Aren't you missing the hit/miss ratio, or am I losing it and should drop out saturday's test??
Or maybe you're assuming that from 256Kb, I'll get 16kB hit and 96kB hit on both caches respectively??
that is what I thought too, hit rate is missing, but the definition of the working set is the program needs to access X amount of data, in this case 256KB to work. Assuming best case, 16KB will be in L1 cache (hit all the time when needed) 96 KB will be in L2, and the rest is allllllllll the wayy down to main memory. You're not losing it, at least not yet . So consider the second case
I agree that this problem is missing one piece of information, and that is the distribution of requests. If the working set is 256Kbytes but 99% of that happens to be within the 16KB of L1, then the response time will be pretty close to 2 cyles (5ns).
If you assume a uniform distribution of requests, and you assume the contents of the L1 cache are disjoint from the contents of L2 (is this reasonable?), I think your answer might be headed in the right direction, Wood.
But what if L1 is a strict subset of L2 (plus we assume a uniform distribution of requests)? Then the "L2 outside L1" is effectively only 96-16 KB = 80KB.
Then 16/256 = 1/16 th of the requests can be satisfied in 2 cyc = 5ns.
And 80/256 = 5/16 th of the requests can be satisfied in 4 cyc = 10ns.
The remaining 10/16 th of the requests must be satisfied in 150ns.
Yielding 5*1/16 + 5*10/16 + 10*150/16 = (5+50+1500)/16 = 1555/16 ns, or a hair under 100 ns.
Wood, I don't have the answer, but the way I am solving it I'm getting a different result. Remember we have 2 levels of cache
Clock is 400MHz, so 1 cycle is 2.5ns.
Hit ratio is 16/256 = 1/16.
Hit ratio is 96/256 = 3/8
Memory (Exclusively, not in L1, not in L2)
Hit ratio = 1- 1/16 - 3/8 = 1 - 1/16 - 6/16 = 9/16
Access time = (1/16)*5 + (3/8)*10 + (9/16 )*150 =
= (5 + 60 + 9*150)/16 = 88.43ns
I don't think so, it would be really dumb to have two copies of L1 data, why? Because you're storing less data close to the CPU and you'll need to update both of them no matter what policy you're using. I think to the CPU L1 and L2 is 1 big cache were the main difference between them is the access time.
Miss L1 Miss L2
- find some line in L1 to throw out
- load new data in L1
- find some line in L1 to throw in L2
- find some line in L2 to throw out, place L1 above in L2
- load data in L1
Miss L1 Hit L2
- No change, 4 cycle time is ok
- Find line in L1 to throw into L2
- Bring L2 line in L1
Does this make sense. Why would one want two copies of L1 ???
In a Celeron processor, the L2 duplicates the data in the L1.
In a Duron processor, the L2 does not duplicate the data in the L1.
Look into it. On the real test, ETS has to tell us whether L2 duplicates L1, just as they have to tell us that the access distribution is uniform.
Thanks for the problem, though. It's a good practice item.
There are currently 1 users browsing this thread. (0 members and 1 guests)