language agnostic - Associative cache simulation - Dealing with a Faulty Scheme -
while working on simulating associative cache (in mips assembly), couple of questions came mind based on information read online;
according notes university of maryland
finding slot: @ most, 1 slot should match. if there more 1 slot matches, have faulty fully-associative cache scheme. should never have more 1 copy of cache line in slot of fully-associative cache. it's hard maintain multiple copies, , doesn't make sense. slots used other cache lines.
does mean should check time whole tag list in order check second match? after if don't, never "realize" fault cache, yet, checking every single time seems quite inefficient.
in case check, , somehow manage find second match, meaning faulty cache scheme, shall then? although best answer fix implementation, yet im interested on how handle during execution if situation should arise.
if more 1 valid slot matches address, means when previous search same address executed, either valid slot should have matched address not used (perhaps because not checked in first place) or more 1 invalid slot used store line wasn't in cache @ all.
without doubt, should considered bug.
but if we've decided not fix bug (maybe we'd rather not commit hardware better implementation) obvious option pick 1 of slots invalidate. available other cache lines.
as how pick 1 invalidate, if 1 of duplicate lines clean, invalidate 1 in preference dirty cache line. if more cache line dirty and disagree have bigger bug fix, @ rate cache out of sync , doesn't matter pick.
edit: here's how might implement hardware this:
first off, doesn't make whole lot of sense start assumption of duplicates, rather we'll work around @ appropriate time later. there few possibilities of must happen when caching new line.
- the line in cache, no action needed
- the line not in cache there invalid slots available: place new line 1 of available slots
- the line not in cache there no invalid slots available. valid line must evicted , new line takes place.
- picking eviction candidate has performance consequences. clean cache lines can evicted free, if chosen poorly, can cause cache miss in near future. consider if 1 cache line dirty. if clean cache line evicted, many sequential reads alternating between 2 addresses cause cache miss on every read. cache invalidation among 2 hard problems in comp sci (the other being 'naming things') , out of scope of exact question.
i implement search checks correct slot act on each of these. block pick first list , act on it.
now, getting question. conditions under duplicates possibly enter cache. if memory accesses strictly ordered, , implementation (as above) correct, don't think duplicates possible @ all. , there's no need check them.
now lets consider more implausible case single cache shared across 2 cpu cores. we're going simplest thing work , duplicate except cache memory each core. slot searching hardware not shared. support this, bit per slot used mutex. search hardware cannot use slot locked other core. specifically,
- if address in cache, try lock slot , return slot. if slot locked, stall until free.
- if address not in cache, find unlocked slot invalid or valid evictable.
in case can end in position 2 slots share same address. if both cores try write address not in cache, end getting different slots, , duplicate line occur. first lets think happen:
- both lines reads main memory. same value , both clean. correct evict either.
- both lines writes. both dirty, not equal. race condition should have been resolved application issuing memory fences or other memory ordering instructions. cannot guess 1 should used, if there no cache race condition persist ram. correct evict either.
- one line read , 1 write. write dirty read clean. once again race condition have persisted ram if there no intervening cache, reader have seen different value. evicting clean line right ram, , has side effect of favoring read write ordering.
so know it, logic belong. first lets think happen if don't anything. subsequent cache access same address on either core return either line. if neither core issuing writes, reads keep coming different, alternating between 2 values. breaks every conceivable idea memory ordering.
one solution might dirty lines belong 1 core only, line not dirty, dirty and owned core.
- in case of 2 concurrent reads, both lines identical, unlocked , interchangeable. doesn't matter line core gets subsequent operations.
- in case of concurrent writes, both lines out of sync, mutually invisible. although race condition creates unfortunate, still leads reasonable memory ordering, if of operations happen on discarded line happened before of operations on cleaned line.
- if read , write happen concurrently, dirty line invisible reading core. however, clean line visible both cores, , cause memory ordering break down writer. future writes cause lock both (because both dirty).
that last case pretty militates dirty lines preferred clean ones. forces @ least some hardware dirty lines first , clean lines if no dirty lines found. have new concurrent cache implementation:
- if address in cache , dirty , owned requesting core, use slot
- if address in cache clean
- for reads, use slot
- for writes, mark slot dirty , use slot
- if address not in cache , there invalid slots, use invalid slot
- if there no invalid slots, evict line , use slot.
we're getting closer, there's still hole in implementation. if both cores access same address not concurrently. simplest thing dirty lines invisible other cores. in cache dirty same not being in cache @ all.
now have think providing tool applications synchronize. i'd tool explicitly flushes line if dirty. invoke same hardware used during eviction, marks line clean instead of invalid.
to make long post short, idea deal duplicates not removing them, making sure cannot lead further memory ordering issues, , leaving deduplication work application or eventual eviction.
Comments
Post a Comment