Whats the difference between physical and virtual cache?

I am having trouble understanding what virtual cache actually is. I understand virtual memory.

If the CPU wants to access memory, as far as I understand, it sends a virtual address to the MMU which, using page tables, figures out the physical memory address.

Now as well as this the CPU sends a different address (just the end of the virtual address), which consists of a set no. a tag and a offset, to the cache which then works out if it resides in cache.

How does virtual cache differ from this?

enter image description here


There are four ways to address a cache depending on whether virtual or physical address bits are used for indexing and/or for tagging.

Because indexing the cache is the most time critical (since all the ways in a set can be read in parallel and the appropriate way selected based on a tag comparison), caches are typically indexed with the virtual address, allowing the indexing to begin before address translation is completed. However, if only bits within the page offset are used for indexing (e.g., with each way being no larger than the page size and simple modulo of the way size for indexing1), then this indexing is actually using the physical address. It is not uncommon for L1 associativity to be increased primarily to allow a larger cache to be indexed by the physical address.

While indexing based on physical address is possible with ways larger than the page size (e.g., by predicting the more significant bits or a fast translation mechanism providing those bits using the delay of indexing with the known physical address bits to hide translation latency), it is not commonly done.

Using virtual addresses for tagging allows a cache hit to be determined before translation has been done. Permissions still need to be checked before the access can be committed, but for loads the data can be forwarded to the execution units and computation using the data begun and for stores the data can be sent to a buffer to allow delayed commitment of state. A permission exception would flush the pipeline, so this does not add design complexity.

(The vhints used by the Pentium 4 data cache provided this latency advantage by using a subset of the virtual address bits that are available early to speculatively select the way.)

(In the days of optional external MMUs, virtual address tags could be particularly attractive in pushing the translation almost entirely outside of the cache design.)

Although virtually indexed and tagged caches can have significant latency advantages, they also introduce the potential for aliasing where the same virtual address maps to different physical addresses (homonyms) or the same physical address maps maps to different virtual addresses (synonyms). Indexing and tagging with physical addresses avoids aliasing.

The homonym problem is relatively easily solved by using address space identifiers (ASIDs). (Flushing the cache when changing address spaces will also guarantee no homonyms, but such is relatively expensive. At least partial flushing would be needed when an ASID is reused for a different address space, but an 8-bit ASID can avoid flushes on most address space changes.) Typically ASIDs would be managed by the operating system, but some systems provided hardware checks for ASID reuse based on the page table base address.

The synonym problem is more difficult to solve. On a cache miss, the physical addresses of any possible aliases must be checked to determine if an alias is present in the cache. If aliasing is avoided in the indexing—by indexing with the physical address or by the operating system guaranteeing that aliases have the same bits in the index (page coloring)—, then only the one set needs to be probed. By relocating any detected synonym to the set indicated by the more recently used virtual address, the alias is avoided in the future (until a different mapping of the same physical address occurs).

In a direct mapped virtually tagged cache without index aliasing, a further simplification is possible. Since the potential synonym will conflict with the request and be evicted, either any necessary writeback of a dirty line can be done before the cache miss is handled (so a synonym would be in memory or a physically addressed higher level cache) or a physically addressed writeback buffer can be probed before the cache line fetched from memory (or higher level cache) is installed. An unmodified alias need not be checked since the memory contents will be the same as those in the cache, merely doing unnecessary miss handling. This avoids the need for additional, physical tags for the whole cache and allows translation to be relatively slow.

If there is no guaranteed avoidance of aliasing in the index, then even a physically tagged cache would need to check the other sets that might contain aliases. (For one non-physical bit of index, a second probing of the cache in the single alternative set may be acceptable. This would be similar to pseudo-associativity.)

For a virtually tagged cache, an extra set of physical address tags can be provided. These tags would only be accessed on misses and can be used for I/O and multiprocessor cache coherence. (Since both misses and coherence requests are relatively rare, this sharing is not typically problematic.)

AMD's Athlon, which used physical tagging with virtual indexing, provided a separate set of tags for coherence probes and alias detection. Since three virtual-only address bits are used for indexing, seven alternative sets had to be probed for possible aliases on a miss. Since this could be done while waiting for a response from the L2 cache, this did not add latency and the extra set of tags could also be used for coherence requests which were more frequent given the exclusivity of the L2 cache.

For a large virtually indexed L1 cache, an alternative to probing many additional sets would be to provide a physical to virtual translation cache. On a miss (or coherence probe) the physical address would be translated to the virtual address that might be used in the cache. Since providing a translation cache entry for each cache line would be impractical, a means would be needed to invalidate cache lines when a translation is evicted.

If aliasing (at least of writable addresses) is guaranteed not to occur, e.g., in a typical single address space operating system, then the only disadvantage of a virtually addressed cache is the extra tag overhead from the fact that virtual addresses in such systems are larger than physical addresses. Hardware designed for a single address space OS could use a permission lookaside buffer instead of a translation lookaside buffer, delaying translation until a last level cache miss.


1 Skewed associativity indexes different ways of the cache with different hashes based on more bits than necessary for modulo indexing of the same size ways. This is useful for reducing conflict misses. This can introduce aliasing problems that would not be present in a modulo-indexed cache of the same size and associativity.