Programming Model
3.6 Memory and Cache Coherency
The 604e can support a fully coherent 4-Gbyte (232) memory address space. Bus snooping is used to drive a four-state (MESI) cache coherency protocol which ensures the coherency of all processor and direct-memory access (DMA) transactions to and from global memory with respect to each processor’s cache. It is important that all bus participants employ similar snooping and coherency control mechanisms. The coherency of memory is maintained at a granularity of 32-byte cache blocks (this size is also called the coherency or cache-block size).
All instruction and data accesses are performed under the control of the four memory/cache access attributes:
• Write-through (W attribute) • Caching-inhibited (I attribute) • Memory coherency (M attribute) • Guarded (G attribute)
These attributes are programmed by the operating system for each page and block. The W and I attributes control how the processor performing an access uses its own cache. The M attribute ensures that coherency is maintained for all copies of the addressed memory location.The G attribute prevents speculative loading and prefetching from the addressed memory location.
3.6.1 Data Cache Coherency Protocol
Each 32-byte cache block in the 604e data cache is in one of four states. Addresses presented to the cache are indexed into the cache directory and are compared against the cache directory tags. If no tags match, the result is a cache miss. If a tag match occurs, a cache hit has occurred and the directory indicates the state of the block through three state bits kept with the tag.
The four possible states for a block in the cache are the invalid state (I), the shared state (S), the exclusive state (E), and the modified state (M). The four MESI states are defined in Table 3-3 and illustrated in Figure 3-5.
The primary objective of a coherent memory system is to provide the same image of memory to all processors in the system. This is an important feature of multiprocessor systems since it allows for synchronization, task migration, and the cooperative use of shared resources. An incoherent memory system could easily produce unreliable results depending on when and which processor executed a task. For example, when a processor performs a store operation, it is important that the processor have exclusive access to the addressed block before the update is made. If not, another processor could have a copy of the old (or stale) data. Two processors reading from the same memory location would get different answers.
To maintain a coherent memory system, each processor must follow simple rules for managing the state of the cache. These include externally broadcasting the intention to read a cache block not in the cache and externally broadcasting the intention to write into a block that is not owned exclusively. Other processors respond to these broadcasts by snooping their caches and reporting status back to the originating processor. The status returned includes a shared indicator (that is, another processor has a copy of the addressed block)
Table 3-3. MESI State Definitions
MESI State Definition
Modified (M) The addressed block is valid in the cache and in only this cache. The block is modified with respect to system memory—that is, the modified data in the block has not been written back to memory. Exclusive (E) The addressed block is in this cache only. The data in this block is consistent with system memory. Shared (S) The addressed block is valid in the cache and in at least one other cache. This block is always
consistent with system memory. That is, the shared state is shared-unmodified; there is no shared- modified state.
Invalid (I) This state indicates that the addressed block is not resident in the cache and/or any data contained is considered not useful.
and a retry indicator (that is, another processor either has a modified copy of the addressed block that it needs to push out of the chip, or another processor had a queuing problem that prevented appropriate snooping from occurring).
To maximize performance, the 604 provides a second path into the data cache directory for snooping. This allows the mainstream instruction processing to operate concurrently with the snooping operation. The instruction processing is affected only when the snoop control logic detects a situation where a snoop push of modified data is required to maintain memory coherency.
Figure 3-5. MESI States
Modified in Cache A
Cache A Cache B
System Memory
Cache A Cache B
System Memory
Cache A Cache B Cache A Cache B
System Memory Valid Data
M not congruentData invalid\
Shared in Cache A
Valid Data Valid Data
S S Valid Data Exclusive in Cache A E Valid Data Valid Data Don’t Care X Invalid in Cache A System Memory Don’t Care Data invalid\ not congruent Data invalid\
3.6.2 Coherency and Secondary Caches
The 604e supports the use of a larger secondary cache that can be implemented in different configurations. The use of an L2 cache can serve to further improve performance by further reducing the number of bus accesses. The L2 cache must operate with respect to the memory system in a manner that is consistent with the intent of the PowerPC architecture. L2 caches must forward all relevant system bus traffic onto the 604e so it can take the appropriate actions to maintain memory coherency as defined by the PowerPC architecture.
3.6.3 Page Table Control Bits
The PowerPC architecture allows certain memory characteristics to be set on a page and on a block basis. These characteristics include the following:
• Write-back/write-through (using the W bit) • Cacheable/noncacheable (using the I bit)
• Memory coherency enforced/not enforced (using the M bit)
An additional page control bit, G, handles guarded storage and is not considered here. This ability allows both single- and multiple-processor system designs to exploit numerous system-level performance optimizations.
The PowerPC architecture defines two of the possible eight decodings of these bits to be unsupported (WIM = 110 or 111).
Note that software must exercise care with respect to the use of these bits if coherent memory support is desired. Careless specification of these bits may create situations that present coherency paradoxes to the processor. In particular, this can happen when the state of these bits is changed without appropriate precautions (such as flushing the pages that correspond to the changed bits from the caches of all processors in the system) or when the address translations of aliased real addresses specify different values for any of the WIM bits. These coherency paradoxes can occur within a single processor or across several processors.
It is important to note that in the presence of a paradox, the operating system software is responsible for correctness. The next section provides a few simple examples to convey the meaning of a paradox.
3.6.4 MESI State Diagram
The 604e provides dedicated hardware to provide data cache coherency by snooping bus transactions. The address retry capability of the 604e enforces the MESI protocol, as shown in Figure 3-6. Figure 3-6 assumes that the WIM bits are set to 001; that is, write-back, caching-not-inhibited, and memory coherency enforced.
Figure 3-6. MESI Cache Coherency Protocol—State Diagram (WIM = 001)
Table 3-6 gives a detailed list of MESI transitions for various operations and WIM bit settings.
3.6.5 Coherency Paradoxes in Single-Processor Systems
The following coherency paradoxes can be encountered within a single processor: • Load or store operations to a page with WIM = 0b011 and a cache hit occurs.
Caching was supposed to be inhibited for this page. Any load operation to a cache- inhibited page that hits in the cache presents a paradox to the processor. The 604e ignores the data in the cache and the state of the cache block is unchanged. • Store operation to a page with WIM = 0b10X and a cache hit on a modified cache
block occurs. This page was marked as write-through yet the processor was given access to the cache (write-through page are always main memory). Any store operation to a write-through page that hits a modified cache block in the cache
SHARED SHR RH RH EXCLUSIVE SHW RMS SHR SHW SHR RME WH WH WH RH MODIFIED SHW SHW (burst) INVALID (On a miss, the old line is first invalidated
and copied back
if M)
WM
BUS TRANSACTIONS
RH = Read Hit = Snoop Push
RMS = Read Miss, Shared
RME = Read Miss, Exclusive = Invalidate Transaction WH = Write Hit
WM = Write Miss = Read-with-Intent-to-Modify SHR = Snoop Hit on a Read
SHW = Snoop Hit on a Write or = Cache Block Fill Read-with-Intent-to-Modify
presents a coherency paradox to the processor. The 604e writes the data both to the cache and to main memory (note that only the data for this store is written to main memory and not the entire cache block). The state of the cache block is unchanged.
3.6.6 Coherency Paradoxes in Multiple-Processor Systems
It is possible to create a coherency paradox across multiple processors. Such paradoxes are particularly difficult to handle since some scenarios could result in the purging of modified data, and others may lead to unforeseen bus deadlocks.
Most of these paradoxes center around the interprocessor coherency of the memory coherency bit (or the M bit). Improper use of this bit can lead to multiple processors accepting a cache block into their caches and marking the data as exclusive. In turn, this can lead to a state where the same cache block is modified in multiple processor caches. Additional information on what bus operations are generated for the various instructions and state conditions can be found in Chapter 8, “System Interface Operation.”