Output Queue - PCX Arbiter Control Flow

CCXarbiter0arbiter

3.7.3 PCX Arbiter Control Flow

4.1.2.6 Output Queue

The output queue (OQ) is a 16 entry FIFO that queues operations waiting for access to the CPX. Each entry in the OQ is 146-bits wide. The FIFO is implemented with a dual-ported array. The write port is used for writing into the OQ from the L2-cache pipe. The read port is used for reading contents for issue to the CPX. If the OQ is empty when a packet arrives from the L2-cache pipe, the packet can pass around the OQ if it is selected for issue to the CPX.

Multicast requests are dequeued from the FIFO only if all the of CPX destination queues can accept the response packet. When the OQ reaches its high-water mark, the L2-cache pipe stops accepting inputs from miss buffer or the PCX. Fills can happen while the OQ is full since they do not generate CPX traffic.

4.1.2.7 Snoop Input Queue

The Snoop input queue (SNPIQ) is a two-entry FIFO for storing DMA instructions coming from the JBI. The non-data portion (the address) is stored in the snoop input queue (SNPIQ). For a partial line write (WR8), both the control and the store data is stored in snoop input queue.

4.1.2.8 Miss Buffer

The 16-entry miss buffer (MB) stores instructions which cannot be processed as a simple cache hit. These instructions include true L2-cache misses (no tag match), instructions that have the same cache line address as a previous miss or an entry in the writeback buffer, instructions requiring multiple passes through the L2-cache pipeline (atomics and partial stores), unallocated L2-cache misses, and accesses causing tag ECC errors.

The miss buffer is divided into a non-tag portion which holds the store data, and a tag portion which contains the address. The non-tag portion of the buffer is a RAM with 1 read and 1 write port. The tag portion is a CAM with 1 read, 1 write, and 1 cam port.

A read request is issued to the DRAM and the requesting instruction is replayed when the critical quad-word of data arrives from the DRAM.

All entries in the miss buffer that share the same cache line address are linked in the order of insertion in order to preserve the coherency. Instructions to the same address are processed in age order, whereas instructions to different addresses are not ordered and exist as a free list.

When an MB entry gets picked for issue to the DRAM (such as a load, store, or ifetch miss), the MB entry gets copied into the fill buffer and a valid bit gets set. There can

Data can come from the DRAM to the L2-cache out of order with respect to the address order. When the data comes back out of order, the MB entries get readied for issue in the order of the data return. This means that there is no concept of age in the order of data returns to the CPU as these are all independent accesses to different addresses. Therefore, when a later read gets replayed from the MB down the pipe and invalidates its slot in the MB, a new request from the pipe will take its slot in the MB, even while an older read has not yet returned data from the DRAM.

In most cases, when a data return happens, the replayed load from the MB makes it through the pipe before the fill request can. Therefore, the valid bit of the MB entry gets cleared (after the replayed MB instruction execution is complete in the pipe) before the fill buffer valid bit. However, if there are other prior MB instructions, like partial stores that get picked instead of the MB instruction of concern, the fill request can enter the pipe before the MB instruction. In these cases, the valid bit in the fill buffer gets cleared prior to the MB valid bit. Therefore, the MB valid bit and FB valid bits always get set in the order of MB valid bit first, and FB valid bit second. (These bits can get cleared in any order, however.)

4.1.2.9 Fill Buffer

The fill buffer (FB) contains a cache-line wide entry to the stage data from the DRAM before it fills the cache. Addresses are also stored for maintaining the age ordering in order to satisfy coherency conditions.

The fill buffer is an 8 entry buffer used to temporarily store data arriving from the DRAM on an L2-cache miss request. Data arrives from the DRAM in four 16-byte blocks starting with the critical quad-word. A load instruction waiting in the miss buffer can enter the pipeline after the critical quad-word arrives from the DRAM (the critical 16 bytes will arrive first from the DRAM). In this case, the data is bypassed. After all four quad-words arrive, the fill instruction enters the pipeline and fills the cache (and the fill buffer entry gets invalidated).

When data comes back in the FB, the instruction in the MB gets readied for reissue and the cache line gets written into the data array. These two events are independent and can happen in any order.

For a non-allocating read (for example, an I/O read), the data gets drained from the fill buffer directly to the I/O interface when the data arrives (and the fill buffer entry gets invalidated). When the FB is full, the miss buffer cannot make requests to the DRAM.

In document OpenSPARC T1 Microarchitecture Specification (Page 156-158)