ory system bus is based on a scheme in which each cache that has a copy of the data from memory a l so has a copy of the information about it. All cache controllers monitor or snoop on the bus to deter mine whether or not they have a copy of the shared block. Hence the system bus protocol is referred to as a snooping protocol, and the system bus is referred to as a snooping bus:i
The 128-bit -wide synchronous system bus pro vides a write u pdate 5 - state snooping protocol fo r write-back cache-coherent 32-byte block read and write transactions to system memory address space. Each module uses a 192-pin signal connector-the same connector used by Fu turebus+ modu les. Each modu le interfaces between the system bus and its back port with two 299-pin PGA packages contain ing CMOS ASIC ch ips, wh ich implement the bus pro tocol. A total of 157 signals and 35 reference connections implement the system bus in the 192- pin connector (6 interrupt and error, 8 clock and initia l izatio n , 128 command and address or data, 4 parity, I I protocol). Al l control/status registers (CSRs) are visible from the bus to simplify the data paths as wel l as to support SMP.
To simpl ify the snooping protocol, only fu l l block transactions are su pported; masking or sub block transacti ons occur in each modu le's Bl li. Transactions are described from t h e p erspectives of a commander, a responder, and a bystander. The address space is partitioned into CSR space that can not be cached , memory space that can be cached , and secondary l/0 space for the Futurebus+ and 1/0 module devices. Secondary 1!0 space is accessible through an l/0 mod u le mailbox transaction, which pends or retries the system bus when access to very slow 1/0 contro l ler registers confl icts with direct memory access (DMA) t raffic. This software assisted procedure also provides masked byte read and write access to 1/0 devices as wel l as a standard
software i nterface. The use of 32-bit peripheral DiVI.A devices avo ided the need to impl ement hard ware address translators. The software d rivers pro vide physical addresses; hence mapping registers a re not necessary.
The l/0 module drives t wo device-related in ter rupt signals that are received by both CPl · modu les (lue to SM!' requirements. One i nterrupt is associ ated with the Futurebus+, and the other is associated with a l l the device control. lers local to the l/0 mocl u le. The J/0 module provides a silo register of Futu rebus+ i nterrupt pointers and a device request register of local device i nterrupt requests. CPU 1 o r CPU 2 is the designated interru pt d ispatcher mod u le. Privi leged arch itecture l i brary software sub rou ti nes, known as l'A Lcode, run on the primary CPU module and read the device interrupt register or Futurebus+ i n terrupt register to determine wh ich local d evices or wh ich Futurebus+ device hand lers are to be d ispatched.
The e nclosu re, power, and cool ing subsystems are capable of interrupting both processors when
immediate attention i s requ ired. A CPl J can obtai n
information from subsystems shown i n Figure 2 through the seria l control bus. The serial con trol bus enables highly rel iable com munications between field replaceable subsystems. Duri ng power-up, i t is used to obtai n configu ration i n for mation. It is a lso used as an error-logging channel
and as a means to commun icate between the CPU
subsystem , power subsystem , and the OCP. The
nonvolatile RAM (NVRAM) chip i m plemen ted on each module a l lowed the firmware to use software swi tches to configu re the system. The software swi tches avo ided the need fo r hardware switches and jumpers, field replaceable u n i t identification tags, and handwritten error logs. As a resu lt, the hardware system is fu l ly configured through fi rmware , ami fau l t information travels with the field replaceable unit.
The five-state cache coherence protocol assu mes that t he processor's prim ary write-through cache is maintai ned as a subset of the second- level write back cache. The 1m : on the CPU module e n forces this subset p o licy to simplify the simu lation verifi cation p rocess. Without it, the nu mber of verifica tion cases wou kl have been excessive, d ifficult to express, and d i fficult to simu late and check for correctness. The 1/0 module implements an i nva l i date-on-write pol icy, such t hat a block it has read from memory w i l l be inval idated and then re-read if a CPU wri tes to the block . The l/0 module parti-
90
cipates in the coherency pol icy by signa l i ng shared status to a CPU read of a block it has buffered . The five stares of the cache coherence protocol are given in Table 2.
The cache coherence protocol ensures that only one CPU module can ret urn a d i rty response. The d irty response obl igates the responding CPU mod
ule to supply t he rea(] data to the bus, s i nce the memory copy is stale and the memory control ler aborts t he return of the read data . Bus wri tes always clear the dirty bit of the referenced cache block i n both t he commander module ami the module that takes the update.
A CPU has two op tions when a bus transaction is a write and the block is found to be val id in its cache. A CPU either i nvalidates the block or accepts the block and updates its copy, keeping the block valicl . Th is decision is based on the state of the pri mary cache's duplicate tag store and t he state of the second-level cache tag store. Accep tance of the transaction i n to the second- l evel cache on a tag
Table 2 F ive States of the Cache Coherence Protocol State Remarks NOT VALID 2 VALID NOT SHARED NOT D I RTY 3 VALID NOT SHARED D I RTY 4 VALI D SHARED NOT D I RTY 5 VALID SHARED D I RTY Block is inva l i d .
Valid f o r read o r write, this cached block contains the only copy of the block; the copy is identical to the memory copy.
Valid for read or write, this cached block contains the
only cached copy of the block. The cached copy has been
modified more recently than the memory copy.
Block is val id for read or write, but a write must b roadcast to the bus. This block may be i n another cache, b u t t h e memory copy is identical.
Block is val id for read or write, but a write must broadcast to the bus. This block may be i n another cache, b u t the contents have been modified more recently than the memory copy. This is a transitional state that occurs when arbitrati ng for the bus to broadcast a write or when an unshared d i rty block is retu rned to a bus read
transaction.