The bus i merface u n i t , t he BIU, controls external chip operations, i n ternal cache access and refresh , and a rbitration for the i n ternal data a nd address bus. The BIU contai ns two state machines .
• The i nternal state machine controls the arbi t ration for the i n ternal data and address bus ( I DALs) .
• The external state mach i ne , controls the arbi tration for the external p i ns and DAis.
The design goal was to ach ieve a si ngle-cycle read operation for hits to the i n ternal cache and a two-cycle write operation for an ideal memory subsystem . In addit ion , better system rel iabil i ty is ach ieved by providing parity protection on all the external data transfers and internal cache rcadjwrite operations.
The C VAX 7803 4 Chip, a 32- bit Second-generation VAX Microprocessor
To accomplish a si ngle-cycle read operation, the two state machines were i m plemented as self t i med PlAs t hat require just one phase to evalu ate. The separation of control operations between the two state machi nes a l l owed the PlAs to oper ate i n different phases . Readjwri te-re lated . i nter nal t i me-critica l signals are generated by the i nternal state machine . This stare mach ine eva lu ates first, stalls the CPU i f necessary, conrrols the cache, and sets states for the externa l state machine . Time-critical external strobes are con trol led by the external state mac hine. The exter nal state machine operates next, controls the ter min ation of external operations, cl ears the interna l stare mach i ne flags, and grants control of external buses and strobes to external devices . On a cache miss, the external state machine uncond i t iona l ly drives t he external read data to t he M-Box or the 1-Box, and a phase later the state machine val idates the data . This scheme made it possible to service the next microinstruction while the previous one was completing.
The BIU also controls all me mory transactions.
A me mory read operation is pe rformed in one
cyc l e if t here is a h i t i n the in ternal cache and no cache parity error is detected . However, when a cache miss occurs during a read operation , a rwo longword block i n t he cache is al located to store the data , which must now be read from memory. The B I U stalls t he CPU u n t i l the first longword of data is received . The B I U ini tia tes the external read cycle, send ing t he address of the first long word to the external memory system . When the first Jongword of data is received , the B I U sends it to the cache and E -Box or 1-Box, and unsta l ls the CPU. The fetch of t he second l ongword is over lapped with other chip activi ty to m i n i m ize the effective me mory access time. The second long word of data is written i nto t he al ternate long word in t he allocated quadword (two longword) cache block . The cache block is va l idated only i f both longwords i n t he block are fe tched success ful ly.
The BIU contains a longword write buffer wh ich supports a dum p-and-run wri te mecha nism . Chip activi ty, includ ing cache reads, can proceed in para l l e l while t he BIU is wai ting for the com pletion of a write operation . The B I U may have up to t hree different operations in progress at once : a write to memory, a read from me mory, and an i nternal cache entry inval idation . Descrip· tions of t hese operations in the BIU fol l ow.
Wh ile a write to memory is awaiting com ple· tion . the internal state machine can service read
1 0 2
requests . If the read reference misses t he cache, it is queued and serviced only after the write operation completes. This overlapping of read and write operations reduces the number of memory sta l l cycles, resu l t i n g i n a l ower TPI .
To fac i l i tate support for multi processor appli cations and DMA activity, the BIU provides a pro tOcol for internal cache coherency. To activate this fu nction , an external device first gains own e rship of t he external address and data bus by means of the DMA request and grant protOcols. The device t hen presents an address, qua l ified by certa in strobes, to the processor. The processor l arches t he address and t hen performs a cache l ook-up If a cache hit occurs, t he marching cache en try wi l l be inval idated.
Eight p i ns are dedicated to the floating poi nt i nterface . To opti mize the operand transfer rate
between the CVA.X 78034 CPU and i ts floating
poi n t processor, bot h c h i ps read the floati ng point operands from memory si mul taneously.
Cache
The goals for the design of the i nternal cache were twofold: tO reduce the me mory access t i me to one microcyc lc for data that is resident in the cache; and tO m i n i m i ze t he nu mber of cache ref erences that m i ss the cache .
To achieve the one-m icrocycle access t i me , the internal cache is designed to pe rform the cache l ook- up i n para l l el with the translat ion buffer look-up. This scheme uses the 9 virtual address bits that do not change during t he address transla tion process to i ndex i nto the array. Because the cache look-up and translation buffer l ook-up are performed in paral l e l , t he data for the selected cache e ntry is ready when the translated address is being latched intO t he tag comparator. The cache tag is t hen compared to the translated address . I f a match occurs, the data is driven onto the I DAL before t he end of the cycle.
To achieve our second goa l - m i n i m i zat ion of t he number of cache misses - we used a rwo way set assoc iat ive cache with a b lock size of
8 bytes . This two-way set associative cache was designed to meet both performance and chip size require menr s . First , a random replacement algo rit h m was selected tO reduce circuit com plexity with a m i nimal i mpact on cache performance . W i t h reference to chip size, we determined that a cache size of 1 KB was the largest that could be used . I n addi t ion , t he cache is designed so t hat it can be configured by sofrware tO act as an instruction -only cache or as an i nstruct ion and
Digital Technical journal No. 7 August 1988
dara cache . The i nstruction-on ly option was pro vided to simpl i fy hardware in multiprocessor systems where the designers do not want to deal with DMA inval idates.
The cel l chosen to i m plement the cache array is a one-transistor ( l T) dynam ic RAM. The I T ce l l , i l lustrated i n Figure 4 , was chosen because of i ts sma l l area . A comparable array design with either a four-transistor dynam ic RAM or a six-tran sistor static RAM cell wou ld have req u i red 2 . 4 ro
:) ti mes as much area. The storage capacita nce of the I T cel l is I I 0 femtofarads, resu l t i ng in a bit l i ne tO cell-capacitance ratio of 8 to 1 . With a folded bit- l i ne structure and the use of a dummy ce ll (which stores half the charge of t he storage ce l l ) , a voltage d i fferential of 200 m i l l ivolts was rea l i zed at the sense ampl i fiers . Because of the dynamic nature of the I T ce l l , a refresh counter, composed of l i near feedback shift registers, was designed tO control which row is refreshed dur ing idle cache cycles.
We designed byte pari ty into the cache to detect data corruption resu l ti ng from e i ther soft or hard errors . A study was done to determ i ne t he soft error rate of the cel l . The soft error rate for the cache array was found to be 1 0 FITs , where
I FIT is equal to I fai lure in one tril l ion operat ing hours . To protect aga i nst data corruption due to m i nority carrier injection, the array is sur rounded by a deep N-type i mplant ri ng.
The CVA.X CPU chip is the first m icroprocessor
in the industry to i nclude an on-chip dynamic
1 T cel l cache.