• No results found

Bus Interface Un it

In document dtj v01 07 aug1988 pdf (Page 101-103)

The bus i merface u n i t , t he BIU, controls external chip operations, i n ternal cache access and refresh , and a rbitration for the i n ternal data a nd address bus. The BIU contai ns two state machines .

• The i nternal state machine controls the arbi ­ t ration for the i n ternal data and address bus ( I DALs) .

• The external state mach i ne , controls the arbi­ tration for the external p i ns and DAis.

The design goal was to ach ieve a si ngle-cycle read operation for hits to the i n ternal cache and a two-cycle write operation for an ideal memory subsystem . In addit ion , better system rel iabil i ty is ach ieved by providing parity protection on all the external data transfers and internal cache rcadjwrite operations.

The C VAX 7803 4 Chip, a 32- bit Second-generation VAX Microprocessor

To accomplish a si ngle-cycle read operation, the two state machines were i m plemented as self­ t i med PlAs t hat require just one phase to evalu­ ate. The separation of control operations between the two state machi nes a l l owed the PlAs to oper­ ate i n different phases . Readjwri te-re lated . i nter­ nal t i me-critica l signals are generated by the i nternal state machine . This stare mach ine eva lu­ ates first, stalls the CPU i f necessary, conrrols the cache, and sets states for the externa l state machine . Time-critical external strobes are con ­ trol led by the external state mac hine. The exter­ nal state machine operates next, controls the ter­ min ation of external operations, cl ears the interna l stare mach i ne flags, and grants control of external buses and strobes to external devices . On a cache miss, the external state machine uncond i t iona l ly drives t he external read data to t he M-Box or the 1-Box, and a phase later the state machine val idates the data . This scheme made it possible to service the next microinstruction while the previous one was completing.

The BIU also controls all me mory transactions.

A me mory read operation is pe rformed in one

cyc l e if t here is a h i t i n the in ternal cache and no cache parity error is detected . However, when a cache miss occurs during a read operation , a rwo­ longword block i n t he cache is al located to store the data , which must now be read from memory. The B I U stalls t he CPU u n t i l the first longword of data is received . The B I U ini tia tes the external read cycle, send ing t he address of the first long­ word to the external memory system . When the first Jongword of data is received , the B I U sends it to the cache and E -Box or 1-Box, and unsta l ls the CPU. The fetch of t he second l ongword is over­ lapped with other chip activi ty to m i n i m ize the effective me mory access time. The second long­ word of data is written i nto t he al ternate long­ word in t he allocated quadword (two longword) cache block . The cache block is va l idated only i f both longwords i n t he block are fe tched success­ ful ly.

The BIU contains a longword write buffer wh ich supports a dum p-and-run wri te mecha­ nism . Chip activi ty, includ ing cache reads, can proceed in para l l e l while t he BIU is wai ting for the com pletion of a write operation . The B I U may have up to t hree different operations in progress at once : a write to memory, a read from me mory, and an i nternal cache entry inval idation . Descrip· tions of t hese operations in the BIU fol l ow.

Wh ile a write to memory is awaiting com ple· tion . the internal state machine can service read

1 0 2

requests . If the read reference misses t he cache, it is queued and serviced only after the write operation completes. This overlapping of read and write operations reduces the number of memory sta l l cycles, resu l t i n g i n a l ower TPI .

To fac i l i tate support for multi processor appli­ cations and DMA activity, the BIU provides a pro­ tOcol for internal cache coherency. To activate this fu nction , an external device first gains own­ e rship of t he external address and data bus by means of the DMA request and grant protOcols. The device t hen presents an address, qua l ified by certa in strobes, to the processor. The processor l arches t he address and t hen performs a cache l ook-up If a cache hit occurs, t he marching cache en try wi l l be inval idated.

Eight p i ns are dedicated to the floating poi nt i nterface . To opti mize the operand transfer rate

between the CVA.X 78034 CPU and i ts floating

poi n t processor, bot h c h i ps read the floati ng point operands from memory si mul taneously.

Cache

The goals for the design of the i nternal cache were twofold: tO reduce the me mory access t i me to one microcyc lc for data that is resident in the cache; and tO m i n i m i ze t he nu mber of cache ref­ erences that m i ss the cache .

To achieve the one-m icrocycle access t i me , the internal cache is designed to pe rform the cache l ook- up i n para l l el with the translat ion buffer look-up. This scheme uses the 9 virtual address bits that do not change during t he address transla­ tion process to i ndex i nto the array. Because the cache look-up and translation buffer l ook-up are performed in paral l e l , t he data for the selected cache e ntry is ready when the translated address is being latched intO t he tag comparator. The cache tag is t hen compared to the translated address . I f a match occurs, the data is driven onto the I DAL before t he end of the cycle.

To achieve our second goa l - m i n i m i zat ion of t he number of cache misses - we used a rwo­ way set assoc iat ive cache with a b lock size of

8 bytes . This two-way set associative cache was designed to meet both performance and chip size require menr s . First , a random replacement algo­ rit h m was selected tO reduce circuit com plexity with a m i nimal i mpact on cache performance . W i t h reference to chip size, we determined that a cache size of 1 KB was the largest that could be used . I n addi t ion , t he cache is designed so t hat it can be configured by sofrware tO act as an instruction -only cache or as an i nstruct ion and

Digital Technical journal No. 7 August 1988

dara cache . The i nstruction-on ly option was pro­ vided to simpl i fy hardware in multiprocessor systems where the designers do not want to deal with DMA inval idates.

The cel l chosen to i m plement the cache array is a one-transistor ( l T) dynam ic RAM. The I T ce l l , i l lustrated i n Figure 4 , was chosen because of i ts sma l l area . A comparable array design with either a four-transistor dynam ic RAM or a six-tran­ sistor static RAM cell wou ld have req u i red 2 . 4 ro

:) ti mes as much area. The storage capacita nce of the I T cel l is I I 0 femtofarads, resu l t i ng in a bit­ l i ne tO cell-capacitance ratio of 8 to 1 . With a folded bit- l i ne structure and the use of a dummy ce ll (which stores half the charge of t he storage ce l l ) , a voltage d i fferential of 200 m i l l ivolts was rea l i zed at the sense ampl i fiers . Because of the dynamic nature of the I T ce l l , a refresh counter, composed of l i near feedback shift registers, was designed tO control which row is refreshed dur­ ing idle cache cycles.

We designed byte pari ty into the cache to detect data corruption resu l ti ng from e i ther soft or hard errors . A study was done to determ i ne t he soft error rate of the cel l . The soft error rate for the cache array was found to be 1 0 FITs , where

I FIT is equal to I fai lure in one tril l ion operat­ ing hours . To protect aga i nst data corruption due to m i nority carrier injection, the array is sur­ rounded by a deep N-type i mplant ri ng.

The CVA.X CPU chip is the first m icroprocessor

in the industry to i nclude an on-chip dynamic

1 T cel l cache.

In document dtj v01 07 aug1988 pdf (Page 101-103)