The VAX 9000 architecture adds vector instructions to the standard VAX environment, thus a vector register file was required. There were two primary design requ i rements for the vector register file. First, the register file and associated cross-bar logic had to fit in a single multichip unit; and second, the
Digital Techn ical journal Vol. 2 No. 4 Fall f'J'JO
register file had to perform read and write at dif ferent addresses within a single 16-ns clock cycle. These requirements could not be met with available memory and logic chips, thus necessitating the development of a fully custom vector register chip.
The vector register file is 64 bits wide and con sists of 16 vector registers with 64 elements each. The vector register chip, VRGx, was developed as an 8-bit slice of the 64 -bit vector register file. The chip contains 9216 bits of RAM for data storage and the cross-bar logic (6000 equivalent gates) that allows access from the five read ports and three write ports. Integrating the register memory and the cross-bar logic on the same chip allowed timing to be optimized so that the system timing require ments were met .
VRGx Chip Physical Features and Organization
The VRGx chip is fabricated using the MOSAIC III ECL process, w hich was not designed as a memory pro cess. Coordination with the vendor resulted in the addition of an implant step for the memory-cel l bit line emitters. Key features of the process are three metal interconnect layers, oxide isolation, and polysilicon emitters with a drawn width of
1 .75 microns.
Figure 9 shows the locations of the major circuit blocks in the VRGx chip. The major blocks of the VRGx chip are five read ports, three write ports, and 16 vector registers in the RAM bank array. The block diagram, Figure 10, shows the main data paths. The 16 vector registers are implemented as 64 -word by 9-bit single port RAMs. Eight bits are a slice of the 64 -bit vector register ftle and the ninth bit is for byte parity.
Timing
A register RAM can be read from one address and written from a different address in one 16-ns clock cycle. This dual operation is made possible by a 2 to 1 multiplexer on the RAM address inputs. The read address is appl ied during the first portion of the cycle, and the write address is applied during the second portion of the cycle. Spl itting the clock cycle i nto read and write portions eliminates conflict between read and write ports in the event that a single register RAJVl is selected for both read and write. Read data is held in a latch during the sec ond portion of the cycle and is unaffected by the write operation .
A single clock cycle consists of nonoverlapping clock phases A ami B. Latches on the read and write
Figure 9 Photomicrograph of VRGx Chip pon inputs are clocked by phase A, and read port
output latches are clocked by p hase B. For a read operation initiated on phase A, the output read data becomes valid during phase B.
Cross-bar Logic
Cross-bar logic in the RAM bank array makes each of the 16 vector register RAMs independently accessi ble from the read and write ports. Enable inputs on the ports prevent invalid addresses from contl icring with intended addresses. Read and write ports may point to the same register RAM, bur different write pons may nor point to the same R A M . Also, differ ent read ports may on ly point to the same RMvl if the vector element address is the same. All conflicts must be resolved external to the chip.
56
A read port consists of an enable, a 4-bir register select, a o-bit vector element address, and a 9-bit output. An enabled read port appl ies a register select code that points to a particular RAM bank. At that RAM bank, a ') to I multiplexer selects the vec tor element address from the active read port and applies it ro the read add ress of the RAM . Then the
RAM output passes through a 16 to l m u l tiplexer controlled by the register select code, so that the selected RAM output reaches the output of the active read port.
A write port consists of an enable, a 4 -bit register select, a 6-bir vector element address, and a 9-bir write data input. An enabled write port applies a register select code that points to a particular RA.M bank . At that RAM bank, a 3 to I multiplexer selects
Semiconductor Technology in a High-performance VAX System 5x S E L ADDR<5:0> r - - - SEL<3:0> R EAD
I
PORT ENABLE ADDRI 5 : 1 M U XI
3xI
DIN<8:0> 3 : 1 M U X I ADDR<5 0> I WRITESEL<3:0> PORT DATA 3 : 1
- - - - ---,
I
9 1 6 AR DOI
I
AWI
I RAM I 64 X 9I
5x SEL 1 6 : 1 M U X R E PO ou AD RT T D0 - 8 0 · Dl SEL M U X I ENABLEI
I
I
I
RAM BA N K L - - - __jRAM BAN K ARRAY. 1 6x
Figure 10 VRGx Chip Block Diagram
the vector element address from the active write port and applies it to the write address of the RAM . A lso, a 3 to I m u ltiplexer selects t he write d ata from the active write port and applies i t to the RAM data input .
RAM Technology
The normal transistors in an ECL process are of the
NPN type, where the collector is a buried N-doped
region . For memory cel ls, a lateral PNP transistor is placed in the same collector region , and the com bined structure has the latching characteristics of a silicon controlled rectifier (SCR). The memory cell array in the 64 by 9 register RAMs is implemented with ECL SCR memory cells.
The SCR memory cel l shown in Figure I I consists of two cross-coupled SCR structures. Extra NPN
emitters connect to the bit lines and provide a means of writing and sensing the celL The "on" side of the cell saturates, allowing the bit line emitter to conduct in the inverse mode. Inverse gain of the bit line emitters must be limited to avoid excessive leakage into the unselected cells. An added process step applies a special base implant to the bit line emitters only to control their inverse gain.
Advantages of the SCR cell include good density, low standby power, large sense voltage differen-
Digital Tecbnicaljournal Vol 2 No. 4 Fall 1990
tial, and low sensitivity to alpha-particle-induced soft errors. The cell has one limitation: excess charge storage due to write current can delay sub sequent writing to the opposite state. This problem is el iminated with a special bit line current steering circuit that makes write current state dependent (Figure 1 1 ).
The SCR memory cel l in Figure 1 1 is written by applying a high current (four t imes read current) to the "off' bit line emitter. The current steering tran sistors prevent this current from reaching a bit line emitter that is already " on . " Thus, attempting to write a cell that is a lready in the desired state does not result i n any additional cell current beyond the normal read current, and no additional charge stor age occurs.
Other Chip Features
Other noteworthy chip features include scan logic, parity error detect logic , and a data pipeline for write port 0 data. Scan operation gives access to the register RAMs. In a single scan-in and scan-out oper ation, it is possible to read five registers and to write three registers.
Parity checking logic is used to detect input errors and set error flags. There is a parity check on the 9-bit write port data inputs. Another parity
1.51 �
� 0.51
� 0.51
VA VA
KEY:
WC - WRITE CONTROL UWL - UPPER WORD L I N E B L - B I T L l N E (LEFT) BR - BIT LINE (RIGHT) LWL - LOWER WORD LINE
VA - VOLTAGE R EFERENCE
Figure 11 SCR Memory Cell with Bit Line Current Steering Circuit
checker is applied to address and control inputs. These are assigned to three parity groups, with a parity bit input for each group.
The write port 0 data pipeline allows a delay of one. two, and three clock cycles to be selected , delaying the write port data as necessary to resolve register access conflicts.
Self-timed RAM
In the VAX 9000 system - as in any high-perfor mance CPU - fast memory is used for cache and control store applications. Engineers traditionall y use very fast static RAMs within the CPU for mem ory. Logic designers, however, have long recognized that CPU performance is often l imited as a result of the time needed to access data in these RAMs. This l imitation is not only the result of the access time and write cycle performance of the devices them selves, but also of t he off-chip circuitry and inter connect used for w rite pulse generation and distribution . The logic designers and technologists
58
for the VAX 9000 knew that unless some architec tural improvements were made to the traditional static RAM , much of the RAM performance improve ments would be lost in the wiring interconnect. They also realized that Digita l 's memory suppliers would have to be convi nced that a new RAM archi tecture would be marketable to their other cus tomers. After several design iterations, the tech nologists submitted a set of specifications for a synchronous, self-timed RAM (STRAM) to several suppliers for their review. After extensive market surveys, our memory suppliers agreed that this new architecture could eventually become a new stan dard for high-speed static RAMs.
The VAX 9000 system requires two configura tions of the basic STRAM dev ice: I K words by 4 bits, and 4K words by 4 bits. A block diagram of the STRAM is shown in Figure 12. The STRAM is similar to the traditional RAM in that it has chip select, input address and data, and output data . However, the STRAM also has several nontraditional inputs such
Semiconductor Technology in a High-performance VAX System
as write, a differential clock, and a reference voltage (Vbb). Latches added to all inputs and ourputs provide pipelined timing. An internal write pulse generator controls write operations and eliminates the need to generate and distribute the write pulse signal externally on the modu le. Also two optional output configurations are provided : a 50-ohm drive open emitter for standard parallel termination on the module, and a resistor and pulldown current source which is w ired externally to implement STECL or on-chip source termination.
The clock buffer design al lows inputs to be driven differentially from off-chip to m inimize clock skew. The clock buffer is also designed to accommodate customers who are not greatly con cerned about skew or who may be more concerned about conserving routing area. One input of the clock buffer may be tied to the output pin of the reference generator which provides the standard ECL threshold vol tage (Vbb), al lowing the other input of the clock buffer to be driven in a single ended mode.
D I N <3:0>H
ADDR<M - 1 :O> H
WRITE L
Input and output latches are clocked on opposite edges of the internal differential clock buffer. Tim ing diagrams are shown in Figure 13. On a falling edge of CLK H , data and address i nputs flow into the RAM array.
I f w rite is asserted during the next rising edge of CLK H , then a write cycle is initiated, and the input data is stored in the memory at the add ress presented at the ADR inputs. At the same time, the data is passed through the mu ltip lexer and the out put latch.
If write is deasserted on the rising edge of CLK H, then the STRAM is in a read cycle and input data is ignored _ The data stored in the RAM at the address presented at the A DR inputs flows out to the multi plexer and output latch.
If chip select (CS) is deasserted prior to the rising edge of CLK H , then write and read operations are disabled and the output latches are reset low.
For p roper operation of the STRAM , certain timing requirements must be fulfilled . The write operation is terminated by either the falling edge of
RAM ARRAY 2 M X 4 D I N DOUT <3:0><3:0> ADDR W R EN WRITE PULSE G E N ERATOR DOUT RAM <3:0>H CLOCK H DO<ST><3:0>H CS L CLOCK H D CLK H 0 CLK L
Figure 12 STRAM Block Diagram
Digital Tecbnicaljournal Vol. 2 No. 4 Fa/1 1990
DLY CLK H
NOTE: CLOCK HIGH STATE M U ST LAST LONG ENOUGH TO COMPLETE A WRITE CYCLE
CLK WRITE
ADDR, D I N , CS
DATA OUT
KEY:
0 RD - READ OPERATION CYCLE 0 1 WR - WRITE OPERATION CYCLE
1
1
WR 2 R D 3 R DFigure 13 STRAM Timing Diagrams
CLK H or by the internal write pu lse generator, whichever occurs first. Therefore CLK H must be asserted long enough to ensure that data is properly written into the memory array. The internal write pulse generator provides an output having the proper duration as determined by a string of gates.
Also, the assertion of the internal write pulse sig nal must be delayed by an amount equal to the inter nal access time of the RAM . In this way. the correct data is stored , and not the data previously stored in the input registers. The delay is accomplished by the row delay circuit, which is also simply a string of gates. These featu res give the STRAM i ts "self timed" nature.
Acknowledgments
The authors would l ike to acknowledge the follow ing individuals who participated in and contrib uted to the success of the VAX 9000 project: Jerry
Weisbach, Andy Moroney, Bob Haller, Marc Lamere, Mark Hamel, Tom Senna, Dave McCall, Patty Kroesen, Rick Jones, jim jensen , Terry Skrypek , Eugene Marteney, Paul Guglielmi, Elaine Fire, Larry Herman, Bill G rundman n , Mark Pascarelli, Fran Richard, Linda Greska, Jack Mason, Chris Caiazzi, Roger Dame, Mike Normand Steve Sullivan, Rob Rcinschmidt, Bob Bechdolt, Mike Warder, M i ke Hickman, Brian Sadler, Wayne Nunn, Rita Wespi, Gene Yee, Bruce Smith, Alisyn Emerson, J im Glanville.
60
References
1 . D. Marshall and ]. McElroy, " VAX 9000
Packaging, The Multi-Chip Unit," Pmceedings of
COM PC ON '90 (Spring 1990).
2 . P. Zdebel et al. , " MOSAIC l l l - A H igh Perfor
mance Bipolar Technology with Self-Aligned Devices," Proceedings of IEEE 1987 Bipolar Circuits and Technology Meeting
3. D. Fire and T. Fossum, "Designing a VAX for High Performance," Proceedings of COMPCON '90
(Spring 1990).
4. C. Baugh and B. Wooley, "A Two's Complement Parallel Array Multiplication Algorithm," Sh011 Note a t COMPCON 73, 7th A n n ual IEEE
Computer Society International Conference
(February 1973).
5. C . Wallace, "A Suggestion for a Fast Multipl ier,"
1 EEE Transactions on Electronic Computers, Vol. EC- 13 (February 1964): 14- 17.
6. L . Dadda, "Some Schemes for Parallel Multipliers," Colloque sur l 'Algebre de Boote Oanuary 1965).
7. K . Hwang, Computer Arithmetic Principles, Architecture, and Design (New York: john Wiley and Sons, 1979): 213-283.
Richard A. Brunner Dileep P. Bhandarkar Francis X. McKeen Bimal Patel William]. Rogersjr. Gregory L. Yoder