Many novel circ u i t structures ami d etailed ana lysis tech niques were developed to support the clock rate in conj unction with the complexi ty demanded by the concu rrence ami wide data raths. The clock ing method is si ngle wire level sensitive . The bus i nterface u n i t op erates fro m a buffered version of the main clock. Signals that cross t h is i nterface are des kewed to el iminate races. T h is clocking method e l i m i n ates dead t i me between phases ami requ i res only a single clock signal to be rou ted throughout the chip.
40
One difficu lt y i n herent in this clocking method
is the substantial load on the clock
node, :).25
nanofaracl (nf) i n our design. This load and the requ irement for a fast clock edge led us to take p ar ticu lar care ·w ith clock rou ting and to do extensive analysis on th e resultin g grid . Figure 6 shows the d istri b u t io n of c lock load am ong the m ajor func tional u n i ts. The clock drives into a grid of vertical meta l 3 and horizontal metal
2.
Most of the l oading occurs in the integer and floati ng-point u n its that are fed from the more robust metal 3 l i nes. To ensure the integrity of the clock grid across the chip, the grid was ext racted from the layo ut and the resu l t i ng network, w h ich conta ined630,000
RC ele ments, was simul ated using a circ u i t simulation program based on the AWEsim simu lato r from Ca rnegie-Mel lon l ! n iversiq'. Figure 7 shows a three dimensional rep resentation of the o u t p u t of this simu lation and shows the clock delay from the d river to each of the6),000
transistor gates con necteLI to the clock grid .The
200-MI-Iz
clock signal is fed to the driver through a bi nary fa n ning tree with five levels of b u ffering. There is a horizontal shorting bar at the input to the clock driver to help smooth out possi ble asymmetry in the i ncom i ng wave front. The d river itself consists of145
separate elements. each of which contains fou r levels of prescal i ng into a final output stage that drivl's the clock grid .A 200-MHz 64-bit Dual-issue Clv/05 Micmprocessor I NT UNIT
1 1 29
pF WRITE BUFFER82
pF I-CACHE373
pF FPU803
pF D-CACHE208
pFTOTAL BYPASS CAPACITANCE =
1 28
nFNote: Total ellective switching capacitance = 12.5 nF
Figure 6 Clock Load Distribution
•
<2.93E+01
•
5.86E+01
300•
8.79 E+01
(j) 200 >-1 . 1 7 E+02
"' a; Cl 1 001 .47E+02
01 .76 E+02
2.05 E+02
6'2.34E+02
· o<"�
'?9 �
<9·o
2.64E+02
� � -qo
<0�•
<2.93E+02
�' �<:> \...� �' "0e'�Figure 7 CPU Clock Skew
The clock driver and p red rive r represent about 40 percent of the total effect ive switching capaci tance determined by power measu rement to be
l2.5
nF (worst case including ou tput pi ns). To manage the problem of di/dt on the chip power pins, explicit decoupl ing capacitance is providedDigital Techuicnl jourua/ Vul. ·1 ,\u. 1 .\jJ('cial lssue /1;)').2
on-chip. Th is consists of thin oxide capacita nce that is distribu ted around the chip, primarily under the data buses. In addition, there are horizonta l metal 2 power and clock shorting straps adjacent to the clock generator, and the thin oxide decoup l i ng cap under these l i nes suppl ies charge to
the clock driver.
di/dt
for the driver alone is about2
x 10n amp eres per second. The total decoupl ing capacitance as extracted from the layout measures128
nF Thus the ratio of decoupl ing capacitance to switching cap is about 10: l. With this capacitance ratio, the decoupl ing cap could supply a l l the charge associated with a complete CPU cycle with only a 1 0percent reduction i n the on-chip supply vo l tage.
Latches
As previously described, the chip employs a single p hase approach , with nearly a l l latches in the core of the chip receiving the clock node, CLK, directly. A representative example is i l l ustrated in Figure
8.
No tice thatL1
andL2
are transparent latches separated by random logic and are not simu ltane ously active; L1 is active when CLK is high andL2
is active when CLK is low. The minimum numbe r of delays between latches is zero and the maximum nu mber of delays is constrained only by the cyc le time and the deta ils of an y releva nt critica l paths. The bus i nterface unit, many da ta-p ath structures, and some critical paths deviate from t his approach and use bu ffered versions ancl/or cond itio naJ ly buf-
CLK
42
(a) Latching Schema
L1 OPAQUE
TR