• No results found

Circuit Implementation

In document dtj v04 04 1992 pdf (Page 42-44)

Many novel circ u i t structures ami d etailed ana lysis tech niques were developed to support the clock rate in conj unction with the complexi ty demanded by the concu rrence ami wide data raths. The clock­ ing method is si ngle wire level sensitive . The bus i nterface u n i t op erates fro m a buffered version of the main clock. Signals that cross t h is i nterface are des kewed to el iminate races. T h is clocking method e l i m i n ates dead t i me between phases ami requ i res only a single clock signal to be rou ted throughout the chip.

40

One difficu lt y i n herent in this clocking method

is the substantial load on the clock

node, :).25

nanofaracl (nf) i n our design. This load and the requ irement for a fast clock edge led us to take p ar­ ticu lar care ·w ith clock rou ting and to do extensive analysis on th e resultin g grid . Figure 6 shows the d istri b u t io n of c lock load am ong the m ajor func­ tional u n i ts. The clock drives into a grid of vertical meta l 3 and horizontal metal

2.

Most of the l oading occurs in the integer and floati ng-point u n its that are fed from the more robust metal 3 l i nes. To ensure the integrity of the clock grid across the chip, the grid was ext racted from the layo ut and the resu l t i ng network, w h ich conta ined

630,000

RC ele­ ments, was simul ated using a circ u i t simulation program based on the AWEsim simu lato r from Ca rnegie-Mel lon l ! n iversiq'. Figure 7 shows a three­ dimensional rep resentation of the o u t p u t of this simu lation and shows the clock delay from the d river to each of the

6),000

transistor gates con­ necteLI to the clock grid .

The

200-MI-Iz

clock signal is fed to the driver through a bi nary fa n ning tree with five levels of b u ffering. There is a horizontal shorting bar at the input to the clock driver to help smooth out possi­ ble asymmetry in the i ncom i ng wave front. The d river itself consists of

145

separate elements. each of which contains fou r levels of prescal i ng into a final output stage that drivl's the clock grid .

A 200-MHz 64-bit Dual-issue Clv/05 Micmprocessor I NT UNIT

1 1 29

pF WRITE BUFFER

82

pF I-CACHE

373

pF FPU

803

pF D-CACHE

208

pF

TOTAL BYPASS CAPACITANCE =

1 28

nF

Note: Total ellective switching capacitance = 12.5 nF

Figure 6 Clock Load Distribution

<

2.93E+01

5.86E+01

300

8.79 E+01

(j) 200 >-

1 . 1 7 E+02

"' a; Cl 1 00

1 .47E+02

0

1 .76 E+02

2.05 E+02

6'

2.34E+02

· o

<"�

'?9

<9

·o

2.64E+02

� -qo

<0�

<

2.93E+02

�' �<:> \...� �' "0e'�

Figure 7 CPU Clock Skew

The clock driver and p red rive r represent about 40 percent of the total effect ive switching capaci­ tance determined by power measu rement to be

l2.5

nF (worst case including ou tput pi ns). To manage the problem of di/dt on the chip power pins, explicit decoupl ing capacitance is provided

Digital Techuicnl jourua/ Vul. ·1 ,\u. 1 .\jJ('cial lssue /1;)').2

on-chip. Th is consists of thin oxide capacita nce that is distribu ted around the chip, primarily under the data buses. In addition, there are horizonta l metal 2 power and clock shorting straps adjacent to the clock generator, and the thin oxide decoup­ l i ng cap under these l i nes suppl ies charge to

the clock driver.

di/dt

for the driver alone is about

2

x 10n amp eres per second. The total decoupl ing capacitance as extracted from the layout measures

128

nF Thus the ratio of decoupl ing capacitance to switching cap is about 10: l. With this capacitance ratio, the decoupl ing cap could supply a l l the charge associated with a complete CPU cycle with only a 1 0

percent reduction i n the on-chip supply vo l tage.

Latches

As previously described, the chip employs a single­ p hase approach , with nearly a l l latches in the core of the chip receiving the clock node, CLK, directly. A representative example is i l l ustrated in Figure

8.

No tice that

L1

and

L2

are transparent latches separated by random logic and are not simu ltane­ ously active; L1 is active when CLK is high and

L2

is active when CLK is low. The minimum numbe r of delays between latches is zero and the maximum nu mber of delays is constrained only by the cyc le time and the deta ils of an y releva nt critica l paths. The bus i nterface unit, many da ta-p ath structures, and some critical paths deviate from t his approach and use bu ffered versions ancl/or cond itio naJ ly buf-

CLK

42

(a) Latching Schema

L1 OPAQUE

TR

TF

L2 OPAQUE

In document dtj v04 04 1992 pdf (Page 42-44)