Clocking
The CFPA chip employs a fou r-phase overlapp i ng clocking scheme wh ich provides t i m ing resol u t ion . Much of the control c i rc u i try design cal ls for combi national c i rcuits that operate between
latc hes clocked on nonconsec utive phases, which are nonoverlapping.
Multiplier
As noted in the sect ion M u l t i pl ication . i t was rec ogn i z ed early i n the c h i p design that the multi plier array wou ld he key to mee t i n g the desi red performance The CFPA i m plements mu l t i p l ica tion by using an array of carry save adders with part i a l product wraparound. The wraparound enables the array tO be cycled as many t i mes as necessary . The fi nal carry and sum add i t ion is executed in the fraction ALU . A static i mp lemen tation of the carry save adders i s necessary s i nce data propagates through multiple rows of t he array.
To b u i l d the carry save adders, we used a four transi stor XOR. This approach a l l owed for m i n i -
Digital Technical journal No. 7 A ugust 1988
Figure j CFPA Physical Layout
mum delay and req u i red rhe lcasr amounr of c h i p area . As a resu l r of SPICE s i mu lation . we found t hat dou b l i n g the m i ni mu m s i ze of rhe rransisrors in the mu l t i p l ier array cou ld provide a 20 per cent speed increase . Si nce the ce l l area was con stra i ned by rhe necessary i merconnect in the
mcral layers . rhc device si zes were i ncrcascd without affecting the cel l size Further dcvice size incrcascs, however, wou l d have forced us to i ncrease the ce l l s i z e and wou ld not have i mproved speed appreciably due ro increased se l f- l oad i ng W i t h t he approach we chose . SPI.CE s i m u lation showed a worst -case delay of (l ') ns
per row and a typ ica l delay of 4 . '5 ns .
To obta i n rhc dcs i red m u l t i p l icat ion pcrfor mancc and m i n i m i zc t he a rea neccssary for the mu l t i p l in array. wc used a technique i n which t hc array i s cycl cd rwice per microcyc l c . For Digital Technical journal
No. 7 A ufi /1.<1 1 988
worst-case deviccs . a half cycle takes 4 '5 ns. An
array size of fou r rows takes 26 ns to propagate t h rough t he array , a ll ow i n g 1 9 ns for latching, return of part i a l products, and control sw it ching. For typical devices four rows com p l ete in 1 8 ns, a l l ow i n g 22 ns in an 80-ns cycle for the wraparound pat h .
Con trol PLA
We a lso recogni zed the fract ion shift control PLA as a po�s i b l e speed l i m i tation . The shift comrol PLA was the largest PLA i n t he control section and had to eva l uate in a s i ng l e c lock phase . Because no c l ock signals were avai l a b l e ro com rol eva lua tion of t he PLA, we used a " d u m my" A.i\1 0 array
term to start eval uat ion of the OR array . A
" d u m my" OR l i ne conrrols out pu t clocki ng. mak i ng the PIA se l f- t i me d . Because t h i s PLA coul d be
of the C VAX Floating Point Chip
eva luated i n a si ngle clock phase, bot h alignment and normal i zation operations were able ro e l i m i nate an unnecessary wai t cycle present on the MicroVAX FPU . We were a lso able to expand the di vide algori thm to 4 bit shifts per cycle .
A-; we had suspected , the l i m i t i ng factor in the fina l chip cycle t i me was t he mu l t i p l ier array. The Al..Us and the large control P U\s in bath the m icrocode control section and the BIU eas i ly met speed req u i rements i n the CMOS I process.