The small spacing dimensions of the submicron
C\·IOS-4 process caused fringing and lateral capaci
tances to contri bute significantly to the total nodal capaci tance. The existing l ayout extraction tool only extracted overlappi ng paral lel plate capaci tance. Thus, a new layout capacitance extractor, C:U l', was written ro accurately extract fri nging, lat eral, and area capacitances.C U P extracted interconnect capaci tance from
layout by decomposing i n terconnect layout into p ieces of u niform l ayou t cross sections. The geome try of each interconnect p iece, and i ts d istance from layers above, below, and adjacent to it, a re used to calculate its area, fri nging, and lateral com ponents of capacitance. The emp i rical formula e used to calcu late the capacitive components were based on cmves of t wo-di mensional electrost atic simu lation data of various layout cross sections. This tech nique produced accurate i nternodal and total interconnect capacitance data. This accu racy resu l ted in CUP being very compute intensive .
Multiprocess ing was employed again to reduce the elapsed tu rnaround time fo r capacitance
extraction batch jobs. CUP sectioned the layout
database into fixed-size stripes, which were i nserted i nto the batch que ues of mul tiple crus. This method red uced the data complexity and a l lowed as many paral lel computati ons as there were processors. Duri ng NVAX chip design , capaci tance extraction was partitioned across as many as
20 CPUs. M u lt iprocessing reduced , fo r example, the NVA_,'\: I-box capacitance extraction from 26 hours to
just 8 hours using 4 p rocessors. Extraction of the
E-box took 40 hours using one processor, but only
1 2 hours with 4 CPUs. Ta ble 2 shows the device and node co unts of the
NVAX
boxes (exclud i ng the caches), ancl the CUP extraction run t i mes on a VAX 6000 Model 500. Each box r u n resulted in approx i mately 500,000 extracted parasitic capacitors.Resistance Extraction
Verify i ng the NVfu'\ power, grou nd, and c l ock net works, and long signal l ines requ ired accurate extract ion of i ntercon nect resistance from layou t . To meet this requirement, the R E X resistance
extractor was developed 9 R EX processed the
output of H I LE X to produce a series and para llel
combi nation of resistors that modeled a node's interconnect. The resistor network was generated by fracturing the node layout i nto polygons based on changes i n the layout geometries (width, length , bends) of the node o r the occu rrence o f con tacts. The effective resistance of each polygon and con tact, or cluster of contacts, was then determined from techn ology parameters and the polygon geometries.
The power and ground resistor networks were extracted fo r individual boxes rather than the entire chip. The resu lting networks were sti ll quite large due to the fine gra nularity of the REX extract ion
process. Table 3 shows the extraction ti mes fo r a
R EX job run ning on a VAX 6000 Model ')00 and the
total nu mber of resistors extracted from each box .
Ta ble
2
CUP Pa rasitic Capa citance Extraction Batch Run-time Data for NVAX BoxesBox Device Count Node Count
1-box 1 07,000 36,830
M-box 1 02,000 38,770
E-box 1 07,600 41 ,760
C-box 92,400 45,050
F-box 1 29,1 50 55,550
Single CPU Four CPUs
(Hours) (Hou rs) 26 8 29 8 40 1 2 42 12 45 1 2.5
The NVAX CPU Chip: Design Challenges, Methods, and CAD Tools
Table 3 REX Extracted Parasitic Resistance Data and Batch Run-time Data for NVAX Boxes
Extraction Time
Box Resistor Count (Hou rs)
M-box 602,000 5
C-box 621 ,000 5
1-box 522,000 1 0
E-box 719,000 10
F-box 1 ,200,000 36
New Proprietary CAD Tools
Several other novel CAD tools were specifically designed for the NVAX chip. These tools provided practical solutions to verification and analysis prob lems that were previously u nmanageable or intractable.
CHANGO Logic Simulator
CHANGO was an important development for NVAX functional verification because it allowed sig nificantly more simulation cycles and fu nctional verification tests to be performed from the NVAX transistor-level description than was previously possible. CHANGO is a two-state gate-level logic simulator designed to maximize performance through compiled, straight-l ine simu lation . Elec trical issues such as gate delay and charge sharing were not mode led since CHANGO was used for functional, not t iming, veri fication. CHANGO's para l lel simu lation capability a llowed the simul taneous execution of 13 different NVAX model simulations on one CPU, which resu lted in an eight fold increase in simulation performance. Overal l, CHANGO has been shown to accelerate simu lation one to two orders of magnitude over traditional event-d riven gate- level simulators. Its high through put enabled us to boot the VMS operating system twice (75 m i l lion cycles) prior to tape out.
To create a CHANGO model, a trans istor-level netlist description of the design was inpu t to a pre processor cal led GEN_MODEI.. GEN_MODEL trans formed the netl ist into a logical description of the design, consisting of simple Boolean elements, D-type latches, and SR flops. CHANGO trans formed this logical description i nto a highly optimized sim ulation stream of VAX assembly code.
CHANGO achieved its high simu lation through put in many ways. Conditional branch latency penalties were largely avoided because CHANGO
Digital Technical journal Vol. 4 Nu. 3 Summer 1992
simu lation code is designed to execute i n a straight line fashion . Due to the high switching event densi ties we observed on NVAX, 18 percent on average, this straight-line compiled approach to simu lation was more efficient than event- driven simu lators, which typical ly fail to compete when event densi ties increase beyond 3 to 5 percent. The CHANGO trans lation process further optimized the sim ulation by partitioning the simulation according to signals that should be evaluated during each par ticular clock phase. This avoided processing signals during clock phases when signal tra nsitions could not occur. Further, evaluation of a switching event was only performed when the sign al cou ld affect the evaluation of some other signal. This prevented simu l ation of u nimportant switching events that were ignored by the remaining design. Redundant signals (i.e., nodes with the same logical behavior) were grouped together as a l ist of synonym signals in order to moclel multiple nodes by only one simu lation event.
NTV Timing Verifier
NTV is a static timing verification tool developed for use on the NVAX microprocessor. 10 NTV processed
350,000 circu i t paths and checked 42,000 timing constraints on the NVAX design. NTV eliminated the need for the pattern-dependent dynamic speed verification strategy used by other chip designs and significantly reduced the extensive speed ver ification work needed for SPICE simulations. It identified critical paths that wou ld have otherwise remained undetected clue to rhe complexity and size of the NVAX design.
NTV read m u ltiple flat transistor netlists with or without parasitics and automatica l ly identified cir cuit structures such as complementary, dynamic, and cascode gates as we l l as several types of latches. Based on the classification of these structures, NTV identified timing constraints. For example, NTV checked that the latch storage nodes become val id before the latches closed. NTV also read user-speci fied timing for primary inputs and propagated node timing throughout the design based on when sig nals arrived at gate i nputs, the drive capability of each gate, and its output load ing.
NTV has three delay models that were used for calculating gate delay : (1) unit delay was used for an initial rough tim ing estimate before real parasitics were known, (2) a SPICE-calibrated lumped RC model was used for delay calcu lation of comple mentary gates, and (3) an El more-distributed RC
NVAX-microprocessor VAX systems
model was used for other srructu res. 11 NTV flagged circu its that fa i led to meet rhe identified rim ing constrai nts within a user-specified t i me tolerance. Like other static rim ing verifiers, some paths identi fied by NTV were ··don't cares" or were l ogically im possible. The user e l i m inated these false paths by deleting t i m i ng constraint checks o r by speci fy ing mutual exclusivity between specified groups of n odes.