• No results found

Training 1

N/A
N/A
Protected

Academic year: 2021

Share "Training 1"

Copied!
46
0
0

Loading.... (view fulltext now)

Full text

(1)

Institut f¨ur Integrierte Systeme Integrated Systems Laboratory

Department of Information Technology and Electrical Engineering

VLSI II: Entwurf von hochintegrierten Schaltungen

227-0147-00

Training 1

SoC Encounter for Designers II

Prof. Dr. H. Kaeslin

Dr. N. Felber

SVN Rev.: 1016 Last Changed: 2013-10-15

Reminder:

(2)

1 Overview

Unlike other exercises in the VLSI lectures, the back-end design flow requires you to learn how to use a commercial Electronic Design Automation (EDA) tool, in our case CADENCESOC ENCOUNTER

from Cadence Design Systems. These exercises are therefore called ’Trainings’ and will teach you the basics of CADENCESOC ENCOUNTER so that you can use it for your semester projects.

There will be three trainings: • Training 1

Floorplanning, placement, clock tree synthesis, optimization, routing and timing analysis with CADENCESOC ENCOUNTER.

Training 2

Determining power consumption, IR drop analysis. • Training 3

Tape-out preparation, performing Design Rule Check (DRC) and Layout Versus Schematic (LVS) on your final database.

Students who plan to work on an ASIC semester project should make sure to visit all three trainings.

1.1 About the Style

We will try to use a number of different styles to identify different types of actions. These are summa-rized below:

Student Task: Parts of the text that have a gray background, like the current paragraph, indicate

steps required to complete the exercise.

Actions that require you to select a specific menu fill be shown like the following: menu→sub-menu→sub-sub-menu

Whenever there is an option or a tab that can be found in the current view/menu we will use aBUTTON

to indicate such an option.

Throughout the exercise you will be asked to enter certain commands using the commandline1. The following is an example of the linux command line.

sh > command to be entered on the linux command line

Whereas some of the commands will be entered on the command line of the CADENCE SOC EN

-COUNTER tool such as:

enc > this command is an encounter command

1 There are many reasons for using a commandline. Some functionality can not not be accessed through GUI commands,

and in some cases, using the commandline will be much faster. Most importantly, things you enter on the commandline can be converted into a script and executed repeatedly

(3)

2 Introduction

In this training we will start with a structural Verilog design netlist (from synthesis) and create step by step a physical layout that can be manufactured. To keep runtimes reasonably low, we will use an example design with a (slightly) lower complexity than most student design projects.

2.1 Example Design

The example design is based on the FIR filter that we have been using in the past exercises. The filter has been changed to include several pipelined filter stages as shown in the block diagram below2.

16 32 16 48 48 48 48 48 LUT filter_stage8 16 32 16 48 48 48 48 48 LUT filter_stage1 16 32 16 48 48 48 48 48 LUT filter_stage2 16 32 16 48 48 48 48 48 LUT filter_stage3 16 32 16 48 48 48 48 48 LUT filter_stage4 filter ’0’ fiter_top filter_chip DataInxDI DataInReqxSI DataInAckxSO ResetxRBI ClkxCI ScanEnxTI RamTestxTI DataOutxDO DataOutAckxSI DataOutReqxSO RamRDxD RamWDxD RamAddrxD r256x72tb300xo SY180_2048X16X1CM8

Each filter stage contains a large multiplier, a look-up table and an accumulator. Note that the input of the first stage is tied to constants and therefore greatly simplified. The following is a short description of all pins of the circuit:

2

(4)

Pin Descriptions

Name Bits Dir Description

ClkxCI 1 In Clock input

ResetxRBI 1 In Reset input, active low signal, 0: Reset

ScanEnxTI 1 In Scan Enable for testing, 1: Scan

RamTestxTI 1 In Ram bypass control, 1: Test (RAM bypassed)

DataInxDI 16 In 16-bit data input

DataInReqxSI 1 In Request signal for data input

DataInAckxSO 1 Out Acknowledge signal for data input

DataOutxDO 16 Out 16-bit data output

DataOutReqxSO 1 Out Request signal for data output DataOutAckxSI 1 In Acknowledge signal for data output

3 Getting Started

You will need a terminal program to type in commands throughout this exercise. In the computers in the ETZ D61.2 you can get a terminal by accessing the menu on the top left corner and selecting Applications→Accessories→Terminal.

Student Task 1:

• Change to your home directory and install the training files with the script provided: sh > cd ˜

sh > /home/vlsi2/t1/install_t1

• Change to the design directory sh > cd training_1

The copied files and folders are arranged in a certain structure which is described in the next sec-tion.

3.1 Directory Structure

The following figure shows the directory structure for a design directory that was created by the cockpit tool developed by the Design Zentrum (DZ) of ETH Zurich.

(5)

.cockpitrc calibre encounter modelsim simvectors sourcecode synopsys tetramax out save scripts src tech

design Configuration for the cockpit

Final layout, DRC and LVS

Simulation tool

Stimuli and expected responses VHDL sourcecode

Synthesis environment

Test vector generation, test coverage

Final output files: netlist, layout, timing (Verilog,GDSII, SDF) Save files for Encounter (Encounter native format)

Example scripts, run scripts (TCL)

Input source files: netlist, constraints, io placement

Links to technology files, etc. sample Sample input files

lef

lib Links to timing libraries Links to absracts and technology

docs Links to documents

In this structure, there are five subdirectories for CADENCE SOC ENCOUNTER. It is strongly recom-mended to use them in the following way:

out Place all final data to be exported from CADENCE SOC ENCOUNTER in this directory. This includes the final netlist (the initial netlist gets modified by clock tree insertion, optimization etc.), layout and delay files that will be used for postlayout simulation and/or physical verification and chip finishing. A sample script that generates all these files is provided (scripts/exportall.tcl). save Put all CADENCESOC ENCOUNTER save files, i.e. files in native CADENCESOC ENCOUNTER

format, in this directory.

scripts Contains TCL scripts. By default several example scripts for common tasks are provided. It is highly recommended to develop a run script that contains all the commands used for your design.

src All user input files should be placed here. These include the initial Verilog netlist, the I/O place-ment file, timing constraints file and clock tree definition file (all will be explained later in section 3.2).

tech Holds links to technology specific files. Cockpit manages this directory automatically.

3.2 Input Files

The input files required for back-end design with CADENCE SOC ENCOUNTER can be divided into two categories:

• Design files that describe (or are closely related with) the circuit, first of all the Verilog netlist of our synthesized design.

• Technology files that describe the technology itself as well as libraries of standard building blocks implemented in this technology.

(6)

Let’s start with the first category.

3.2.1 Verilog Netlist

The Verilog netlist we obtain from synthesis contains standard cells, functional I/O pads and their interconnection information. While the functionality including scan circuitry is already complete, some special cells are still missing:

• Supply pads to provide power and ground to the core (pads ’VCCKD’ and ’GNDKD’) and to the padframe (pads ’VCC3IOD’ and ’GNDIOD’).

• Corner pads that need to be placed in the corners of the padframe to complete the power lines running inside the padframe (pad CORNERD).

Due to the arrangement we have with our ASIC manufacturer, student designs are strictly limited in size. As a consequence at most 56 pads (not including the 4 corner pads) can be placed in the padframe. Furthermore, to ease chip testing on the ASIC tester two predefined power schemes have been established:

1. 40 signal pads, 16 supply pads

Take a look at the following web page for an illustration of the power schemes and to obtain further information on constraints for the semester design projects.

http://www.eda.ee.ethz.ch/index.php/UmcL180#Mini.40sic

With all this information we are now ready to add the missing corner and supply pads to our Verilog netlist.

A typical Verilog netlist that you will obtain from SYNOPSYS DESIGN COMPILER will contain many levels of hierarchy. Each level of hierarchy is enclosed between the

module name ( pin names separated by comma ) ...

endmodule

statements, where ’name’ refers to the name of the module (module is the Verilog equivalent of an entity in VHDL). In our case we need to add the pads to the top-level module which contains the rest of the I/O pads. The top-level design is almost always the last module definition in a Verilog file3.

Student Task 2:

• Copy the Verilog netlist to encounter/src/ in order to have a clean copy of the initial netlist even if synthesis is rerun.

sh > cd encounter/src/

sh > cp -p ../../synopsys/netlists/filter_chip.v \ filter_chip.v.initial

The file specialpads.v contains four corner pads and 8 supply pads corresponding to the power scheme 1. As our design uses power scheme 1, no changes are required to this

3 The content of the module needs to be defined before it can be instantiated by a different module. Consequently the

top-level module is the last to be defined, however not all Verilog files need to be hierarchical, a design can also be spread between multiple files

(7)

file. For power scheme 2, we would have to comment out the eight additional supply pads (comments in Verilog start with //).

What remains to do is to add the contents of specialpads.v at the right point, i.e. where the other pads are, to the initial netlist.

• Using a text editora, open filter chip.v.initial and find the definition of the top-level module ’chip’ by searching for:

module chip

Below this declaration you should see lines that instantiate the pads. Insert the contents of specialpads.v at this point. As long as you are in the module body, it does not matter where exactly you insert them.

• Save the file as filter chip.v and exit the text editor.

a

There are many text editors you can use. There are terminal based editors (vi, vim, nvi, joe, jed, pico, nano etc.), editors that are mainly terminal based but have a simple GUI (emacs, xemacs, gvim etc), and GUI based editors (mousepad, gedit, nedit, kate etc). Out of these emacs, vi (and derivatives), and nedit are the most advanced editors.

Remark: In the future you can use a small Perl script to add the specialpads to the initial netlist, i.e. sh > ./insert_specialpads ../../synopsys/netlists/filter_chip.v \

./specialpads.v > filter_chip.v

inserts the contents of specialpads.v into the last module defined in ../synopsys/netlists/filter chip.v and write the modified netlist to filter chip.v.

3.2.2 I/O File

After the last step our Verilog netlist contains all pads. However there is no information that actually tells the tool where each pad should be placed. The pad placement is very important as it directly determines the PCB layout4. In our case, we want all designs to share a common power and ground pad locations so that a single test board can be used on our ASIC tester. For practical reasons we have decided to use a 56-pin package for all designs. So even though the chip has only 48 physical pins, it will be placed in a package that contains 56 pins5. Depending on the power configuration, a different bonding scheme will be used. These two configurations can be seen on the following webpage:

http://www.eda.ee.ethz.ch/index.php/UmcL180#Mini.40sic

The cockpit will copy sample I/O files automatically to the src/sample directory6 . All lines starting with ‘#‘ are comments. The file consists of two main sections: globals and iopad.

(globals

[global definitions]

4 A good pinout could simplify the routing on the PCB, allow you to use fewer layers and result in less parasitics 5

8 pins will be left unconnected

6

For this technology there will be four files. There will be two template files chip.io−template and chip−ep.io−template for the normal and extended power configuration respectively. These files have all the required power connections in place, and the data sections are commented out. There are also two example files that have fictional I/O placement where all pins are defined.

(8)

) (iopad

(topleft

[pads that are on the top left] )

(left

[pads that are on the left side] )

[definitions for other sides] )

For us the relevant part is the iopad section. This part contains eight subsections that define the names of the pad instances, and their locations in the four sides and four corners. We do not have to touch the corner specifications7as they will be the same for all designs. We have to distribute the pads among the four sides of the chip top, right, bottom, left. If you look at the sample file you will see that for each pad there is a single line entry in the following form

(inst name="NAME_OF_PAD" offset=OFFSET_VALUE ) # pin no: PIN_NUMBER

The last part following # is a comment, it is there just for your information. Regardless of the power scheme you are using, we will use the same 56 pin package as illustrated in the webpage above. The PIN_NUMBER is just a reminder to show which particular location is being defined. The location is specified using the OFFSET_VALUE. CADENCESOC ENCOUNTER uses a coordinate system that bases the coordinate (0,0) on the bottomleft corner as shown in the figure below:

top bottom left right 0,0 Offset Side 1 2 3 topright topleft bottomleft bottomright

On the left and right side the pads will be ordered from bottom-to-top, and on the top and bottom side the pads will be ordered from left-to-right. This ordering can be quite confusing, as it is neither clockwise, nor counterclockwise. Therefore the aforementioned comments showing the actual pin numbers will be very useful.

(9)

The OFFSET_VALUEs given in the template represent fixed locations for the given pad. It is very important that you do not change these values, as the chip-finishing part will rely on the pads being located exactly at these locations.

You can assign your pads by writing the name of each pad into the corresponding NAME_OF_PAD. The name of the pad will be the name of the instance in the Verilog file. For example assume that you are using standard power scheme and your clock signal is assigned to a pad named pad_clock. In your Verilog file you would have the following entry for this pad:

XMD ClkxCI_PAD ( .I(ClkxCI) [other pin definitions] )

If you now want to place this pad on pin number 54 of your package, you will find the subsection top in the I/O file and edit the line for pin 54:

... (iopad

... (top

...

(inst name="ClkxCI_PAD" offset= 864.28 ) # pin no: 48 ...

) ... )

Be careful, do not modify the offset value while you are editing the I/O file. Since we use a fixed

bonding scheme for the power and ground pins, all we need to do is extract the instance names for all our signal pads and place them by inserting within the appropriate inst name="" statement cor-responding the OFFSET_VALUE which corresponds to the desired location. It is also recommended to put the clock pin (if possible) to pin number 48. All new test boards will make sure that the pin 48 has the best signal quality.

Preparing the I/O file from scratch can be a lengthy and tedious task. To avoid unnecessary work during this exercise we will start with an almost complete I/O file, but before doing so we will describe the full procedure recommended when starting from scratch:

1. Start CADENCE SOC ENCOUNTER and proceed to design import8 by selecting Design→\

Import Design. In this form make sure that the IO ASSIGNMENTFILEisempty.

2. If everything works well, the design will be loaded. Now we can write out a template file that will contain all the names of the pads. Use Design→Save →I/O File ... to save an I/O file src/chip−sequence.io. You can select the SEQUENCE checkbox, however it is not imperative. What we need is only the names of the pads.

3. Copy the template I/O file src/sample/chip.io−template to src/chip.io. As noted earlier, this file includes all offset= statements, and all statements for corner and supply pads.

4. Using a text editor open the files src/chip.io and src/chip−sequence.io. You need to move the PAD_NAMEs from the file src/chip−sequence.io to the correct positions in the file src/chip.io. 5. All entries for data pins in the template file are by default commented out using ‘#‘ character.

Do not forget to remove the comment character for the pads you are using.

(10)

Student Task 3:

• Now, for this exercise you can start with the almost complete I/O file src/chip.io−incomplete\ instead of the template file. This file has all the pads placed properly with the exception of the 16 pads of the input bus DataInxDI which are still missing.

Furthermore the file src/filter chip.sequence.io mentioned above has already been gener-ated for you.

The desired I/O assignment is depicted in the figure below and can also be found in the file src/filter chip.io.psa.

• Create the complete I/O file and save it as src/filter chip.io.

a Postscript viewers were very common in the earlier days, you can use gv, kghostview, or evince to view this file

You can use the utility src/io2ps.pl to generate a postscript file from your I/O file. This utility will also verify if you have used the correct offset locations in you I/O file, and will report errors. For best results, you should also provide the Verilog netlist file, which will enable the script to make even more checks.

sh > ./io2ps.pl filter_chip.io > filter_chip.pin_diagram.ps

The src/io2ps.pl utility uses a configuration file with the extension .pads. Per default the file src/io2ps\ .pads will be used. If you are planning to use the extended power scheme, you will have to add the configuration file src/io2ps−ep.pads to the command as well.

1 15 29 43 2 16 30 44 3 17 31 45 4 18 32 46 5 19 33 47 6 20 34 48 7 21 35 49 8 22 36 50 9 23 37 51 10 24 38 52 11 25 39 53 12 26 40 54 13 27 41 55 14 28 42 56 pad_vcc_p1 DataInxDI_PAD_9 DataInxDI_PAD_8 DataInxDI_PAD_7 DataInxDI_PAD_6 DataInxDI_PAD_5 pad_gnd_c1 pad_vcc_c1 DataInxDI_PAD_4 DataInxDI_PAD_3 DataInxDI_PAD_2 DataInxDI_PAD_1 DataInxDI_PAD_0 pad_gnd_p1 pad_vcc_p2

DataInxDI_PAD_10 DataInxDI_PAD_11 DataInxDI_PAD_12 DataInxDI_PAD_13 DataInxDI_PAD_14 pad_gnd_c2 pad_vcc_c2

DataInxDI_PAD_15

DataOutAckxSI_PAD

DataOutReqxSO_PAD DataOutxDO_PAD_0 DataOutxDO_PAD_1

pad_gnd_p2 pad_vcc_p3 DataOutxDO_PAD_2 DataOutxDO_PAD_3 DataOutxDO_PAD_4 DataOutxDO_PAD_5 DataOutxDO_PAD_6 pad_gnd_c3 pad_vcc_c3 DataOutxDO_PAD_7 DataOutxDO_PAD_8 DataOutxDO_PAD_9 DataOutxDO_PAD_10 DataOutxDO_PAD_11 pad_gnd_p3 pad_vcc_p4 DataInReqxSI_PAD DataInAckxSO_PAD RamTestxTI_PAD ScanEnxTI_PAD ClkxCI_PAD pad_gnd_c4 pad_vcc_c4 ResetxRBI_PAD DataOutxDO_PAD_15 DataOutxDO_PAD_14 DataOutxDO_PAD_13 DataOutxDO_PAD_12 pad_gnd_p4

(11)

3.2.3 Timing Constraints

Just as for synthesis, we need to specify timing constraints for the backend design with CADENCE

SOC ENCOUNTER.

With decreasing process geometries the impact of placement and routing on timing, power, etc. is steadily increasing. Therefore, timing analysis and optimization have become very important in order to arrive at a layout that (still) satisfies all requirements.

As CADENCESOC ENCOUNTER supports most of the more common SYNOPSYS DESIGNCOMPILER

commands/constraints it should be rather straight forward to create an appropriate timing constraints file based on the constraints used for synthesis.

Student Task 4:

• There is an example constraint file src/sample/chip.sdc−sample that contains the most commonly used commands along with many useful and important comments.

Copy this file to src/chip.sdc and modify it so that the following constraints get set (and nothing else!):

– Define a 125 MHz clock

– Specify 3.5 ns input delay for all inputs – Specify 5.0 ns output delay for all outputs

– Specify an input transition time of 0.8 ns at all inputs – Specify a 15 pF output load for all outputs

3.2.4 Technology Files

The tech directory and the two subdirectories contain technology files that describe the technology itself as well as libraries of standard building blocks implemented in this technology, i.e. standard cells, pads, RAM/ROM.

Technology files (UMCL180)

lef/header6 V55.lef Base technology description, defines metal layers, vias, spacing rules, routing

umcL180.capTbl Table used to extract parasitic capacitances and resistances for signal and power wires.

streamout.map Layer mapping table used when exporting the final layout in GDSII format. • Library files (standard cells, pads, macro-cells)

lef/*.lef Physical description, shape and allowed orientation of cells, layer and shape of pins, blockages, antenna information, ...

lib/*.lib Functional description, timing and power information, maximum load/fanout or transition-time allowed, ...

(12)

3.2.5 Macro-cells

The macro-cells for the umcL180 process are created using dedicated memory compilers. The spe-cific memory compiler we have access to is able to create five different types of macro-cells with various capacities:

• SU180 : single-port static RAM • SJ180 : dual-port9static RAM • SY180 : single-port register-file10 • SZ180 : two-port11 register-file • SP180 : via programmable ROM

The following parameters are used for the macro-cells:

words

Number of words in the memory

sub-word size

Number of bits within a sub-word of the memory. The sub-word is the smallest unit used for data access in the macro-cell12.

number of sub-words per data word

This parameter allows creating multiple sub-words. Each sub-word can be written to separately. For example, A 32-bit RAM can be configured as having a single 32-bit sub-word, or two 16-bit sub-words, four 8-bit sub-words and so on.

column or block multiplexer

This parameter affects the geometry of the macro-block. This can have significant influence on the performance of the macro-block. There is no general rule to determine this parameter. Once the memory requirements are known, all possible geometries will be considered and the most suitable one will be determined.

There are several available macro cells, their datasheets can be found under:

/usr/pack/designkits-1.0-ma/umc_L180/faraday/gen/memaker/200901.1.1/datasheet.dz

If none of the available macro-cells suit your needs more can be easily generated on demand. Please contact the Microelectronics Design Center for this purpose.

Our example design uses a single-port RAM named SY180_2048X16X1CM8. This RAM has 2048 words of 16-bits each (single sub-word) and a block multiplexer of 8. All necessary preparations to work with this macro-cell have already been done, so you do not need to do anything additional for this exercise.

9

dual-port memories have two completely independent access ports. At the same time two separate memory addresses can be accessed for both read and write.

10 Although the name suggests that the memory is made out of individual registers, it is very similar in design to SRAM. 11

In two-port memories, the read and write ports are separate, so you can simultaneously read and write. There are timing constraints for reads and writes to the same address, please refer to the memory compiler manual for details.

12

In many places this sub-word is referred to as ’byte’. This might be slightly confusing, since a byte is commonly accepted to be an information unit consisting of 8-bits.

(13)

4 Importing the Design

Student Task 5:

• Start CADENCESOC ENCOUNTERaeither from your design directory by using cockpit

sh > cd ˜/training_1 sh >

sh > icdesign umcL180 &

• or from the encounter directory by issuing the command cd ˜/training_1/encounter

cds_soc81 encounter

a

This exercise uses version 8.1 of the Cadence SoC Encounter . There are newer versions of these software, however the main principles have not changed much so we will continue to use this version for this exercise, newer versions have slightly changed GUI elements, and improved capabilities for some functions.

We will now import our design.

CADENCE SOC ENCOUNTER uses a large configuration file that defines the design and technology files to be loaded as well as some global settings to be applied.

Cockpit does automatically generate an appropriate sample configuration file src/sample/chip.conf that should be used to start with.

Student Task 6:

• Copy the sample file into the src directory.

sh > cp src/sample/chip.conf src/filter_chip.conf

• Select Design→Import Design ... to open the design import form. This form con-tains fields for all configuration options. At the bottom of this window, there are buttons to load and save the configuration from/to a file. Use the LOAD ... button to load the configuration file we have just copied to the src directory.

• On the BASICtab make sure that VERILOG NETLIST:, TIMING CONSTRAINT FILE: and IO ASSIGNMENT FILE: match your design. COMMON TIMING LIBRARIES: and LEF FILES: should already be correct.

• On the ADVANCED tab the only setting you might want to adapt for your design is the DEFAULTDELAYPINLIMIT: in the category DELAYCALCULATION. We will explain this later. • Once you are happy with the configuration don’t forget to save your changes to the

config-uration file.

• Click OKto import your design. Monitor the messages on the console for errorsa.

Pay attention to the messages where the timing constraint files is loaded (“Reading timing constraint file”) to see if everything was accepted! If there are errors, you need to fix them!

a

(14)

We are now in the floorplan view of CADENCESOC ENCOUNTER which displays an empty floorplan

with only the pads placed. All top level module(s) of the netlist are shown as a pink/purple square to the left and all macro-cells to the right. Note that all standard cells are inside the module(s).

5 Floorplanning

Now we will have to decide how cells and macro-cells will be placed on our chip. This process is called floorplanning. For a standard design, our main concern would be to find a floorplan that will result in the smallest possible area, while fulfilling all performance and reliability requirements. This is purely driven by economical reasons, since chip costs are mainly determined by the area. In some cases there are additional geometrical constraints. The manufacturing company may impose certain

(15)

limits to the aspect ratio of the final layout13, or even dictate the maximum height or width of the layout.

Back-end design is not only used for complete chips. Macro-cells that will be part of a larger system-on-chip design can also be designed in this way. In such cases there might be even more restrictions. For example, certain metal layers might be reserved for the system level.

So the question is, “How small can my layout be so that I am still able to fulfill all specifications?”. As a lower bound, you will need enough area to place all your I/O pads and standard cells. Ideally, in terms of area (and assuming your design is not pad limited, see exercise 2), you will want to place standard cells without leaving extra space in between, completely filling out the core area. This is hardly ever possible because:

• The number of interconnections that can pass through a certain area is limited by the number of metal layers available14, wire width and minimum spacing requirements. Depending on the interconnection overhead, the area above the cells15 may not be sufficient for routing.

• Timing is greatly affected by the placement of your cells. Placing them next to each other with no space in between not leave the tool any flexibility in placing cells. This in turn reduces the optimization options of the tool, like the ability to cluster cells that are closely interconnected. • All designs require power routing for operation. Some wires of the power connection limit where

the cells can be placed, or restrict signal routing which in turn increases the area requirement. • The majority of designs require a clock tree to function. This clock tree is added during the

back-end design. This requires additional area for the buffers used in the clock tree. Furthermore, the clock tree synthesis algorithm can produce better results if it has more freedom to place its buffers.

• Macro-cells, like the RAM in our example, usually require some extra space along the edges so that they can properly be connected to power and signal lines.

• Designs that have a high switching activity require a lot of current for a short time which is called a surge. The power distribution network may need additional decoupling capacitors to store some charge that can provide some of the current of the standard cells during such a surge. Additional space for these decoupling cells may be required during placement.

As a consequence, the standard cell rows (which form the core area) can not be filled completely with standard cells, in other words there needs to remain some free space in between cells.

Utilization indicates to what amount the standard cell rows are filled. 100% utilization is the upper bound where all cells are abutted and there is no extra space, while a utilization of 50% means that half of the core area is empty.

Usually, it is not possible to predict whether or not it is possible to fulfill all requirements with a certain utilization16. You will have to try and find out. This is the main reason why back-end design is an iterative process17.

13

Especially in MPW runs, a lot of silicon area is wasted if all designs have wildly different dimensions.

14 For our technology there are 6 metal layers. 15

Cells in our technology use mostly the lowest metal layer Metal-1 and very rarely the Metal-2 for internal connections, all other layers are free for routing.

16

Both placement and routing are separately NP complete problems, without completing the routing and placement you will not know if it is possible to fulfill the requirements.

17 Obviously, technology plays an important role, and it is possible to give certain guidelines for a technology. However,

backend design is always highly dependent on the design itself. You will usually see in a few iterations what is possible and what is not.

(16)

5.1 Semester Projects

The MPW provider used for the semester projects offers modules caled Mini Asic (mini@sic) with a size of 1519.62 µm × 1519.62 µm. Therefore, the chip size for the semester project ASICs is fixed. Please refer to the following web page to learn the details.

http://www.eda.ee.ethz.ch/index.php/UmcL180#Mini.40sic

As a consequence, we only have to make sure that our design fits on this area, and there is no need to find the smallest possible layout. We may however need to constrain the core area to make it smaller if the utilization is too low, since a spread out design has longer interconnections that may adversely affect timing.

5.2 Sketching a Floorplan

Before we go on with CADENCESOC ENCOUNTER we need to make some planning and understand some key concepts. The figure on the following page is an example floorplan (not an ideal one) that shows the important concepts.

In CADENCESOC ENCOUNTER die area corresponds to the total silicon area available to place pads

(excluding bonding area for this technology) and core cells. For the semester projects this is strictly limited to 1519.62 µm × 1519.62 µm. All pads (I/O, power and corner) are placed in what is known as the padframe. The remaining area can be used for the core of the chip. For semester projects

the theoretical maximum for core area is 1239.38 µm × 1239.38 µm = 1.54 mm2.

As can be seen from the figure, the core area is surrounded by a core power ring. In its simplest

form this consists of two (one for VCC, one for GND) wide18metal lines that evenly distribute the power all around the chip. In order to leave room for the power ring, we need to leave a certainI/O to core spacing.

The standard cells are designed in such a way that, when placed next to each other their VCC and GND pins can be connected with a horizontal power line. These horizontal lines are then extended to the core power ring. These power connections are relatively narrow (0.76 µm in the technology that we use) and run over the entire width of the core area. This could be a problem for designs that consume much power, since the cells towards the middle would not have a good power connection19. To improve this, vertical power stripes that connect to the horizontal power lines can be added,

thereby forming sort of a mesh.

The core area is filled withstandard cell rows on which later all standard cells will be placed. In the

same area we will usually also need to make room for our macro-cells. Most macro-cells need some free space around themselves. This free space is required to make signal connections, add ablock power ring around the macro-cell or simply to prevent standard cells from being placed too close to

the macro-cell. We will define ablock halo to specify this free space.

18

The width of the metal line depends on the amount of current drawn from the line, you will be able to judge this better after exercise 3 which is dedicated to estimating the power consumption. We will mostly use a width of 20µm, since this is the widest metal that can be manufactured without slotting (wider metal lines require slots/holes which break up the metal shape).

19 The problem is that if much current is drawn, there will be a significant IR drop along the power lines. The cells

in the middle will be supplied with a lower VCC than the ones on the sides. This could dramatically effect the performance of the system.

(17)

When placing a macro-cell, you should also take into account where the power and signal pins of the block are located and what metal layer they are on. Often signal connections are only on two edges and you want them to face the core and not the I/O pads.

Now, when we consider all the above, the core area that remains free to place core cells on is much smaller than the 1.54 mm2 that we started with. Our example design has a total cell area (including RAM) of 0.82 mm2 and should therefore comfortably fit into the designated area.

Standard Cell Row

I/O and Corner Pads Placed on the Padframe

Macro Cell (RAM)

Core Power Ring VDD

GND

Power Stripe

Block Power Ring

Block Power ConnectionBlock Halo Standard Cells

Power Pad Connections I/O to CoreSpacing 1519.62 µm

1239.38

µm

Standard Cell Power Connections

5.3 Initialize Floorplan

We are now ready to proceed with CADENCESOC ENCOUNTER.

Student Task 7:

• From the menu select Floorplan→Specify Floorplan.... A large window will open. • Select the DIE SIZE BY: WIDTH AND HEIGHT option and make sure that both values are

1519.62.

• Now we need to specify the I/O to core spacing by filling in the four values under the CORE

(18)

Larger values will reduce the area available to place the core cells thereby increasing core utilization.

As noted earlier, some iterations are usually required to find optimal values for a particular design.

• In this exercise we will assume that we will use one VCC and one GND line of maximum width 20 µm. We need some extra space between the lines and, for the moment, we can start with a distance of 50 µm for all sides and click on OK.

The floorplan should now look like shown in the screen-shot below. Note that the pads are all placed at their proper locations as the I/O file used during design import specifies absolute locations and we made sure that the die size stays fixed to the proper size during the initialize floorplan step.

Student Task 8:

• Next we need to place the RAM macro-cell. Change the cursor mode to MOVE/RESIZE/RE

-SHAPEby selecting the appropriate icon (next to the ruler icon) or use the keyboard shortcut

’SHIFT-R’. Now you can select the RAM macro-cell and drag it to any location you like. The blue lines displayed are so called flightlines that show where the signal connections to the block are.

You can change the orientation of the RAM by either using Floorplan →Edit Floorplan→\ Flip/Rotate Instances ...(or press ’r’), or with the attribute editor (press ’q’). Note that the RAM macro will completely block Metal-1, Metal-2, Metal-3 and Metal-4. Only Metal-5, Metal-6will be available for routing over the RAM macro-cell20.

20

By default, the internal structures within a cell or block are not displayed. You need to make “Cell Blkg” visible to see the so called blockages within a cell.

(19)

5.4 Power Planning

The next step is to create the power distribution network.

The Verilog netlist that we started with does not contain any power connections, therefore we need to create this connectivity now. We have to connect the power/ground pins of all instances to the respective global power/ground net that was specified on the DESIGNIMPORTform (category POWER

on the ADVANCEDtab)21.

This can be done using the Floorplan →Connect Global Nets ... form or you can use the globalnet.tcl script provided.

Student Task 9:

• Execute the script provided by typing on the command line of CADENCESOC ENCOUNTER

(not GUI):

enc > source scripts/globalnet.tcl

21

There is also a special rule required if there are logic one/zero values 1’b1/1’b0 instead of TIE1/TIE0 cells in your netlist. You should however not have such logic values in your netlist.

(20)

Next we will add the core power rings that distribute power all around the core.

Student Task 10:

• Select the menu Power →Power Planning →Add Rings.... A large window will ap-pear. The NET(S) field on the top defines for which nets rings will be created. The default is to create power VCC as well as ground GND rings.

• In the RINGCONFIGURATIONsection you can specify on what layers the ring segments will be created. Select metal5 H for TOPand BOTTOM and metal6 V for LEFT and RIGHT. Specify WIDTHas 20 µm, SPACINGas 1.5 µm and OFFSETas 4 µm and click OK.

There are many alternative power distribution schemes that can be used. The one that we have chosen here is a very simple one. We have selected the upper metal layers Metal-5 and Metal-6 for the ring, because in this technology Metal-6 is thicker and consequently has less parasitic resistance which is desirable for power distribution.

For your own designs, you should perform a power analysis (topic of Training 2) to find out the best power distribution approach that matches your design.

The width has been chosen as 20 µm for convenience reasons. Basically the wider the power con-nection, the better. But as already mentioned earlier, in this technology, metal lines wider than 20 µm need to be slotted (’stress relief slots’) which requires extra effort. As an alternative to slotting it is also possible to create several smaller parallel rings, e.g. two VCC and two GND rings.

(21)

SPACING determines the distance between the two nets and OFFSET determines the distance be-tween the core area and the innermost ring.

We also need a (partial) ring around the macro-cell, you will see later why this is necessary.

Student Task 11:

• Select the menu Power→Power Planning →Add Rings... just like before. This time in the RING TYPE box, select BLOCK RING(S) AROUND. You can leave the selection at EACH BLOCKsince we have only one block anyway.

CADENCE SOC ENCOUNTER is usually smart enough to create wires only on the edges where no power lines are yet, i.e. to not create new wires on top of the core ring.

• If this fails you can specify the segments and connections you want on the ADVANCEDtab. • Fill in the values/settings similar to that of the ADDRINGSand click on OK.

At any point if you wish to delete part of the floorplan you can: • use the UNDOfeature by simply pressing ’u’

• select and remove objects of a specific class (press ’d’)

• use the menu option Floorplan →Edit Floorplan →Clear Floorplan... • select an object and hit the ‘Del’ key on the keyboard

Student Task 12:

• Also, you can save or load (restore) your floorplan at any time using the menu Design →Save →Floorplan ... and Design →Load →Floorplan ... respectively. • Save your floorplan to the save directory.

At this point power is to the standard cells arrives from the sides. Especially for fast designs the standard cells in the middle of the standard cell row will not receive sufficient power it is important to add vertical stripes to improve the power distribution.

Student Task 13:

• Select Power →Power Planning →Add Stripes ....

The SETCONFIGURATIONpart of the window defines the properties of one stripe set. The SETPATTERN part defines how many stripes will be added. We can either choose to insert a fixed number of sets or only specify the distance between two sets SET-TO-SET DISTANCE:

• In the FIRST/LASTSTRIPEpart, we select RELATIVE FROM CORE OR SELECTED AREA. Add

to X FROM LEFTand XFROM RIGHTa value stripe sets in such a way that the standard cell rows get divided into three equally long pieces. See the screen shot for width, spacing and layer. Note: You can fine tune this later by moving the stripe sets.

• By default stripes will continue over macro cells. To prevent this, select the OMIT STRIPES INSIDE BLOCK RINGSoption in the STRIPE BREAKINGsection of the ADVANCEDtab.

(22)

It is rather easy to move wires in CADENCE SOC ENCOUNTER . Click on the move wires button (or press ’m’), select the wires you want to move, and drag them to their new location. CADENCE SOC

ENCOUNTER will make sure that electrical connections remain intact. If you want you can use this to fine tune the stripe placement.

We still need to define a block halo for the RAM macro-cell. This is necessary to keep standard cells from being placed to close to the RAM and also to avoid problems when routing the power lines of the standard cell rows.

The figure below illustrates one common problem with the block halo.

Macro-Block

Block Halo Standard Cell Row

Standard Cell Row

Power Rails

Dangling Power Line (bad) Terminated Power Line (good)

(23)

In this figure, only two standard cell rows are shown. The block halo around the first row extends far enough to cover the two power lines22. This is like it should be.

For the second row, the block halo does not cover the power rails, and when making the power connections CADENCE SOC ENCOUNTER will try to extend the power connection past the power rails as shown in the figure. This leaves a dangling power line23. While this will not render your chip useless, it should be avoided.

Student Task 14:

• From the menu select Floorplan →Edit Floorplan →Edit Halo.... A window will appear, where you can specify a keep-out zone for routing and/or placement around the macro-cell.

Usually we only need a Placement Halo. The size will depend on your power routing/floor-plan.

• Create an appropriate Placement Halo.

Notice that the I/O pads are placed with some distance between them24. At some point in the design flow we need to close the gaps between the I/O pads in order to complete the supply rings that run around the core (within the pad cells) and are required to supply the circuitry within of the pad cells.

Student Task 15:

• Instead of using wires, we will place so called filler cells that completely fill the gaps and establish the required connectivity.

There is a script that will automatically insert matching filler cells. Type the following in the CADENCESOC ENCOUNTER console window

enc > source scripts/fillperi.tcl

22

This is just for illustration. It is not possible to draw a block halo that has this (L) shape.

23

This sort of dangling wires are known as geometry antenna in Cadence SoC Encounter

24 This is due to the contraints set by the company that bonds the chips. They specify that the minimum distance

between two adjacent pads can be 90µm. Since even a core-limited pad in this technology is roughly 60 µ wide, we need to place them with gaps in between.

(24)

Now we need to finalize the power connections of the chip. The following connections still need to be made:

• The core ring needs to be connected to the core supply pads (VCC3IOD and GNDIOD). • All standard cells need to be connected to VCC and GND lines.

• All macro-cells need to be connected to VCC and GND lines.

Student Task 16:

• Select Route →Special Route ... from the menu. SRoute is the special net router, and is only used to make power connections.

The ROUTE: part contains the different connection types we have listed above. BLOCK PINSare macro-cell power connections, PAD PINSare the connections from the core supply pads to the core ring. We will not need PAD RINGSsince we have already used filler cells to complete these rings. STANDARD CELL PINSwill add power lines to the standard cell rows. Finally, if you still have stripes that are not connected to power (not very likely) you can use the STRIPES (UNCONNECTED) option.

• While it is possible to route all connections at the same time, it is strongly recommended to do it one by one:

1. Start with PAD PINS. If nothing happens you have most likely forgotten to source the globalnet.tcl script.

2. Route BLOCK PINS. Check the result, did the router connect the macro-cell the way you wanted? If not you may need to study the ADVANCEDtab of the SRoute window.

If all fails you can edit the connections manually.

3. Route the STANDARD CELL PINS. This should create many horizontal Metal-1 lines that connect to the rings and stripes. Look for dangling wires around the block halo (adjust the block halo if necessary).

We are now finished with floorplanning. Your floorplan should look similar to the following screen shot.

(25)

6 Placement

We will now start with the placement of the standard cells in the core area. Placement is a very computation intensive problem, and mostly heuristic algorithms are used for this purpose.

Student Task 17:

• Select Place →Standard Cells.. ....

We want run a full placement and not an incremental or just the quick prototyping one. INCLUDEPRE-PLACEOPTIMIZATION however is very useful as it removes all buffers/invert-ers trees from the netlist which will help us for timing analysis as you will see later.

• To set advanced options click MODE. Set CONGESTION EFFORTto LOWand deselect RUN

TIMING DRIVEN PLACEMENT as timing driven takes much longer and might not help that much to improve timing. There are several other options that you can set, but at this time we will leave them as they are. Apply the changes by pressing OK

• You will come back to the placement window seen below, click OK to start placement. This may take some time.

We have to warn you about the various performance related options such as CONGESTION EFFORT

and RUNTIMING DRIVEN PLACEMENTabove. In the exercises sometimes we will advise you to use

certain settings for these options in order to reduce runtime, or because for this particular design we have found out that a particular option gives better results. When you do your own designs, you

(26)

should consider evaluating which options are better suited rather than copying all options from this exercise.

For each standard cell, the placement algorithm will try to find the optimum location so that there is a feasible routing solution and the total length of the connections is minimized.

Examine the placement by using the design browser (switch to the physical view). You will notice that standard cells within the same entity are mostly placed next to each other.

The available space and the placement of macro-cells and I/O pads can have a great influence on the placement of standard cells. Even though more space seems to be a good idea, too much space sometimes results in placements where the average distance between standard cells and consequently the delays caused by wire capacitance/resistance become larger. Only experience and several iterations will allow you to find a placement for your circuit that is close to optimal.

(27)

The results for placement (and later routing) are strongly design dependent. For example, structures with many interconnections such as look-up tables will usually need much more space than synthesis predicted as the cells need to be spread out in order to have enough space to route all the intercon-nections. This is why generalizations for back-end design, such as ”During back-end design, your circuit area will increase by 10%” don’t work very well.

Student Task 18:

• Let us save the entire design with Design →Save Design As →SoCE. This will save the configuration file, netlist, floorplan, special route, placement and routing files as well as the current mode, options and preferences. A design saved in this way can be restores using Design→Restore Design ... →SoCE.

The space required is surprisingly small as most files are compressed and the library files do not get saved along with the design.

• Remember to save under the save directory.

Alternatively you could also just save the placement. Select Design →Save →Place \ ....

During synthesis, SYNOPSYS DESIGN COMPILERassigns constant logic values to two special stan-dard cells named TIE0x and TIE1x, where x is a drive strength modifier. This creates a small inconvenience, as often one of these cells is assigned to drive many outputs at the same time, creat-ing relatively long interconnections.

There is sufficient place on the chip to place several of these cells. We will use a script that first removes all these cells. Then we will set the rules for placing these cells. The example script scripts\ /tiehilo.tcl sets the maximum number of connections driven by a single cell to 20, and the maximum distance between the pin and the tie cell to 250 µm. And finally we insert the tie cells according to the rules we have defined.

Student Task 19:

• At the command line type:

enc > source scripts/tiehilo.tcl

7 Timing

The synthesis tools we currently use for HDL synthesis (SYNOPSYS DESIGN COMPILER) are not aware of any instance placement information. Therefore the interconnects can only be estimated based on a statistical model, i.e. the fanout of a net determines its length, capacitance, resistance and area. Now that the placement and even trial-routing is available the timing might differ considerably from the numbers obtained from SYNOPSYS DESIGNCOMPILER.

7.1 Analysis

CADENCESOC ENCOUNTER has a practical timing analysis function, where you only have to specify

(28)

Pre-Place design is not placed

Pre-CTS design is placed but clock tree is not yet inserted Post-CTS design is placed and the clock tree is inserted Post-Route design is placed and routed

Sign-Off will use extra tools for even more precise analysis. We will not use this as these tools are not installed/setup.

Depending on this state, trial route (a very simple, but fast routing) and/or parasitic extraction might be run automatically prior to the timing analysis. This will improve the accuracy and help to avoid unnecessary iterations.

Student Task 20:

• Open Timing →Analyze Timing and make sure PRE-CTS and SETUPis selected. • Start the timing analysis by clicking OK.

Note: You could also do this from the command line with enc > timeDesign -preCTS

As the design is not routed, CADENCESOC ENCOUNTER will perform trial route and parasitic extrac-tion before doing the timing analysis. A short summary will be displayed on the console (the actual numbers may differ slightly):

+---+---+---+---+---+---+---+ | Setup mode | all | reg2reg | in2reg | reg2out | in2out | clkgate | +---+---+---+---+---+---+---+ | WNS (ns):| -9.069 | -6.554 | -9.069 | -0.686 | -7.328 | N/A | | TNS (ns):| -2684.3 | -1776.9 | -2392.1 | -1.172 | -43.761 | N/A | | Violating Paths:| 861 | 732 | 454 | 7 | 6 | N/A |

| All Paths:| 1807 | 1342 | 817 | 18 | 6 | N/A |

+---+---+---+---+---+---+---+

+---+---+---+

| | Real | Total |

| DRVs +---+---+---| | | Nr nets(terms) | Worst Vio | Nr nets(terms) | +---+---+---+---+ | max_cap | 187 (187) | -3.774 | 188 (188) | | max_tran | 368 (13826) | -8.333 | 387 (13867) | | max_fanout | 0 (0) | 0 | 0 (0) | +---+---+---+---+ Density: 59.566%

Routing Overflow: 0.00% H and 0.25% V

---The summary gives a very good overview of the current design timing. Some explanations:

• The analysis was run in setup mode, i.e. setup time checks were performed but no hold time checks.

(29)

• The columns contain numbers for all path in the design (ALL) or for specific path groups, e.g. reg2reg for all register to register paths.

• Worst negative slack (WNS) reports the slack for the most critical path. Negative numbers mean that the constraints are violated by this value.

• Total negative slack (TNS) is the sum of WNS for all violating paths. Together with the number of violating paths this figure helps to see how severe the violations are.

• Real/Total DRV show (electrical) design rule violations, some libraries have a maximum tran-sition time for all nets. The report above shows that 370 nets have a trantran-sition violation (the signal takes too long to change from logic-1 to logic-0 or vice versa). In addition 135 nets have a maximum capacitance violation (the total amount of capacitance driven by a net exceeds the limit set by the design library). These violations are mostly related to excessive parasitic capac-itance due to interconnections, and generally cause timing violations as well. However, even if a DRV does not cause a timing violation it needs to be fixed.

• DENSITY and ROUTING OVERFLOW show the placement utilization and routing resources, i.e. are a measure for the feasibility of the current floorplan/placement.

Remark: Refer to exercise 4 of VLSI I25if you have problems with timing concepts.

The summary looks really terrible. Obviously we have many timing violations that we need to have a closer look at, before we try to optimize the timing with CADENCESOC ENCOUNTER.

Here are some important points to consider when doing so:

• The timing depends entirely on the constraints you have specified in the file src/chip.sdc. The most common mistake is to have errors in this file. Before you go any further make sure that your timing constraints are correct.

• Make sure to not accidentally use constraints that were written for the core level (chip without pads) at the chip level (with pads) and vice versa. The pads affect the I/O timing quite a bit and the drive capabilities of a standard cell and an output pad are entirely different, i.e. set_load needs to be very different.

• Inputs and outputs used for test and debugging may cause timing violations. Most of these signals are not dynamic (they are not toggled during normal operation) and the timing paths originating from these inputs or ending at these outputs should be ignored, i.e. left uncon-strained or explicitly disabled.

• To speed up delay calculation CADENCE SOC ENCOUNTER does not compute the timing of nets with a fanout above a certain limit but rather swaps in predefined values for delay, capaci-tance and transition time. All these numbers are specified on the DESIGN IMPORT form on the

ADVANCEDtab in the ”Delay Calculation” category. As a result you will not see the real timing26 of these net in timing analysis and furthermore optimization will not see (and therefore not fix) violations27 on these nets. However, this is usually the desired behavior as we give these nets a special treatment anyway (with CTS).

25 You can access the exercise descriptions, files, and solutions under /home/vlsi1/u4. 26

To see the real timing you can change the limit on-the-fly from 1000 to a very high value in the console with setUseDefaultDelayLimit 100000. More on this topic later.

27

DRV violations will be fixed but no setup/hold violations. Clock nets are even more special, also no DRV fixing will be done there.

(30)

Let’s now examine the detailed reports that were generated by timing analysis and can be found in the timingReports folder. Each analysis produces multiple files. Among these there are three files dedicated to design rule violations (max capacitance: *.cap , max fanout: *.fanout, max transition time: *.tran violations), and separate *.tarpt timing analysis report files for different path groups (in2out, in2reg, reg2reg, reg2out)

Student Task 21:

• Where do the violating paths in the in2out path category start? • Where do the violating paths in the in2reg path category start?

• Do the paths in reg2out and reg2reg look like normal path that should be optimized to meet timing or is there something wrong?

• Why are the reg2reg paths too slow? Look for large numbers in the Delay column and check the drive strength of the corresponding cell.

There are several different problems in the .sdc file that we have used. First of all, two of our inputs should not be considered for timing analysis28. We also have several nets (clock, reset and scan enable) that we will take care of separately (using the clock tree synthesizer, which we will see later). These nets will show up in the DRV reports. We do not want to solve timing related problems for these nets (since they will anyway be solved later), the time and effort required to optimize these nets could prevent other parts of the design to be optimized.

We can use the DEFAULT PIN LIMITfeature of CADENCE SOC ENCOUNTER to stop CADENCESOC

ENCOUNTER from extracting timing information (and reporting timing violations) for the nets that we will be optimizing later on. By default the pin limit of CADENCESOC ENCOUNTER is set to 1000. In our case this number is too high (we have slightly more than 400 flip flops in our design).

Student Task 22:

• Let us see the nets which have a large fanout. Report all nets with e.g. more than 400 pins. Use the console command:

enc > report_net -min_fanout 400

• Now set a suitable limit with the command

enc > setUseDefaultDelayLimit <number>

so that the high fanout nets will not be considered for timing. Also make the neces-sary changes to the timing constraints file src/chip.sdc to disable the offending input-ports. Reload the timing constraints by selecting the menu Timing →Load Timing \ Constraint ....

• Then rerun timing analysis.

If you have done everything correct, the only setup violations should be in the path group register-to-register and register-to-out. There should no longer be pins that belong to scan enable or reset network in the transition time violation report.

28

Cadence SoC Encounter provides a special timing calculation mode that is called Multi-Mode Multi-Corner Analysis (MMMC). In this mode it is possible to define several scenarios (i.e. separate test and functional modes). The setup for MMMC is slightly involved and will not be covered as part of this exercise.

(31)

7.2 Optimization

In order to (better) meet the constraints, CADENCESOC ENCOUNTER can try to optimize the design at every stage of the design process. In our case, the worst setup time violation is about 5.8 ns (for a 8 ns period), although the netlist delivered by the synthesis tool had no timing violations. This is due to differences in interconnect parasitics between the two tools. While the synthesis tool relies on an estimate (statistical model based) CADENCE SOC ENCOUNTER can use the real placement and

(trial-)routing at hand. Consider the following line from a timing report (broken down over many lines for readability)

Path 1: VIOLATED Setup Check with Pin i_filter_top/u_filter/u_filter_stage_5/ RegxDP_reg_42_/CK

Endpoint: i_filter_top/u_filter/u_filter_stage_5/RegxDP_reg_42_/D (ˆ) checked with leading edge of ’ClkxCI’

Beginpoint: i_filter_top/u_ram_wrapper/i_ram/DO5 (ˆ) triggered by leading edge of ’ClkxCI’

Path Groups: {reg2reg}

Other End Arrival Time 0.000

- Setup 0.149

+ Phase Shift 8.000

= Required Time 7.851 - Arrival Time 14.405

= Slack Time -6.554

Clock Rise Edge 0.000

= Beginpoint Arrival Time 0.000 Timing Path:

+---+

| Instance | Arc | Cell | Slew | Load | Delay | Arrival |

| | | | | | | Time |

|---+---+---+---+---+---+---|

| | ClkxCI ˆ | | 0.000 | 1.828 | | 0.000 |

| ClkxCI_PAD | I ˆ -> O ˆ | XMD | 0.000 | 0.000 | 0.000 | 0.000 | | i_filter_top/u_ram_wrapper/i_ram | CK ˆ -> DO5 ˆ | SY180_2048X16X1CM8 | 0.130 | 0.033 | 1.750 | 1.750 | | i_filter_top/u_ram_wrapper/i_test_| A ˆ -> O ˆ | MUX2 | 8.441 | 1.874 | 3.973 | 5.722 |

| bypass_mux5 | | | | | | |

The last line reports an standard cell instance MUX2 with low driving capability (2) that has to drive a big load on its output (1.876 pF). The propagation delay is therefore huge (3.98 ns).

The timing of the same cell as reported by synthesis are: Delay: 0.15 ns, Slew: 0.09, Load: 0.01. While this is an extreme case you see how synthesis can be wrong without knowing the actual placement and wire loads.

Student Task 23:

• Open the optimization form by selecting Timing →Optimize ....

DESIGNSTAGEneeds to be set to the current design stage. Some options are only available for certain stages, e.g. hold time optimization can not be performed during PRE-CTS as it doesn’t make much sense.

Timing is not the only thing that can optimized. Most technologies specify design rules like maximum transition time, maximum capacitance driven by a certain cell or maximum fanout.

• After pressing the MODEbutton, within the THRESHOLDSsection you can find options that can be used to tighten the constraints in order to get some margina.

(32)

• Set the options as shown in the figure below and hit OK. Watch the progress of the op-timization in the console window. CADENCE SOC ENCOUNTER is very verbose with its actions.

a

Cadence SoC Encounter will already automatically add a small margin on its own (internally)

During optimization CADENCESOC ENCOUNTER can select different drive strengths for cells, add/re-move buffers and inverters, add/re-move instances or even restructure part of the logic (just like synthesis does).

Optimization is done using iterations of timing analysis, optimization, trial-route and parasitic extrac-tion.

As a last step CADENCE SOC ENCOUNTER performs a timing analysis on the optimized design, prints the summary to the console and writes the detailed reports to the timingReports directory.

Student Task 24:

• Take a look at the summary and the final reports generated. There should be no violations left.

But what happens if we can not fix the violations with optimization? Again, first make sure to under-stand what your constraints are and why they are violated. Often there are errors in converting the design specifications to constraints (is the input delay really 3.5 ns? Also for this pin?) and describing them properly with the commands available. If you still have problems, there are three levels where you can reach a solution:

Optimization during backend design (CADENCE SOC ENCOUNTER)

CADENCESOC ENCOUNTER can optimize the design at every stage of the design process. In general, the earlier the stage, the more changes can be done, e.g. PRE-CTS optimization has

much more flexibility than POST-ROUTEoptimization. At the PRE-CTS stage registers can be moved and resized, this will no longer be possible after clock tree insertion. On the other hand, the parasitic interconnect information is much more accurate with later stages of design, so the timing information (and hence the optimization goals) will be more accurate.

We can (re)run the optimization at various stages, try a new placement or even start with a new floorplan. It is impossible to give general guidelines, you will have to see what works best for your design. If you are far from meeting your target (e.g. for a 10 ns clock, if after all optimizations you still have a timing violation of 2 ns), you may need to go back to synthesis.

(33)

Optimization during synthesis

Once you have tried to place and route a netlist you will get a better idea about the relationship between synthesis results and back-end results (area and timing wise). You may use this information to adjust the timing constraints and re-synthesize the circuit.

Architectural optimizations

If nothing else helps, you will have to modify your architecture. During this iteration you will have a much better idea about what is critical for your circuit.

If all of the above fails, you will have to see if the specifications could be changed.

Student Task 25:

• Your design has changed considerably as the optimization algorithms have modified the netlist and placement. Save it by using Design →Save Design As.

8 Clock Tree Insertion

The fan-out of a net refers to the number of inputs driven by a particular output. High fan-out nets (that drive hundreds or even thousands of inputs) need to be handled differently from standard inter-connections. Note: For timing analysis we did adjust the pin limit (setUseDefaultDelayLimit) in order to treat them differently.

Every synchronous circuit has at least one high fan-out net, namely the clock net. For most circuits reset and scan-enable signals have to be distributed to each and every flip-flop as well.

The main problem with high fan-out nets is the large load capacitance that needs to be driven. Each driven input adds its own input capacitance to the total load capacitance and in addition, the intercon-nection required to distribute the signal to all these inputs increases the load capacitance further. There are three important parameters for such nets:

Transition time This is the time it takes to change the logic level of a node (e.g. 0 → 1). Basically, the more load an output has to drive, the more time is required to charge this load. CMOS drivers consume additional short circuit current during the transition, therefore long transition times are not very welcome. Furthermore, noise on signals with long transition times can result in glitching. Most libraries set an upper limit for the transition time (for the technology we are using this is 1.79 ns for typical libraries). To lower the transition time, a tree of buffers can be inserted so that the total load is shared between the buffers. The lower the desired transition time, the more buffers are required.

Insertion delay The time required for the signal to travel from the driver to the end-points. This delay is usually different for each end-point. Each level of buffers in the buffer tree will add a delay to the signal.

Skew The difference between insertion delays of different end-points. To minimize skew, a balanced buffer tree has to be built. Generally, the lower the desired skew the more buffers are required. What parameters are most important depends on the type of net:

(34)

Clock Our main concern is to reduce the skew, since it will effect our timing. The maximum skew depends on the clock period. As an example, for a 20 MHz clock a clock skew of 0.5 ns is acceptable. But for a 200 MHz clock, the same skew equals to 10% of the clock period and would be to high.

If you over-constrain your skew, you will need a deep (and large) clock tree and your insertion time will rise, which will affect your input and output timing. Therefore you will want to balance the skew against insertion delay and the number of buffers. Constraining maximum insertion delay too low will usually degrade results.

Usually, a tree that gives you an acceptable skew will also give you a decent transition time, so you don’t have to worry about that.

Reset We are interested in propagating the reset within one clock cycle to all flip-flops in our design. For designs with on-chip reset synchronization this is strictly required. The insertion delay should therefore be less than the clock period, transition times within the bounds imposed by the technology and skew doesn’t matter at all.

Scan Enable Very similar to the reset signal. Usually a slower clock is used for scan testing, therefore we can allow even a larger insertion delay. For transition time and skew the same holds true as for the reset.

Buf Tran Sink Tran Sink Tran Sink Tran Sink Tran Buf Tran Buf Tran Min Delay Max Delay Max Skew AutoCTS Root Pin

In CADENCE SOC ENCOUNTER , clock tree synthesis (CTS) is used to generate optimized buffer trees to drive high fan-out nets. It can be configured to satisfy a variety of constraints.

Student Task 26:

• A sample clock tree synthesis configuration file can be found under src/sample/chip.ctstch\ −sample. The sample file contains three different configurations for a clock, a reset and a scan enable signal.

• Copy this file to the src directory and adapt the ’AutoCTSRootPin’ statements to match your design.

• For educational purposes, change the clock tree specifications as follows: max. skew 0.2 ns, max. insertion delay 4 ns, max. transition time at buffers 0.6 ns and at clock pins 0.4 nsa

References

Related documents

The PROMs questionnaire used in the national programme, contains several elements; the EQ-5D measure, which forms the basis for all individual procedure

An analysis of the economic contribution of the software industry examined the effect of software activity on the Lebanese economy by measuring it in terms of output and value

Public awareness campaigns of nonnative fish impacts should target high school educated, canal bank anglers while mercury advisories should be directed at canal bank anglers,

• Labour productivity growth in the business services industry lags behind that of the rest of the market sector.. Figures 1 and 2 show that this is the case in the Netherlands and

 HCC is developing in 85% in cirrhosis hepatis Chronic liver damage Hepatocita regeneration Cirrhosis Genetic changes

The summary resource report prepared by North Atlantic is based on a 43-101 Compliant Resource Report prepared by M. Holter, Consulting Professional Engineer,

Standard aspects of immunogenicity as described in the general guideline should be addressed for every new therapeutic mAb, taking into account its characteristics, the nature of