The goal of clock tree synthesis is to build a buffer distribution network to meet the timing requirements among the leaf pins. It includes the following jobs.
• Creating clock tree spec file
• Building a buffer distribution network
• Routing clock nets using NanoRoute
Creating the Clock Specification File
Clock tree synthesis (CTS) is a series of procedures to build a buffer distribution network to meet the design's timing targets. The clock tree specification file is used to direct clock tree synthesis and includes:
• Design constraints including latency, skew, and design rules
• Buffer and routing type definitions
• Trace and synthesis controls like: MacroModel, ClkGroup, NoGating, LeafPin, ExcludedPin, PreservePin, ThroughPin, and GatingGroupInstances
• Flow controls like:
o Whether or not to generate a detail report
o Whether or not to route the clock net
o Whether or not to perform post-CTS optimization
You can generate the default clock tree spec file with the command:
createClockTreeSpec -file filename
Automatically generating a clock tree specification translates the following information from the timing constraint file into suitable records for the clock tree spec file.
• create_clock - Becomes AutoCTSRootPin in CTS constraints file
• set_clock_transition - Becomes SinkLeafTran and BufMaxTran (Default: 400 ps)
• set_clock_latency value - Becomes MaxDelay (Default: clock period) MinDelay (Default: 0)
• set_clock_latency -source value - Becomes SrcLatency value in ns
• set_clock_uncertainty - Becomes MaxSkew (Default: 300 ps)
• create_generated_clock - Will add necessary ThroughPin statement to the CTS constraints file
You can control the buffer types to build clock trees by listing the buffer types in the config file or use "createClockTreeSpec -bufferList bufferList".
You can also control the routing types for the clock nets. Route types for nets connected to leaf cells and nets connected to non-leaf cells can be specified separately with LeafRouteType and Routetype. Also, you can use setCTSMode before running specifyClockTree to change the default routing type and global clock tree synthesis controls.
Note: Set the preferredExtraSpace to 0 in the LeafRouteType definition in the spec file. If you do not set it to zero, the software uses the value
specified in RouteType definition, which might be greater than zero. This can cause congestion.
The clock tree specification file is very important and directly affects the result of clock tree synthesis. A good clock tree plan including suitable constraints and placement space can improve the results of clock tree synthesis and avoid problems for post-CTS timing closure.
The Pre-CTS Clock Tree Tracer (Clock - Trace Pre-CTS Clock Tree) user interface can be used to traverse the clock tree structure logically and physically based on the applied clock specification file before committing clock tree synthesis. You can use it as a basis for changing the clock tree specification file to consolidate the clock tree structure and improve the results of clock tree synthesis.
Synthesizing the Clock Tree
To generate the clock tree, use the clockDesign command. This command performs the following operations during clock tree synthesis:
• Deletes any existing buffers on the clock nets
• Builds a buffer distribution network to distribute the clock signal(s) to the registers
• Routes the clock nets using NanoRoute
• Optimizes the clock tree
clockDesign is a super-command which runs the commands in the CTS flow (i.e. createClockTreeSpec, specifyClockTree, deleteClockTree, ckSynthesis, etc.) It's important to note clockDesign automatically sets some CTS options which are disabled by default. So if you are comparing a clockDesign run to a run where each command is run separately, ensure the settings are consistent. The table below compares the clockDesign settings to the default settings:
Option clockDesign Setting Default Setting
RouteClkNet Yes No
PostOpt Yes Yes
OptAddBuffer Yes No
The clockDesign command generates the default clock tree specification file (if not specified), deletes existing clock trees, builds the clock tree, calls NanoRoute to route the clock nets, and then optimizes the clock tree to
improve the skew including resizing buffers or inverters, adding buffers, refining placement, and correcting routing.
If you performed useful skew optimization (setOptMode -usefulSkew true), clockDesign automatically checks for any scheduling file in the working directory, or checks for "rda_Input ui_scheduling_file", and honors the scheduling file while building the clock tree.
If the clockDesign command calls NanoRoute to route the clock nets, direct NanoRoute to follow the route guide by using the command "setCTSMode -routeGuide true". This is enabled by default. This operation can improve the correlation between pre-route and post-route clock nets.
Tips for Performing CTS on High Performance Designs
Skew, slew and latency are the primary methods to get the highest performance design but there is a balance between tightening these constraints and getting the best clock tree (tighter values often increase the area/power of the tree). Reducing latency often lessens the impact of derating, but sometimes can come at the expense of common paths which get filtered via CPPR.
Mode settings to reduce latency:
• The following affects actual tree construction. It can increase runtime considerably and ignores MinDelay constructs:
setCTSMode -synthLatencyEffort high
• The following performs optimization after the tree construction mostly by optimizing the location of the tree elements and can also add significant runtime:
setCTSMode -optLatency true
• The following reduces the size of the tree by performing optimization after the tree construction to delete and downsize elements to recover area:
setCTSMode -optArea true Routing layer selection
• It is advised to have CTS route the clock nets and fix the wires.
• Using the upper thicker layers often improves performance due to the lower lateral capacitance.
o Try to pair and limit the layers (i.e. horizontal and vertical) with the same pitch/width/spacing so CTS estimation is more accurate.
• Using a preferred extra spacing of 1 helps to reduce capacitance and future SI impact.
o Using values larger than 1 often do not improve the situation and often have no effect on SI pushouts. That is because once the wires are >1 track away from the clock, the primary coupling occurs between the cross over/under wires.
• Routing with wider widths (2 to 3 times the default width) is an effective way to reduce resistance which is a big factor for designs at 40nm and below.
Buffer/Inverter selection
• Limiting the list of available cells to 3 or 4 often improves both runtime and quality of results. The low drive clock cells typically shouldn't be used as they are susceptible to routing changes. If the tree can be built with LVT cells, use those as the tree will be smaller and more SI immune. Additionally, you can run tests to see if trees built with all buffers or all inverters are faster.
Setting constraints
• Using MaxCap constraints on elements in the tree can help reduce the potential for large jumps through these cells. For example:
MaxCap
+ SGCLATNX4 0.030pF + CLKINVX16 0.20pF
• Apply a realistic minDelay, maxDelay and maxSkew value because it will help to balance the different clocks from each other.
• Make sure all clock gating cells are not marked dont_use or dont_touch otherwise they will not be resized.
o It is very common for the default library definition to have them marked dont_use and/or dont_touch.
Manual skewing
• Sometimes certain elements must be manually skewed. This can arise when preCTS useful skew is not enabled or preCTS cannot predict the magnitude of the problem due to skew/derating. In preCTS you can model this using the set_clock_latency SDC construct. The following example show the clock delay to A/B/RAM1/CLKA is pulled in 500ps:
set_clock_latency -0.5 A/B/RAM1/CLKA
• To model this in the CTS spec file it would appear as follows. The +0.5ns means that 500ps of latency is "inside" the CLKA pin of A/B/RAM1:
MacroModel pin A/B/RAM1/CLKA 0.5ns 0.5ns 0.5ns 0.5ns 0pF Tips for Performing CTS on Congested and High Utilization Designs CTS congestion normally results from either:
• Too many cells inserted due to overly tight constraints
• Poor choice of top/bottom preferred routing layers
o For example, restricting the top layer so it cannot go over RAMS
• NDRs or other routing rules
For CTS in high utilization designs, typically the goal is to make the tree as small as possible.
The following setting is used to reduce the size of the tree. CTS performs optimization after the tree construction to delete and downsize elements to recover area. Also, when too many cells are inserted, try relaxing the constraints (typically the Buf/Sink MaxTran):
setCTSMode -optArea true
When routing rules are causing the problems, consider only using the rules for the non-sink levels (or using a less restrictive rule for the sinks)
• In the CTS spec file:
o RouteType controls the routing rules for non-sink levels
o LeafRouteType controls the routing rules for the sink level
• MaxTran constraints typically have the largest effect on size of tree so relaxing these helps reduce impact.
Buffer/Inverter selection
• If the tree can be built with LVT cells, use those as the tree will be smaller and more SI immune.
Analyzing and Debugging the Clock Tree Results
You can use the Clock Tree Browser (Clock - Browse Clock Tree) user interface to fine tune the clock tree to improve the results. From the user interface you can perform the following operations:
• Add buffers
• Delete buffers
• Size cells
• Change net connections
Use Global Clock Tree Debug (Clock - Debug Clock Tree) to debug the timing result. Refer to the chapter, Synthesizing Clock Trees, in the EDI System User Guide for more information. Also, see the chapter, Clock Menu, in the Encounter Menu Reference for descriptions of the forms and fields of the user interface.
Sometimes a degradation in clock delay or skew occurs during CTS when comparing the results before and after the clocks are routed. If this occurs try the following:
• Confirm the RC scaling factors for the clocks are set properly. See How to Generate Scaling Factors for RC Correlation.
• Constrain the routing to two upper routing layers using a RouteType in the CTS specification file. Constraining the routing to two layers reduces differences in layer assignment between CTS and NanoRoute.
• Use displayClockMinMaxPaths with the -preRoute and -clkRouteOnly options to compare pre-route and clock route paths.
Optimizing the Clock Tree
After clockDesign, ckECO can be used to improve the tree based on the parasitics and timing seen by the optimizer.
• ckECO by default can use all the allowed buffers/inverters. To limit it to only those in the CTS spec file use the -useSpecFileCellsOnly option.
ckECO -postCTS –useSpecFileCellsOnly
• A similar flow can be used after detailed routing. Be aware that if useful skew was applied during postCTS optimization, "ckECO -postRoute" may undo this because its goal is to minimize skew.:
ckECO -postRoute [-useSpecFileCellsOnly]
• If you are looking for local skew reduction (skew between talking flip-flops) use the -localSkew option:
ckECO -postCTS -useSpecFileCellsOnly –localSkew
• Check the CTS log file for clock gating element movement during optDesign -postCTS:
o Use setPlaceMode "-clockGateAware true" option in placement (see placeDesign section)
o Or don't allow gated elements to move during CTS:
setCTSMode -optLatencyMoveGate false Clock Specification File Example
The following example shows a clock specification file.
#
# FirstEncounter(TM) Clock Synthesis Technology File Format
#
MacroModel pin freg/mod004048/CLK 20ps 18ps 20ps 18ps 30ff ClkGroup
+ CGEN_1 + CGEN_2
RouteTypeName CK1 PreferredExtraSpace 1 TopPreferredLayer 4
BottomPreferredLayer 3
Buffer BUFX4 BUFX8 BUFX12 INVX1 MaxCap
+ DFF_B/CLK End