Ultra High Speed (5Ghz) Block Custom
Physical Design Flow with ICC
Prakash Jayasekharan
Senior PD Engineer
Suman Musunuru
Senior Design Engineer
Agenda
•
Challenges in High speed Physical Design
-
Design Constraints, Library and Design issues
•
Custom solutions with Synopsys ICC flow
-
Matrix re-characterization, Synthesis improvements,
placement sensitive flow, CTS waveform balancing,
Signal EM, power
•
Timing/STA correlation results
- Star-RC vs Calibre, ICC vs PT-SI
•
Conclusion/Takeaways
•
Appendix A
Design Constraints
•
65nm SOC design
- 2.4 Million gate
- Block A and Block B @5GHz (200ps period)
- 5% late, 10% early Derating (both clock and data), 5% Jitter
- Target skew ~15ps
Transition ~20ps
Pulse width ~ 80ps
- IR < 3% Peak
Library Issues
•
Re-characterization of timing libraries
-
Traditional library tables produce pessimism in timing delay
calculation (setup/delays worst by 10ps at least)
.lib
...Library issues
•
Extra pessimism not tolerable because
-
10ps for each cell gets added to become significant
-
Paths become too tight to fix
•
Library is mostly made of weak drive strength buffers,
complex gates. Realistic fanout <5
•
Asymmetric clock cells cause low pulse width
•IR drop not part of timing delay tables in .lib
Design Issues
•
Alternative lower frequency architecture not done
- Will consume 2x area and power
•
Very good skew and transition times required
-
Very fast transition => higher switching power
=> higher insertion delay
-
Weak clock tree cells cause more insertion delay
•
> 70% of the logic is sequential. Setup (reg2reg)
timing is critical
...Design Issues
• Small coupling caps (1fF) due to size of design
- Small nets in the design do not get extracted and can be
dropped . Use
coupling_abs_threshold
to reduce thresh
• 4 corners for IR/EM, 3 corners for Timing
- highV, high Temp added finally for IR/EM
Voltage
Temp
Tag
Description
0.9
125.0
WCCOM
Traditional
worst case
timing
1.1
-40
LTCOM
Traditional best
case timing
0.9
-40
WCLCOM
Temp inversion
corner
1.1
125
MLCOM
Worst
EM/IR/Leakage
T
e
m
p
Voltage
0.9 1.1 125C -40CAgenda
•
Challenges in High speed Physical Design
- Design
Constraints, Library and Design issues
•
Custom solutions with Synopsys ICC flow
-
Matrix re-characterization, Synthesis improvements,
placement sensitive flow, CTS waveform balancing,
signal EM, power
•
Timing/STA correlation results
- Star-RC vs Calibre, ICC vs PT-SI
•
Conclusion/Takeaways
•
Appendix A
Matrix re-characterization
timing() { related_pin : "cp" ; timing_type : setup_rising ; fall_constraint(cnst_ctin_rtin_3x3) { index_1("0.003, 0.2019, 0.9"); index_2("0.003, 0.2019, 0.9"); values("0.00995, 0.0199, 0.06965",\ "0.08955, 0.1095, 0.2089",\ "0.2189, 0.1791, 0.3184"); }B
E
F
O
R
E (3x3)
A
F
T
E
R ( 10x10)
10x10 reduces extra
pessimism
timing() { related_pin : "cp" ; timing_type : setup_rising ; fall_constraint(cnst_ctin_rtin_10x10) { index_1("0.003, 0.009191, 0.03092, 0.07243, 0.1371, \ 0.2278, 0.3472, 0.4976, 0.6812, 0.9"); index_2("0.003, 0.009191, 0.03092, 0.07243, 0.1371, \ 0.2278, 0.3472, 0.4976, 0.6812, 0.9"); values("0.00995, 0.00995, 0.00995, 0.00995, 0.00995, 0.0199, 0.02985, 0.0398, 0.04975, 0.06965",\ "0.0199, 0.0199, 0.00995, 0.0199, 0.0199, 0.0199, 0.02985, 0.0398, 0.0597, 0.06965",\ "0.02985, 0.02985, 0.02985, 0.02985, 0.02985, 0.0398, 0.04975, 0.0597, 0.06965, 0.08955",\ "0.04975, 0.04975, 0.0398, 0.04975, 0.0597, 0.06965, 0.0796, 0.08955, 0.1095, 0.1194",\ "0.06965, 0.06965, 0.0597, 0.06965, 0.0796, 0.0995, 0.1095, 0.1293, 0.1492, 0.1691",\ "0.08955, 0.08955, 0.0796, 0.08955, 0.0995, 0.1194, 0.1393, 0.1691, 0.199, 0.2288",\ "0.1194, 0.1095, 0.0995, 0.1095, 0.1194, 0.1393, 0.1592, 0.189, 0.2288, 0.2686",\ "0.1492, 0.1393, 0.1194, 0.1293, 0.1393, 0.1492, 0.1791, 0.2089, 0.2487, 0.2885",\ "0.1791, 0.1791, 0.1492, 0.1492, 0.1592, 0.1691, 0.189, 0.2288, 0.2587, 0.3085",\ "0.2189, 0.2189, 0.1791, 0.1791, 0.1791, 0.189, 0.2089, 0.2388, 0.2786, 0.3184");}
Synthesis Improvements
•
Very slow cells like XOR, 4:1 Mux, AOI gates prohibited
- some sensitive logic hand instantiated to prevent AOI or XOR
selection
•
Register Cloning/Fanout optimization to reduce fanout
- 10-15% increase in sequential area, but helps reduce flop delay
- set_register_replication (DC) can be used
Load Cap =C Load Cap =C/2 Load Cap =C/2
Placement Sensitive Flow
•
Cell placement is closely controlled in all stage
•Bad timing due to:
- Placement of cells due to loose constraints
- High buffer insertion to close timing
•
Clocks over-constrained by 10% and incremental
psynopts improves timing
- Best possible flop placement achieved
•
Clock latency set to simulate post-cts derating in
Placement..
Default timing flow
create_placement +
psynopt
WNS :-0.05, 50 pathsclock_opt
route_opt +
route_opt -incr
WNS:-0.10, 60 paths WNS:-0.18, 90 pathsDerating
SI+ Wires
Placement..
PSFlow
create_placement+ psynopt WNS :-0.05, 50 paths clock_opt –only_cts route_opt+ route_opt -incr WNS:+0.005,10 paths psynopt(1) psynopt(2) WNS:-0.10, 80 paths WNS:-0.025, 50 paths WNS:-0.08, 20 paths WNS:-0.015, 10 paths (waived) route_opt -incr (reg2reg only) SI + wires 40 ps uncertainty Dont upsize Just Move Allow buffer resizing Remove extra uncertainty (24ps) Don’t move registersCTS-Waveform Balancing
•
Getting around clock cells’ asymmetricism
-
Decision to use same non-equal duty cycle inverter back to back
to avoid pulse width issues
CTS-others
•
Register placement is fixed
•
Fast transition times help speed up Ck-Q timing
- Also reduces setup times at the flops
•
Final duty cycle tolerance -40/60%
Power Analysis
•
Both blocks are in special power domain (not shared by top )
•Target < 3% (i.e. 33mv)
•
IR drop achieved @MLCOM (1.1, 125) is 14 + 17 = 31 mv
Pads
block B
block A
Power EM
•
EM, Rj issues due to high current through buses with
insufficient Vias (Important
run for high speed)
•
ICC custom route tool used to add extra Via2, M2
Signal EM
Statistical EM
Timing clean up( Worst func mode for power ) Simulate/generate vcd /saif file. SAIF based EM
*
Fix Signal EM Iterations Fix Signal EM (If any) Fix minorDRCs/Antennas
Repeat for critical functional
modes.
* fix_signal_em (or) script
STA
Reduced
Timing
Iterations
...Signal EM
Agenda
•
Challenges in High speed Physical Design
-
Design Constraints, Library and Design issues
•
Custom solutions with Synopsys-ICC flow
-
Matrix re-characterization, Synthesis improvements,
placement sensitive flow, CTS waveform balancing ,
signal EM, power
•
Timing/STA correlation results
- Star-RC vs Calibre, ICC vs PT-SI
•
Conclusion/Takeaways
•
Appendix A
Correlation
•
Bottom up flow to make sure ICC settings are close
enough to PrimeTime, Star-RC
(
Solvnet IC Compiler
Correlation Checklist Trilogy
)
•
Extraction Settings
OPERATING_TEMPERATURE: 25, COUPLE_TO_GROUND: NO,
COUPLING_ABS_THRESHOLD: 1e-15 , MODE=400 ,
EXTRACT_VIA_CAPS =YES
•
Noise / Timing Settings
set db_load_ccs_noise_data true,
set timing_crpr_threshold_ps 0
,
Star-RC vs Calibre spef
ICC vs PT-SI slack
•
Block B: ICC (
4ps
) slightly pessimistic vs PT (
2ps
)
#
Pat
hs
0.000 -0.002 0.005 0.011 0.000 -0.004 0.005 0.009#
Pat
hs
Agenda
•
Challenges in High speed Physical Design
-
Design Constraints, Library and Design issues
•
Custom solutions with Synopsys-ICC flow
-
Matrix re-characterization, Synthesis improvements,
placement sensitive flow, CTS waveform balancing ,
signal EM, power.
•
Timing/STA correlation results
- Star-RC vs Calibre, ICC vs PT-SI
•
Conclusion / Takeaways
•
Appendix A
Conclusion / Takeaways
•
Fix Library Issues
- Good range of cells with decent strengths for optimization
-
Cell names must be
user friendly
to limit use (for better EM/IR)
- Larger matrices for setup/pulse timing to prevent timing
pessimism
- Symmetric clock cells tagged with
special naming
-
Don’t use cells should be clearly marked
•
Fix Process Corners (e.g. MLcom , WCLcom)
-
Special situations like Temperature inversion for timing, High
Temp corners for leakage, peak IR drop should be known well in
advance
…Conclusion / Takeaways
•
Think Top level
- Think about next stage, top level
•
Correlate (SolvNet :
IC Compiler Correlation Checklist Trilogy
)
-
Star-RC / ICC extraction should be correlated to device level
-
PT-SI and ICC noise settings should be checked
•
Tune ICC to meet requirements (e.g. custom
placement, custom cts, custom router, etc…)
-
Get to know all options available
- Script for
Reusability
Thanks…
Synopsys Hotline
•
Filed and accepted requests for EM gui and temperature scaling
•Retaining FILLs in soft block while after flattening
•
Ability to check min grid during zroute verify
Others
1.
KhanKap Mounarath – Sr. Scientist, Maxim
2. DSM group/ Library , Maxim EDA
3. Bill Sicaras - Synopsys AC
Appendix A
•
PT-SI and Spice correlation
Spice level simulation performed on the worst path
Startpoint: clk_div_0/div_by2_by4_0/sig_i4_reg
(rising edge-triggered flip-flop clocked by dac_clk1)
Endpoint: clk_div_0/div_by2_by4_0/sig_i4_reg
(rising edge-triggered flip-flop clocked by dac_clk1)
Path Group: dac_clk1
Path Type: max
∑
( launch clock delay + CK-Q delay + combinational delay to the
Endpoint register ) is within 5% for Block B
Appendix B (scripts)
Script used for placement
## Source the common settings for placement and optimization
source common_placement_settings_icc.tcl set placer_max_cell_density_threshold 0.68
## 15% of the clock period which is 200ps is 30ps
## 30ps plus 10ps uncertainty is 40ps overconstraining
set_timing_derate late 1.15
set_clock_uncertainty 0.01 [all_clocks] set_critical_range 0.090 cd18_decoder_dac
## INITIAL PLACEMENT
create_placement effort high congestion congestion_effort high legalize_placement
## FIRST ROUND OF optimizations
set_dont_touch [get_cells * ] set_dont_touch [get_nets * ] psynopt
## tighten the output paths
set_clock_uncertainty 0.015 [all_clocks] set_clock_latency 0.200 [get_clocks dac_clk] set_clock_latency 0.100 [get_clocks dac_clko] psynopt
## SECOND ROUND OF optimization
## Remove the dont touches and let the tool optimize the ## timing more . ( upsize cells etc. )
remove_attribute [get_cells hier *] dont_touch quiet remove_attribute [get_nets hier *] dont_touch quiet
## do not optimize some sensitive logic
set_dont_touch [get_cells U*] psynopt
Appendix B
Script used for CTS
#
DON’T MOVE CAREFULLY PLACED CELLSset_dont_touch_placement [get_cells hier *_reg* ] set_attribute [get_cells hier spr*] is_fixed true
remove_clock_tree clock_trees { dac_clk dac_clko} honor_dont_touch reset_clock_tree_references
define_routing_rule decoder_clk_shield_rule default_reference_rule taper_level 0 multi
lier_width 2 multiplier_spacing 1 shield
## CONTROL TRANSITION FOR CLOCKS
## RELAX BUFFER LEVLES TO l help fix fanout
set_clock_tree_options layer_list $runOption(input,clkRoutelayerList) routing_rule ecoder_clk_shield_rule use_default_routing_for_sinks 1 target_skew 0.010
max_buffer_levels 9 max_transition .024
set_clock_tree_options clock_trees dac_clk routing_rule decoder_clk_shield_rule \ use_default_routing_for_sinks 1 target_skew 0.010 max_buffer_levels 9
set_max_fanout 2 [get_ports dac_clk] set_max_fanout 2 [get_ports dac_clko]
## Tighter transition on output clk. timing is ok.
set_clock_tree_options clock_trees dac_clko max_buffer_levels 3 max_transition 0.022 check_clock_tree clocks dac_clk
report_clock_tree summary clock_trees dac_clk level_info report_clock_tree show_all_sinks
report_clock_tree settings > clktree/settings.rpt update_clock_latency
## Turn on removal and recovery check ##
set enable_recovery_removal_arcs true
## Perform clock tree synthesis only