• No results found

Pa2 Jay Pres

N/A
N/A
Protected

Academic year: 2021

Share "Pa2 Jay Pres"

Copied!
30
0
0

Loading.... (view fulltext now)

Full text

(1)

Ultra High Speed (5Ghz) Block Custom

Physical Design Flow with ICC

Prakash Jayasekharan

Senior PD Engineer

Suman Musunuru

Senior Design Engineer

(2)

Agenda

Challenges in High speed Physical Design

-

Design Constraints, Library and Design issues

Custom solutions with Synopsys ICC flow

-

Matrix re-characterization, Synthesis improvements,

placement sensitive flow, CTS waveform balancing,

Signal EM, power

Timing/STA correlation results

- Star-RC vs Calibre, ICC vs PT-SI

Conclusion/Takeaways

Appendix A

(3)

Design Constraints

65nm SOC design

- 2.4 Million gate

- Block A and Block B @5GHz (200ps period)

- 5% late, 10% early Derating (both clock and data), 5% Jitter

- Target skew ~15ps

Transition ~20ps

Pulse width ~ 80ps

- IR < 3% Peak

(4)

Library Issues

Re-characterization of timing libraries

-

Traditional library tables produce pessimism in timing delay

calculation (setup/delays worst by 10ps at least)

.lib

(5)

...Library issues

Extra pessimism not tolerable because

-

10ps for each cell gets added to become significant

-

Paths become too tight to fix

Library is mostly made of weak drive strength buffers,

complex gates. Realistic fanout <5

Asymmetric clock cells cause low pulse width

IR drop not part of timing delay tables in .lib

(6)

Design Issues

Alternative lower frequency architecture not done

- Will consume 2x area and power

Very good skew and transition times required

-

Very fast transition => higher switching power

=> higher insertion delay

-

Weak clock tree cells cause more insertion delay

> 70% of the logic is sequential. Setup (reg2reg)

timing is critical

(7)

...Design Issues

• Small coupling caps (1fF) due to size of design

- Small nets in the design do not get extracted and can be

dropped . Use

coupling_abs_threshold

to reduce thresh

• 4 corners for IR/EM, 3 corners for Timing

- highV, high Temp added finally for IR/EM

Voltage

Temp

Tag

Description

0.9

125.0

WCCOM

Traditional

worst case

timing

1.1

-40

LTCOM

Traditional best

case timing

0.9

-40

WCLCOM

Temp inversion

corner

1.1

125

MLCOM

Worst

EM/IR/Leakage

T

e

m

p

Voltage

0.9 1.1 125C -40C

(8)

Agenda

Challenges in High speed Physical Design

- Design

Constraints, Library and Design issues

Custom solutions with Synopsys ICC flow

-

Matrix re-characterization, Synthesis improvements,

placement sensitive flow, CTS waveform balancing,

signal EM, power

Timing/STA correlation results

- Star-RC vs Calibre, ICC vs PT-SI

Conclusion/Takeaways

Appendix A

(9)

Matrix re-characterization

timing() { related_pin : "cp" ; timing_type : setup_rising ; fall_constraint(cnst_ctin_rtin_3x3) { index_1("0.003, 0.2019, 0.9"); index_2("0.003, 0.2019, 0.9"); values("0.00995, 0.0199, 0.06965",\ "0.08955, 0.1095, 0.2089",\ "0.2189, 0.1791, 0.3184"); }

B

E

F

O

R

E (3x3)

A

F

T

E

R ( 10x10)

10x10 reduces extra

pessimism

timing() { related_pin : "cp" ; timing_type : setup_rising ; fall_constraint(cnst_ctin_rtin_10x10) { index_1("0.003, 0.009191, 0.03092, 0.07243, 0.1371, \ 0.2278, 0.3472, 0.4976, 0.6812, 0.9"); index_2("0.003, 0.009191, 0.03092, 0.07243, 0.1371, \ 0.2278, 0.3472, 0.4976, 0.6812, 0.9"); values("0.00995, 0.00995, 0.00995, 0.00995, 0.00995, 0.0199, 0.02985, 0.0398, 0.04975, 0.06965",\ "0.0199, 0.0199, 0.00995, 0.0199, 0.0199, 0.0199, 0.02985, 0.0398, 0.0597, 0.06965",\ "0.02985, 0.02985, 0.02985, 0.02985, 0.02985, 0.0398, 0.04975, 0.0597, 0.06965, 0.08955",\ "0.04975, 0.04975, 0.0398, 0.04975, 0.0597, 0.06965, 0.0796, 0.08955, 0.1095, 0.1194",\ "0.06965, 0.06965, 0.0597, 0.06965, 0.0796, 0.0995, 0.1095, 0.1293, 0.1492, 0.1691",\ "0.08955, 0.08955, 0.0796, 0.08955, 0.0995, 0.1194, 0.1393, 0.1691, 0.199, 0.2288",\ "0.1194, 0.1095, 0.0995, 0.1095, 0.1194, 0.1393, 0.1592, 0.189, 0.2288, 0.2686",\ "0.1492, 0.1393, 0.1194, 0.1293, 0.1393, 0.1492, 0.1791, 0.2089, 0.2487, 0.2885",\ "0.1791, 0.1791, 0.1492, 0.1492, 0.1592, 0.1691, 0.189, 0.2288, 0.2587, 0.3085",\ "0.2189, 0.2189, 0.1791, 0.1791, 0.1791, 0.189, 0.2089, 0.2388, 0.2786, 0.3184");

}

(10)

Synthesis Improvements

Very slow cells like XOR, 4:1 Mux, AOI gates prohibited

- some sensitive logic hand instantiated to prevent AOI or XOR

selection

Register Cloning/Fanout optimization to reduce fanout

- 10-15% increase in sequential area, but helps reduce flop delay

- set_register_replication (DC) can be used

Load Cap =C Load Cap =C/2 Load Cap =C/2

(11)

Placement Sensitive Flow

Cell placement is closely controlled in all stage

Bad timing due to:

- Placement of cells due to loose constraints

- High buffer insertion to close timing

Clocks over-constrained by 10% and incremental

psynopts improves timing

- Best possible flop placement achieved

Clock latency set to simulate post-cts derating in

(12)

Placement..

Default timing flow

create_placement +

psynopt

WNS :-0.05, 50 paths

clock_opt

route_opt +

route_opt -incr

WNS:-0.10, 60 paths WNS:-0.18, 90 paths

Derating

SI+ Wires

(13)

Placement..

PSFlow

create_placement+ psynopt WNS :-0.05, 50 paths clock_opt –only_cts route_opt+ route_opt -incr WNS:+0.005,10 paths psynopt(1) psynopt(2) WNS:-0.10, 80 paths WNS:-0.025, 50 paths WNS:-0.08, 20 paths WNS:-0.015, 10 paths (waived) route_opt -incr (reg2reg only) SI + wires 40 ps uncertainty Dont upsize Just Move Allow buffer resizing Remove extra uncertainty (24ps) Don’t move registers

(14)

CTS-Waveform Balancing

Getting around clock cells’ asymmetricism

-

Decision to use same non-equal duty cycle inverter back to back

to avoid pulse width issues

(15)

CTS-others

Register placement is fixed

Fast transition times help speed up Ck-Q timing

- Also reduces setup times at the flops

Final duty cycle tolerance -40/60%

(16)

Power Analysis

Both blocks are in special power domain (not shared by top )

Target < 3% (i.e. 33mv)

IR drop achieved @MLCOM (1.1, 125) is 14 + 17 = 31 mv

Pads

block B

block A

(17)

Power EM

EM, Rj issues due to high current through buses with

insufficient Vias (Important

run for high speed)

ICC custom route tool used to add extra Via2, M2

(18)

Signal EM

Statistical EM

Timing clean up( Worst func mode for power ) Simulate/generate vcd /saif file. SAIF based EM

*

Fix Signal EM Iterations Fix Signal EM (If any) Fix minor

DRCs/Antennas

Repeat for critical functional

modes.

* fix_signal_em (or) script

STA

Reduced

Timing

Iterations

(19)

...Signal EM

(20)

Agenda

Challenges in High speed Physical Design

-

Design Constraints, Library and Design issues

Custom solutions with Synopsys-ICC flow

-

Matrix re-characterization, Synthesis improvements,

placement sensitive flow, CTS waveform balancing ,

signal EM, power

Timing/STA correlation results

- Star-RC vs Calibre, ICC vs PT-SI

Conclusion/Takeaways

Appendix A

(21)

Correlation

Bottom up flow to make sure ICC settings are close

enough to PrimeTime, Star-RC

(

Solvnet IC Compiler

Correlation Checklist Trilogy

)

Extraction Settings

OPERATING_TEMPERATURE: 25, COUPLE_TO_GROUND: NO,

COUPLING_ABS_THRESHOLD: 1e-15 , MODE=400 ,

EXTRACT_VIA_CAPS =YES

Noise / Timing Settings

set db_load_ccs_noise_data true,

set timing_crpr_threshold_ps 0

,

(22)

Star-RC vs Calibre spef

(23)

ICC vs PT-SI slack

Block B: ICC (

4ps

) slightly pessimistic vs PT (

2ps

)

#

Pat

hs

0.000 -0.002 0.005 0.011 0.000 -0.004 0.005 0.009

#

Pat

hs

(24)

Agenda

Challenges in High speed Physical Design

-

Design Constraints, Library and Design issues

Custom solutions with Synopsys-ICC flow

-

Matrix re-characterization, Synthesis improvements,

placement sensitive flow, CTS waveform balancing ,

signal EM, power.

Timing/STA correlation results

- Star-RC vs Calibre, ICC vs PT-SI

Conclusion / Takeaways

Appendix A

(25)

Conclusion / Takeaways

Fix Library Issues

- Good range of cells with decent strengths for optimization

-

Cell names must be

user friendly

to limit use (for better EM/IR)

- Larger matrices for setup/pulse timing to prevent timing

pessimism

- Symmetric clock cells tagged with

special naming

-

Don’t use cells should be clearly marked

Fix Process Corners (e.g. MLcom , WCLcom)

-

Special situations like Temperature inversion for timing, High

Temp corners for leakage, peak IR drop should be known well in

advance

(26)

…Conclusion / Takeaways

Think Top level

- Think about next stage, top level

Correlate (SolvNet :

IC Compiler Correlation Checklist Trilogy

)

-

Star-RC / ICC extraction should be correlated to device level

-

PT-SI and ICC noise settings should be checked

Tune ICC to meet requirements (e.g. custom

placement, custom cts, custom router, etc…)

-

Get to know all options available

- Script for

Reusability

(27)

Thanks…

Synopsys Hotline

Filed and accepted requests for EM gui and temperature scaling

Retaining FILLs in soft block while after flattening

Ability to check min grid during zroute verify

Others

1.

KhanKap Mounarath – Sr. Scientist, Maxim

2. DSM group/ Library , Maxim EDA

3. Bill Sicaras - Synopsys AC

(28)

Appendix A

PT-SI and Spice correlation

Spice level simulation performed on the worst path

Startpoint: clk_div_0/div_by2_by4_0/sig_i4_reg

(rising edge-triggered flip-flop clocked by dac_clk1)

Endpoint: clk_div_0/div_by2_by4_0/sig_i4_reg

(rising edge-triggered flip-flop clocked by dac_clk1)

Path Group: dac_clk1

Path Type: max

( launch clock delay + CK-Q delay + combinational delay to the

Endpoint register ) is within 5% for Block B

(29)

Appendix B (scripts)

Script used for placement

## Source the common settings for placement and optimization

source common_placement_settings_icc.tcl set placer_max_cell_density_threshold 0.68

## 15% of the clock period which is 200ps is 30ps

## 30ps plus 10ps uncertainty is 40ps overconstraining

set_timing_derate late 1.15

set_clock_uncertainty 0.01 [all_clocks] set_critical_range 0.090 cd18_decoder_dac

## INITIAL PLACEMENT

create_placement effort high congestion congestion_effort high legalize_placement

## FIRST ROUND OF optimizations

set_dont_touch [get_cells * ] set_dont_touch [get_nets * ] psynopt

## tighten the output paths

set_clock_uncertainty 0.015 [all_clocks] set_clock_latency 0.200 [get_clocks dac_clk] set_clock_latency 0.100 [get_clocks dac_clko] psynopt

## SECOND ROUND OF optimization

## Remove the dont touches and let the tool optimize the ## timing more . ( upsize cells etc. )

remove_attribute [get_cells hier *] dont_touch quiet remove_attribute [get_nets hier *] dont_touch quiet

## do not optimize some sensitive logic

set_dont_touch [get_cells U*] psynopt

(30)

Appendix B

Script used for CTS

#

DON’T MOVE CAREFULLY PLACED CELLS

set_dont_touch_placement [get_cells hier *_reg* ] set_attribute [get_cells hier spr*] is_fixed true

remove_clock_tree clock_trees { dac_clk dac_clko} honor_dont_touch reset_clock_tree_references

define_routing_rule decoder_clk_shield_rule default_reference_rule taper_level 0 multi

lier_width 2 multiplier_spacing 1 shield

## CONTROL TRANSITION FOR CLOCKS

## RELAX BUFFER LEVLES TO l help fix fanout

set_clock_tree_options layer_list $runOption(input,clkRoutelayerList) routing_rule ecoder_clk_shield_rule use_default_routing_for_sinks 1 target_skew 0.010

max_buffer_levels 9 max_transition .024

set_clock_tree_options clock_trees dac_clk routing_rule decoder_clk_shield_rule \ use_default_routing_for_sinks 1 target_skew 0.010 max_buffer_levels 9

set_max_fanout 2 [get_ports dac_clk] set_max_fanout 2 [get_ports dac_clko]

## Tighter transition on output clk. timing is ok.

set_clock_tree_options clock_trees dac_clko max_buffer_levels 3 max_transition 0.022 check_clock_tree clocks dac_clk

report_clock_tree summary clock_trees dac_clk level_info report_clock_tree show_all_sinks

report_clock_tree settings > clktree/settings.rpt update_clock_latency

## Turn on removal and recovery check ##

set enable_recovery_removal_arcs true

## Perform clock tree synthesis only

References

Related documents

Directions for Questions 5-6: In each questions below are given two statements followed by two conclusions numbered I and II. You have to take the given two statements to be true

*Remember to let your doctor know about other prescribed medications or natural therapies that your child is taking as they may interact with their arthritis

Share data across multiple ASTRO 25 networks using the Inter-RF Subsystem Interface (ISSI) so you can track resources, monitor personnel, send text when first responders are on a

Each robot has a status record including its position and also the olfactory environmental data acquired from its sensors. Due to localization method, using ZigBee wireless

Los Angeles has nearly four times as many chronically homeless individuals as New York City, which has the second largest number of chronically homeless individuals (3,275)”

This study examines the challenges that women in the taxi business face as entrepreneurs in the mini-bus industry and observes the strategies that government will implement in

Surveys were collected and analyzed by CDA to discover what forms of Islamic humanism (as articulated by Al-Ghazali) are already as part of Arab university teaching and

A brewer desiring to operate a tavern as an alternate use of brewery premises, shall submit a Brewer's Notice, Form 5130.10 containing the information required by paragraph (b) of