Via Circuit Retiming 1
Brian Lo ckyear and Carl Eb eling
Departmentof ComputerScienceandEngineering
UniversityofWashington
Seattle,Washington98195
TechnicalRep ort 93-5-04
May,1992
1
ThisresearchwasfundedinpartbytheDefenseAdvancedResearchProjectsAgencyunderContractN00014-J-91-4041.
CarlEb elingissupp ortedinpartbyanNSFPresidentialYoungInvestigatorAwardwithmatchingfundsprovidedbyIBM
Clo ckskewisoftencitedasanimp edimenttodesigninghigh-p erformancesynchronouscircuits.Clo ck
skewreduces the p erformance ofa circuitbyreducing the timeavailable forcomputation,and itmay
even cause circuitfailure by allowing race conditions. In this pap er, we show how to use retimingto
reduce the eect of clo ck skew inb oth edge-clo cked and level-clo ckedcircuits. We include b oth xed
clo ck skew, forexample skew whichresults from clo ck distributionwiring and buering, and variable
clo ckskewwhichresultsfromuncontrolledvariationinpro cessparameterandop eratingconditions. By
including xed skew inthe maximum constraint equations retiming can nd the fastest circuitgiven
that skew. By including variableskew, retiming canalso generate the circuitthat tolerates themost
variationinclo ck skew foragiven clo ckfrequency. Clo ck skew can alsob e used to sp eed up circuits
using atechnique similarto retimingdescrib ed by Fishburn [2]. We describ ea metho d forcombining
this techniquewhichuses added clo ckskewwithretimingto optimizethe p erformanceofedge-clo cked
circuits.
1 Introduction
Retimingis acircuitoptimizationtechnique well-knownto hardware designerswhichisused to reducethe
clo ckp erio d ofacircuit byrelo cating registersto maximizetheamountof computationdoneineachclo ck
p erio d. For example, in order to pip eline a circuit,a designer places the registers to spread computation
outevenlyoverseveralcycles,therebyminimizingthecyclep erio d. Whileretiminghaslongb eenanadho c
technique used in design, it was not until 1983 that an ecient algorithmwas rep orted for automatically
retimingcircuits[4].Inthiswork,Leiserson,RoseandSaxeformalizedretimingasanoptimizationtechnique
that improvedthe p erformance of a circuit without changing its external b ehavior. This early work was
limitedtocircuitsusingedge-triggeredregisters. However,high-p erformancesystemsmakewideuseof
level-sensitive latchesto allowcomputations to b orrowtime across clo ck cycle b oundaries. Only rece ntly have
ecientalgorithmsforretimingb eendescrib ed fortheselevel-clo ckedcircuits[7,3].
In this pap er we extend retimingto handle theproblem of clo ck skew inb oth edge-clo cked and
level-clo cked circuits. Clo ck skew slows down circuits by reducing the time available for computation and in
some cases itmaycause raceconditions. Ascircuits b ecome faster, clo ck skew willb ecome evenmore ofa
problemforsynchronous circuit design. Clo ckskewcan b edividedintotwocomp onents: xedskew,which
isbuiltintothedeliveryoftheclo ck,andvariableskew,whichiscausedbyvariationsinpro cessparameters,
temp erature, p ower supply voltage and other op eratingconditions. Fixed skew is that which the designer
controls or at least can measure in the clo ck distribution tree, while variable skew is the unpredictable
variationindelaythatcano ccurontheclo cksignal. Includingclo ckskewintheretimingconstraintsallows
theretimingtondfastercircuits andalsoincrease tolerancetoclo ckskew variation.
We b egin in Section 2 byreviewing the circuit and clo ck mo delused by theretiming algorithms. We
thendescrib e howclo ckskew isaddedtothismo del,alongwith synchronizer 2
delayandsetuptime.
Theremainderofthepap erdescrib e sfourrelatedresults. Section4rstdescrib estherelatively
straight-forwardextension to retiming that addsclo ck skew to theedge-triggered circuit mo deland retiming
algo-rithmsdevelop edby[4].Section5thenextendsourlevel-clo ckedretimingalgorithms[7]toincorp orateclo ck
skewaswell. Inneithercase do estheinclusionofclo ckskewincreasetheasymptotictimecomplexityofthe
algorithms. Thesetwosectionsshowthat xedclo ckskew can b etreated as mo difyingthedelayof circuit
elements. Thus retimingcan generate thefastest circuit given axed skew, which can b e faster or slower
thanthefastestp ossible circuitwith zeroclo ckskew. Retimingalso ensuresthat thecircuit generatedwill
op erate correctlyinthepresence ofvariableskew,whichalwaysslowsdownthecircuit.
Section 6 describ es how level-clo cked circuits can b e retimedto maximizethe toleranceto variationin
clo ckskewinlevel-clo ckedcircuits. Althoughminimizingtheclo ckp erio disequivalenttomaximizingclo ck
skew toleranceinedge-clo ckedcircuits, thisisnotalwaystrueforlevel-clo ckedcircuits.
Finally, Section 7 shows how xed clo ck skew can b e used to actually increase the p erformance of
an edge-clo cked circuit. In a technique closely related to circuit retiming presented by Fishburn in [2],
2
computationaldelayb etween selected registersattheexp ense ofreduce ddelayb etweenothers. Foraxed
registerplacementinanedge-triggeredcircuit,FishburnpresentsaLinearProgramthatsolvesfortheskews
requiredtopro ducetheminimump ossibleclo ckp erio d.Wepresenthereourpreliminaryworkoncombining
retimingwithintentionalclo ck skew topro duce faster circuits thancan b epro duced witheither technique
usedalone.
2 Background
We b eginthispap erwithareviewofthecircuitandclo ckmo delspresente din[4,7]foredge-triggeredand
level-clo ckedcircuits. Thereader isencouragedtoread theseearlierpap ers forfulldetails.
2.1 Circuit Graph Model
A circuit is represe nted as a graph with avertex, v , for each functionalelement and anedge, u e
! v , for
eachinterconnecting wire. Eachvertex hasdelay,d(v ),themaximumdelayofthecorrespondingfunctional
element. A unique host vertex v
h
, with d(v
h
) 0, represe nts the environment external to the circuit.
Registersandlatchesareplacedontheedgesconnectingverticesandeachedgehasaweight,w (e),indicating
thenumb erofregistersor latchesontheconnection.
A pathu p
!!vis asequenceof vertices andedgesfromutov . A simplepathcontainsnovertex twice.
Theweightw (p)of apathp=v
0 e 0 !v 1 e 1 !111 e n01 ! v n
isthe numb erof registersor latchesplaced alongit,
that is, the sumof theedge weights: w (p) = P
n01
i=0 w (e
i
). Wesay that registers or latchesare adjacent if
thepathconnectingthemhaszeroweight. Theweightofacycleistheweightofthesamesequenc eofedges
and vertices treated as apath. Similarly,the delayof apath d(p)is the sumof thedelays ofthe vertices
alongthepath: d(p)= P
n
i=0 d(v
i
). Thedelayof acycled(c),c=v
0 e 0 !v 1 e 1 !111v n01 en01 ! v 0 , includesthe delayofvertexv 0
onlyonce;hence d(c)= P n01 i=0 d(v i ).
A circuitGistransformedintoacorresp ondingretimedcircuit G
r
throughassignmentofaretiming(or
lag)valuer (v ) toeachofthevertices inG. Thisretimingvaluereprese ntsthenumb erofregisters(latches)
removedfromthe outputedgesof vertex v andaddedto theinput edges. Theresultingweightof anedge
u e
!v intheretimedgraphis: w
r
(e)=w (e)+r (v )0r (u).
2.2 Clock Model
Forourretimingworkwe haveadopted theclo ckmo delofSakallah,Mudge&Olukotun[8]whichprovides
aconvenientwaytodescrib e theresultingtimingconstraints. Ak-phase clockisasetofkp erio dicsignals,
8=f 1 ::: k g,where i
isphaseioftheclo ck8. All
i
haveacommoncycletimeT
8
. An edge-triggered
circuithasasingleclo ckphase andvaluesarepassed throughtheregistersoncep ercycleonthe\clo cking"
edge. Forlevel-clo ckedcircuits,eachphasedividestheclo ckcycleintotwointervalsasshowninFigure1:an
activeintervalofdurationT
i
andapassiveintervalofduration(T
8 0T
i
). Thelatchescontrolledbyaclo ck
phase are enabled during its active intervaland disabled during its passive interval. The clo ck transitions
intoand out of theactive interval arecalled theenabling and latching edgesresp ec tively. We refer to the
clo ckphasecontrollinglatch l asP(l ).
Relative to the b eginningof its passive intervalat time t =0, the enablingedge of a phase o ccurs at
t=T
8 0T
i
,anditslatchingedgeatT
8
(Figure1). Sakallahetal.additionallyintro duceanarbitraryglobal
timereferenceandvaluese
i
denotingthetime relative to theglobaltimereference at which phase
i ends
for some sp ecied cycle. Phases areordered relative to the globaltime reference so that e
1 e 2 111 e k 01 e k ande k T 8
. Thephasefollowing
i
intheclo cksetisreferredtoas
i+1 with phase k +1 1 and 101 k .
PassiveInterval -ActiveInterval -0 (T 8 0T i ) T8 i @ @ R 0 0 9
Figure1: Diagram fromSakallahet al.showing aclockphase
i
and itslocaltimezone.
i e i T i - j e j T j -E i;j
-Figure 2: The phaseshift operator provides the relative dierencebetween timesin the localtime zones of
dierentphases.
Finally,aphaseshiftop erator,E
i;j ,isdened as: E i;j (e j 0e i ); fori<j (T 8 +e j 0e i ); forij E i;j
takesonp ositivevaluesintherange(0;T
8
]. Whensubtracte dfromatimep ointinthecurrentp erio dof
i
,itchangestheframeofreferenc e tothenextp erio dof
j
,takingintoaccountap ossiblecycleb oundary
crossing(Figure2).Becausethep erio dofeachphaseisidenticalande
i e
i01
,thesumoftheshiftsb etween
ksuccess ivephases isT
8 : k X i=1 E i;i+1 =T 8 : (1)
Asymmetriclevel-clo ckedschedule isoneinwhichallactivephasep erio dsT
i
areequalandallphaseshifts
E i;i+1 = T 8 k .
Thelatestarrivaltimeof asignalatlatchl isdenotedbyA
l
andthelatestdeparturetime byD
l , b oth
interms ofthelo caltimezone.
Note that this clo ckmo deldo es not providefor clo ckphases withdieringp erio ds norfor gatedclo ck
signals. For simplicity, we assume that the delay characteristics of synchronizers do not vary as they are
movedacrosscombinationallogic.
2.3 Correct Operation, Valid Schedules and Well-formed Circuits
We dene a circuit to b e correc tly timed if for any pair of adjacent synchronizers the signal leaving the
rstarrives atthesecond duringthenext clo ck p erio d inedge-triggeredcircuits or thenext clo ckphasein
level-clo ckedones. Thusthedenitionofcorrectnes sforalevel-clo ckedcircuitisastraightforwardextension
of the denitionof correct op eration commonly used foredge-clo cked circuits. Timingcorrec tnes s can b e
summarizedas apairoftimingconstraintsforeach case(illustrated forlevel-clo ckedcircuitsinFigure3).
Foranytwoadjacentregistersrandsinanedge-clo ckedcircuit:
E1. MaximumDelay: A
s =d(p)T 8 E2. Non-interference: A s =d(p)>t hold
P(l ) P(m) 8 - 9 b
Figure3:Graphicalrepresentationoftheconstraintsontheclockphasesthatarerequiredforcorrectoperation
ofawell-formedlevel-clockedcircuit. Latcheslandmareanypairconnectedbyazero-weightpathbeginning
froml .
Foranytwoadjacentlatchesl andminalevel-clo ckedcircuit:
L1. MaximumDelay: A m =D l +d(p)0E P(l);P(m ) T 8 L2. Non-interference: A m =D l +d(p)0E P(l);P(m) >t hold
These constraintsassume that clo ck skew, synchronizer propagation delay and setuptime are allzero.
Theseparameters willb eaddedlaterinthispap er.
Theretimingtechniquesin[7]restrictthetyp eofclo cksandcircuitsthatcanb eretimed. Clo ckschedules
must b e \valid"so that only maximumdelayconstraints need to b e satised for correct op eration. Valid
clo ckschedules donotallowraces to o ccureven ifallcircuit delaysare zero. This isguaranteed ifforany
twolatchesl andmconnected byazero-weightpathl !!m,E
P(m);P(l) >T
l
. Ingeneral,theclass ofvalid
clo cksincludes schedules withphase overlap,underlap,or b oth;however, two-phase clo cksare required to
b enon-overlappingandnosinglephaseschedules areallowed.
In addition, level-clo cked circuits must b e \well-formed," that is, latches must o ccur in phase order
alongeverypath. 3
Retimingisalwaysfree tomovelatchesacrossverticesinwell-formedcircuitgraphsand
well-formedcircuitsremainwell-formedfollowingretiming. Furthermore,theminimumnumb erofregisters
required onapathcanb e computedsolelyfromthepathdelayand thephase oftherstlatch.
Sincethisworkextendsourpreviouswork,theserestrictionsonclo ckschedulesandlevel-clo ckedcircuits
alsoapplytothispap er. Althoughthedesignerisconstrainedtoworkwithinthismo del,itappliestomost
circuits commonlyusedinpractice.
3 Clock Skew Parameters
We use two parameters to mo delthe xed clo ck skew:
r
(l ) and
f
(l ), the xed skew of the rising and
fallingedgeresp ectively oftheclo ckat latchl . 4
Inedge-triggeredcircuits onlyone clo ckedgeisofinterest,
thus (r ) isused to represent theskew ofthat edge at register r . Ifskew is p ositivethe clo ckarrives late
relative tothe reference clo ckand ifnegative it arrives early. For anedge e,
r
(e) and
f
(e) are thexed
skewofaclo ckatthephysicalcircuitlo cationofthewirereprese ntedbytheedge. Addingtotheclo ckskew
eectivelymovesanequivalentamountofdelayfromthefaninsofthesynchronizerto thefanouts.
Variable skew o ccurs due to variations incircuit fabricationand op erating conditions. Themaximum
amount by which these variations may shift clo ck edges in either direction away from the xed skew is
mo deledbytheparameter
. Thusarisingedgemayarriveaslateas
r (l )+ andasearlyas r (l )0 .
Inthisworkweassume thatvariableskewisthesameacrossthecircuit,althoughourtechniques caneasily
b eextende d tomakethevariableskew dep end onphysical lo cation.
Inorderforustob eabletousetheecientalgorithmswe havealreadydevelop edforretimingcircuits,
wemustmakethefollowingrestrictiononclo ckskew. Therelativeclo ckskewb etweentheinputandoutput
edges of avertex,that isthe dierence inthe clo ckskew b etween the synchronizers on these edges, must
3
Asimpleexceptiontothisruleallowsinputandoutputsignalsatthehostno detoo ccurondierentphasesaslongas
timingconstraintsacrossthehostarenotimp osed.
4
Prop erlytheseparametersshouldrefertotheenablingandlatchingedgesratherthanrisingandfalling,however,theletters
eandlarecommonlyusedelsewhereinthetexttorefertoedgesandlatches.Weassumethattheenablingedgeistherising
itensures that we neednotconsider thecase where increasingthelengthof apathreduces thenumb er of
synchronizers required. It is easy to see that this constraint is met by practical circuits. Note that this
do es not imp oseaminimumonthecombinationaldelay. If theactual delayis less thanthe skew, amore
conservativecircuit thannecess arywillb egeneratedbythealgorithms. Formally,therequirementisstated
as:
Forevery vertex v andpairof edgese2fanin(u)ande2fanout(u):
SR: r (e )0 r (e)d(u) SF: f (e )0 f (e)d(u)
Inadditiontoparametersforclo ckskew,itisstraightforwardtoaddregisterorlatchpropagationdelay,
P, and required setup time, S, into our mo del. We make the simplifying assumption that P and S are
the same for all synchronizers in the circuit. Algorithms for asymmetric level-clo cked circuits in [7, 3]
couldb eextendedtosolveproblemsinwhichpropagationdelaysandsetuprequirementsvarywithphysical
placement of synchronizers, with the exception of propagation delayin level-clo cked circuits, alimitation
discussed inSection 5.
4 Clock Skew in Edge-Triggered Circuits
This sectionincorp oratesthe parametersof clo ck skew,register propagationdelayand setuptime intothe
edge-triggeredtimingalgorithmsdevelop edin[4]. Whilethisisarelativelysimpleextensionoftheprevious
work,it providesthenecessary rststep towards algorithmswhich combineretimingand intentionalclo ck
skew (Section7)andabasisofunderstanding fortheeectsof skewonlevel-clo ckedcircuits.
Register propagationdelayandsetuptimedirectlyreduce themaximumdelayallowedonazero-weight
path. Propagationdelaycausesthesignalattheb eginningofthepathtob edelayedfromtheclo ckingedge
by P, while setuptime requires that signalsarrive S time priorto thenext clo cking edge. Thus thetime
availableforcomputationisreduced byb othP andS.
Clo ck skew increases themaximumdelay allowedonapathb etween tworegisters iftheclo ckingedges
o ccur farther apart anddecrease s it ifthey o ccur closer together. The latestp ossible arrivalof aclo cking
edgeatregisterris (r )+
andtheearliestatsis (s)0
. Thusthemaximumdelayallowedfromrto
sisincreased by (s)0 (r )02
. Thusthemaximumdelayconstraintforeach pairofadjacentregisters
randsinanedge-clo ckedcircuitb ecomes:
E1'. MaximumDelay: A
s
=d(p)T
8
0S0P+ (s)0 (r )02
Clo ck skew can also cause edge-triggeredcircuits to failbycausing race conditionsdue to short paths.
Thedelayof thepathis increased bytheregister propagation delay P,but the relativeskew b etween the
twoclo cksdecreases theeective holdtime oftheregister s. Thustheminimumdelayconstraintb ecomes:
E2'. Non-interference: A s =P+d(p)>t hold + (s)0 (r )+2
Wecan enforcetheseconstraintswithastaticcheck thatensures thatE2'holds indep endently foreach
vertexinthecircuit. This canb e doneb ecausethexedskewisasso ciated withedges,notregisterswhich
mayb e movedby theretiming. Ifthis constraint holdsfor all vertices, then it holds for alllonger paths.
ThusE2'holdsforallretimingsofthecircuitand minimumdelayretimingconstraintsarenotrequired.
ToretimeacircuittosatisfythedelayconstraintE1',thevalueofr (u)0r (v )isconstrainedsothatthere
isat leastone registeroneachpath u p
!!v whosedelayexceeds that givenbyE1'. That is, foreach path
u p
!!v andpairofedgese2fanin(u)and e2fanout(v ),if:
d(p) > T
8
0P0S+( (e )0 (e))02
register. Ifregisters are notactually placed on edgese and etheskew of those edgeswillnot impactthe
clo ckp erio d;however,theclo ckskewmonotonicityconstraintpreventsthep ossibilitythat placingregisters
farther apartwillreduceskewmorethanthepathdelayisincreased. Thusitisonlyp ossibleto satisfythe
circuittimingconstraintsbyplacingregisterscloser together,i.e.byrequiringtheweightofthepathinthe
retimedcircuit tob eatleastone.
Clearly ap otentiallyexp onentialnumb erof pathconstraints exist;however,b ecause each constraint is
oftheform: r (u)0r (v )C(u p !!v ); whereC(u p
!!v)istheconstantw (p)01,itisp ossibletoidentifycriticalpathswhichresultinaminimum
valueofC(u p
!!v). SatisfactionofthesmallestvalueofC(u p
!!v)impliesthatallotherconstraintsonthe
same pairof variables are satised as well. Because propagation delay, setup and skew areunaected by
selection of thepath joiningu andv , acriticalpath premainsone with maximumdelay ofallpaths with
minimumweight as identied in [4]. W(u;v ) and D (u;v ) are dened as the weight and delay of critical
paths. Thecriticalfaninandfanoutedgesarethoseforwhich:
max
(u) = maxf (e) : e2fanin(u)g
min
(v ) = minf (e) : e2fanout(v )g:
Thus the followingtheorem allows identication of aretiming Rwhich satises all circuit timing
con-straintsinthepresenc e ofregisterpropagationdelay,setuptimeandclo ckskew.
Theorem1: An edge-triggeredcircuit GunderretimingRiscorrectlytimedi:
1)Forallverticesuand v connectedby apath inGsuch that
D (u;v )>T 8 0P0S+( min (v )0 max (u))02 ; r (u)0r (v )W(p)01;
and 2)Forall edgesu e
!v2G: r (u)0r (v )w (e):
Proof Sketch: Satisfying the criticalpath constraints guarantees that thetiming constraints inEqn.2
are satised. Alternatively,ifacritical pathconstraint isnotsatised,skew monotonicityguarantees that
some timingconstraint fromEqn.2is alsounsatised. Theedgeconstraintsprevent negative weightedges
whicharephysicallymeaningless.
The retiming constraints presented in Theorem 1are inthe form of anInteger Linear Program(ILP)
and may b e solved directly using the Bellman-Ford technique or translated to a form solvable using the
Mixed-ILP or Feas algorithms presente d in[5]. Because the clo ck p erio d is determined by some timing
constraint relationshipexpressed inEqn.2 exactly met by thecircuit, the minimump ossible clo ck p erio d
existsintheset:
T 8 opt 2fD (u;v )+P+S0( min (v )0 max (u))+2 g foruandv2V.
Abinarysearchoverthissetidentiestheoptimumclo ckp erio d. Theminimump erio dandaretimingtoit
mayb e foundinO (jVj 2
logjVj)timebyusingthesimpleFeasalgorithmfrom[5]forconstraintsolution.
Figure4showsacorrelatorcircuitinwhichtwoedgeshaveearlyarrivingclo cksmo deledbyaxedskew
of=01. Allotheredgeshave xedskew of 0and
=0. Unlike theoriginalcircuit from[4],cross-host
timingconstraintsarepreventedbyassumingthatapairofregistersarehiddeninsidethehostvertex. Thus
thecircuitisacyclic. Attheoptimalclo ckp erio dofT
8
=9,theearlyarrivaloftheclo ckatv
3 !v
4
prevents
theplacementof aregisteronthat edgefromsatisfyingthedelayconstraintonpathv
1 !!v
3
. Ontheother
hand,theearlyarrivalat v
2 !v 3 allowspathv 3 !!v 5 withdelayd(v 3 !!v 5
! ! ! ! v h 0 v 1 3 -1 v 2 3 -2 v 3 3 -2 v 4 3 -3 v 7 7 -2 v 6 7 -1 v 5 7 0 @ @ @ R - - -6 6 6 0 0 0 9 S S S S S S S S o =01 =01
Figure 4: The edge-triggeredcorrelatorfrom[4] with xed skew inclock signalsdeliveredtoedges v
2 !v 3 and v 3 !v 4 .
5 Level-Clocked Circuits and Synchronization Parameters
As in edge-triggered circuits, timing constraints in level-clo cked designs are derived from the time span
b etweenclo ckedges,fromtherisingedgeenablingtherstlatchonthepathtothefallingedgelatchingthe
nallatch. Latchpropagationdelayandsetuptimeagaindirectlyreducethetimeavailableforcomputation
alongcircuitpaths,whileclo ckskew changesthetimespanb etweentherelevantedges(see Figure5).
l1
l2
l3
l4
l5
latch propagation delays
maximum path delay
l5 setup time
Figure 5: The propagationdelay of each latch placed along the path must becounted against the maximum
pathdelay. Onlythesetuptimeofthenallatchplacedalongthepathiscountedagainstthemaximumdelay.
Unlikeedge-triggered timingconstraints,level-clo ckedtimingconstraintsextendacross pathswith
mul-tiplelatches. Signalsarestillrequired toarriveatthenallatchofapathS timepriortothearrivalofthe
fallingedge;however,theconstraintsmustaccountfor thepropagationdelayof allintermediate latchesas
well as theinitialone(see Figure5). Since retimingconstraints arestatedinterms of theretimingvalues
r (u) and r (v ) of the vertices at the b eginningand end of apath and they can refer onlyto the weightof
thepath,notthelo cationoflatchesonthepath. Thusitisnotp ossibletovarythelatchpropagationdelay
based onlatchlo cation.
Withskew,thedeparturetimeofasignalfromalatch lisgivenby:
D l =max 8 A l ;T 8 0T P(l) + r (l )+ 9 +P ; (3)
andthearrivaltimeatasubsequentlatchmconnected byazero-weightpathis:
A m =D l +d(p)0E P(l);P(m) : (4)
Correcttimingrequiresthatsignalsarrive atlatchmpriortoStimeb eforetheearliesto ccurre nce ofits
fallingedgeatT 8 0 f (m)0
(seeFigure6).Thusthemaximumdelayconstraintforeachpairofadjacent
latchesl andminalevel-clo ckedcircuits b ecomes:
L1'. MaximumDelay: A m =D l +d(p)0E P(l);P(m ) T 8 0S0 f (m)0
maximum path delay based on reference clock
latest time of enabling edge
earliest time of latching edge
l
m
reference clock at m
actual clock at m
u
v
latch propagation delay
required setup time
maximum actual path delay
reference clock at l
actual clock l
Figure6: Late arrivalofasignalatlatchesl earlyarrivalatmreducesthemaximumallowabledelayofpath
utov and may requirealatchtobeplacedon the path.
L2'. Non-interference: A m =D l +d(p)0E P(l);P(m) >t hold + f (m)+
WerstaddresstheL2'constraintwhichconstrainstheminimumdelayinthecircuit. Bycombiningthis
constraintwiththeearliestp ossibledeparturetimeofasignalfromlatchlwhichisT
8 0T P(l) +P+ r (l )0 ,
we getaminimumdelayconstraintof:
d(p)>t hold 0P + f (m)0 r (l )+2 0T 8 +T P(l) +E P(l);P(m)
where thetermT
8 0T
P(l) 0E
P(l);P(m)
istheamountofunderlapb etween thetwoclo ckphases. Inour
previousworkwe haveignoredshortpathsbyrelyingonvalidclo ckschedulesto avoidracesevenwhenthe
minimumdelayiszero. If thelatchhold, propagationdelay, andclo ck skew areallzero as we assumedin
previouswork,thenthisreduces toaconstraintthatthepathdelayb e greaterthanthephase overlap(the
amountof negative underlap). Allowingminimumdelaysto b e zero then leads to thedenitionof avalid
clo ck schedule, which forthe case of 2-phase clo cksforbidsphase overlap. We can weaken thevalid clo ck
schedule constraint somewhat by following the same approach we to ok for edge-clo cked circuits, ensuring
viaastaticcheckthattheminimumdelayconstraintwillb esatised regardlessofretiming. Wedothisby
checkingthat the minimumdelayconstraint L2'holds foreach vertex indep endently. Thisallowsretiming
thefreedomto placelatchesonanyedgewithoutviolatingtheminimumdelayconstraint. Notethat inthe
case oflevel-clo ckedcircuits, theunderlap valueusedinthestaticcheckmustb ethesmallestoverallpairs
ofsucce ssive phases, since thelatchesassigned totheedgesmayuseanypair.
5.1 Maximum Path Delay Constraints
By combiningtheearliest p ossibledeparturetime ofasignalfroml ,T
8 0T P(l) + r (l )+ +P),andthe
latestp ermissiblearrivalateachsucces sivelatchonapathinturn,wecancomputethemaximumallowable
delay alonga signalpathof anyweight. The next theorem uses the denitionof correct circuit timingto
identifythelatchingclo ckedgeforeach latch.Sincelatcheso ccurinphaseorderalongpathsofwell-formed
circuits, the availablecomputation time is computed by summing success ive phase shifts. An imp ortant
asp ectofthisresultisthatonlytheskewoftheinitialrisingandnalfallingedgesapp earsinthemaximum
path delay constraint. Although each internal latch along the pathmay b e aected by skew, theskew is
addedtothetimeononesideofthelatchandsubtractedfromthetimeontheotherside,thuscancelingout.
Thus clo ck skew can vary withlatch lo cation since theskew app earing inthe maximumdelay constraints
the delayof anysimple path l 0 p !!l n+1 isboundedby: d(p)T l 0 + w (p) X i=0 0 E P(l i );P(l i+1 ) 0P 1 0 r (l 0 )+ f (l n+1 )02 0S:
Proofsketch. Throughinductiononthepathweight,andsubstitutionofD
l =T 8 0T P(l) + r (l )+ +P
for the earliest p ossible departure time of a signal from latch l and A
l n+1 = T 8 0 f (l n+1 )0 for the
maximump ermissiblearrivaltime atlatchm,Eqn.4b ecomes:
d(p) T l 0 0 r (l 0 )0 + f (l 0 )0 0S+ w (p) X i=0 0 E P(li);P(li+1) +( f (l i+1 )+ 0 f (l i )0 ) 1 0 w (p) X i=0 P :
Thisequationsimpliestothedesired result.
5.2 Critical Cycles
Because thelatch at thestart ofthe cycleis thesame as the one at theend, the maximumdelay allowed
aroundacompletecycle inthecircuit isunaected byclo ckskewor setuptime; however,it isreduce d by
thetotallatchpropagationdelay. Thisamountisxedsince propagationdelayisconstantandthenumb er
oflatchesonacycleisunchangedbyretiming. Thelowerb oundonp ossibleclo ckp erio dscausedbycircuit
cycles mayagainb efoundusingthemax-ratio-cyclealgorithm.
Theorem3: A multi-phase,level-clockgraphusingavalidclockscheduleis correctlytimedonly if the
delay ofany cyclel
0 c !!l n+1 isboundedby: d(c) w (c)01 X i=0 0 E P(l i );P(l i+1 ) 0P 1 Proof omitted.
Corollary 4: Awell-formedgraphusing ak-phaseclockschedule iscorrectlytimedifand onlyif:
8 cyclesc2G:T 8 k d(c) w (c) +P :
Proof sketch: Inawellformed graph each cycle must containsome multipleof klatches, therefore the
value P w (c)01 i=0 0 E P(li);P(li+1) 0P 1 reduce s to w (c) k T 8
0w (c)P. If cycle constraints are satised and the
previousdeparturetimeofasignalwaspriortothesetupp erio d(whichmustb e guaranteedindep endently
bypathconstraints),thenthesignal'sarrivalaftertraversingthecyclewillb epriorto thesetupp erio d as
well. Thusrequiredsetuptime Sdo es notapp earas aparameterincycleconstraints.
Using amaximum-ratio-cyclealgorithmsuch as the one in[1]we solve forthe maximumvalueof d(c)
w (c)
overallcircuit cycles. Thisinturnidentiesthecriticalcycleb ound,or theminimump ossiblevalueofT
8
towhichthecircuitcan b eretimed. Toidentifytheminimump erio dp ossible throughretiming,asearch is
p erformedat andab ove thisb oundfortheminimump erio datwhicharetimingsatisfyingpathconstraints
Asinedge-triggered retiming,identifyingthecriticalpathsinthecircuitgraph instead ofenumeratingthe
constraintsforallpathsiscrucialtondingap olynomialtimeb oundonthealgorithm. Lemma5providesa
characteristicofcriticalpathswhichallowsthemtob eidentiedusinganall-pairs-shortest-pathsalgorithm.
Lemma 5: (mo diedfromLemma5.5 [7])Apathu p
!!v inawell-formedcircuitisacriticalpath
i: fw (p) T 8 k 0P 0d(p)gfw (q ) T 8 k 0P 0d(q )g forallu q !!v :
Proof sketch. Clo ckskew andsetup timereduce theslackequallyalongallpathsb etween twovertices,
andthusdonotaectwhichpathiscritical.
TheweightanddelayofcriticalpathsareagaindenedasW(u;v )andD (u;v )resp ect ively. Thecritical
faninandfanoutedgesareagainthoseforwhich:
rmax (u) = maxf r (e) : e2fanin(u)g fmin (v ) = minf f (e) : e2fanout(v )g:
Substitutionof these values intoTheorem 2andcombiningthatresult withTheorem 3leads to
Corol-lary6,whichdenesalltimingrequirementswhichmustb esatisedforcorrecttimingofasymmetric-phase,
level-clo ckedcircuit:
Corollary 6: Awell-formed, level-clocked circuit graph G, usinga symmetric, k-phase clock schedule
iscorrectlytimedifand onlyifthe weightsW(u;v ) ofallcriticalpatharebounded by:
W(u;v ) D (u;v )0T + rmax (u)0 fmin (v )+2 +S T8 k 0P ! 01:
and,for allcyclesc:
T 8 k d(c) w (c) +P : Proof omitted.
Forsimplicity,severalparameters fromCorollary6canb ecombinedintoasinglevaluewhichrepres ents
theeec tive delayofapath:
(u;v ) = D (u;v )+
rmax (u)0
fmin
(v )+S: (5)
Using , thepathconstraintsofCorollary6canb e writtenas:
W(u;v ) (u;v )0T T 8 k 0P ! 01: (6)
The ceiling of this value is the minimum numb er of latches required along the critical path b etween
verticesuandvand isreferre d toas L(u;v ). ILPconstraintsarenowformedas
r (u)0r (v )W(u;v )0L(u;v ):
Mixed-ILP constraints may also b e formed and the more ecient solution technique (O (jVj 2
logjVj)) for
themused[3]. Theformulationmayalsob eextendedforunequalphaseschedulesasshownin[7]andsolved
usingextensionstotheBellman-FordalgorithminO (k1jVj 3
Retiming isusuallycast as awayto ndtheminimumclo ck p erio d foracircuit,but oftentheproblem is
tomeet some goalclo ckp erio dratherthannd theminimump erio d. Inthis section,we showthatwe can
useextrafreedomintheclo ckp erio d tomakeacircuitrobust withresp e cttoclo ckskew variations.
Tolerance to parameter variation is dened as the maximum amount by which the actual parameter
values can vary without a resulting constraint violation. Op erating a particular circuit at a faster clo ck
p erio dnaturallyimplieslesstolerancetoparametervariation. Determiningacircuit'stolerancetovariation
inaparticularparameterinvolvesenumeratingthepathandcyclemaximumdelayconstraintsthatdep end
onthatparameter. Thetoleranceistheamountof\slack"inthemostconstrainingof theseconstraints.
Theadvantagesofidentifyingthemostrobust circuit retimingareclear;eachoftheparametersusedin
the circuit mo del,including criticalpath delay D (u;v ),clo ck skew values
r
(u) and
f
(v ), latch setup S,
and propagation delayP, mayvary fromtheexp ected value either due to p o or estimates or variationsin
theimplementedcircuit. If atimingconstraintisviolatedintheactualcircuit b ecause adelayexceeds the
mo deledvaluesusedinitsretiming,thecircuitwillfailtoop erate correct ly.
Retimingcannotincreasethetoleranceinparametersinvolvedincycleconstraintsinlevel-clo ckedcircuits
b ecauseitcannotchangethenumb eroflatchesonacycle. Thustoleranceofacycleconstrainttoparameter
variationissimply w (c)T
8
k
0d(p). Similarlyanedge-clo ckedcircuitmosttolerantofparameter errorisfound
by retimingto theminimumcycle p erio d and then op eratingat a highersp eed. However,retiming
level-clo cked circuitscan increase theslack inpath constraintsbyincreasingthenumb eroflatchesonthepath.
Moreover, as shown inTheorem 3 and Corollary 4, clo ck skew is notinvolvedin cycle constraints but is
includedinpathconstraints. Thus,toleranceto variationinclo ckskewcan b eimprovedbyretimingpaths
evenifthetolerance todelayvariationis limitedbycycledelays.
6.1 An Algorithm to Generate Robust Circuits
Toimprovecircuittolerancetoclo ckskewvariationsweaddatolerance,,totheeectivepathdelaydened
inEqn.5. Using, therequired pathweightinEqn.6iswritten:
W(u;v ) (u;v )+0T T8 k 0P ! 01:
An increasein increaseseect ive pathdelayaswould b ecausedbyanincrease invariableclo ckskew,
an increase inthe relative skew b etween risingand fallingedges, or an increase in thesetup time. For a
givenclo ckp erio d,ifacircuitis retimedwithavalueof,thenthatcircuit cantoleratevariationsinthese
parameterssummingto.
A circuitretimedwithsome valueof can alsotolerateincreases inpathandlatchpropagationdelays;
however,changingeitherofthesedelayparametersmayalsochangewhichpathsarecritical. Thusaretiming
using do es notnecessarilyguaranteeacircuit thatcantoleratethismuchvariationinelementdelays.
Tond a circuit retimingthat tolerates themost clo ck skew variation, we x T
8
to the desired clo ck
p erio dandsearchoverincreasingvaluesof forthemaximumoneatwhicharetimingcanb efound. Finding
themaximumvalueof isequivalenttondingtheminimumclo ckp erio dunlessacriticalcycleb oundlimits
theminimumclo ckp erio d. Furtherdecreases ofT
8
arepreventedb ecausenegativeweightcycles app earin
theslackgraphonwhichashortestpathalgorithmisusedtoidentifycriticalpaths. Thusitmayb ep ossible
toincrease furtherbutnottofurtherdecreaseT
8
. Inotherwords,retimingtothefastestclo ckp ossibleand
thenrunningwithaslowerclo ckdo esnotgivethemostrobust circuitwhenthecycleconstraintsdetermine
theoptimalclo ckp erio d.
Increase dcircuittoleranceisillustratedinFigure7. Usingtheclo ckscheduleshownwithincreasingvalues
of causestheinitialcircuitin(A)tob eretimedtotheonein(B).Eventhoughthecircuitin(B)cannotrun
fasterb ecause some otherconstraintdenestheclo ckp erio d,itcan tolerateasignicantlygreater amount
ofclo ckskewthantheone in(A).Because therearenochangesincriticalpaths,theincreased toleranceto
1
1
0.5
1
1
0.5
Result must arrive before here
Total delay error tolerance = 0.5
Result must arrive before here.
Total delay error tolerance = 1.0
B)
A)
1
1
Figure7: Asimple exampleshowing the tolerance gainfromoptimizingcircuit paths. Each circuitoperates
correctlyunder the given schedule; however the circuit inA) can only tolerant atotal of 0.5 units of delay
estimation error in the two nodes while B) can tolerate 0.5 units error in each of the nodes. (The clock
periodshownhereisdeterminedbysome othercycleinthe circuit.)
7 Retiming with Forced Clock-Skew
This section presents preliminary results of an algorithmwhich combines retiming via register movement
with retimingviaintentionaladdedclo ck skew. Retimingusing clo ckskew allowsthe \virtual"movement
of registers on a ner grain than the usual retiming [2]. Using the two techniques together allows the
greatest exibilityinsynchronizer placementandp otentiallyfastercircuit designs. Particularlyinteresting
isthe resultingabilityof edge-triggered circuits toachieve sp eeds comparableto level-clo ckeddesigns. We
describ ehereanalgorithmforcombinedretimingofedge-triggeredcircuits. Wearecurrentlyextendingthese
techniques tolevel-clo ckedcircuitsaswell. Forsimplicity,weremovethesynchronizer parametersfromour
maximumdelay constraint and return to theoriginalrequirement onedge-triggered circuits that dep ends
onlyonthedelayoffunctionalunits. Thatis,thedelayb etweentwoadjacentregistersmustb elessthanthe
clo ckp erio d. Thisrequirementmayb esatisedeitherbymovingregistersclosertogetherbyretiming,orby
skewing theclo cksignalsto theregisters tolengthen theclo ckp erio dforjust thatpair(see Figure8). The
smallestamountbywhichretiming canincrease theavailable timeforadelaypathisone fullclo ck p erio d
byaddingoneregistertothepath. Butclo ckskewcanb eusedtoincreasetheavailabletimebyamountsup
toone clo ckp erio d. Thus thetwotechniques are complementary, withsmallamountsoftimeb eing added
byskewingtheclo cksignalsandlarger amountsoftimeaddedbymovingregisters.
u
v
s
r
delay clock so register s appears to be inside vertex
move register s across vertex to input edge
Figure 8: The delaybetweenregistersr and s maybe reducedbymoving sacrossthe vertex physically with
retimingor virtuallyby delayingitsclocksignal.
We assign a value (e) to each edge, where T
8
1(e) is the amount by which the clo ck signal to the
signalarrival. Furthermore,skewingasubsequentregisterunneces sarilyshortensthetimeavailabletopaths
followingit. (e)islimitedinvaluetotherange0(e)<1. Thusclo cksignalsaredelayedrelativetothe
base clo ckbyan amountupto theclo ckp erio d. Inaddition,we requirethat for each edgee2fanout(e),
(e) = (v ) so that we can asso ciate the skew value (v ) with vertices and not edges. Skewing all the
registers followingavertex by thesame amounthasthe eect of virtuallymoving theregisters inside the
vertex. Note,however,thatnotalledgesfollowingavertexarerequired tohave aregister.
Thedierenceinaddedclo ckskewattwosuccess iveregistersdetermineshowmuchextratimeisavailable
to the vertices b etween them. We rst extend maximumdelay constraints to included intentional skew,
omittinghere theothersynchronizationparameters:
Theorem7: An edge-triggeredcircuit Giscorrectlytimedi, forallpaths u p
!!v suchthat w (p)=0
and edges e2fanin(u)and e2fanout(v ),d(p)isboundedby:
d(p)T
8
(10(e)0(e))
Proof Omitted.
Wenowhavecircuitpathsu p
!!vwithretimingvariablesr (u),r (v )andskew(v )1T
8
assignedtoeach
edgeinthefanoutofv . Foreachpathpand edgee2fanin(u),aconstraintiswritten:
(r (u)0(e))0(r (v )+(v )) W(u;v )0
D (u;v )
T
8
(7)
Ouralgorithmusesanextensionofthereverse dBellman-Ford(RevBF)relaxationtechnique[6]tosolve
constraintsetsoftheforminEqn.7. TheRevBFmetho dsatisesdierenceoftwovariableconstraints,such
asr (u)0r (v )C(u;v ),bysettingthevalueofr (v )=r (u)0C(u;v )(insteadofsettingr (u)=C(u;v )0r (u)
as inthestandard Bellman-Fordapproach). TheRevBFalgorithmwasdevelop edinourworkonretiming
unequal-phaselevel-clo ckedcircuitswhichusedconstraintweightsdep endentonone ofthetwoconstrained
variables b ecause timing constraints dep ende nt on r (u) could b e expressed with greater ease than those
dep endingonr (v ). For constraintsoftheEqn.7,weadjust registerplacementbyassigning anewvalueto
r (v ) which satisesthe integral part of (r (u)0(e))0(W(u;v )0 D (u;v )
T
8
) andthen adjust the clo ck skew
bysetting(v )tosatisfytherealpart. Thusalledgesinthefanoutofeachvertexhave thesame valueof
whileedgesinitsfaninmayhave dieringvalues.
To show that races cannot o ccur b ecause of short paths, we must showthat the E2' minimumdelay
constraintwillnotb eviolatedfollowinganassignmentofintentionalskewtocircuitedgesandanyretiming
ofregisters. Inparticular,we showthattheconstraintsonvaluesof(e)ensure thatE2'isnotviolated.
v -e 0 e s r
Figure9: A vertexwith surroundinglatches.
If two registers r and s are placed across any vertex in thecircuit as shown inFigure 9 (a placement
p ossibleunder retiming),constraintE2'requires:
E2 0 : A s =d(v )+P >t hold + (e 0 )0 (e)+ : (8)
Intentionalskew pro duces acircuit inwhich (e)=(e)T
8
foreach edge. ThusEqn.8requires:
d(v )+P > t hold +(e 0 )T 8 0(e)T 8 + :
Intheworstcase,(e)=0,resultinginab oundon(e)of: (e 0 )< d(v )+P0t hold 02 T 8 ;
inadditionto theexistingrequirement that (e 0 )<1. A valueof(e 0 )greater than d(v ) T8 is never assigned
byouralgorithmb ecause itexceeds theamountoftimeneces sary to satisfythevertex delay. Thatis, any
greaterconstraintwouldinsteadmovetheregisteracrossthevertexbysettingr (v )=1. Notethattheworst
case for races o ccurs when the maximum value is assigned to (e 0
) = d(v )
T8
and the constraint reduces to
P >t
hold +2
,whichisthefamiliarskew constraintforedge-clo ckedcircuits. Notethatthisalsodescrib es
thecasewheretwoormore registersapp earonanedge,thatiswhend(v )=0and(e 0
)=0.
Figures10and11providetwocorrelatorcircuitexamplesofedge-triggeredcircuitswithintentionalclo ck
skew. In Figure 10 includes timing constraint across the host vertex resulting in cycles which limit the
minimumclo ckp erio d to 10. Level-clo ckedcircuits areabletoachievethis b ound[7],whileedge-triggered
retimingswithoutdelayedclo cksignalsarehave aminimump erio d ofT
8
=13[4]. By usingdelayedclo ck
signalsthecircuit inFigure10hasaminimump erio dof T
8
=10,equaltothecycleimp osedlowerb ound.
v h 0 1 v 1 3 0 v 2 3 0 v 3 3 0 v 4 3 0 v 7 7 1 v 6 7 0 v 5 7 0 @ @ @ R - - -6 6 6 0 0 0 9 S S S S S S S S o =0:7 =0:0 =0:0 =0:7 =0:7
Figure 10: An edge-triggeredcorrelator retimingwith intentional clock skew. Crosshost timingconstraints
are imposed. TheoptimalperiodisT
8 =10.
InFigure 11, notimingconstraints acrosstheexternalenvironmentareimp osedand thusthere areno
circuit cycles. This may b e thought of as the inclusion of a register inside the host vertex controlled by
aclo cksignalwhich cannotb e delayed. Withoutintentionallyskewed clo cks, the minimumedge-triggered
p erio disT
8
=9whiletheminimumsymmetric,level-clo ckedp erio disT
8
=8. Theedge-triggeredexample
inFigure11 achievesaminimump erio dofT
8
=7:5thoughtheuseofskewed clo ckarrivaltime.
As in the standard Bellman-Ford algorithm, we conjecture that the relaxation routine still has O (1)
timecomplexityandthat itmustb e appliedjVj timestoeach ILPconstraint. Eachof theO (jVj 2
)critical
pathshas anaverageof jEj
jVj
faninedges and asingleconstraint mustexist for each fanin edgeand critical
path. Thus the overall numb er of constraints is O (jVj 2
jEj
jVj
). Realcircuit graphs are typically sparse and
jEj
jVj
=O (1), resultingina linearincrease inthe exp ecte d running timeof the algorithm. TheMixed-ILP
andFeasalgorithmspresentedin[5]mayalsob emo diedtosolvetheseproblemswithalinearincreasein
exp ecte drunningtime.
8 Summary
We havepresente d inthis pap ertechniques forincludingclo ckskew parameters intheretimingalgorithms
forb othedge-clo ckedandlevel-clo ckedcircuits. Whilethisallowsustohandleclo ckskewinmanysituations,
! ! ! ! v h 0 v 1 3 -1 v 2 3 -2 v 3 3 -3 v 4 3 -4 v 5 7 -2 v 6 7 -1 v 7 7 0 @ @ @ R - - -6 6 6 0 0 0 9 S S S S S S S S o =0:0 =0:0 =0:2 =0:0 =0:6 =0:1333 =0:0667
Figure 11: An edge-triggeredcorrelatorretimingwith intentional clockskew. Cross-host timingconstraints
are prevented by assuming that a pair of registersarehidden inside the host vertex. The optimalperiodis
T
8 =7:5.
astaticanalysis,butamoregeneralapproachwouldincorp orateminimumdelayconstraintsintheretiming
algorithm itself. Perhaps the most interesting problem is extending the combined intentional skew and
retiming algorithmto level-clo cked circuits. We conjecture that such an algorithm inconjunction with a
reasonableapproach tominimumdelayswillallowus toretimecircuits toachieve thesamep erformancein
theprese nce ofvariableskewas inthecase whentheclo ckskewiszero.
References
[1] S. Burns. Performance Analysis and Optimization of Asynchronous Circuits. PhD thesis, California
Instituteof Technology,1991. Caltech-CS-TR-91-01.
[2] J.P.Fishburn. Clo ckskewoptimization. IEEE Transactions onComputers,39(7):945{951,1990.
Cor-resp ondence .
[3] A.T.Ishii,C.E.Leiserson,andM.C.Papaefthymiou. Optimizingtwo-phase,level-clo ckedcircuitry. In
AdvancedResearch inVLSIand ParallelSystems: Proc.ofthe Brown/MIT Conference,pages245{264,
1992.
[4] C.E.Leiserson, F.Rose, andJ.B.Saxe. Optimizingsynchronouscircuitry byretiming. InProc.of the
3rdCaltech ConferenceonVLSI,Mar.1983.
[5] C. E. Leiserson and J. B. Saxe. Retiming synchronous circuitry. Algorithmica, 6(1):5{35,1991. Also
available asMIT/LCS/TM-372.
[6] B.E.Lo ckyearandC.Eb eling.Retimingofmulti-phase,level-clo ckedcircuits. TechnicalRep ort91-10-1,
UniversityofWashington,Dept.ofComputerScience,Oct.1991.
[7] B.E. Lo ckyearandC.Eb eling. Retimingofmulti-phase,level-clo ckedcircuits. InAdvancedResearchin
VLSIand ParallelSystems: Proc.ofthe Brown/MIT Conference,pages265{280,Mar.1992.
[8] K.A.Sakallah,T.N.Mudge,andO.A. Olukotun. Analysisanddesignoflatch-controlledsynchronous