3.4 Asyn hronous behavioral synthesis
3.4.5 Variable length time-slot behavioral synthesis
Sa ker[88℄proposesamethodwhi hresemblesthesyn hronousbehavioralsynthesis
owbutwherethetargetoperationgrouptime-slotsareofvariablelength. Borrowing
from ompiler te hnology and syn hronoussynthesis the grouphas extended their
existingsyn hronousbehavioralsynthesisMOODStohandleasyn hronous ir uits.
The target is a single ontrol sequen e of operation-groups,where ea h
operation-groups an onsist of several operations in parallel and havethe exe utiontime of
the slowest operationin the group. Multi- y le operationsare not supported, but
hainingis. However hainingimpliesdataisfeddire tlybetweentwoFUswith-out
being stored in registersand therefore no resour esharing of the FUs involved in
haining is allowed. There has to be a su ient number of FUs su h that for all
operations-groups,alltheoperationsinanoperation-grouphaveadire tmappingto
aFU.
Thestartingpointis aVHDL behavioral model. Fromthisanintermediate
for-mat,they allICODEisextra ted,whi hisarepresentationequivalenttoaCDFG.
Thens hedulingallo ationandbindingisperformed,withthesyn hronouss hedule
representedbya ontrol-stepgraph. Theasyn hronous ontrolishandledbymapping
the elements of the ontrol-step graph via predened asyn hronous ontroller- ells
toanasyn hronous ir uit. Thedatapathissynthesizedthroughasetoftemplates.
Theusedasyn hronoussignalingisbasedon4-phasehandshake-proto ols. Theow
3.5 Summary
Currently resear hin behavioral synthesis of asyn hronous ir uits is primarily
fo- used onsyntax-dire tedsynthesisanddesyn hronization. Besidesthere isa
multi-tude ofmoreorlesssu essfulattemptsforhigh-levelsynthesis.
There arethree aspe ts wewould likeour asyn hronousbehavioral synthesisto
ontain:
•
Ability to onstru tsystemsoperatingin ontinuoustime andusing methodsfrombehavioralsynthesisandOperationsResear hin ontinuoustime.
Desyn- hronization methods are limited by their useof a dis rete time-evolution to
ndtheoptimals hedule.
•
Abilitytouseexistingbehavioralsynthesismethodsdevelopedforsyn hronous synthesis,su h asthemethods forlow-powerbehavioralsynthesisreviewedinthebeginning ofthis hapter. Leveragingonexistingte hniques that arewell
provenbothintheoryandpra ti ewillproveverybene ial.
•
Useofhandshake omponentsbothfor ontrollersynthesisanddatapathsyn-thesis to fa ilitate onstru tions of large s ale designs. For an asyn hronous
behavioral synthesis to be ee tive it has to be able to synthesize
industry-s aledesigns.
Theresear hpresentedinthisthesistriestoimplementtheseaspe tsby
introdu -inga omputationmodelallowingtheuseofbothsynthesismethodsofsyn hronous
dis retetimeandmethodsfor ontinuoustimeandtargetsasyn hronoushandshake
omponents both for datapath and ontroller synthesis. As an implementation we
urrentlybuilduponthebalsalanguage,butthisisnotarestri tionourwork ould
easilybeextendedtotargetotherlanguagesordesignapproa hes.
Behavioral Synthesis for
Asyn hronous Cir uits
Syn hronous ir uitsynthesisutilizesasimplemodel forimplementingsyn hronous
omputationandthis method hasprovento behighlysu essful. Therefore,rather
than to invent a dierent omputation model, we adapt the existing omputation
modelforasyn hronous ir uitsynthesis. Thishasthe addedadvantageofopening
upfortheuseofmanyoftheexistingmethodsfromsyn hronousbehavioralsynthesis
inasyn hronous ir uitsynthesis. Inthis hapterweaddressthisindetail.
4.1 From syn hronous to asyn hronous behavioral
synthesis
Letusrstreviewandanalyzetheelementsofsyn hronousbehavioralsynthesis.
BasedontheCDFG,syn hronousbehavioralsynthesisinvolvesthreesetsof
trans-formationsinorder to reateasuitablehardwarear hite ture;
•
S heduling,inwhi hoperatornodesoftheCDFGaregroupedinto operation-groupsor time-slots,and where theexe ution of the next operation-groupishandled by asyn hronizationevent,
E i
, wherei
stri tly orders theevents intime. Inthe aseof syn hronousbehavioral synthesis
E i
is ontrolled bythesystem lo k.
•
Allo ation,inwhi htheminimumhardwareresour es/fun tionalunits(FUs),r,j k
Figure4.1: Adaptingsyn hronoussynthesis(left)intothe asyn hronoushandshake
domain(right).
Figure4.2: Firststepinadaptingthesyn hronous omputationmodelintothe
asyn- hronousdomain.
•
Binding (or assignment),where individual operator nodes are tiedto spe ihardwareresour es.
Thesyn hronizationeventsdetermine(i)thebeginningofexe utinganoperation
(ii)writingtheresultofanoperation.
TheCDFGextra tedinthesyn hronousbehavioralsynthesisisa1-bounded
ol-oredPetrinet,where olorsrepresentdatavalues,edgesrepresentpla es, andnodes
representtransitions. Interestingly,thePetrinetmodelisbasedonanasyn hronous
exe utionsemanti swhi h should makeit an obviousmodelfor asyn hronous
syn-thesisaswell. Inthesyn hronoussynthesis,Figure4.1(left),operationsareordered
a ordingto aglobalsyn hronizationevent,
E i
, i.e.,read events(E r,j
)foroperatorj
happensatthesamepointin timeasthewrite events(E w,i
)foroperatori
in thepreviousoperation-group:
E w,i 0 = E r,j 0 = E 0
, andfurthermore all operationsin an operation-groupareexe utedsimultaneously:E r,j 0 = E r,k 0 = E 0
.LATCH
FU L L
E r,i E w,i
v
Transfer
Merge
Transfer
Merge MUX
LATCH
MUX
FU
Figure4.3: Rearranging omponentstogettheinitial omputationmodel.
Ifwerelaxtheseassumptions:
E w,i 6= E r,j
andE r,j 6= E r,k
asshowninFigure4.1(right),andifwemakethesesyn hronizationevents ontrolledbythe ontroller,we
an reateahardwarear hite ture onsistingof adatapath and a ontrollerwhi h
operatesin ontinuoustime.
Westartwiththesyn hronous omputationmodelasshowninFigure4.2(left).
ThisisastandardMoore ma hinedatapath withmemory(register) ontroller by a
lo kandsomefun tionalunits( ombinatorial ir uitry)tooperateonthedata. To
movedataba kand forthbetween thememoryand thefun tionalunits twolayers
ofmuxes ontrolthedataow, ontrolledbysignals
M I
andM II
. Therststepinadopting this omputation modelis tomovethe omponentsinto theasyn hronous
handshakedomain. Wewillusethistomodeltheasyn hronoustimingassumptions.
Thenweexpandtheregistersbysplittingthesyn hronizationsevents:
E w,i 6= E r,j
.Thenextstepistoletthesyn hronizationevents ompletely ontrolthe
ompu-tation(datapath). Thisisdonebyrearrangingthelat hesandtransfer omponents
su hasredu ingthemuxestomerge omponents. Fromthiswegettheinitial
om-putationmodel shown in Figure 4.3. Inthis model the individual syn hronization
events
E w,i , E r,i
ontrolthe omputation. Fromthemodel itshowsthatE w,i
isa -tiveduringthea tual omputationand
E r,i
isa tiveonlyforthetransferfromlat hto lat h. This model is suboptimal asweareusing alat hfor temporarydata and
theFU an onlyhaveonetarget.
To ontinue from here we have twooptions whi h ree t the properties of our
datapath, and lead to two datapath topologies: The rst we designate alpha and
herethefun tionalunitsarepurely ombinatorialwithoutlat hesoninputand
out-w
Figure4.4: Rearrangingtogetthetemporaryvariableinto thememory.
w
Figure4.5: Final omputationmodelwithoutnormallyopaquelat hesoninputand
outputportsof thefun tionalunits.
opaquelat hesbothoninputandoutputports. Theuseofinputandoutputlat hes
tends to in rease speed and to redu e power onsumption by preventing spurious
signaltransitionstopropagatebeyondlat hboundaries. Ifinputandoutputlat hes
are notused, morevariable lat hes may be needed in thedatapath in order to
a - ommodatethelongerlifetimerequirementsandinordertoavoidautoassignments.
Inthefollowingwepursue bothdire tions,startingwith alpha, nolat hesoninput
FU L
E w,i
L
FU E r,i
E compute L v
Transfer
Merge
Transfer
Merge MUX
LATCH
MUX
Figure4.6: Computationmodelwithinputandoutputlat hes.
Rearrangingthetemporarylat haftertheFUasshownFigure4.4(left),
next we move the temporary data into the memory be oming
Lw
bysubstituting
E w,i → E r,j
getting Figure 4.4 (right). We still have therestri tion that the FU always writes to Lw, but Lw an be used by
others. Byreinsertingwritesyn hronizationeventswegeta omputation
model whi h allows all lat hes to be used as sour e and target for all
fun tionalunits. ThisisshowninFigure4.5.
E r,i ||E w,j
movesdatafromLvtoLwthroughtheFUdoing omputation. Restri tion: Lv annotbe
used as bothsour e andtarget and while Lv and Lw arebeingused in
omputation,there anbe: (i)nootherwritetoLvand(ii)no-otherread
orwriteto Lw.
Nextwewillpursuethedatapath(beta)withlat hesoninputandoutput ports:
Wealready haveinput lat hessoweinsertoutput lat hesand are thus
for edtogetanextrasyn hronizationevent ontrollingthe omputation.
Theexe utionofa omputationtakesthefollowingform:
{E r,i} ; E compute ; {E w,j }
,asshowninFigure4.6. Thenweremovethe ontrolofthisom-putationeventbyde ouplingthe ontroloftheFUmakingitan
indepen-dentpro essasshownonFigure4.7(left). Thismodel anoperatewith
MUX
L E w,i
E r,i
L
FU
L E input
E compute
E output
L
E r,i
L
E w,j E w,i E w,k
v
Transfer
Merge
Transfer
Merge MUX
LATCH
MUX
FU
LATCH
MUX
FU v
FU
Figure 4.7: Final omputation model with normally opaque lat hes on input and
outputportsof thefun tionalunits.
in Figure 4.7(right),itresemblesthesyn hronousar hite ture butit is
ompletelyasyn hronous.
Bothmodelshavethesamear hite ture;theonlydieren eisthetimethedataneeds
to be held in the sour e lat h and restri tions on the target lat h. Both methods
anthereforebeusedheterogeneouslyinthesamedatapath,usingthemostsuitable
methodforthespe i FU,wewilldenotesu hamixed modelgamma.
This idea allowsus to useany of, but not restri tedto, the manysyn hronous
behavioral synthesis te hniques to obtain a hardware ar hite ture (datapath and
ontroller)andthento implementthisar hite tureusingasyn hronous ir uit
te h-niques. Atthesametime,thisideaallowstheuseofbehavioralsynthesiste hniques
4.2 Asyn hronous behavioral synthesis
Figure4.8: (Top)One-to-one orresponden ebetweenCDFGandasyn hronous
ir- uit. (Bottom) s heduledCDFGusing anon-essentialpre eden e- onstraint(thi k
solidline)and mappingtoasyn hronous ir uit.
Havingapproa hedourtarget omputationmodelfromthesyn hronoussidewe
willnowapproa hourmodelfrom theasyn hronousside. Thestartingpointisthe
one-to-one orresponden e between the CDFG representing the omputation and
theasyn hronoushandshake omponentnetwork,asshownin Figure4.8(left)with
asmall example. Forthis CDFG there is asingle essential pre eden e onstraint:
f 1 < g
. Thedelayof the ir uit is givenbyT = max (T f 2 , T f 1 + T g )
and thetotalareaisgivenby
A = A f 1 + A f 2 + A g
.Thebasi ideabehind onstraintbasedsynthesisandresour esharingisto
per-formtime-multiplexedmappingofseveraloperatorsontoasmallerset offun tional
units. As only one operation anbe performedper FU, this requires memory. In
this setting the time-multiplexing orresponds to the s heduling. The mapping of
operatorstoFUs, orrespondtotheassignment,andthesetofFUsthemselves
or-respond to the allo ation. The s heduling an be represented by aminimal set of
non-essential pre eden e onstraints [95℄ or resour e-ar s [2℄, spe ifying the
time-ordering. This isillustratedonFigure 4.8(right)with thenon-essentialpre eden e
onstraint:
f 1 < f 2
representedbythethi karrowfromf 1
tof 2
,whi haremappedonto the samefun tional unit
F
. In this ase the delay of the ir uit is given byT = max (T f 1 + T f 2 , T f 1 + T g ) = T f 1 + max (T f 2 , T g )
and thetotal areais givenbyA = A f 1 ,f 2 + A g
.Li
FU
E r,a E r,b
E w,c
a b
c
a b
c Lj
Lk
Figure4.9: Mappingoperator
σ
toaFU.f b
a1 a2 a0
c1 c0
Control d Transfer
e
Figure4.10: The ontrolhandshake omponentand thetransferhandshake
ompo-nent.
Topro eedfromhereweneedthemappingofasingleoperator
σ
withsour edataa, b
inlat hLi
andLj
respe tively,andtargetdatac
assignedtoLk
whi hisgiveninFigure4.9,asthesimplest onstru tionofsu hamapping. To onstru tthe ontrol
ir uitsforthismappingweintrodu ethedual omponenttothetransferhandshake
omponent, the ontrol omponent .f. Figure 4.10. The behavior of the ontrol
omponentisafollows: Firstthe omponentwaitsforarequestfromallinputports
a0, a1, ...
then arequest is pla ed onoutput portb
. When ana knowledge arrives fromb
the handshakewith input portsa0, a1, ...
are ompleted and thehandshakewithoutputports
c0, c1, ..
are ommen edand ompleted. TheSTGforafourphaseimplementationofthe omponentisshowninFigure4.11.
Together with the transfer omponent the ontrol omponent maps the CDFG
ontoa ontrolpartandadatapart. Thisdependswhetherourfun tionalunitshave
input/output lat hes or not. Both solutions to this problem are shown in Figure
4.12. We now see there is a dire t orresponden e between the CDFG node and
the ontrol node ofour asyn hronous ir uit andthe fun tionalunit mapping. For
thealpha modelthere is adire t orresponden ebetween theCDFG node and the
ontrol omponent. Forthebetamodelthereisadire t orresponden ebetweenthe
CDFGinputar sandthe ontrolnoderesponsiblefortheloadingofthedatatothe
fun tionalunitandthedire t orresponden ebetweentheCDFGoutputar andthe
ontrolnoderesponsibleforthereadingoftheresultofthefun tionalunit. Wewill
c.a−
a.r+
b.r+
a.a+
a.r−
a.a−
c.r+
c.a+
b.a+
b.a−
b.r− c.r−
Figure4.11: FourphaseSTGforthe ontrolhandshake omponentwithonlyone
a
and
c
hannel. For multiplea0, a1, ...
andmultiplec0, c1, ...
thea
andc
haveto berepla ed by on urrenthandshakingonallthese hannels.
Performing a one-to-one mapping of the ontrol nodes in the CDFG and the
alphamodelgeneratesthe ir uitshowninFigure4.13. Usingthisapproa hwehave
movedfrom the one-to-one orresponden e betweenCDFG and fun tional units to
modelwithaone-to-one orresponden ebetweentheCDFGandthe ontrolpartof
thehandshake ir uitonly. Thefun tionalunitsnowfollowthebehavioralsynthesis
allo ation. The ontrol partof the handshake ir uit ould be implemented using
anymethodologyfor asyn hronousstate-ma hine design: Burst-mode[109℄,Petrify
[26℄,setofhandshake omponents[92℄andBalsa/Tangram[7,11℄style ontroller.
Wewillimplementthe ontrolpartofthe ir uitusingadierentmethodto
gen-eratetheevents,whi huseshandshake omponentssu hassequen ersandparallel
et . These arebettersuited forourbehavioralsynthesisalgorithms operatingwith
asequen eofdis reteevents.
Thesamedatapathand ontrol ir uit anbebuiltforthebetamodel,usingthe
sameapproa h. To build a ompa t e ient omputation unit (datapath) we will
lookathowto generatethis ingeneralin thefollowingse tion.