• No results found

Variable length time-slot behavioral synthesis

3.4 Asyn hronous behavioral synthesis

3.4.5 Variable length time-slot behavioral synthesis

Sa ker[88℄proposesamethodwhi hresemblesthesyn hronousbehavioralsynthesis

owbutwherethetargetoperationgrouptime-slotsareofvariablelength. Borrowing

from ompiler te hnology and syn hronoussynthesis the grouphas extended their

existingsyn hronousbehavioralsynthesisMOODStohandleasyn hronous ir uits.

The target is a single ontrol sequen e of operation-groups,where ea h

operation-groups an onsist of several operations in parallel and havethe exe utiontime of

the slowest operationin the group. Multi- y le operationsare not supported, but

hainingis. However hainingimpliesdataisfeddire tlybetweentwoFUswith-out

being stored in registersand therefore no resour esharing of the FUs involved in

haining is allowed. There has to be a su ient number of FUs su h that for all

operations-groups,alltheoperationsinanoperation-grouphaveadire tmappingto

aFU.

Thestartingpointis aVHDL behavioral model. Fromthisanintermediate

for-mat,they allICODEisextra ted,whi hisarepresentationequivalenttoaCDFG.

Thens hedulingallo ationandbindingisperformed,withthesyn hronouss hedule

representedbya ontrol-stepgraph. Theasyn hronous ontrolishandledbymapping

the elements of the ontrol-step graph via predened asyn hronous ontroller- ells

toanasyn hronous ir uit. Thedatapathissynthesizedthroughasetoftemplates.

Theusedasyn hronoussignalingisbasedon4-phasehandshake-proto ols. Theow

3.5 Summary

Currently resear hin behavioral synthesis of asyn hronous ir uits is primarily

fo- used onsyntax-dire tedsynthesisanddesyn hronization. Besidesthere isa

multi-tude ofmoreorlesssu essfulattemptsforhigh-levelsynthesis.

There arethree aspe ts wewould likeour asyn hronousbehavioral synthesisto

ontain:

Ability to onstru tsystemsoperatingin ontinuoustime andusing methods

frombehavioralsynthesisandOperationsResear hin ontinuoustime.

Desyn- hronization methods are limited by their useof a dis rete time-evolution to

ndtheoptimals hedule.

Abilitytouseexistingbehavioralsynthesismethodsdevelopedforsyn hronous synthesis,su h asthemethods forlow-powerbehavioralsynthesisreviewedin

thebeginning ofthis hapter. Leveragingonexistingte hniques that arewell

provenbothintheoryandpra ti ewillproveverybene ial.

Useofhandshake omponentsbothfor ontrollersynthesisanddatapath

syn-thesis to fa ilitate onstru tions of large s ale designs. For an asyn hronous

behavioral synthesis to be ee tive it has to be able to synthesize

industry-s aledesigns.

Theresear hpresentedinthisthesistriestoimplementtheseaspe tsby

introdu -inga omputationmodelallowingtheuseofbothsynthesismethodsofsyn hronous

dis retetimeandmethodsfor ontinuoustimeandtargetsasyn hronoushandshake

omponents both for datapath and ontroller synthesis. As an implementation we

urrentlybuilduponthebalsalanguage,butthisisnotarestri tionourwork ould

easilybeextendedtotargetotherlanguagesordesignapproa hes.

Behavioral Synthesis for

Asyn hronous Cir uits

Syn hronous ir uitsynthesisutilizesasimplemodel forimplementingsyn hronous

omputationandthis method hasprovento behighlysu essful. Therefore,rather

than to invent a dierent omputation model, we adapt the existing omputation

modelforasyn hronous ir uitsynthesis. Thishasthe addedadvantageofopening

upfortheuseofmanyoftheexistingmethodsfromsyn hronousbehavioralsynthesis

inasyn hronous ir uitsynthesis. Inthis hapterweaddressthisindetail.

4.1 From syn hronous to asyn hronous behavioral

synthesis

Letusrstreviewandanalyzetheelementsofsyn hronousbehavioralsynthesis.

BasedontheCDFG,syn hronousbehavioralsynthesisinvolvesthreesetsof

trans-formationsinorder to reateasuitablehardwarear hite ture;

S heduling,inwhi hoperatornodesoftheCDFGaregroupedinto operation-groupsor time-slots,and where theexe ution of the next operation-groupis

handled by asyn hronizationevent,

E i

, where

i

stri tly orders theevents in

time. Inthe aseof syn hronousbehavioral synthesis

E i

is ontrolled bythe

system lo k.

Allo ation,inwhi htheminimumhardwareresour es/fun tionalunits(FUs),

r,j k

Figure4.1: Adaptingsyn hronoussynthesis(left)intothe asyn hronoushandshake

domain(right).

Figure4.2: Firststepinadaptingthesyn hronous omputationmodelintothe

asyn- hronousdomain.

Binding (or assignment),where individual operator nodes are tiedto spe i

hardwareresour es.

Thesyn hronizationeventsdetermine(i)thebeginningofexe utinganoperation

(ii)writingtheresultofanoperation.

TheCDFGextra tedinthesyn hronousbehavioralsynthesisisa1-bounded

ol-oredPetrinet,where olorsrepresentdatavalues,edgesrepresentpla es, andnodes

representtransitions. Interestingly,thePetrinetmodelisbasedonanasyn hronous

exe utionsemanti swhi h should makeit an obviousmodelfor asyn hronous

syn-thesisaswell. Inthesyn hronoussynthesis,Figure4.1(left),operationsareordered

a ordingto aglobalsyn hronizationevent,

E i

, i.e.,read events(

E r,j

)foroperator

j

happensatthesamepointin timeasthewrite events(

E w,i

)foroperator

i

in the

previousoperation-group:

E w,i 0 = E r,j 0 = E 0

, andfurthermore all operationsin an operation-groupareexe utedsimultaneously:

E r,j 0 = E r,k 0 = E 0

.

LATCH

FU L L

E r,i E w,i

v

Transfer

Merge

Transfer

Merge MUX

LATCH

MUX

FU

Figure4.3: Rearranging omponentstogettheinitial omputationmodel.

Ifwerelaxtheseassumptions:

E w,i 6= E r,j

and

E r,j 6= E r,k

asshowninFigure4.1

(right),andifwemakethesesyn hronizationevents ontrolledbythe ontroller,we

an reateahardwarear hite ture onsistingof adatapath and a ontrollerwhi h

operatesin ontinuoustime.

Westartwiththesyn hronous omputationmodelasshowninFigure4.2(left).

ThisisastandardMoore ma hinedatapath withmemory(register) ontroller by a

lo kandsomefun tionalunits( ombinatorial ir uitry)tooperateonthedata. To

movedataba kand forthbetween thememoryand thefun tionalunits twolayers

ofmuxes ontrolthedataow, ontrolledbysignals

M I

and

M II

. Therststepin

adopting this omputation modelis tomovethe omponentsinto theasyn hronous

handshakedomain. Wewillusethistomodeltheasyn hronoustimingassumptions.

Thenweexpandtheregistersbysplittingthesyn hronizationsevents:

E w,i 6= E r,j

.

Thenextstepistoletthesyn hronizationevents ompletely ontrolthe

ompu-tation(datapath). Thisisdonebyrearrangingthelat hesandtransfer omponents

su hasredu ingthemuxestomerge omponents. Fromthiswegettheinitial

om-putationmodel shown in Figure 4.3. Inthis model the individual syn hronization

events

E w,i , E r,i

ontrolthe omputation. Fromthemodel itshowsthat

E w,i

is

a -tiveduringthea tual omputationand

E r,i

isa tiveonlyforthetransferfromlat h

to lat h. This model is suboptimal asweareusing alat hfor temporarydata and

theFU an onlyhaveonetarget.

To ontinue from here we have twooptions whi h ree t the properties of our

datapath, and lead to two datapath topologies: The rst we designate alpha and

herethefun tionalunitsarepurely ombinatorialwithoutlat hesoninputand

out-w

Figure4.4: Rearrangingtogetthetemporaryvariableinto thememory.

w

Figure4.5: Final omputationmodelwithoutnormallyopaquelat hesoninputand

outputportsof thefun tionalunits.

opaquelat hesbothoninputandoutputports. Theuseofinputandoutputlat hes

tends to in rease speed and to redu e power onsumption by preventing spurious

signaltransitionstopropagatebeyondlat hboundaries. Ifinputandoutputlat hes

are notused, morevariable lat hes may be needed in thedatapath in order to

a - ommodatethelongerlifetimerequirementsandinordertoavoidautoassignments.

Inthefollowingwepursue bothdire tions,startingwith alpha, nolat hesoninput

FU L

E w,i

L

FU E r,i

E compute L v

Transfer

Merge

Transfer

Merge MUX

LATCH

MUX

Figure4.6: Computationmodelwithinputandoutputlat hes.

Rearrangingthetemporarylat haftertheFUasshownFigure4.4(left),

next we move the temporary data into the memory be oming

Lw

by

substituting

E w,i → E r,j

getting Figure 4.4 (right). We still have the

restri tion that the FU always writes to Lw, but Lw an be used by

others. Byreinsertingwritesyn hronizationeventswegeta omputation

model whi h allows all lat hes to be used as sour e and target for all

fun tionalunits. ThisisshowninFigure4.5.

E r,i ||E w,j

movesdatafrom

LvtoLwthroughtheFUdoing omputation. Restri tion: Lv annotbe

used as bothsour e andtarget and while Lv and Lw arebeingused in

omputation,there anbe: (i)nootherwritetoLvand(ii)no-otherread

orwriteto Lw.

Nextwewillpursuethedatapath(beta)withlat hesoninputandoutput ports:

Wealready haveinput lat hessoweinsertoutput lat hesand are thus

for edtogetanextrasyn hronizationevent ontrollingthe omputation.

Theexe utionofa omputationtakesthefollowingform:

{E r,i} ; E compute ; {E w,j }

,asshowninFigure4.6. Thenweremovethe ontrolofthis

om-putationeventbyde ouplingthe ontroloftheFUmakingitan

indepen-dentpro essasshownonFigure4.7(left). Thismodel anoperatewith

MUX

L E w,i

E r,i

L

FU

L E input

E compute

E output

L

E r,i

L

E w,j E w,i E w,k

v

Transfer

Merge

Transfer

Merge MUX

LATCH

MUX

FU

LATCH

MUX

FU v

FU

Figure 4.7: Final omputation model with normally opaque lat hes on input and

outputportsof thefun tionalunits.

in Figure 4.7(right),itresemblesthesyn hronousar hite ture butit is

ompletelyasyn hronous.

Bothmodelshavethesamear hite ture;theonlydieren eisthetimethedataneeds

to be held in the sour e lat h and restri tions on the target lat h. Both methods

anthereforebeusedheterogeneouslyinthesamedatapath,usingthemostsuitable

methodforthespe i FU,wewilldenotesu hamixed modelgamma.

This idea allowsus to useany of, but not restri tedto, the manysyn hronous

behavioral synthesis te hniques to obtain a hardware ar hite ture (datapath and

ontroller)andthento implementthisar hite tureusingasyn hronous ir uit

te h-niques. Atthesametime,thisideaallowstheuseofbehavioralsynthesiste hniques

4.2 Asyn hronous behavioral synthesis

Figure4.8: (Top)One-to-one orresponden ebetweenCDFGandasyn hronous

ir- uit. (Bottom) s heduledCDFGusing anon-essentialpre eden e- onstraint(thi k

solidline)and mappingtoasyn hronous ir uit.

Havingapproa hedourtarget omputationmodelfromthesyn hronoussidewe

willnowapproa hourmodelfrom theasyn hronousside. Thestartingpointisthe

one-to-one orresponden e between the CDFG representing the omputation and

theasyn hronoushandshake omponentnetwork,asshownin Figure4.8(left)with

asmall example. Forthis CDFG there is asingle essential pre eden e onstraint:

f 1 < g

. Thedelayof the ir uit is givenby

T = max (T f 2 , T f 1 + T g )

and thetotal

areaisgivenby

A = A f 1 + A f 2 + A g

.

Thebasi ideabehind onstraintbasedsynthesisandresour esharingisto

per-formtime-multiplexedmappingofseveraloperatorsontoasmallerset offun tional

units. As only one operation anbe performedper FU, this requires memory. In

this setting the time-multiplexing orresponds to the s heduling. The mapping of

operatorstoFUs, orrespondtotheassignment,andthesetofFUsthemselves

or-respond to the allo ation. The s heduling an be represented by aminimal set of

non-essential pre eden e onstraints [95℄ or resour e-ar s [2℄, spe ifying the

time-ordering. This isillustratedonFigure 4.8(right)with thenon-essentialpre eden e

onstraint:

f 1 < f 2

representedbythethi karrowfrom

f 1

to

f 2

,whi haremapped

onto the samefun tional unit

F

. In this ase the delay of the ir uit is given by

T = max (T f 1 + T f 2 , T f 1 + T g ) = T f 1 + max (T f 2 , T g )

and thetotal areais givenby

A = A f 1 ,f 2 + A g

.

Li

FU

E r,a E r,b

E w,c

a b

c

a b

c Lj

Lk

Figure4.9: Mappingoperator

σ

toaFU.

f b

a1 a2 a0

c1 c0

Control d Transfer

e

Figure4.10: The ontrolhandshake omponentand thetransferhandshake

ompo-nent.

Topro eedfromhereweneedthemappingofasingleoperator

σ

withsour edata

a, b

inlat h

Li

and

Lj

respe tively,andtargetdata

c

assignedto

Lk

whi hisgivenin

Figure4.9,asthesimplest onstru tionofsu hamapping. To onstru tthe ontrol

ir uitsforthismappingweintrodu ethedual omponenttothetransferhandshake

omponent, the ontrol omponent .f. Figure 4.10. The behavior of the ontrol

omponentisafollows: Firstthe omponentwaitsforarequestfromallinputports

a0, a1, ...

then arequest is pla ed onoutput port

b

. When ana knowledge arrives from

b

the handshakewith input ports

a0, a1, ...

are ompleted and thehandshake

withoutputports

c0, c1, ..

are ommen edand ompleted. TheSTGforafourphase

implementationofthe omponentisshowninFigure4.11.

Together with the transfer omponent the ontrol omponent maps the CDFG

ontoa ontrolpartandadatapart. Thisdependswhetherourfun tionalunitshave

input/output lat hes or not. Both solutions to this problem are shown in Figure

4.12. We now see there is a dire t orresponden e between the CDFG node and

the ontrol node ofour asyn hronous ir uit andthe fun tionalunit mapping. For

thealpha modelthere is adire t orresponden ebetween theCDFG node and the

ontrol omponent. Forthebetamodelthereisadire t orresponden ebetweenthe

CDFGinputar sandthe ontrolnoderesponsiblefortheloadingofthedatatothe

fun tionalunitandthedire t orresponden ebetweentheCDFGoutputar andthe

ontrolnoderesponsibleforthereadingoftheresultofthefun tionalunit. Wewill

c.a−

a.r+

b.r+

a.a+

a.r−

a.a−

c.r+

c.a+

b.a+

b.a−

b.r− c.r−

Figure4.11: FourphaseSTGforthe ontrolhandshake omponentwithonlyone

a

and

c

hannel. For multiple

a0, a1, ...

andmultiple

c0, c1, ...

the

a

and

c

haveto be

repla ed by on urrenthandshakingonallthese hannels.

Performing a one-to-one mapping of the ontrol nodes in the CDFG and the

alphamodelgeneratesthe ir uitshowninFigure4.13. Usingthisapproa hwehave

movedfrom the one-to-one orresponden e betweenCDFG and fun tional units to

modelwithaone-to-one orresponden ebetweentheCDFGandthe ontrolpartof

thehandshake ir uitonly. Thefun tionalunitsnowfollowthebehavioralsynthesis

allo ation. The ontrol partof the handshake ir uit ould be implemented using

anymethodologyfor asyn hronousstate-ma hine design: Burst-mode[109℄,Petrify

[26℄,setofhandshake omponents[92℄andBalsa/Tangram[7,11℄style ontroller.

Wewillimplementthe ontrolpartofthe ir uitusingadierentmethodto

gen-eratetheevents,whi huseshandshake omponentssu hassequen ersandparallel

et . These arebettersuited forourbehavioralsynthesisalgorithms operatingwith

asequen eofdis reteevents.

Thesamedatapathand ontrol ir uit anbebuiltforthebetamodel,usingthe

sameapproa h. To build a ompa t e ient omputation unit (datapath) we will

lookathowto generatethis ingeneralin thefollowingse tion.