Simulated annealing and evolutionary algorithm

Inthisse tionweinvestigatetwometa-heuristi algorithmsforsolvingthebehavioral

synthesisproblem: (i)Simulatedannealingand (ii)evolutionaryalgorithms[78,42,

79,66,43,32,52℄. Meta-heuristi algorithmsareinterestingin this ontextaslarge

DFGs anbes heduledwithfastrun-times. Furthermoretheyareeasilybestopped

if the optimal solution is notrequired to be found, but just asolution whi h falls

within the area requirement. The power- onstrainthas notyet been implemented

intothesealgorithms.

For these algorithms we target DFG fragments to be s heduled and a

time- onstraint whi h spe ies the maximum amount of ontrol steps allowed for the

exe ution of the DFG fragment. The DFGs onsidered here are a y li dire ted

graphwithverti es

σ i

^,representingtheoperatorstobeexe uted,andedges

σ i → σ l

spe ifying the order in whi h they haveto be exe uted for the omputation to be

orre t(

σ i

^has^to^be^exe
uted^before

σ l

^). ^The^DFGîsâugmented^withâ^sour
e

( on-ne ting to inputs,I) and atarget vertex ( onne tingfrom outputs, O).Toexe ute

operationsweusethesameresour elibraryoffun tionalunits,denedin table6.2.

Withthehardtimeframe onstraintweneedtonds heduleinwhi htoexe ute

theoperationsintheDFGontosomeFUssu hthatwenishalloperatorsbeforethe

timeframe

T

^(without^violating^theirdependen ies) andat thesametimeminimize thearea. Thisinvolvestrade-osbetweens hedulinge.g.many

{+, −, >}

^operations

inparallel(requiringmore heapALUs),toserializemore

{∗}

^operations^(requiring

fewerexpensivemul1),aswellastradeosbetweendierentsubtypesofFUs(fast

orslow). All this depends strongly on thespe i DFG and thetime frame

T

^we

haveavailable.

6.3.1 Problem formulation

First,weformulate the behavioral synthesisproblem asanILP problem. We have

aDFG with operators

σ i i = 1 . . . n

^and dependen ies

σ i → σ l

^, ^a^resour
e ^library

with fun tional units of type

F U j j = 1 . . . m

^having ^a ^sili
on ^area

w j

^. ^And ^a

timeinterval

k = 1 . . . T

^giving^for ^ea
h^operator

σ i

â^timeînterval^where îtân^be

s heduled:

S i . . . L i

^. ^We ^want ^to ^minimize ^the ûsed ^sili
on ârea. ^Let ûs ^start ^by

introdu ingthevariablesinourformulation:

x : Let

x i,j,k

^be^a

0, 1

înteger^variableâsso
iated^with^theôperator

σ i

x i,j,k = 1

^if

σ i

^is^s
heduled^to^startⁱⁿ^time-step

k

âssigned^toêxe
uteôn

F U j

^and

x i,j,k = 0

otherwise.

N : Let

N j

^beânînteger^variable^whi
h ^denotes^the^numberôf ^fun
tionalûnits ôf

type

F U j

^we^willâllo
ateônôurÎC.

Theobje tivefun tionis:

minimize

A =

The obje tive fun tion (equ. 6.2) states we want to minimize the total used

sili on areaandsumsoverallfun tional unit typesand forea hmultiplies its area

bythenumberrequiredforthes hedule. Therst onstraint(equ. 6.3)simplystates

that alloperators mustbes heduledto startin sometime stepand on some

F U j

These ond onstraint(equ. 6.4)spe ies that for ea h DFG dependen y

σ i → σ l

operator

l

ân ônly ^start âfter ôperator

i

^nishes

t l ≥ d j + t i

^(whi
h ^depends ^on

whi h FU

i

îs^s
heduledôn). ^The^thierdônstraint^(equ. ^6.5)^statesâ^FU ânônly

exe uteoneoperationat atime. Thenal onstraint(equ. 6.6)ensuresthat there

nowhere is usedmore powerthan availeble. This last onstraintwill be ignored in

A i+3

i i+1 A _i+1 i+2

i+2 i+3 cost gradient

Feasible

Infeasible Feasible

Perturbation A _i

A

Figure6.9: Crossingfromoneislandofthesolutionspa etoanotherbykeepingthe

infeasible solutions, when the perturbation is smaller than the minimum required

distan e. Thesequen eof

φ j

^'sîndi
ated^by^the^dotsâre^theâ
tual^solutionsând^the

sequen eof

F(φ j ) = A j

^indi
ated^by^the^rosses,^orrespond^to^the^feasible^solutions

the ostareafun tionis omputedfrom.

6.3.2 Representation and feasibility

Weuse asolution ve tor ontaining

n

^tuples ^(one ^for êa
h ôperator), ônsisting ôf

thepair

(k i , j i )

^where

k i

^is^the ^time^step, ^where ^operator

i

^starts^and

j i

^is ^the^FU

typeto exe uteit on(

k i ∈ S i . . . L i

^and

j : σ i ∈ F U j

^). ^Let^the^s
hedule ^be^dened

by:

φ = [(k 1 , j 1 ), (k 2 , j 2 ), . . . , (k n , j n )]

In bothsimulatedannealing and evolutionaryalgorithms wewill likelyprodu e

(andstartwith) solutionswhi h areinfeasible. Whereinfeasible meansweare

vio-latingDFGdependen ies,thereforeweneedto makethesolutionfeasible

φ → φ ^′

We also use this feasibility algorithm to allow for easy rossing overregions of

infeasible solutions, as illustrated on Figure 6.9. We keep the infeasible solution

but omputethe ostof thisinfeasible solutionbymakingthesolutionfeasibleand

then ompute the ost of this solution. This requires howeverthat the feasibility

algorithmisdeterministi , su h thatthe best solution(feasible) anberegenerated

fromapossibleinfeasible best solution. This isabettersolutionthanworkingwith

apenaltyfun tionor removingtheinfeasiblesolutions.

First,letusrevisittheASAPalgorithm. Beforethealgorithmstartsassumewe

assignanoperator

σ i

^to^time^step^within

t i ∈ S i . . . L i

^and^with

j i

^equal^to^the^fastest

F U j

^. ^Theôutputîs^theêarliest^time

S _l ^′

^the^other^operators

σ l

^an^be^s
heduled^with

σ i

^is^s
heduledⁱⁿ^time^step

k i

^. ^Only^su

essors^to

σ i

^are^ae
ted

S l ≤ S _l ^′

Criti al for this to be of any use is

S _l ^′ ≤ L l ∀ l

^: ^Assume ^we ^at ^some^point^get

S _l ^′ > L l

âfter âssigning ôperator

r

^to ^time ^step

t r

⁽

∈ S r . . . L r , S r ≤ L r

^). ^Let

p

be the longest path

σ r → σ l

^and

q

^the ^longest ^path

σ l → σ r

^(going'ba kwards'):

S _l ^′ ≥ t r + |p|

^and

L r ≤ L l − |q|

^. ^Sin
e^the^DFG^is ^a
y
li

|p| = |q|

^, ^so

S _l ^′ ≥ t r + |p|

and

L r + |p| ≤ L l

^,^therefore^if

S _l ^′ > L l ⇔ t r + |p| > L r + |p|

^or

t r > L r

^, ^whi
h ^is ^a

ontradi tion.

The same applies to the ALAP algorithm and by running both algorithms in

su ession, we redu e the time intervals for all other operators

σ l

k l ∈ S _l ^′ . . . L ^′ _l

S l ≤ S _l ^′ , S _l ^′ ≤ L ^′ _l , L ^′ _l ≤ L _l

Upuntilnowwehaveassumed

j i

^wasâssignedônto^the^fastest^FU.^Theâvailable

delay is the minimal

L ^′ _l

^time ^for ^its ^su

essors

σ l

^minus ^the ^start ^time:

delay i = min{L ^′ _l } − k i

^. ^So^any

F U j

^with

d j ≤ delay i

^an^be^hosen.

Thealgorithmforfeasibilityisasfollows:

Initial set

φ ^′

^empty^.

Step 1 Pi kanuns heduledoperator

σ r

ⁱⁿ

φ

Step2 S hedule

σ r

ⁱⁿ^time^step:

φ ^′ .k r = φ.k r

Step 3 Compute

delay r = min{L ^′ _l } − k r

Step4 If

φ.j r ≤ delay r

φ ^′ .j r = φ.j r

^else ^assign^:

φ ^′ .j r = j

^(j ^is ^the ^one ^with^the

slowestallowableexe ution)where

σ r ∈ F U j

^and

d j ≤ delay r

Step5 ASAP(update

S l → S _l ^′

⁾

Step 6 ALAP(update

L l → L ^′ _l

⁾

Step7 For all uns heduled operators

σ l

ⁱⁿ

φ

^: ^if

φ.k l < S _l ^′

^set

φ.k l = S _l ^′

^and ^if

φ.k l > L ^′ _l

^set

φ.k l = L ^′ _l

Step8 Ifanyuns heduledoperatorsin

φ

^goto^step ^1.

Thealgorithm worksbyiterativelys hedulingoperatorsoneat atimeandea h

timerunningASAP and ALAPredu ingthe validtime intervalsfor uns heduled

operators and a feasible s hedule an be obtained. The algorithm is deterministi

andhas omplexity

O(n ² )

6.3.3 Simulated annealing

The simulated annealing algorithm is a meta-heuristi algorithm for solving ILP

problemswhi hborrowsfromthephysi almodelofnearadiabati rystallizationi.e.

theformationofaperfe t rystallatti e.

Simulatedannealingalgorithm:

Initial Generateinitial feasiblesolutionve tor

→ φ

ândômputeîtsâreaôst

A

Step1 Perturb

φ

^, ^by ^randomly ^movingân ôperator ⁱⁿ ^time ând ^hangingîts ^FU

assignment

→ φ ^′

Step 2 Generate a feasible solution from the perturbed solution ve tor

F(φ ^′ ) →

φ ^′ _{f easible}

Step3 Computethearea ostof

φ ^′ _{f easible} → A ^′

Step3 If the new ost is smaller than the existing solution (

A ^′ < A

⁾ ^a

ept ^the

new solution

φ ^′

^, ^otherwise onditionally a ept

φ ^′

^depending ^if

exp(−(A ^′ − A)/T emp) > random(1)

^is^true.

Step4 Update the solution spa e

(φ ^′ , A ^′ , T emp ^′ ) → (φ, A, T emp)

^and ^while ^not

thermalequilibriumgotostep 1.

Step5 Redu ethetemperatureexponentially

T emp ^′ = αT emp

^,^with

0 < α < 1

Step6 Ifthetemperature

T emp ^′

^is ^larger^than

T emp crystal

^(the^stopping

temper-ature)and

A ^′

^is^larger^than

A accept

^goto^step ^1.

In the iteration stepa random operator

σ i

^is ^hosen ^and ^random (a eptable) valuesareinsertedforboth

k i

^and

j i

^. ^Then^the^s
hedule^is^made^feasible^starting^with

s heduling

σ i

ând^then^s
heduling^the^rest. În^this^way^we ênsure^theperturbation survivesthe feasibility pro ess. Then depending on the ost and the temperature

wea eptthis news hedule ornot. Thefundamental dieren ebetweensimulated

annealingandlo alsear hliesintheabilityathigh temperaturestomoveuphill

i.e. a ept solutionswhi h are lessoptimal (as well as alwaysmovedownhill i.e.

a eptmoreoptimalsolutions). Thisishandled bythea eptfun tion maintaining

the Boltzmann distribution from statisti al me hani s. Initially the algorithm is

started with an random solutionwhi h is made feasible. The thermal equilibrium

onditionrepeatstheinner-loopa ertainamount,thisisdeterminedinthefollowing

hapter.

T emp crystal

^stops^the^algorithm^if^thetemperature omesdownto1. It an beshownmathemati allythatbysele tingthe orre ttemperaturefun tionspe i

to the problem, the simulated annealing algorithm will nd the optimal solution.

Howeverthetimespentonndingtheoptimalsolution anbeshowntobeequalto

orlargerthanthetimetoperformanexhaustivesear h. Wesetthestarttemperature

10000

ândîtân^be^shown^thatââdiabatiôol-oⁱⁿtemperature orrespondsto anexponentialtemperaturede ayi.e. thenewtemperatureisgeneratedby

T emp ^′ = αT emp

^with

0 < α < 1

^. ^We^determine^theappropriatevaluefor

α

ⁱⁿ ^the^following

hapter.

6.3.4 Evolutionary algorithm

Theevolutionaryalgorithm approa h is ameta-heuristi algorithm forsolvingILP

problems whi h is biologi ally inspiredand implements the on ept of survivalof

thettest.

Evolutionaryalgorithm:

Initial Generateinitial setof feasiblesolutionve tors

→ Φ = {φ}

^, ^thepopulation, and omputetheirrespe tivearea osts

A = {A}

^and^set^the^generation^ount

tozero

G = 0

Step1 Removethehalf partof thepopulation

Φ

^with ^the^lowest^area^ost

→ Φ 1 2

andset

Φ ^′ = ∅

Step 2 Sele t twoelementsfrom

Φ 1

2 → {φ _a , φ b }

^, ^the ^parent^solution ^ve
tors, ^and

removetheelementsfromtheset

Φ 1

2 \{φ a , φ b } → Φ ^′ 1 2

Step3 Sele tarandom rossoverpositionandformtwonewsolutionve tors

{φ a , φ b } → {ψ, ϕ}

^,^the^hild^solution^ve
tors.

Step 4 Mutate

{ψ, ϕ}

^, ^by ^randomly ^moving ânôperator ⁱⁿ ^time ând ^hangingîts

FUassignment

→ {ψ ^′ , ϕ ^′ }

^using^a^lowprobability

χ

^for^mutating ^the^solution

ve tors.

Step5 Add the parent and the the hild solution ve torsto the new population

Φ ^′ + {φ a , φ b , ψ ^′ , ϕ ^′ } → Φ ^′′

Step6 Update thesolutionsets

(Φ ^′ 1 2

Step7 Generatefeasiblesolutionsfromtheperturbedsolutionve torsin

Φ ^′

F(Φ ^′ _perturbed ) → Φ ^′ _{f easible}

Step8 Computethearea ostof

Φ ^′ _{f easible} → A ^′ _{f easible}

Step9 In rementthegeneration ount

G

^and^update^the^solution^spa
e

(Φ ^′ , A ^′ ) → (Φ, A)

Step 10 Ifthebestsolution

A best

^is^larger^than

A accept

^and^the^generation

G

^is^less

than

G stop

^goto ^Step^1.

Thealgorithmworksbyrstdeletingthemostunthalfofthepopulation. Then

for two survivor pairs we sele t a random rosspoint and perform the rossover

thereby produ ing two new hildren. Then we randomly sometimes add a

muta-tionto the hildren. Then the hildren are made feasible (in the sameway asfor

thesimulatedannealing)andthe ostfun tions areevaluatedandtheyareput into

thenewpopulation. Thefundamentaldieren ebetweenthelo alsear h/simulated

annealing andthe evolutionaryalgorithm is theuse of apopulationofsolutions in

thelatter. Thedeletion ofthe most unthalf in prin ipleworks asthe downhill

movingpartandwiththe ross-overandmutationasthepotentialdownhill/uphill

moving part. Initially thealgorithm is startedwith set of random solutions,made

feasibleandevaluated. Themutationrateisin ludedintheevolutionaryalgorithms

topreventtheentirepopulationfrom onvergingto asingle olle tionofsimilar

so-lutions. The mutation rate should not be the prin ipal solutionspa e exploration

methodofthealgorithmandshouldbeverylow;we hose

χ = 0.01

^. ^The^generation

ountterminates the main loop if morethan

G stop

generations has passed. Inthe following hapterwedetermineboththepopulationsizeandthe

G stop

^parameter.

Module Oprs Area Time-slots E/time-slot[nJ℄

add

{+} 2032.75 1 0.0266

sub

{−} 2032.75 1 0.0266

omp

{>} 2032.75 1 0.0266

ALU

{+, −, >} 2965.00 1 0.0266

mul1

{∗} 41978, 50 3 0.1046

mul2

{∗} 28414.50 6 0.0523

mul3

{∗} 14638.75 17 0.0319

input i

43.00 1 0.0

output o

43.00 1 0.0

Table 6.2: 16 bit fun tional unit librarybased on balsa- ost numbers,available to

thesynthesisalgorithm.

Figure6.10: (Left)Partitionof ourCDFGintoDFGfragments. (Right)The

orre-spondingtaskgraphtothepartitionoftheCDFG.

In document Behavioral synthesis of asynchronous circuits (Page 96-102)

Simulated annealing and evolutionary algorithm

σ i

σ i → σ l

σ i

σ l

T

{+, −, >}

{∗}

T

σ i i = 1 . . . n

σ i → σ l

F U j j = 1 . . . m

w j

k = 1 . . . T

σ i

S i . . . L i

x i,j,k

0, 1

σ i

x i,j,k = 1

σ i

k

F U j

x i,j,k = 0

N j

F U j

A =

F U j

σ i → σ l

l

i

t l ≥ d j + t i

i

A i+3

i i+1 A i+1 i+2

i+2 i+3 cost gradient

Feasible

Infeasible Feasible

Perturbation A i

A

φ j

F(φ j ) = A j

n

(k i , j i )

k i

i

j i

k i ∈ S i . . . L i

j : σ i ∈ F U j

φ = [(k 1 , j 1 ), (k 2 , j 2 ), . . . , (k n , j n )]

φ → φ ′

σ i

t i ∈ S i . . . L i

j i

F U j

S l ′

σ l

σ i

k i

σ i

S l ≤ S l ′

S l ′ ≤ L l ∀ l

S l ′ > L l

r

t r

∈ S r . . . L r , S r ≤ L r

p

σ r → σ l

q

σ l → σ r

S l ′ ≥ t r + |p|

L r ≤ L l − |q|

|p| = |q|

S l ′ ≥ t r + |p|

L r + |p| ≤ L l

S l ′ > L l ⇔ t r + |p| > L r + |p|

t r > L r

σ l

k l ∈ S l ′ . . . L ′ l

S l ≤ S l ′ , S l ′ ≤ L ′ l , L ′ l ≤ L l

i i+1 A _i+1 i+2

Perturbation A _i

φ → φ ^′

S _l ^′

S l ≤ S _l ^′

S _l ^′ ≤ L l ∀ l

S _l ^′ > L l

S _l ^′ ≥ t r + |p|

S _l ^′ ≥ t r + |p|

S _l ^′ > L l ⇔ t r + |p| > L r + |p|

k l ∈ S _l ^′ . . . L ^′ _l

S l ≤ S _l ^′ , S _l ^′ ≤ L ^′ _l , L ^′ _l ≤ L _l

L ^′ _l

delay i = min{L ^′ _l } − k i

φ ^′

φ ^′ .k r = φ.k r

delay r = min{L ^′ _l } − k r

φ ^′ .j r = φ.j r

φ ^′ .j r = j

S l → S _l ^′

L l → L ^′ _l

φ.k l < S _l ^′

φ.k l = S _l ^′

φ.k l > L ^′ _l

φ.k l = L ^′ _l

O(n ² )

→ φ ^′

F(φ ^′ ) →

φ ^′ _{f easible}

φ ^′ _{f easible} → A ^′

A ^′ < A

φ ^′

φ ^′

exp(−(A ^′ − A)/T emp) > random(1)

(φ ^′ , A ^′ , T emp ^′ ) → (φ, A, T emp)

T emp ^′ = αT emp

T emp ^′

A ^′

T emp ^′ = αT emp

Φ ^′ = ∅

2 → {φ _a , φ b }

2 \{φ a , φ b } → Φ ^′ 1 2

→ {ψ ^′ , ϕ ^′ }

Φ ^′ + {φ a , φ b , ψ ^′ , ϕ ^′ } → Φ ^′′

(Φ ^′ 1 2