• No results found

Simulated annealing and evolutionary algorithm

Inthisse tionweinvestigatetwometa-heuristi algorithmsforsolvingthebehavioral

synthesisproblem: (i)Simulatedannealingand (ii)evolutionaryalgorithms[78,42,

79,66,43,32,52℄. Meta-heuristi algorithmsareinterestingin this ontextaslarge

DFGs anbes heduledwithfastrun-times. Furthermoretheyareeasilybestopped

if the optimal solution is notrequired to be found, but just asolution whi h falls

within the area requirement. The power- onstrainthas notyet been implemented

intothesealgorithms.

For these algorithms we target DFG fragments to be s heduled and a

time- onstraint whi h spe ies the maximum amount of ontrol steps allowed for the

exe ution of the DFG fragment. The DFGs onsidered here are a y li dire ted

graphwithverti es

σ i

,representingtheoperatorstobeexe uted,andedges

σ i → σ l

,

spe ifying the order in whi h they haveto be exe uted for the omputation to be

orre t(

σ i

hastobeexe utedbefore

σ l

). TheDFGisaugmentedwithasour e

( on-ne ting to inputs,I) and atarget vertex ( onne tingfrom outputs, O).Toexe ute

operationsweusethesameresour elibraryoffun tionalunits,denedin table6.2.

Withthehardtimeframe onstraintweneedtonds heduleinwhi htoexe ute

theoperationsintheDFGontosomeFUssu hthatwenishalloperatorsbeforethe

timeframe

T

(withoutviolatingtheirdependen ies) andat thesametimeminimize thearea. Thisinvolvestrade-osbetweens hedulinge.g.many

{+, −, >}

operations

inparallel(requiringmore heapALUs),toserializemore

{∗}

operations(requiring

fewerexpensivemul1),aswellastradeosbetweendierentsubtypesofFUs(fast

orslow). All this depends strongly on thespe i DFG and thetime frame

T

we

haveavailable.

6.3.1 Problem formulation

First,weformulate the behavioral synthesisproblem asanILP problem. We have

aDFG with operators

σ i i = 1 . . . n

and dependen ies

σ i → σ l

, aresour e library

with fun tional units of type

F U j j = 1 . . . m

having a sili on area

w j

. And a

timeinterval

k = 1 . . . T

givingfor ea hoperator

σ i

atimeintervalwhere it anbe

s heduled:

S i . . . L i

. We want to minimize the used sili on area. Let us start by

introdu ingthevariablesinourformulation:

x : Let

x i,j,k

bea

0, 1

integervariableasso iatedwiththeoperator

σ i

:

x i,j,k = 1

if

σ i

iss heduledtostartintime-step

k

assignedtoexe uteon

F U j

and

x i,j,k = 0

otherwise.

N : Let

N j

beanintegervariablewhi h denotesthenumberof fun tionalunits of

type

F U j

wewillallo ateonourIC.

Theobje tivefun tionis:

minimize

A =

The obje tive fun tion (equ. 6.2) states we want to minimize the total used

sili on areaandsumsoverallfun tional unit typesand forea hmultiplies its area

bythenumberrequiredforthes hedule. Therst onstraint(equ. 6.3)simplystates

that alloperators mustbes heduledto startin sometime stepand on some

F U j

.

These ond onstraint(equ. 6.4)spe ies that for ea h DFG dependen y

σ i → σ l

operator

l

an only start after operator

i

nishes

t l ≥ d j + t i

(whi h depends on

whi h FU

i

iss heduledon). Thethierd onstraint(equ. 6.5)statesaFU anonly

exe uteoneoperationat atime. Thenal onstraint(equ. 6.6)ensuresthat there

nowhere is usedmore powerthan availeble. This last onstraintwill be ignored in

A i+3

i i+1 A i+1 i+2

i+2 i+3 cost gradient

Feasible

Infeasible Feasible

Perturbation A i

A

Figure6.9: Crossingfromoneislandofthesolutionspa etoanotherbykeepingthe

infeasible solutions, when the perturbation is smaller than the minimum required

distan e. Thesequen eof

φ j

'sindi atedbythedotsarethea tualsolutionsandthe

sequen eof

F(φ j ) = A j

indi atedbythe rosses, orrespondtothefeasiblesolutions

the ostareafun tionis omputedfrom.

6.3.2 Representation and feasibility

Weuse asolution ve tor ontaining

n

tuples (one for ea h operator), onsisting of

thepair

(k i , j i )

where

k i

isthe timestep, where operator

i

startsand

j i

is theFU

typeto exe uteit on(

k i ∈ S i . . . L i

and

j : σ i ∈ F U j

). Letthes hedule bedened

by:

φ = [(k 1 , j 1 ), (k 2 , j 2 ), . . . , (k n , j n )]

In bothsimulatedannealing and evolutionaryalgorithms wewill likelyprodu e

(andstartwith) solutionswhi h areinfeasible. Whereinfeasible meansweare

vio-latingDFGdependen ies,thereforeweneedto makethesolutionfeasible

φ → φ

.

We also use this feasibility algorithm to allow for easy rossing overregions of

infeasible solutions, as illustrated on Figure 6.9. We keep the infeasible solution

but omputethe ostof thisinfeasible solutionbymakingthesolutionfeasibleand

then ompute the ost of this solution. This requires howeverthat the feasibility

algorithmisdeterministi , su h thatthe best solution(feasible) anberegenerated

fromapossibleinfeasible best solution. This isabettersolutionthanworkingwith

apenaltyfun tionor removingtheinfeasiblesolutions.

First,letusrevisittheASAPalgorithm. Beforethealgorithmstartsassumewe

assignanoperator

σ i

totimestepwithin

t i ∈ S i . . . L i

andwith

j i

equaltothefastest

F U j

. Theoutputistheearliesttime

S l

theotheroperators

σ l

anbes heduledwith

σ i

iss heduledintimestep

k i

. Onlysu essorsto

σ i

areae ted

S l ≤ S l

.

Criti al for this to be of any use is

S l ≤ L l ∀ l

: Assume we at somepointget

S l > L l

after assigning operator

r

to time step

t r

(

∈ S r . . . L r , S r ≤ L r

). Let

p

be the longest path

σ r → σ l

and

q

the longest path

σ l → σ r

(going'ba kwards'):

S l ≥ t r + |p|

and

L r ≤ L l − |q|

. Sin etheDFGis a y li

|p| = |q|

, so

S l ≥ t r + |p|

and

L r + |p| ≤ L l

,thereforeif

S l > L l ⇔ t r + |p| > L r + |p|

or

t r > L r

, whi h is a

ontradi tion.

The same applies to the ALAP algorithm and by running both algorithms in

su ession, we redu e the time intervals for all other operators

σ l

:

k l ∈ S l . . . L l

,

S l ≤ S l , S l ≤ L l , L l ≤ L l

.

Upuntilnowwehaveassumed

j i

wasassignedontothefastestFU.Theavailable

delay is the minimal

L l

time for its su essors

σ l

minus the start time:

delay i = min{L l } − k i

. Soany

F U j

with

d j ≤ delay i

anbe hosen.

Thealgorithmforfeasibilityisasfollows:

Initial set

φ

empty.

Step 1 Pi kanuns heduledoperator

σ r

in

φ

.

Step2 S hedule

σ r

intimestep:

φ .k r = φ.k r

.

Step 3 Compute

delay r = min{L l } − k r

Step4 If

φ.j r ≤ delay r

:

φ .j r = φ.j r

else assign:

φ .j r = j

(j is the one withthe

slowestallowableexe ution)where

σ r ∈ F U j

and

d j ≤ delay r

.

Step5 ASAP(update

S l → S l

)

Step 6 ALAP(update

L l → L l

)

Step7 For all uns heduled operators

σ l

in

φ

: if

φ.k l < S l

set

φ.k l = S l

and if

φ.k l > L l

set

φ.k l = L l

.

Step8 Ifanyuns heduledoperatorsin

φ

gotostep 1.

Thealgorithm worksbyiterativelys hedulingoperatorsoneat atimeandea h

timerunningASAP and ALAPredu ingthe validtime intervalsfor uns heduled

operators and a feasible s hedule an be obtained. The algorithm is deterministi

andhas omplexity

O(n 2 )

.

6.3.3 Simulated annealing

The simulated annealing algorithm is a meta-heuristi algorithm for solving ILP

problemswhi hborrowsfromthephysi almodelofnearadiabati rystallizationi.e.

theformationofaperfe t rystallatti e.

Simulatedannealingalgorithm:

Initial Generateinitial feasiblesolutionve tor

→ φ

and omputeitsarea ost

A

Step1 Perturb

φ

, by randomly movingan operator in time and hangingits FU

assignment

→ φ

.

Step 2 Generate a feasible solution from the perturbed solution ve tor

F(φ ) →

φ f easible

Step3 Computethearea ostof

φ f easible → A

.

Step3 If the new ost is smaller than the existing solution (

A < A

) a ept the

new solution

φ

, otherwise onditionally a ept

φ

depending if

exp(−(A − A)/T emp) > random(1)

istrue.

Step4 Update the solution spa e

, A , T emp ) → (φ, A, T emp)

and while not

thermalequilibriumgotostep 1.

Step5 Redu ethetemperatureexponentially

T emp = αT emp

,with

0 < α < 1

.

Step6 Ifthetemperature

T emp

is largerthan

T emp crystal

(thestopping

temper-ature)and

A

islargerthan

A accept

gotostep 1.

In the iteration stepa random operator

σ i

is hosen and random (a eptable) valuesareinsertedforboth

k i

and

j i

. Thenthes heduleismadefeasiblestartingwith

s heduling

σ i

andthens hedulingtherest. Inthiswaywe ensuretheperturbation survivesthe feasibility pro ess. Then depending on the ost and the temperature

wea eptthis news hedule ornot. Thefundamental dieren ebetweensimulated

annealingandlo alsear hliesintheabilityathigh temperaturestomoveuphill

i.e. a ept solutionswhi h are lessoptimal (as well as alwaysmovedownhill i.e.

a eptmoreoptimalsolutions). Thisishandled bythea eptfun tion maintaining

the Boltzmann distribution from statisti al me hani s. Initially the algorithm is

started with an random solutionwhi h is made feasible. The thermal equilibrium

onditionrepeatstheinner-loopa ertainamount,thisisdeterminedinthefollowing

hapter.

T emp crystal

stopsthealgorithmifthetemperature omesdownto1. It an beshownmathemati allythatbysele tingthe orre ttemperaturefun tionspe i

to the problem, the simulated annealing algorithm will nd the optimal solution.

Howeverthetimespentonndingtheoptimalsolution anbeshowntobeequalto

orlargerthanthetimetoperformanexhaustivesear h. Wesetthestarttemperature

to

10000

andit anbeshownthataadiabati ool-ointemperature orrespondsto anexponentialtemperaturede ayi.e. thenewtemperatureisgeneratedby

T emp = αT emp

with

0 < α < 1

. Wedeterminetheappropriatevaluefor

α

in thefollowing

hapter.

6.3.4 Evolutionary algorithm

Theevolutionaryalgorithm approa h is ameta-heuristi algorithm forsolvingILP

problems whi h is biologi ally inspiredand implements the on ept of survivalof

thettest.

Evolutionaryalgorithm:

Initial Generateinitial setof feasiblesolutionve tors

→ Φ = {φ}

, thepopulation, and omputetheirrespe tivearea osts

A = {A}

andsetthegeneration ount

tozero

G = 0

.

Step1 Removethehalf partof thepopulation

Φ

with thelowestarea ost

→ Φ 1 2

andset

Φ = ∅

.

Step 2 Sele t twoelementsfrom

Φ 1

2 → {φ a , φ b }

, the parentsolution ve tors, and

removetheelementsfromtheset

Φ 1

2 \{φ a , φ b } → Φ 1 2

.

Step3 Sele tarandom rossoverpositionandformtwonewsolutionve tors

{φ a , φ b } → {ψ, ϕ}

,the hildsolutionve tors.

Step 4 Mutate

{ψ, ϕ}

, by randomly moving anoperator in time and hangingits

FUassignment

→ {ψ , ϕ }

usingalowprobability

χ

formutating thesolution

ve tors.

Step5 Add the parent and the the hild solution ve torsto the new population

Φ + {φ a , φ b , ψ , ϕ } → Φ ′′

.

Step6 Update thesolutionsets

1 2

Step7 Generatefeasiblesolutionsfromtheperturbedsolutionve torsin

Φ

:

F(Φ perturbed ) → Φ f easible

.

Step8 Computethearea ostof

Φ f easible → A f easible

.

Step9 In rementthegeneration ount

G

andupdatethesolutionspa e

, A ) → (Φ, A)

.

Step 10 Ifthebestsolution

A best

islargerthan

A accept

andthegeneration

G

isless

than

G stop

goto Step1.

Thealgorithmworksbyrstdeletingthemostunthalfofthepopulation. Then

for two survivor pairs we sele t a random rosspoint and perform the rossover

thereby produ ing two new hildren. Then we randomly sometimes add a

muta-tionto the hildren. Then the hildren are made feasible (in the sameway asfor

thesimulatedannealing)andthe ostfun tions areevaluatedandtheyareput into

thenewpopulation. Thefundamentaldieren ebetweenthelo alsear h/simulated

annealing andthe evolutionaryalgorithm is theuse of apopulationofsolutions in

thelatter. Thedeletion ofthe most unthalf in prin ipleworks asthe downhill

movingpartandwiththe ross-overandmutationasthepotentialdownhill/uphill

moving part. Initially thealgorithm is startedwith set of random solutions,made

feasibleandevaluated. Themutationrateisin ludedintheevolutionaryalgorithms

topreventtheentirepopulationfrom onvergingto asingle olle tionofsimilar

so-lutions. The mutation rate should not be the prin ipal solutionspa e exploration

methodofthealgorithmandshouldbeverylow;we hose

χ = 0.01

. Thegeneration

ountterminates the main loop if morethan

G stop

generations has passed. Inthe following hapterwedetermineboththepopulationsizeandthe

G stop

parameter.

Module Oprs Area Time-slots E/time-slot[nJ℄

add

{+} 2032.75 1 0.0266

sub

{−} 2032.75 1 0.0266

omp

{>} 2032.75 1 0.0266

ALU

{+, −, >} 2965.00 1 0.0266

mul1

{∗} 41978, 50 3 0.1046

mul2

{∗} 28414.50 6 0.0523

mul3

{∗} 14638.75 17 0.0319

input i

43.00 1 0.0

output o

43.00 1 0.0

Table 6.2: 16 bit fun tional unit librarybased on balsa- ost numbers,available to

thesynthesisalgorithm.

Figure6.10: (Left)Partitionof ourCDFGintoDFGfragments. (Right)The

orre-spondingtaskgraphtothepartitionoftheCDFG.