• No results found

INSTRUCTION SELECTION. Instruction selection by Tree-pattern-matching (1)

N/A
N/A
Protected

Academic year: 2022

Share "INSTRUCTION SELECTION. Instruction selection by Tree-pattern-matching (1)"

Copied!
10
0
0

Loading.... (view fulltext now)

Full text

(1)

INTEGRATEDCODEGENERATION.Page1C.Kessler,IDA,Link¨opingsUniversitet,2002.

INTEGRATEDCODEGENERATION

Introductiontocodegeneration

Instructionselection

Registerallocation

Instructionscheduling

Interdependences

Phaseorderingproblemsforirregulararchitectures(DSPs)

Integratedcodegenerationapproaches

OPTIMIST INTEGRATEDCODEGENERATION.Page2C.Kessler,IDA,Link¨opingsUniversitet,2002.

Compilerstructure

frontend IR1

(AST) CFGgen. IR2

(CFG) select. opt.opt.

program

main

exit basicblock t3=t1+t2x[3]=t3 t1=a+1t2=2*t1 sourceinstruction CODE GENERATION

instruction

allocation schedulingregister(symb.reg.) IR3 opt.opt.assemblerprogram COMPILER

ADD MULSUB ADDCONST

LOADLOAD t2 t8 STORE

CONSTLOAD t9t1

t7

t5t6

t4t3 directed acyclic graph (DAG)

INTEGRATEDCODEGENERATION.Page3C.Kessler,IDA,Link¨opingsUniversitet,2002.

INSTRUCTIONSELECTION

naiveinstructionselection:foreachLIRoperationgenerateasequenceofequivalenttargetinstructionsfp fp+4 fp+8 fp+12

i c d

ASGNI

ADDI

CNSTI { int i; char c; i=c+4; }

4 naive instruction selection,arranged by a postorder traversal:

(needs 7 cycles, 5 sregs) nop ! ", delay slotaddi R0,#4,s4 ! CNSTI ! CVCI load 0(s2),s3 ! INDIRC addi fp,#8,s2 ! ADDRLP(c) addi fp,#4,s1 ! ADDRLP(i)addi s3,s4,s5 ! ADDIstore s5,0(s1) ! ASGNI ADDRLPi

CVCIINDIRC

ADDRLPc INTEGRATEDCODEGENERATION.Page4C.Kessler,IDA,Link¨opingsUniversitet,2002.

InstructionselectionbyTree-pattern-matching(1)

TheLIRhasoftenafinergranularitythanthetargetinstructions!

patternmatching

reg reg

load 8(fp),s1 ! ADDRFP+INDIRC+CVCI ASGNI

ADDI arranged by a postorder traversal:

ADDRLPinop ! ", delay slot { int i; char c; i=c+4; }pattern-matching instruction selection,

4 CNSTICVCI

INDIRC

ADDRLPc addi s1,#4,s2 ! CNSTI+ADDIstore s2,4(fp) ! ADDRLP(i)+ASGNI

(needs 4 cycles, 2 sregs)

(2)

INTEGRATEDCODEGENERATION.Page5C.Kessler,IDA,Link¨opingsUniversitet,2002.

InstructionselectionbyTree-pattern-matching

ThetargetprocessorisdescribedbyatreegrammarG

N

T

s

P



nonterminalsN

stmt,reg,con,addr,mem,...

(startsymbolisstmt)

terminalsT=

CNSTI,ADDRLP,...

productionrulesP:

regADDI(reg,con)addi%r,%c,%r1regADDI(reg,reg)addi%r,%r,%r1stmtASGNI(addr,reg)store%r,%a1stmtASGNI(reg,reg)store%r,0(%r)1regADDRLPaddifp,#%d,%r1addrADDRLP%d(fp)0regaddraddi%a,%r1regINDIRC(addr)load%a,%r;nop2regINDIRC(reg)load0(%r),%r;nop2reg

CVCI(INDIRC(addr))load%a,%r;nop2reg

CVCI(INDIRC(reg))load0(%r),%r;nop2conCNSTI%d0regconaddiR0,#%c,%r1 INTEGRATEDCODEGENERATION.Page6C.Kessler,IDA,Link¨opingsUniversitet,2002.

InstructionselectionbyTree-pattern-matching

Findaleast-costderivationoftheLIRtree

ASGNI

ADDI

CVCIINDIRC

addr CNSTI 00

ADDRLP ASGNI

ADDI

CVCIINDIRC

ADDRLP CNSTI ASGNI

ADDI

CVCIINDIRC

ADDRLP addr

CNSTI addr

stmt 1ASGNIASGNI 1ASGNI

ADDIADDI

conreg addraddraddrreg

regCNSTI 02

INTEGRATEDCODEGENERATION.Page7C.Kessler,IDA,Link¨opingsUniversitet,2002.

T

reepatternmatchingbydynamicprogramming(example:IBURG)

ASGNI

INDIRC

ADDRLPc ADDI

CNSTI i ADDRLP

CVCI reg -> ADDRLP1 reg -> addr addr -> ADDRLP01

Dynamicprogrammingbottom-uptreepatternmatching

TWIG[Aho/Ganapathi/Tjiang’89],IBURG[Fraser/Hanson/Proebsting’92]

alternatively:treeparsingusingLR-parsing[Graham/Glanville’78] INTEGRATEDCODEGENERATION.Page8C.Kessler,IDA,Link¨opingsUniversitet,2002.

T

reepatternmatchingbydynamicprogramming(example:IBURG)

ASGNI

INDIRC

ADDRLPc ADDI

CNSTI

addr -> ADDRLPreg -> addr 01 i ADDRLP

CVCI

1reg -> ADDRLP reg -> ADDRLP1 reg -> addr addr -> ADDRLP01

Dynamicprogrammingbottom-uptreepatternmatching

TWIG[Aho/Ganapathi/Tjiang’89],IBURG[Fraser/Hanson/Proebsting’92]

alternatively:treeparsingusingLR-parsing[Graham/Glanville’78]

(3)

INTEGRATEDCODEGENERATION.Page9C.Kessler,IDA,Link¨opingsUniversitet,2002.

T

reepatternmatchingbydynamicprogramming(example:IBURG)

ASGNI

INDIRC

ADDRLPc ADDI

CNSTI

reg -> INDIRC(addr)reg -> INDIRC(reg)

addr -> ADDRLPreg -> addr 01 2+0=22+1=3 i ADDRLP

CVCI

1reg -> ADDRLP reg -> ADDRLP1 reg -> addr addr -> ADDRLP01

Dynamicprogrammingbottom-uptreepatternmatching

TWIG[Aho/Ganapathi/Tjiang’89],IBURG[Fraser/Hanson/Proebsting’92]

alternatively:treeparsingusingLR-parsing[Graham/Glanville’78] INTEGRATEDCODEGENERATION.Page10C.Kessler,IDA,Link¨opingsUniversitet,2002.

T

reepatternmatchingbydynamicprogramming(example:IBURG)

ASGNI

INDIRC

ADDRLPc ADDI

CNSTIreg -> CVCI(INDIRC(addr)reg -> CVCI(INDIRC(reg))

reg -> INDIRC(addr)reg -> INDIRC(reg)

addr -> ADDRLPreg -> addr con -> CNSTIreg -> con

01 23 01

2+0=22+1=3 i ADDRLP

CVCI

1reg -> ADDRLP reg -> ADDRLP1 reg -> addr addr -> ADDRLP01

Dynamicprogrammingbottom-uptreepatternmatching

TWIG[Aho/Ganapathi/Tjiang’89],IBURG[Fraser/Hanson/Proebsting’92]

alternatively:treeparsingusingLR-parsing[Graham/Glanville’78]

INTEGRATEDCODEGENERATION.Page11C.Kessler,IDA,Link¨opingsUniversitet,2002.

T

reepatternmatchingbydynamicprogramming(example:IBURG)

ASGNI

INDIRC

ADDRLPc ADDI

CNSTIreg -> CVCI(INDIRC(addr)reg -> CVCI(INDIRC(reg))

reg -> INDIRC(addr)reg -> INDIRC(reg)

addr -> ADDRLPreg -> addr con -> CNSTIreg -> con reg -> ADDI(reg,reg) reg -> ADDI(reg,con) stmt -> ASGNI(addr,reg)stmt -> ASGNI(reg,reg)

01 23 01 1+2+0=31+2+1=4

2+0=22+1=3 1+0+3=41+1+3=5

i ADDRLP

CVCI

1reg -> ADDRLP reg -> ADDRLP1 reg -> addr addr -> ADDRLP01

Dynamicprogrammingbottom-uptreepatternmatching

TWIG[Aho/Ganapathi/Tjiang’89],IBURG[Fraser/Hanson/Proebsting’92]

alternatively:treeparsingusingLR-parsing[Graham/Glanville’78] INTEGRATEDCODEGENERATION.Page12C.Kessler,IDA,Link¨opingsUniversitet,2002.

Conflictswithinstructionschedulingandregisterallocation

!Thecostattributeofaproductionisonlyaroughestimate.

Theactualimpactonexecutiontimeisonlyknownforagiven

schedulingsituation:

+currentlyfreefunctionalunits

+otherinstructionsthatmaybeexecutedsimultaneously

+latencyconstraintsduetopreviouslyscheduledinstructions

integrationwithinstructionschedulingwouldbegreat!

!Differentinstructionselectionsmayresultindifferentregisterneed.

!Mutationswithdifferentunitusage[Nicolau,Novack’94]

a=2*bequivalenttoa=b<<1anda=b+b

(4)

INTEGRATEDCODEGENERATION.Page13C.Kessler,IDA,Link¨opingsUniversitet,2002.

REGISTER

ALLOCATION

1.Registerallocation

determinesvalues(variables,temporaries,largeconstants)

tobekeptinregisters

2.Registerassignment

determinesinwhichregisteranallocatedvalueshouldreside

Valuesthatarelivesimultaneouslycannotbekeptinsameregister

interferencegraph

requiresliverangestobeknown:

somefixed(pre-scheduled)linearsequenceofcodegiven!

+local(usagecounts/linearscan)–linearintime

+global(graphcoloring)–NP-complete INTEGRATEDCODEGENERATION.Page14C.Kessler,IDA,Link¨opingsUniversitet,2002.

Global

registerallocationbygraphcoloring

load 8(fp),s1 ! cnopaddi s1,#4,s2

subi s1,#2,s3store s3,12(fp) ! d

muli s1,s2,s4store s4,8(fp) ! c store s2,4(fp) ! i s1

s2

s3

s4 i = c+4;d = c-2;

c = c*i;fp s1

s2

s3

s4 r2 r1r3

colortheinterferencegraphwithRcolors

+NP-completeforR

3[Garey/Johnsson’79]

Heuristics

+degree

R-rule[Chaitin’81]

ifnotR-colorable:spill/coalesce/rematerializeliveranges,tryagain

+optimisticcoloring(postponingspillingdecisions)[Briggs’92]

INTEGRATEDCODEGENERATION.Page15C.Kessler,IDA,Link¨opingsUniversitet,2002.

Global

registerallocationbygraphcoloring

load 8(fp),s1 ! cnopaddi s1,#4,s2

subi s1,#2,s3store s3,12(fp) ! d

muli s1,s2,s4store s4,8(fp) ! c store s2,4(fp) ! i s1

s2

s3

s4 i = c+4;d = c-2;

c = c*i;fp s1

s2

s3

s4 r2 r1r3

colortheinterferencegraphwithRcolors

+NP-completeforR

3[Garey/Johnsson’79]

Heuristics

+degree

R-rule[Chaitin’81]

ifnotR-colorable:spill/coalesce/rematerializeliveranges,tryagain

+optimisticcoloring(postponingspillingdecisions)[Briggs’92] INTEGRATEDCODEGENERATION.Page16C.Kessler,IDA,Link¨opingsUniversitet,2002.

Global

registerallocationbygraphcoloring

subi s1,#2,s3store s3,12(fp) ! d

muli s1,s2,s4store s4,8(fp) ! c addi s1,#4,s2 nopstore s2,4(fp) ! i load 8(fp),s1 ! ci = c+4;d = c-2;c = c*i;fp s1

s2

s3

s4 r2 r1r3 s1

s2

s3

s4

colortheinterferencegraphwithRcolors

+NP-completeforR

3[Garey/Johnsson’79]

Heuristics

+degree

R-rule[Chaitin’81]

ifnotR-colorable:spill/coalesce/rematerializeliveranges,tryagain

+optimisticcoloring(postponingspillingdecisions)[Briggs’92]

(5)

INTEGRATEDCODEGENERATION.Page17C.Kessler,IDA,Link¨opingsUniversitet,2002.

INSTRUCTIONSCHEDULING(1)

localscheduling

+expressiontrees,

+basicblocks(DAGs),

+extendedbasicblocks

loopscheduling

+dependencyanalysis,looptransformation,loopparallelization

+softwarepipelining

globalscheduling

+branchdelayslotfilling

+tracescheduling,percolationscheduling,regionscheduling

+speculation/multithreading INTEGRATEDCODEGENERATION.Page18C.Kessler,IDA,Link¨opingsUniversitet,2002.

INSTRUCTIONSCHEDULING(2)

Optimizationgoals:

+executiontime

+space(registers,stacksize)

+powerconsumption

Techniques:

+precedenceconstraints:dependencegraph

+algorithmsforspecialcases

+heuristics(listscheduling)

+exhaustivesearch/CLP/Branch&Bound/dynamicprogramming

+integerlinearprogramming(ILP)

INTEGRATEDCODEGENERATION.Page19C.Kessler,IDA,Link¨opingsUniversitet,2002.

Local

instructionschedulingandregisterallocation

bcda( , , , ) schedule of G R1R2 abd

S is space-optimal register need

S is time-optimal ? execution time: time(S) = 5

(time(S) < time(S’) for alle S’ of G) c12345 DAG G

a bc S=

regs(S) = 2

(regs(S) < regs(S’) for all S’ of G) bcda register allocation of Stime slot allocation of S dt1 t2t3abcdt3t1t2 INTEGRATEDCODEGENERATION.Page20C.Kessler,IDA,Link¨opingsUniversitet,2002.

Optimization

ProblemsinLocalInstructionScheduling

MRIS–minimumregisterneedinstructionscheduling

+Spilling(store/reload)takesadditionaltime

+Powerconsumptioninembeddedprocs.increaseswith#mem.accesses

+Superscalarprocessorswithshadowregistersandregisterrenaming

compiler-generatedspillcodecannotbeeliminatedatruntime

–NP–complete[Sethi’75]

MTIS–minimumtimeinstructionscheduling

+hidingpipelinedelays

+exploitinginstruction–levelparallelism(forsuperscalar/VLIW)

–NP–complete[Garey/Johnson’79,Gross’83,Lawleretal.’87]

RCMTIS–register–constrainedminimumtimeinstructionscheduling

SMRTIS–simultaneousminimizationofspaceandtime

(6)

INTEGRATEDCODEGENERATION.Page21C.Kessler,IDA,Link¨opingsUniversitet,2002.

Space-optimal

schedulingstrategiesforDAGs

(a)basedondepth–first–searchtraversalofthedependenceDAG

Specialcase:tree:space-opt.scheduleinlineartime[Sethi,Ullman’70]

Specialcase:vectortree(nodesizeattribute):space-opt.O

nlogn

[Rauber’90]

Specialcase:series-par.DAG:space-opt.scheduleinpol.time[G¨uttler’81]

Generally:contiguousschedules

2 n

+Randomdfs[K.,Paul,Rauber’91]+EnumerationwithDCstrategy[K.,Rauber’93,’95] wv1v2 

      

      

      

       



V1V2V 

(b)basedontopologicalsortingoftheDAG

generalschedules



n!

+space-optimal(enumeration+dynamicprogramming)[K.’96,’98]

+time-optimal(enumeration+dynamicprogramming)[K.’00]

+randomlistscheduling[K.’00]

(c)basedonintegerlinearprogramming[Govindarajanetal.’99] INTEGRATEDCODEGENERATION.Page22C.Kessler,IDA,Link¨opingsUniversitet,2002.

List

Scheduling=LocalSchedulingbyTopologicalSorting[Coffman’76]

select

DAG G

z’

scheduled( z ) scheduled( z ) DAG Gz

u v

v u

v topsort

Setz,int[]INDEG,intt



ifz

 /0//

t

n



selectarbitrarynodev

z;

//implicitlyremovealledges

v

u

u: INDEG 

u

 

 INDEG

u

 1where

v

u

INDEG

u

elsewhere

//updatezero-indegreeset:

z 

z

v



newleaves





u:INDEG

u

 0



St

!v;

topsort

z ,INDEG’,t

"1

;

elseoutputS1:n

!fi

Calltopsort

z0,INDEG0

1

producesascheduleinS1:n

!

INTEGRATEDCODEGENERATION.Page23C.Kessler,IDA,Link¨opingsUniversitet,2002.

GreedylistschedulingforVLIWarchitectures

addi FMULSHIFTADD REGISTER FILE

NOPNOP MEM

load

z

DAG G

scheduled( z ) addi

load

greedyheuristic:

fillinonestepasmanyslotsinaVLIWwordaspossible

withreadyinstructionsofthezeroindegreeset. INTEGRATEDCODEGENERATION.Page24C.Kessler,IDA,Link¨opingsUniversitet,2002.

Phase-decoupled

#codegeneration

instruction selection

target-levelinstruction scheduling target-levelinstruction schedulingTargetcode IR instruction scheduling

register allocationtarget-level target-level

register allocation

IR-level

instruction scheduling IR-level

register allocation register allocation

instruction selection

IR-level IR-level

instruction selection

instruction selection

(7)

INTEGRATEDCODEGENERATION.Page25C.Kessler,IDA,Link¨opingsUniversitet,2002.

Phase-decoupled

#codegenerationingcc,lcc,...

instruction selection

target-level

instruction scheduling target-level instruction scheduling

lcc

Targetcode IR

register allocation

target-level

gcc

target-level

register allocation

IR-level

instruction scheduling IR-level

register allocation

instruction scheduling

register allocation instruction selection

IR-level

IR-level instruction selection

instruction selection

INTEGRATEDCODEGENERATION.Page26C.Kessler,IDA,Link¨opingsUniversitet,2002.

Phase

#orderingproblem

Registerallocationbeforescheduling

introducesadditionaldatadependences

lessparallelism/alternatives

Example:reg

t1

 reg

t3

 R1implies

bbeforec”,ascoverwritest1inR1. a bc dR1 R2R1

bacd12345 bacd12345

acbd a bc d

t2t3

t1 delay=1delay=1

S :

S : S :

1234 12 1

Registerallocationafterscheduling

schedulingdeterminesliverangesinterferences

spillcodemustbescheduled

maycompromisequalityofschedule! a = ...b = .....= ..a....= ..b..a = .....= ..a..b = .....= ..b.. abba a

b

a

b

INTEGRATEDCODEGENERATION.Page27C.Kessler,IDA,Link¨opingsUniversitet,2002.

Morephaseorderingproblems:CodegenerationforDSPs

ClusteredVLIWarchitectures,e.g.TIC6201:

.L2.S2.M2.D2.L1.S1.D1 Register file A (A0-A15)Register file B (B0-B15)

.M1 2X1X

Data cache/Data memory Program cache/Program memory

simultaneouslye.g.

loadonAloadonBmoveA$B

+mappinginstructionstoclusters

mayprofitfrominformationaboutfreecopyslotsintheschedule

+instructionscheduling

mustgeneratecopyinstructions

tomatchresidenceofoperandsandinstructions

Heuristic[Leupers’00]:iterativeoptimizationwithsimulatedannealing INTEGRATEDCODEGENERATION.Page28C.Kessler,IDA,Link¨opingsUniversitet,2002.

More

%phaseorderingproblems:CodegenerationforDSPs

Example:HitachiSH3-DSP

A1YX010010MAYX

muladd/substorestore load/load/ M1A1YX0001110MAYX

muladd/substorestore load/load/ M

add + muladd + NOP data pathsdata paths

data pathsdata paths

Residenceconstraintsonconcurrentexecution(load+mul,add+mul,...)

Instructionschedulingandregisterallocationarenotseparable!

Phase-decoupledstandardmethodsgeneratecodeofpoorquality.

(8)

INTEGRATEDCODEGENERATION.Page29C.Kessler,IDA,Link¨opingsUniversitet,2002.

Integratedcodegeneration

instruction selection

target-levelinstruction scheduling instruction scheduling target-level instruction scheduling

Target IR

code

instruction selection target-level

register allocation

IR-level

instruction scheduling IR-level

register allocation register allocation

register allocationtarget-level

IR-level

IR-level instruction selection

instruction selection code generation

integrated

INTEGRATEDCODEGENERATION.Page30C.Kessler,IDA,Link¨opingsUniversitet,2002.

T

owardsintegratedcodegeneration

Integrationofregisterallocationandinstructionscheduling

+quantitativeevaluation[Bradleeetal.’91]

+heuristicsforspace-awarescheduling

[Goodman/Hsu’88],[Freudenberger/Ruttenberg’92],[Pinter’93],...

+integerlinearprogramming

SILP[Zhang’96][K¨astner’97,’00]O

n 2

vars,O

n 2

inequalities

OASIC[Gebotys/Elmasry’92,’93][K¨astner’97,’00]

O

n2

vars,exponential#inequalitiesifregisterallocationintegrated

versatile,butoptimalsolutiononlyforsmallprobleminstances

+graph-based,dynamicprogrammingalgorithm[K.’00][K./Bednarski’01]

INTEGRATEDCODEGENERATION.Page31C.Kessler,IDA,Link¨opingsUniversitet,2002.

Gr

aph-basedmethod:NaiveEnumeration,SelectionTree(1)

vz’ z z=0

{ } set of all leaves of G

selectiontreeT

nodes=inst.ofzero-indegsets

edge

z

z 

,labeledbyv,iff

z 

&&&

selection

v

z

&&&

 alltopsort

Setz,int[]INDEG,intt

ifz

 /0//

tn

forallv

zdo

z 

INDEG’

selection

v

z

INDEG

;

St

!v;

alltopsort

z ,INDEG’,t

"1

;

od

elseoutputS1:n

!;

Callalltopsort

z0,INDEG0

1

enumeratesalltopological

sortingsofDAGG

Runtime:O

n'#enumeratedschedules

 O

n'n!

 INTEGRATEDCODEGENERATION.Page32C.Kessler,IDA,Link¨opingsUniversitet,2002.

Selection

TreeSelectionDAG

level 0

level 2 level 1

a c bca

ec

f h

g

ed

c b

gf

h g

a

dfg egd cdea eda cc

b ac

b

a b

bdc ce ba

ab c

a bd b

aa {b,c}{a,b}

{c,d} {a,c}

{b}{c,d} {a,b,c}

{b}

{} {h} {f} {d}{f,g}{g} {d,g}{e} {d,e}{c}{a,g}{a} {b} {b,c}{a,c}{a,b} {a,b,c}{c,d}{a,e}{a,e}{a,e}

(9)

INTEGRATEDCODEGENERATION.Page33C.Kessler,IDA,Link¨opingsUniversitet,2002.

Selection

TreeSelectionDAG(2)

Lemma:ForfindinganoptimalsolutionofMRIS

itissufficienttostoreforeachselectionnodez

onescheduleSzthatisoptimalforz.

Proof(idea):

Foreachpathπ((prefixscheduleSπ)endinginthesamezero-indegsetzholdsafterexecutionofSπ:

Thesamevaluesresideinregisters,namely

alive

z



u

scheduled

z

:



u

v

E,v

)scheduled

z



HencewemaychooseanypathπS*tozthatminimizestheregisterneedm

S 

.

Byinductionfollows:

ThescheduleS/0storedinselectionnode/0isoptimalforG.

z

INTEGRATEDCODEGENERATION.Page34C.Kessler,IDA,Link¨opingsUniversitet,2002.

Timeprofiles

time profile (t’,P) t

t’ t time

delay slots021 fillable delay slot profile

time profile (t,P) functional units non-fillable delay slot

U ab cd ef-

- gh-

23 -

--

-UU1 gf -gf - gf -

TimeprofileP:windowoftheinstructionsscheduledlastforeachunit

thatmaystillinfluencefutureschedulingdecisions.

Extendedselectionnode

z

t

P

summarizesallschedulesofscheduled

z

thatendwithtimeprofile

t

P

.

Time-inferiorextendedselectionnodescanbepruned.

INTEGRATEDCODEGENERATION.Page35C.Kessler,IDA,Link¨opingsUniversitet,2002.

Register-constr

ainedtimeoptimizationwithtimeprofiles

ac

c bcba

e b

c

f abedda

d a

ee

f e

g c

h f f

g

e h

d

c g

b g gd

g

a a

d

d c

gad (b,c,e,a)4cc (-){d,g}

{d} (e){d,g}

6cc

8cc (a,b,c,e)5cc

5cc (b,c,e,g)

(b,c,e,a,g) (-)(-) {a} (b,a,d)

(a,b,c,d,e) {a,b}

{d,e}

(a,b,c,d)

(b,c,e,a,d,f,g)

{}(-) 4cc (b)

(b,c,e,a,d,g) {d,e}

(-) (b,c,e)4cc (e)

{f} {a,g}

(b,c,e,a,d)5cc (-) (a,c,b)

{f,g} 5cc

(b,c,e,a,d,f)6cc (-) (c)

{g}

{h}

=1delay

=1delay

(b,c,e,a,d,f,g,h) (-)7cc {e}(-)4cc (a,b,c) (-)3cc (-)1cc

{f,g} {a,e}(b,c)

{c} {a,e}(-)2cc (b)3cc

(-) (c,b)

(e) 3cc

6cc (_ )

{c,d}{c,d}{b} ()

2cc

(b)(-)(-) (b)

(a,c)2cc2cc(b,a) 1cc(a) (b) {a,b,c}

(-) 0cc

(a,b) {a,c}{b,c}

3cc

4

5 3 2222

3

5 4 3

4 00

7 6 3 0

11111 -1Theorem:All(prefix)scheduleswithsamezeroind-setandsametimeprofilearecomparable.

Itisthussufficienttokeep,perext.selectionnode,onlyone,locallyoptimal,ofthem.

[K.’00][K./Bednarski’01]

Exampleforsingle-issuepipelinedprocessor INTEGRATEDCODEGENERATION.Page36C.Kessler,IDA,Link¨opingsUniversitet,2002.

Structuringthespaceofpartialsolutions

length time #regs

i kt

Dependencestructure:ByappendinganyDAGnodevtoascheduleS:

time

+S

,-

time

+S./

0v

1

,-

time

+S

,212MAXDELAYregs

+S

,-

regs

+S./

0v

1

,-

regs

+S

,21

3structurespaceofselectionnodesasgridofsubsetsL k

4ti

3changeorderofconstructionoftheL k

4tiaccordingtooptimizationgoal

(10)

INTEGRATEDCODEGENERATION.Page37C.Kessler,IDA,Link¨opingsUniversitet,2002.

Summar

5yofthedynamicprogrammingalgorithm

OptimalsolutionofMRIS,MTIS,RCMTISforpracticalproblemsizes!

Algorithmictechniquesused:

+topologicalsorting(listscheduling)

3randomschedules

+enumerationwithdecisiontree3O

+n6n!

,+dynamicprogramming3O

+n2n

,(exploitingdomain-specificproperties)

+structuringofsearchspaceasgrid,dependenceanalysis,

modificationoforderofconstruction3upto50DAGnodes

+timeprofiles-”-

+time–spaceprofilesforheterogeneousregistersets(currentwork)

Alternativetomethodsbasedonintegerlinearprogramming. INTEGRATEDCODEGENERATION.Page38C.Kessler,IDA,Link¨opingsUniversitet,2002.

Extensions

7time–spaceprofilesfornon-homogeneousregistersets[K./Bednarski’02]

b d

a c ab

ccc aab

dd mov b

mov bmov a mov a

mov b

mov c mov b

not selectablenot selectablenot selectable b

cprune mov a

prune

prune mov a

unit 2 {c}{c}

{d}{d}

{}{} {a,b}

{a}{b}{b}

{c}{c} {a}

{c}

{d} {c}{c}{c}{c}3cc2cc

4cc5cc

5cc6cc 0cc

2cc1cc2cc

3cc2cc 3cc

4cc

5cc 5cc4cc3cc4cc (−)(−,−)

(a)(−,−)(−)(b,−)(mov a)(−,−)(mov b)(−,−)

(mov b)(−,−)(a)(b,−)(mov a)(−,b)(mov b)(−,−)

(mov c)(−,−)(c)(−,−)

(d)(−,−)(d)(−,−) (a)(−,−)

(c)(−,−) (mov a)(−,−)(mov b)(−,−)(mov a)(b,−)(mov a)(−,−)0

34

45 2 0

001

21 2

3

4 4313 {} {}

{a} {}{a} {a}

{a,b} {a,b}{a} {a,b}{a} {b} {} {b}

{a,b} {b}

{b,c} {b}{b,c} {b,c}

{} {}{} {} {b} {b}

{a,b} {b}

{b,d} {b} {a,b} {a,b}{a,b} {a,b}{a} {a,b}{a,b} {a,b}1918

2728

3232 10

2019 10

19

19 4201920 0

99

INTEGRATEDCODEGENERATION.Page39C.Kessler,IDA,Link¨opingsUniversitet,2002.

Conclusion

8andoutlook

+Needforintegratedcodegeneration,especiallyforDSPs

+Considerableresources(time,space)availableforoptimization

+Anoptimalsolutionallowstocheckthequalityoffastheuristics

+Dynamicprogrammingalgorithm:feasiblefor

-50IRoperations

Futurework

+Morepowerfulinstructionselection(forestpatterns3SIMDinstructions)

+Globalcodegeneration

+Improvedretargetability

+QuantitativecomparisonwithILPmethods

OtherDSPfeaturesthatrequireintegratedapproaches:

+memorybankallocationindual-memoryDSPs

+optimizationofdatalayoutw.r.t.addressgenerationunits

References

Related documents

The present study was undertaken to examine whether there are associations between MCP-1 concentrations and folate/Hcy phenotype or methylenetetrahydrofolate reductase ( MTHFR ) 677C

We recently conducted in vivo validation stud- ies for potential neuroendovascular applications with the 0.018-inch Doppler guide wire, which showed that the

A COUPLED MAGNETO-THERMO-ELASTIC PROBLEM IN A PERFECTLY CONDUCTING ELASTIC HALF-SPACE WITH THERMAL

Transgender people are people who identify as a gender other than the one typically associated with the sex they're assigned at birth1. Someone assigned female at birth might

The study has demonstrated that portable mobile advancements and social network communication destinations have been in corporate in youthful people group regular day to

Results: The proposed design shows a more even stress distribution on the bone-implant interface surface, which will reduce the uneven bone remodeling that can lead to

The mean societal costs and effects per person over 12 months for the scales “ information ” , “ work posture and movement ” and “ arm, neck, shoulder symptoms ” show

The purpose of this study was to investigate the release kinetics of Zoledronic acid (ZOL) incorporated in a poly ( D, L -lactide) coating (PDLLA) and its effect on the