INTEGRATEDCODEGENERATION.Page1C.Kessler,IDA,Link¨opingsUniversitet,2002.
INTEGRATEDCODEGENERATION
Introductiontocodegeneration
Instructionselection
Registerallocation
Instructionscheduling
Interdependences
Phaseorderingproblemsforirregulararchitectures(DSPs)
Integratedcodegenerationapproaches
OPTIMIST INTEGRATEDCODEGENERATION.Page2C.Kessler,IDA,Link¨opingsUniversitet,2002.
Compilerstructure
frontend IR1
(AST) CFGgen. IR2
(CFG) select. opt.opt.
program
main
exit basicblock t3=t1+t2x[3]=t3 t1=a+1t2=2*t1 sourceinstruction CODE GENERATION
instruction
allocation schedulingregister(symb.reg.) IR3 opt.opt.assemblerprogram COMPILER
ADD MULSUB ADDCONST
LOADLOAD t2 t8 STORE
CONSTLOAD t9t1
t7
t5t6
t4t3 directed acyclic graph (DAG)
INTEGRATEDCODEGENERATION.Page3C.Kessler,IDA,Link¨opingsUniversitet,2002.
INSTRUCTIONSELECTION
naiveinstructionselection:foreachLIRoperationgenerateasequenceofequivalenttargetinstructionsfp fp+4 fp+8 fp+12
i c d
ASGNI
ADDI
CNSTI { int i; char c; i=c+4; }
4 naive instruction selection,arranged by a postorder traversal:
(needs 7 cycles, 5 sregs) nop ! ", delay slotaddi R0,#4,s4 ! CNSTI ! CVCI load 0(s2),s3 ! INDIRC addi fp,#8,s2 ! ADDRLP(c) addi fp,#4,s1 ! ADDRLP(i)addi s3,s4,s5 ! ADDIstore s5,0(s1) ! ASGNI ADDRLPi
CVCIINDIRC
ADDRLPc INTEGRATEDCODEGENERATION.Page4C.Kessler,IDA,Link¨opingsUniversitet,2002.
InstructionselectionbyTree-pattern-matching(1)
TheLIRhasoftenafinergranularitythanthetargetinstructions!
patternmatching
reg reg
load 8(fp),s1 ! ADDRFP+INDIRC+CVCI ASGNI
ADDI arranged by a postorder traversal:
ADDRLPinop ! ", delay slot { int i; char c; i=c+4; }pattern-matching instruction selection,
4 CNSTICVCI
INDIRC
ADDRLPc addi s1,#4,s2 ! CNSTI+ADDIstore s2,4(fp) ! ADDRLP(i)+ASGNI
(needs 4 cycles, 2 sregs)
INTEGRATEDCODEGENERATION.Page5C.Kessler,IDA,Link¨opingsUniversitet,2002.
InstructionselectionbyTree-pattern-matching
ThetargetprocessorisdescribedbyatreegrammarG
N
T
s
P
nonterminalsN
stmt,reg,con,addr,mem,...
(startsymbolisstmt)
terminalsT=
CNSTI,ADDRLP,...
productionrulesP:
regADDI(reg,con)addi%r,%c,%r1regADDI(reg,reg)addi%r,%r,%r1stmtASGNI(addr,reg)store%r,%a1stmtASGNI(reg,reg)store%r,0(%r)1regADDRLPaddifp,#%d,%r1addrADDRLP%d(fp)0regaddraddi%a,%r1regINDIRC(addr)load%a,%r;nop2regINDIRC(reg)load0(%r),%r;nop2reg
CVCI(INDIRC(addr))load%a,%r;nop2reg
CVCI(INDIRC(reg))load0(%r),%r;nop2conCNSTI%d0regconaddiR0,#%c,%r1 INTEGRATEDCODEGENERATION.Page6C.Kessler,IDA,Link¨opingsUniversitet,2002.
InstructionselectionbyTree-pattern-matching
Findaleast-costderivationoftheLIRtree
ASGNI
ADDI
CVCIINDIRC
addr CNSTI 00
ADDRLP ASGNI
ADDI
CVCIINDIRC
ADDRLP CNSTI ASGNI
ADDI
CVCIINDIRC
ADDRLP addr
CNSTI addr
stmt 1ASGNIASGNI 1ASGNI
ADDIADDI
conreg addraddraddrreg
regCNSTI 02
INTEGRATEDCODEGENERATION.Page7C.Kessler,IDA,Link¨opingsUniversitet,2002.
T
reepatternmatchingbydynamicprogramming(example:IBURG)
ASGNI
INDIRC
ADDRLPc ADDI
CNSTI i ADDRLP
CVCI reg -> ADDRLP1 reg -> addr addr -> ADDRLP01
Dynamicprogrammingbottom-uptreepatternmatching
TWIG[Aho/Ganapathi/Tjiang’89],IBURG[Fraser/Hanson/Proebsting’92]
alternatively:treeparsingusingLR-parsing[Graham/Glanville’78] INTEGRATEDCODEGENERATION.Page8C.Kessler,IDA,Link¨opingsUniversitet,2002.
T
reepatternmatchingbydynamicprogramming(example:IBURG)
ASGNI
INDIRC
ADDRLPc ADDI
CNSTI
addr -> ADDRLPreg -> addr 01 i ADDRLP
CVCI
1reg -> ADDRLP reg -> ADDRLP1 reg -> addr addr -> ADDRLP01
Dynamicprogrammingbottom-uptreepatternmatching
TWIG[Aho/Ganapathi/Tjiang’89],IBURG[Fraser/Hanson/Proebsting’92]
alternatively:treeparsingusingLR-parsing[Graham/Glanville’78]
INTEGRATEDCODEGENERATION.Page9C.Kessler,IDA,Link¨opingsUniversitet,2002.
T
reepatternmatchingbydynamicprogramming(example:IBURG)
ASGNI
INDIRC
ADDRLPc ADDI
CNSTI
reg -> INDIRC(addr)reg -> INDIRC(reg)
addr -> ADDRLPreg -> addr 01 2+0=22+1=3 i ADDRLP
CVCI
1reg -> ADDRLP reg -> ADDRLP1 reg -> addr addr -> ADDRLP01
Dynamicprogrammingbottom-uptreepatternmatching
TWIG[Aho/Ganapathi/Tjiang’89],IBURG[Fraser/Hanson/Proebsting’92]
alternatively:treeparsingusingLR-parsing[Graham/Glanville’78] INTEGRATEDCODEGENERATION.Page10C.Kessler,IDA,Link¨opingsUniversitet,2002.
T
reepatternmatchingbydynamicprogramming(example:IBURG)
ASGNI
INDIRC
ADDRLPc ADDI
CNSTIreg -> CVCI(INDIRC(addr)reg -> CVCI(INDIRC(reg))
reg -> INDIRC(addr)reg -> INDIRC(reg)
addr -> ADDRLPreg -> addr con -> CNSTIreg -> con
01 23 01
2+0=22+1=3 i ADDRLP
CVCI
1reg -> ADDRLP reg -> ADDRLP1 reg -> addr addr -> ADDRLP01
Dynamicprogrammingbottom-uptreepatternmatching
TWIG[Aho/Ganapathi/Tjiang’89],IBURG[Fraser/Hanson/Proebsting’92]
alternatively:treeparsingusingLR-parsing[Graham/Glanville’78]
INTEGRATEDCODEGENERATION.Page11C.Kessler,IDA,Link¨opingsUniversitet,2002.
T
reepatternmatchingbydynamicprogramming(example:IBURG)
ASGNI
INDIRC
ADDRLPc ADDI
CNSTIreg -> CVCI(INDIRC(addr)reg -> CVCI(INDIRC(reg))
reg -> INDIRC(addr)reg -> INDIRC(reg)
addr -> ADDRLPreg -> addr con -> CNSTIreg -> con reg -> ADDI(reg,reg) reg -> ADDI(reg,con) stmt -> ASGNI(addr,reg)stmt -> ASGNI(reg,reg)
01 23 01 1+2+0=31+2+1=4
2+0=22+1=3 1+0+3=41+1+3=5
i ADDRLP
CVCI
1reg -> ADDRLP reg -> ADDRLP1 reg -> addr addr -> ADDRLP01
Dynamicprogrammingbottom-uptreepatternmatching
TWIG[Aho/Ganapathi/Tjiang’89],IBURG[Fraser/Hanson/Proebsting’92]
alternatively:treeparsingusingLR-parsing[Graham/Glanville’78] INTEGRATEDCODEGENERATION.Page12C.Kessler,IDA,Link¨opingsUniversitet,2002.
Conflictswithinstructionschedulingandregisterallocation
!Thecostattributeofaproductionisonlyaroughestimate.
Theactualimpactonexecutiontimeisonlyknownforagiven
schedulingsituation:
+currentlyfreefunctionalunits
+otherinstructionsthatmaybeexecutedsimultaneously
+latencyconstraintsduetopreviouslyscheduledinstructions
integrationwithinstructionschedulingwouldbegreat!
!Differentinstructionselectionsmayresultindifferentregisterneed.
!Mutationswithdifferentunitusage[Nicolau,Novack’94]
a=2*bequivalenttoa=b<<1anda=b+b
INTEGRATEDCODEGENERATION.Page13C.Kessler,IDA,Link¨opingsUniversitet,2002.
REGISTER
ALLOCATION
1.Registerallocation
determinesvalues(variables,temporaries,largeconstants)
tobekeptinregisters
2.Registerassignment
determinesinwhichregisteranallocatedvalueshouldreside
Valuesthatarelivesimultaneouslycannotbekeptinsameregister
interferencegraph
requiresliverangestobeknown:
somefixed(pre-scheduled)linearsequenceofcodegiven!
+local(usagecounts/linearscan)–linearintime
+global(graphcoloring)–NP-complete INTEGRATEDCODEGENERATION.Page14C.Kessler,IDA,Link¨opingsUniversitet,2002.
Global
registerallocationbygraphcoloring
load 8(fp),s1 ! cnopaddi s1,#4,s2
subi s1,#2,s3store s3,12(fp) ! d
muli s1,s2,s4store s4,8(fp) ! c store s2,4(fp) ! i s1
s2
s3
s4 i = c+4;d = c-2;
c = c*i;fp s1
s2
s3
s4 r2 r1r3
colortheinterferencegraphwithRcolors
+NP-completeforR
3[Garey/Johnsson’79]
Heuristics
+degree
R-rule[Chaitin’81]
ifnotR-colorable:spill/coalesce/rematerializeliveranges,tryagain
+optimisticcoloring(postponingspillingdecisions)[Briggs’92]
INTEGRATEDCODEGENERATION.Page15C.Kessler,IDA,Link¨opingsUniversitet,2002.
Global
registerallocationbygraphcoloring
load 8(fp),s1 ! cnopaddi s1,#4,s2
subi s1,#2,s3store s3,12(fp) ! d
muli s1,s2,s4store s4,8(fp) ! c store s2,4(fp) ! i s1
s2
s3
s4 i = c+4;d = c-2;
c = c*i;fp s1
s2
s3
s4 r2 r1r3
colortheinterferencegraphwithRcolors
+NP-completeforR
3[Garey/Johnsson’79]
Heuristics
+degree
R-rule[Chaitin’81]
ifnotR-colorable:spill/coalesce/rematerializeliveranges,tryagain
+optimisticcoloring(postponingspillingdecisions)[Briggs’92] INTEGRATEDCODEGENERATION.Page16C.Kessler,IDA,Link¨opingsUniversitet,2002.
Global
registerallocationbygraphcoloring
subi s1,#2,s3store s3,12(fp) ! d
muli s1,s2,s4store s4,8(fp) ! c addi s1,#4,s2 nopstore s2,4(fp) ! i load 8(fp),s1 ! ci = c+4;d = c-2;c = c*i;fp s1
s2
s3
s4 r2 r1r3 s1
s2
s3
s4
colortheinterferencegraphwithRcolors
+NP-completeforR
3[Garey/Johnsson’79]
Heuristics
+degree
R-rule[Chaitin’81]
ifnotR-colorable:spill/coalesce/rematerializeliveranges,tryagain
+optimisticcoloring(postponingspillingdecisions)[Briggs’92]
INTEGRATEDCODEGENERATION.Page17C.Kessler,IDA,Link¨opingsUniversitet,2002.
INSTRUCTIONSCHEDULING(1)
localscheduling
+expressiontrees,
+basicblocks(DAGs),
+extendedbasicblocks
loopscheduling
+dependencyanalysis,looptransformation,loopparallelization
+softwarepipelining
globalscheduling
+branchdelayslotfilling
+tracescheduling,percolationscheduling,regionscheduling
+speculation/multithreading INTEGRATEDCODEGENERATION.Page18C.Kessler,IDA,Link¨opingsUniversitet,2002.
INSTRUCTIONSCHEDULING(2)
Optimizationgoals:
+executiontime
+space(registers,stacksize)
+powerconsumption
Techniques:
+precedenceconstraints:dependencegraph
+algorithmsforspecialcases
+heuristics(listscheduling)
+exhaustivesearch/CLP/Branch&Bound/dynamicprogramming
+integerlinearprogramming(ILP)
INTEGRATEDCODEGENERATION.Page19C.Kessler,IDA,Link¨opingsUniversitet,2002.
Local
instructionschedulingandregisterallocation
bcda( , , , ) schedule of G R1R2 abd
S is space-optimal register need
S is time-optimal ? execution time: time(S) = 5
(time(S) < time(S’) for alle S’ of G) c12345 DAG G
a bc S=
regs(S) = 2
(regs(S) < regs(S’) for all S’ of G) bcda register allocation of Stime slot allocation of S dt1 t2t3abcdt3t1t2 INTEGRATEDCODEGENERATION.Page20C.Kessler,IDA,Link¨opingsUniversitet,2002.
Optimization
ProblemsinLocalInstructionScheduling
MRIS–minimumregisterneedinstructionscheduling
+Spilling(store/reload)takesadditionaltime
+Powerconsumptioninembeddedprocs.increaseswith#mem.accesses
+Superscalarprocessorswithshadowregistersandregisterrenaming
compiler-generatedspillcodecannotbeeliminatedatruntime
–NP–complete[Sethi’75]
MTIS–minimumtimeinstructionscheduling
+hidingpipelinedelays
+exploitinginstruction–levelparallelism(forsuperscalar/VLIW)
–NP–complete[Garey/Johnson’79,Gross’83,Lawleretal.’87]
RCMTIS–register–constrainedminimumtimeinstructionscheduling
SMRTIS–simultaneousminimizationofspaceandtime
INTEGRATEDCODEGENERATION.Page21C.Kessler,IDA,Link¨opingsUniversitet,2002.
Space-optimal
schedulingstrategiesforDAGs
(a)basedondepth–first–searchtraversalofthedependenceDAG
Specialcase:tree:space-opt.scheduleinlineartime[Sethi,Ullman’70]
Specialcase:vectortree(nodesizeattribute):space-opt.O
nlogn
[Rauber’90]
Specialcase:series-par.DAG:space-opt.scheduleinpol.time[G¨uttler’81]
Generally:contiguousschedules
2 n
+Randomdfs[K.,Paul,Rauber’91]+EnumerationwithDCstrategy[K.,Rauber’93,’95] wv1v2
V1V2V
(b)basedontopologicalsortingoftheDAG
generalschedules
n!
+space-optimal(enumeration+dynamicprogramming)[K.’96,’98]
+time-optimal(enumeration+dynamicprogramming)[K.’00]
+randomlistscheduling[K.’00]
(c)basedonintegerlinearprogramming[Govindarajanetal.’99] INTEGRATEDCODEGENERATION.Page22C.Kessler,IDA,Link¨opingsUniversitet,2002.
List
Scheduling=LocalSchedulingbyTopologicalSorting[Coffman’76]
select
DAG G
z’
scheduled( z ) scheduled( z ) DAG Gz
u v
v u
v topsort
Setz,int[]INDEG,intt
ifz
/0//
t
n
selectarbitrarynodev
z;
//implicitlyremovealledges
v
u
u: INDEG
u
INDEG
u
1where
v
u
INDEG
u
elsewhere
//updatezero-indegreeset:
z
z
v
newleaves
u:INDEG
u
0
St
!v;
topsort
z ,INDEG’,t
"1
;
elseoutputS1:n
!fi
Calltopsort
z0,INDEG0
1
producesascheduleinS1:n
!
INTEGRATEDCODEGENERATION.Page23C.Kessler,IDA,Link¨opingsUniversitet,2002.
GreedylistschedulingforVLIWarchitectures
addi FMULSHIFTADD REGISTER FILE
NOPNOP MEM
load
z
DAG Gscheduled( z ) addi
load
greedyheuristic:
fillinonestepasmanyslotsinaVLIWwordaspossible
withreadyinstructionsofthezeroindegreeset. INTEGRATEDCODEGENERATION.Page24C.Kessler,IDA,Link¨opingsUniversitet,2002.
Phase-decoupled
#codegeneration
instruction selection
target-levelinstruction scheduling target-levelinstruction schedulingTargetcode IR instruction scheduling
register allocationtarget-level target-level
register allocation
IR-level
instruction scheduling IR-level
register allocation register allocation
instruction selection
IR-level IR-level
instruction selection
instruction selection
INTEGRATEDCODEGENERATION.Page25C.Kessler,IDA,Link¨opingsUniversitet,2002.
Phase-decoupled
#codegenerationingcc,lcc,...
instruction selection
target-level
instruction scheduling target-level instruction scheduling
lcc
Targetcode IRregister allocation
target-level
gcc
target-level
register allocation
IR-level
instruction scheduling IR-level
register allocation
instruction scheduling
register allocation instruction selection
IR-level
IR-level instruction selection
instruction selection
INTEGRATEDCODEGENERATION.Page26C.Kessler,IDA,Link¨opingsUniversitet,2002.
Phase
#orderingproblem
Registerallocationbeforescheduling
introducesadditionaldatadependences
lessparallelism/alternatives
Example:reg
t1
reg
t3
R1implies
“bbeforec”,ascoverwritest1inR1. a bc dR1 R2R1
bacd12345 bacd12345
acbd a bc d
t2t3
t1 delay=1delay=1
S :
S : S :
1234 12 1
Registerallocationafterscheduling
schedulingdeterminesliverangesinterferences
spillcodemustbescheduled
maycompromisequalityofschedule! a = ...b = .....= ..a....= ..b..a = .....= ..a..b = .....= ..b.. abba a
b
a
b
INTEGRATEDCODEGENERATION.Page27C.Kessler,IDA,Link¨opingsUniversitet,2002.
Morephaseorderingproblems:CodegenerationforDSPs
ClusteredVLIWarchitectures,e.g.TIC6201:
.L2.S2.M2.D2.L1.S1.D1 Register file A (A0-A15)Register file B (B0-B15)
.M1 2X1X
Data cache/Data memory Program cache/Program memory
simultaneouslye.g.
loadonAloadonBmoveA$B
+mappinginstructionstoclusters
mayprofitfrominformationaboutfreecopyslotsintheschedule
+instructionscheduling
mustgeneratecopyinstructions
tomatchresidenceofoperandsandinstructions
Heuristic[Leupers’00]:iterativeoptimizationwithsimulatedannealing INTEGRATEDCODEGENERATION.Page28C.Kessler,IDA,Link¨opingsUniversitet,2002.
More
%phaseorderingproblems:CodegenerationforDSPs
Example:HitachiSH3-DSP
A1YX010010MAYX
muladd/substorestore load/load/ M1A1YX0001110MAYX
muladd/substorestore load/load/ M
add + muladd + NOP data pathsdata paths
data pathsdata paths
Residenceconstraintsonconcurrentexecution(load+mul,add+mul,...)
Instructionschedulingandregisterallocationarenotseparable!
Phase-decoupledstandardmethodsgeneratecodeofpoorquality.
INTEGRATEDCODEGENERATION.Page29C.Kessler,IDA,Link¨opingsUniversitet,2002.
Integratedcodegeneration
instruction selection
target-levelinstruction scheduling instruction scheduling target-level instruction scheduling
Target IR
code
instruction selection target-level
register allocation
IR-level
instruction scheduling IR-level
register allocation register allocation
register allocationtarget-level
IR-level
IR-level instruction selection
instruction selection code generation
integrated
INTEGRATEDCODEGENERATION.Page30C.Kessler,IDA,Link¨opingsUniversitet,2002.
T
owardsintegratedcodegeneration
Integrationofregisterallocationandinstructionscheduling
+quantitativeevaluation[Bradleeetal.’91]
+heuristicsforspace-awarescheduling
[Goodman/Hsu’88],[Freudenberger/Ruttenberg’92],[Pinter’93],...
+integerlinearprogramming
SILP[Zhang’96][K¨astner’97,’00]O
n 2
vars,O
n 2
inequalities
OASIC[Gebotys/Elmasry’92,’93][K¨astner’97,’00]
O
n2
vars,exponential#inequalitiesifregisterallocationintegrated
versatile,butoptimalsolutiononlyforsmallprobleminstances
+graph-based,dynamicprogrammingalgorithm[K.’00][K./Bednarski’01]
INTEGRATEDCODEGENERATION.Page31C.Kessler,IDA,Link¨opingsUniversitet,2002.
Gr
aph-basedmethod:NaiveEnumeration,SelectionTree(1)
vz’ z z=0
{ } set of all leaves of G
selectiontreeT
nodes=inst.ofzero-indegsets
edge
z
z
,labeledbyv,iff
z
&&&
selection
v
z
&&&
alltopsort
Setz,int[]INDEG,intt
ifz
/0//
tn
forallv
zdo
z
INDEG’
selection
v
z
INDEG
;
St
!v;
alltopsort
z ,INDEG’,t
"1
;
od
elseoutputS1:n
!;
Callalltopsort
z0,INDEG0
1
enumeratesalltopological
sortingsofDAGG
Runtime:O
n'#enumeratedschedules
O
n'n!
INTEGRATEDCODEGENERATION.Page32C.Kessler,IDA,Link¨opingsUniversitet,2002.
Selection
TreeSelectionDAG
level 0
level 2 level 1
a c bca
ec
f h
g
ed
c b
gf
h g
a
dfg egd cdea eda cc
b ac
b
a b
bdc ce ba
ab c
a bd b
aa {b,c}{a,b}
{c,d} {a,c}
{b}{c,d} {a,b,c}
{b}
{} {h} {f} {d}{f,g}{g} {d,g}{e} {d,e}{c}{a,g}{a} {b} {b,c}{a,c}{a,b} {a,b,c}{c,d}{a,e}{a,e}{a,e}
INTEGRATEDCODEGENERATION.Page33C.Kessler,IDA,Link¨opingsUniversitet,2002.
Selection
TreeSelectionDAG(2)
Lemma:ForfindinganoptimalsolutionofMRIS
itissufficienttostoreforeachselectionnodez
onescheduleSzthatisoptimalforz.
Proof(idea):
Foreachpathπ((prefixscheduleSπ)endinginthesamezero-indegsetzholdsafterexecutionofSπ:
Thesamevaluesresideinregisters,namely
alive
z
u
scheduled
z
:
u
v
E,v
)scheduled
z
HencewemaychooseanypathπS*tozthatminimizestheregisterneedm
S
.
Byinductionfollows:
ThescheduleS/0storedinselectionnode/0isoptimalforG.
z
INTEGRATEDCODEGENERATION.Page34C.Kessler,IDA,Link¨opingsUniversitet,2002.Timeprofiles
time profile (t’,P) t
t’ t time
delay slots021 fillable delay slot profile
time profile (t,P) functional units non-fillable delay slot
U ab cd ef-
- gh-
23 -
--
-UU1 gf -gf - gf -
TimeprofileP:windowoftheinstructionsscheduledlastforeachunit
thatmaystillinfluencefutureschedulingdecisions.
Extendedselectionnode
z
t
P
summarizesallschedulesofscheduled
z
thatendwithtimeprofile
t
P
.
Time-inferiorextendedselectionnodescanbepruned.
INTEGRATEDCODEGENERATION.Page35C.Kessler,IDA,Link¨opingsUniversitet,2002.
Register-constr
ainedtimeoptimizationwithtimeprofiles
ac
c bcba
e b
c
f abedda
d a
ee
f e
g c
h f f
g
e h
d
c g
b g gd
g
a a
d
d c
gad (b,c,e,a)4cc (-){d,g}
{d} (e){d,g}
6cc
8cc (a,b,c,e)5cc
5cc (b,c,e,g)
(b,c,e,a,g) (-)(-) {a} (b,a,d)
(a,b,c,d,e) {a,b}
{d,e}
(a,b,c,d)
(b,c,e,a,d,f,g)
{}(-) 4cc (b)
(b,c,e,a,d,g) {d,e}
(-) (b,c,e)4cc (e)
{f} {a,g}
(b,c,e,a,d)5cc (-) (a,c,b)
{f,g} 5cc
(b,c,e,a,d,f)6cc (-) (c)
{g}
{h}
=1delay
=1delay
(b,c,e,a,d,f,g,h) (-)7cc {e}(-)4cc (a,b,c) (-)3cc (-)1cc
{f,g} {a,e}(b,c)
{c} {a,e}(-)2cc (b)3cc
(-) (c,b)
(e) 3cc
6cc (_ )
{c,d}{c,d}{b} ()
2cc
(b)(-)(-) (b)
(a,c)2cc2cc(b,a) 1cc(a) (b) {a,b,c}
(-) 0cc
(a,b) {a,c}{b,c}
3cc
4
5 3 2222
3
5 4 3
4 00
7 6 3 0
11111 -1Theorem:All(prefix)scheduleswithsamezeroind-setandsametimeprofilearecomparable.
Itisthussufficienttokeep,perext.selectionnode,onlyone,locallyoptimal,ofthem.
[K.’00][K./Bednarski’01]
Exampleforsingle-issuepipelinedprocessor INTEGRATEDCODEGENERATION.Page36C.Kessler,IDA,Link¨opingsUniversitet,2002.
Structuringthespaceofpartialsolutions
length time #regs
i kt
Dependencestructure:ByappendinganyDAGnodevtoascheduleS:
time
+S
,-
time
+S./
0v
1
,-
time
+S
,212MAXDELAYregs
+S
,-
regs
+S./
0v
1
,-
regs
+S
,21
3structurespaceofselectionnodesasgridofsubsetsL k
4ti
3changeorderofconstructionoftheL k
4tiaccordingtooptimizationgoal
INTEGRATEDCODEGENERATION.Page37C.Kessler,IDA,Link¨opingsUniversitet,2002.
Summar
5yofthedynamicprogrammingalgorithm
OptimalsolutionofMRIS,MTIS,RCMTISforpracticalproblemsizes!
Algorithmictechniquesused:
+topologicalsorting(listscheduling)
3randomschedules
+enumerationwithdecisiontree3O
+n6n!
,+dynamicprogramming3O
+n2n
,(exploitingdomain-specificproperties)
+structuringofsearchspaceasgrid,dependenceanalysis,
modificationoforderofconstruction3upto50DAGnodes
+timeprofiles-”-
+time–spaceprofilesforheterogeneousregistersets(currentwork)
Alternativetomethodsbasedonintegerlinearprogramming. INTEGRATEDCODEGENERATION.Page38C.Kessler,IDA,Link¨opingsUniversitet,2002.
Extensions
7time–spaceprofilesfornon-homogeneousregistersets[K./Bednarski’02]
b d
a c ab
ccc aab
dd mov b
mov bmov a mov a
mov b
mov c mov b
not selectablenot selectablenot selectable b
cprune mov a
prune
prune mov a
unit 2 {c}{c}
{d}{d}
{}{} {a,b}
{a}{b}{b}
{c}{c} {a}
{c}
{d} {c}{c}{c}{c}3cc2cc
4cc5cc
5cc6cc 0cc
2cc1cc2cc
3cc2cc 3cc
4cc
5cc 5cc4cc3cc4cc (−)(−,−)
(a)(−,−)(−)(b,−)(mov a)(−,−)(mov b)(−,−)
(mov b)(−,−)(a)(b,−)(mov a)(−,b)(mov b)(−,−)
(mov c)(−,−)(c)(−,−)
(d)(−,−)(d)(−,−) (a)(−,−)
(c)(−,−) (mov a)(−,−)(mov b)(−,−)(mov a)(b,−)(mov a)(−,−)0
34
45 2 0
001
21 2
3
4 4313 {} {}
{a} {}{a} {a}
{a,b} {a,b}{a} {a,b}{a} {b} {} {b}
{a,b} {b}
{b,c} {b}{b,c} {b,c}
{} {}{} {} {b} {b}
{a,b} {b}
{b,d} {b} {a,b} {a,b}{a,b} {a,b}{a} {a,b}{a,b} {a,b}1918
2728
3232 10
2019 10
19
19 4201920 0
99
INTEGRATEDCODEGENERATION.Page39C.Kessler,IDA,Link¨opingsUniversitet,2002.
Conclusion
8andoutlook
+Needforintegratedcodegeneration,especiallyforDSPs
+Considerableresources(time,space)availableforoptimization
+Anoptimalsolutionallowstocheckthequalityoffastheuristics
+Dynamicprogrammingalgorithm:feasiblefor
-50IRoperations
Futurework
+Morepowerfulinstructionselection(forestpatterns3SIMDinstructions)
+Globalcodegeneration
+Improvedretargetability
+QuantitativecomparisonwithILPmethods
OtherDSPfeaturesthatrequireintegratedapproaches:
+memorybankallocationindual-memoryDSPs
+optimizationofdatalayoutw.r.t.addressgenerationunits