The GEM code generator matches code templates to sections of IL trees.9 The code generator has a set of approximately 600 code patterns and uses dynamic programming to guide the selection of a least-cost covering for each statement tree in the IL graph pro
duced by the global optimizer.
Each code pattern specifies a set of interpretive code-generation actions to be appl ied if the tem plate is selected . The code-generation actions cre ate temporaries, determine their l ifetimes, allocate registers and stack locations, and actually emit sequences of i nstructions. These actions are applied during the fol lowing fou r separate code generation passes over the IL graph for a procedure:
• Context . Dur i ng t he con text pass, the code gen
erator creates data stru ctures that describe each temporary variable. The i n formation com p u ted incl udes the l i fe time, usage cou n ts, a mi a weight sca led by loop depth.
• Register history. During the register h istory pass, the code generator does a dominator-order wa lk of the f low graph to ident ify potential red u ndant loads of val ues that cou ld be ava i lable in registers.
• Te m p name. During the temp name pass. the code ge nerato r performs register a l location using the l i fetime and weight i n formation com pu ted d u ring the con text pass. Tile code genera tor also uses register history to al locate temporaries that hold tht· same va l ue in the same register. If su ccessful, this action e l i m in a tes load and m ove i nstructions.
• Code. D ur i ng the code pass, the code generator
emits in structions a nd code labels. The resul ting code cel l s are an i n ternal represe ntation at the assembly code level. Each code cel l contains a single target machine i nstruction. The code cells have srecific registers a nd bound offsets from base registers. References to labels in the code stream are in a symbolic form, pending fur ther optimization and final offset assignment after i nstruction peephole optim izat ion and instruc tion sche(l u l i ng.
Template Matching and Result Modes Code templ ate enu meration and selection occurs d u ring the con text pass The enu meration phase scans IL nodes i n exec u t ion order (bot to m - u p) and labels each node with a lternative p:Hterns and costs. When a root node such as a store or branch tuple is reached, the lowest- cost template for that n ode i s selected . The selec tion process is then appl ied rec ur sively to the leaves fo r the entire tree . 10
The
lL
t ree pat tern of a code-generation template consists of t<m r pieces of i n formation:• A pattern tree that describes the a rrangement of II. nodes that can be coded by this templa te. The interior nodes of the pattern tree are I L opera tors; the leaves are either res u l t m o de sets or I L
operators w i t h no operands.
• A predicate on the tree nodes of the pat tern. The
predicate must be true in orde r fo r the pattern ro
be applicable.
1 2H
• A resu l t mode that encodes the representat ion of a va l ue comp uted by the template's generated code.
• An i n teger that represents tbe cost of the code generated by t his template.
The resu l t modes are an enu meration of the dif fe rent ways the com pi ler can represe n t a va lue i n t he machin e . 1 1 GEM compilers use the fol lowi ng result modes:
• Sca lar, for a value, negated va lue. and comple
me nted value
• Boolea n, h>r low-bit, high-bit, and nonzero values • Fl ow, for a Boolean re pres e nted as control flow
• Resu l t modes for different sizes of i nteger l i terals • Res ult modes for delaye(i generati o n of address
ing calculations
• Result modes i n dicating that only a part of a
va lue has been materia.l ized, i . e . , the low byte, or that the material ized value has used a lower-cost solution
As templates a re matched to portions of the
IL
tree, each node is l abeled with a vector of possible solu tions. The vector is i ndexed by resu l t mode, and the lowest-cost solution for each resu l t m ode is reconled on the forward bottom-up walk. When a root node is encou ntered, the lowest-cost template in its vector of solutions is chosen. This choice then determi nes the requi red resu l t m ode and sol u tion for each leaf of the pattern, recursively.GEM Code Generator Action Language The <; E.VJ code generator uses and extends methods developed i n the BUSS compi lers, the Carnegie Mellon Un iversity Prod uction-Qual ity Compiler Com p i ler Project, and D igital 's VAX Pascal
compiler. 1 ! 1' One key CEM i n nova tion is the use of
a forma l i zed action language to give a u nified descrip tion of a l l actions performed in the fou r code-ge nera tion passes. The s a m e formal action descript ions are interpreted by four differen t i nter preters For example, the Al locate_TN action is used ro a l locate long- lived temporaries that may be in a register or in memory. This action creates a data structure desc ribing the tem porary in the con text pass, al locates a register d ur ing the temp n ame pass. and provides the actual temporary location for code emission.
Tree-matching code generators were originally developed for complex instruction set computer (CISC) machines, like the PDP-11 and VAX compu t ers. The technique is also an effective way to build a retargetable compiler system for current RISC architectures. The overal l code-generation struc ture and many of the actions are target indepen dent. Some IL trees use simple, general code patterns, whereas special cases use more elaborate patterns and resu lt modes.