procedure Uniform(s) returns boolean s : in integer
begin
u, v, x: Vocab item: Item
for each item e ItemSet[s] (item.pos ^ I item.rtI) do if item.pos * 0 then
x := Parent(item.rtlitem.pos,item.rt) if Left_Child(item.rtlitem.pos,item.rt) then
for each u e Vocab do
if Action(s,u) = <Error,0>
& (Left_First(x,u) V Right_First(x,u)) then return false fi od fi fi od return true end I I Uniform
procedure Closure(S) returns set of Item S: in set of Item
begin
OldS: set of Item item, s: Item repeat
I I compute the set of items [x => a v #/J] I | such that [x => is in S OldS := S
5 u= {item e Item where 3s e S (s.pos < Is.rtl & s.(s.pos+l) e Nonterminal
& item.lt = s.(s.pos+l) & item.pos = 0)} until S = OldS
return S end |I Closure
procedure Advance(S) returns set of Item S: in set of Item
begin
s, item: Item
I| advance the dot one position in each item I I that does not have the dot at its right end return {item e Item where 3s e S (item.lt = s.lt
6 s.pos ^ Is.rtl & item.rt = s.rt & item.pos = s.pos+1)}
end I I Advance
F IG . 6 .7 The functions U n ifo rm ( ) , C l o s u r e ( ) , and A dvan ce( ) used in constructing the S L R (l) p arsin g tables.
Section 6.2 A Syntax-Directed Technique 147
property called uniformity defined as follows: a set of rules is uniform if and only if any left operand of a binary operator is a valid left operand of that operator in any Polish-prefix string containing the operator, and similarly for right operands and operands of unary operators. To be suitable for Graham-Glanville code generation, the set of rules must be uniform. The procedure Uniform( ) uses two functions that are defined below, namely, Parent ( ) and Left_Child( ) and two functions that are defined from the relations discussed in the next two paragraphs, namely,
Left.First (x ,u ) and Right .First (x,w), which return true if x Left First u and x Right First u, respectively, and false otherwise.
The function
Parent: Vocab x Prefix — > Vocab
where Prefix is the set of prefixes of sentences generated by the rules, returns the parent of its first argument in the tree form of its second argument, and the function
Left_Cbild : Vocab x Prefix — > boolean
returns a Boolean value indicating whether its first argument is the leftmost child of its parent in the tree form of its second argument.
The relations used in the algorithm and in defining the functions and relations used in it are as follows (where BinOp and UnOp are the sets of binary and unary operator symbols and the lowercase Greek letters other than € stand for strings of symbols, respectively):
Left c (BinOp U UnOp) x Vocab
x Left y if and only if there is a rule r => axyfi, where r may be e. Right c BinOp x Vocab
x Right y if and only if there is a rule r => otxfiyy for some p / e, where r may be c. First c Vocab x Vocab
x First y if and only if there is a derivation x ^ ya, where a may be c. Last c Vocab x Vocab
x Last y if and only if there is a derivation x ay, where a may be e. EpsLast c Vocab
EpsLast = {x | 3 a rule c => ay and y Last x). RootOps c BinOp U UnOp
RootOps = [x | 3 a rule c =» x a and x e BinOp U UnOp}.
The function Follow : Vocab — > Vocab can be defined from the auxiliary function Follow 1 : Vocab — ► Vocab and sets EpsLast and RootOps as follows:
Followl{u) = [v | 3 a rule r =» axyfi such that x Last u and y First v} Follow(u) = Follow\(u) U RootOps
Follow\(u)
if u e EpsLast otherwise
148 Producing Code Generators A utom atically r.2 <- r.l r .3 <- r . l + r .2 r.3 <- r.l + k.2 r .2 <- [r.l] [r2] <- r.l (a) r .2 => r.l r.3 => + r.l r .2 r.3 => + r.l k.2 r.2 => t r.l € => <- r.2 r.l
(b)
or r.1,0,r.2 add r .1,r .2,r .3 add r .1,k .2,r .3 Id [r.l],r.2 st r.l,[r.2] (C)FIG. 6.8 (a) l i r instructions, (b) Graham-Glanville machine-description rules, and (c)
corresponding s p a r cinstruction templates.
As an example of a Graham-Glanville code generator, consider the very simple machine-description rules in Figure 6.8. In the example, we write items in their traditional notation, that is, we write Lx => a *y !3] in place of < l t : x , r t :cry/?,pos: |a|>. First, the Left, Right, and so on relations, sets, and func tions are as given in Figure 6.9. Next, we trace several stages of G en_T ables( ) for the given grammar. Initially we have StateN o = MaxStateNo = 0 and Ite m S et[0 ] = { [e r r] >. N ext we call S u c c e sso rs (0 ), which sets v = ' e ' , computes Act ion (0 , ' ) = < S h ift ,0 > , and sets Next Item s to
C lo su re(A d van ce( { [e => • <r- r r ] » ) which evaluates to
N extltem s = { [e =*• • r r] , [r • r] , [r => • + r r] , [r => • + r k] , [r => • T r ] }
Now MaxStateNo is increased to 1, Item Set [1] is set to the value just computed for N extltem s, and Next (0 , ' <r- ') is set to 1.
Next we compute U n ifo rm (l). All items in Item Set [1] have the dot at the beginning, so this is vacuously tru e . In the following steps, Uniform ( ) always returns tru e .
Next, StateN o is set to 1 and we call S u c c e s s o r s (1 ). It first sets v = ' r ' , computes Act ion (1 , ' r ' ) = < S h ift ,0 ) , and sets N extltem s to
C lo su re(A d van ce( { [c • r r ] , [r => • r ] } ) ) which evaluates to
N extltem s = { [e => <- r • r ] , [r => r • ] , [r => • r ] , [r => • + r r] , [r => • + r k] , [r => • t r] >
N ow MaxStateNo is increased to 2, Item Set [2] is set to the value just computed for N extltem s, and N e x t( l, ' r ' ) is set to 2.
Next, S u c c e sso r s (1) sets v = ' + ', computes Act ion ( 1 , ’ + ’ ) = < S h ift ,0>, and sets N extltem s to
Section 6.2 A Syntax-Directed Technique
149
+' Left ' r ' *T ■ Left ' r ' Left ' + ' Right ' r ' ' +' Right ' k 1 Right
r ' First ' r ' ' r ' First ' +' ' r ' First 1T ' r ' Last ' r ' ' r ' Last 'k' EpsLast = { ' r ' , ' k ' } RootOps = { ' < - ' } Followl (' € ') * { ' r ', Followl (' r ') = { ' r ' , , ' t ' , Followl (' k ') = { ' r ' , . ' T ■ , Followl (' + ' ) = { ' r ' , . ' f . Followl (' t ') = 0 Followl ( ' < - ' ) = 0 FollowC € ') = + FoIIo w( 't' ) = { ' r ' . ' + ' . ' T * Follow C k ' ) = { ' r ' . ' + ' . ' t ' , ' < - • > Follow(' + ') = { ' r ' j ' + ' / t ' Follow (' t ') = 0 Follow ( ' < - ' ) = 0
FIG. 6.9 Relations, sets, and functions for our example machine-description gram m ar in Figure 6.8.
which evaluates to
N e x tlte m s = { [ r => + • r r ] , [r => + • r k] , [ r =*• • r ] , [r => • + r r ] , [r => • + r k] , [r => • T r ] >
N o w M axStateN o is increased to 3, Ite m S e t [3] is set to the value just com puted for N e x tlte m s, and Next ( 1 , ' + ' ) is set to 3.
The code-generator generator continues producin g S h i f t actions until it reaches M axStateN o = 9, for which Ite m S e t [9] is { [e => + r k •] , [r k • ] } , which results in a Reduce action, namely,
< R e d u c e ,{[> + r k • ] > >
The resulting A ction/N ext table is show n in Figure 6 .1 0 and a graphic presen tation o f the parsing autom aton app ears in Figure 6.1 1 . In both the table and the diagram , only non-error transitions are show n. The entries in the table that contain a num ber correspond to shift actions, e.g., in state 3 with lookah ead ' r ', we shift and go to state 6. The entries for reduce actions are given as the set o f items to re duce by, e.g., in state 5 for lookahead ' ', ' T ', ' + ' , or ' r ', reduce by the item set { [e => r r • ] } . In the diagram , a shift transition is represented by an arrow from one state to another labeled with the corresponding lookah ead sym bol. Thus,
150 Producing Code G enerators Autom atically
State Lookahead Symbol
Number <- t + r k $ 0 1 Accept 1 4 3 2 2 4 3 5 3 4 3 6 4 4 3 7 5 {[e =*> «- r r •]} 6 4 3 8 9 7 {[r =* t r •]> 8 {[r=» + r r •]> 9 {[r =*• + r k •]}
FIG. 6.10 Action/Next table for the machine-description grammar given in Figure 6.8.
for example, the arrow from state 3 to state 6 labeled with an ' r ' corresponds to the transition just described for the tabular form.
Next, we trace the action of the code generator with the above Action/Next table on the intermediate-code string
< - + r l 2 + T r 3 3 $
The process begins by setting s t a t e to 0, pushing 0 onto the stack, and fetching the symbol ' < - ' as the value of lookahead. The action for 1 < - ' in state 0 is to shift, so the symbol is pushed onto the stack, the next state is fetched from the Action/Next table and pushed onto the stack, the lookahead symbol is discarded from the input string, and the lookahead symbol is set to the next symbol, namely, ' + '. The stack now has the contents
1 0
The action for ' + ' in state 1 is to shift and the next state is 3. The resulting stack is 3 ' + ' 1 0
The action for lookahead r l in state 3 is to shift and enter state 6, so the stack becomes
6 r l 3 ' + ' 1 0
Two more shift actions put the parser in state 9, with the lookahead set to ' +' and the stack
Section 6.2 A Syntax-Directed Technique 151
Reduce
{ [ € <- r r •]>
FIG. 6.11 The code-generation automaton produced for the machine-description grammar in Figure 6.8.
The appropriate action in state 9 is to reduce by the set o f items { [ r = » + r k * ] } , so E m it _ I n s tr s ( ) is called. It allocates a register for the result o f the addition operation, namely, r2 , and outputs the instruction
add r l , 2 , r 2
The value o f l e f t is set to r2 and r i g h t is set to 3, so six items are popped off the stack and discarded, and r2 and the next state (2) are pushed onto the stack, resulting in
2 r2 1 * + ' 0
We leave it to the reader to continue the process and to verify that the following sequence of instructions is produced, assum ing registers r4 and r5 are allocated as shown:
add r l , 2 , r 2 Id [ r 3 ] ,r 4 add r 4 , 3 , r 5 s t r 2 , [ r 5 ]
152 P ro d u cin g C od e G e n e ra to rs A u to m a tically
6 .2 .3
Elim inating Chain Loops
In a parsing gram m ar, it is relatively rare to find chain loop s, i.e., sets o f nonterm inals such that each o f them can derive the others. On the other hand, such loops are extrem ely com m on in m achine descriptions. A s an exam ple o f the effect o f chain loop s, consider the sim ple gram m ar consisting o f the follow ing rules (the fact that the language generated by this gram m ar is em pty is irrelevant— adding productions that generate term inals does not affect the presence o f the chain loop):
r T r r s s => t t => r
6 => <- s t
The p arsin g autom aton for this gram m ar is shown in Figure 6.12. N ow , if we take as input the interm ediate-code string <- r l t r 2 , then, after processing r l , the autom aton is in state 1, the stack is
1 0
and the lookah ead sym bol is ' T '. From this state, the code generator em its a register- to-register m ove and returns to the sam e state, stack, and lookahead— i.e., it’s stuck.
Elim inating loop s can be done as a preprocessing step applied to the machine- description gram m ar or during construction o f the A ction/N ext table. The code in Figure 6.13 provides a w ay to do this as a gram m ar preprocessor. The proce dure E lim _C h ain _L o ops ( ) finds productions < l t : /, r t : r> in MGrammar that have a
{ [ r = * t r •]> { [ r = * s •] > { [ * =» s t •]> F IG . 6.12 Parsing autom aton for the example gram mar that has a chain loop.
Section 6.2 A Syntax-Directed Technique 153 procedure Elim_Chain_Loops( ) begin rl, r2: Rule C, MG: set of Rule R: set of Nonterminal MG := MGrammar for each rl e MG do
if rl.lt * € & Irl.rtl = 1 then if Close(r1,C,R) then for each r2 e C do MGrammar := (MGrammar - {r2}) u Replace(R,r2,rl.It) od fi fi MG -= {rl} od end |I Elim_Chain_Loops
procedure Close(rule,C,R) returns boolean rule: in Rule
C: out set of Rule R: out set of Nonterminal begin
rl: Rule
I I determine set of grammar rules making up a chain loop