7.3 LR Parsers
7.3.5 Implementation
In order to carry out the parsing practically, a table of the left sides and lengths of the right sides of all productions (other than chain productions), as well as parser actions to be invoked at connection points, must be known to the transition function. The transition function is
7.3 LR Parsers 147 partitioned in this way to ease the storage management problems. Because of cost we store the transition function as a packed data structure and employ an access routine that locates the value
f
(q;
) given (q;
). Some systems work with a list representation of the (sparse) transition matrix; the access may be time consuming if such a scheme is used, because lists must be searched.The access time is reduced if the matrix form of the transition function is retained, and the storage requirements are comparable to those of the list method if as many rows and columns as possible are combined. In performing this combination we take advantage of the fact that two rows can be combined not only when they agree, but also when they are compatible according to the following denition:
7.8 Definition
Consider a transition matrix
f
(q;
). Two rowsq;q
02
Q
are compatible if, for each column , eitherf
(q;
) =f
(q
0;
) or one of the two entries is a don't-care entry.Compatibility is dened analogously for two columns
;
02
V
. We shall only discuss the combination of rows here.We inspect the terminal transition matrix, the submatrix of
f
(q;
) with 2T
, separately from the nonterminal transition matrix. Often dierent combinations are possible for the two submatrices, and by exploiting them separately we can achieve a greater storage reduction. This can be seen in the case of Figure 7.18a, which is an implementation of the transition matrix of Figure 7.17. In the terminal transition matrix rows 0, 4, 5 and 6 are compatible, but none of these rows are compatible in the nonterminal transition matrix.In order to increase the number of compatible rows, we introduce a Boolean failure matrix,
F
[q;t
],q
2Q
,t
2T
. This matrix is used to lter the access to the terminal transition matrix:f
(q;t
) =ifF
[q;t
]thenerrorelse
entry in the transition matrix
;For this purpose we dene
F
[q;t
] as follows:F
[q;t
] = (true
iff
(q;t
) =ERROR
false
otherwiseFigure 7.18b shows the failure matrix derived from the terminal transition matrix of Fig- ure 7.18a. Note that the failure matrix may also contain don't-care entries, derived as dis- cussed at the end of Section 7.3.2. Row and column combinations applied to Figure 7.18b reduce it from 96 to 44.
With the introduction of the failure matrix, all previous error entries become don't-care entries. Figure 7.18c shows the resulting compression of the terminal transition matrix. The nonterminal transition matrix is not aected by this process; in our example it can be compressed by combining both rows and columns as shown in Figure 7.18d. Each matrix requires an access map consisting of two additional arrays specifying the row (column) of the matrix to be used for a given state (symbol). For grammars of the size of the LAX grammar, the total storage requirements are generally reduced to 5-10% of their original values.
We have a certain freedom in combining the rows of the transition matrix. For ex- ample, in the terminal matrix of Figure 7.18a we could also have chosen the grouping f(0,4,5,6,9),(1,2,7,8)g. In general these groupings dier in the nal state count; we must therefore examine a number of possible choices. The task of determining the minimum num- ber of rows reduces to a problem in graph theory: We construct the (undirected) incompati- bility graph
I
= (Q;D
) for our state setQ
, in which two nodesq
andq
0 are connected if the rows are incompatible. Minimization of the number of rows is then equivalent to the task ofcoloring the nodes with a minimum number of colors such that any pair of nodes connected by a branch are of dierent colors. (Graph coloring is discussed in Section B.3.3.) Further compression may be possible as indicated in Exercises 7.12 and 7.13.
i
( ) + * #E T F
0 -6 4 . . . . 1 2 2 1 . 5 * 2 . . . 5 6 * 4 -6 4 . . . . 7 8 8 5 -6 4 . . . . 9 9 6 -6 4 . . . . -4 7 -7 5 . 8 . . -7 5 6 . 9 . . +2 +2 6 +2a) Transition matrix for Figure 7.17 with shift-reduce transitions
i
( ) + * #0
false false true true true true
1
true false
false
2
true true true false false false
4
false false true true true true
5
false false true true true true
6
false false true true true true
7
false false
true
8
true true false false false true
9
true true false false false false
b) Uncompressed failure matrix for (a)
i
( ) + * #0,1,2,4,5,6,7,8 -6 4 -7 5 6 *
9 +2 +2 6 +2
c) Compressed terminal transition matrix
E TF
0,1,2 1 2
4 7 8
5 9
6,7,8,9 -4
d) Compressed nonterminal transition matrix Figure 7.18: Table Compression
7.4 Notes and References
LL(1) parsing in the form of recursive descent was, according to McClure[1972], the most frequently-used technique in practice. Certainly its exibility and the fact that it can be hand-coded contribute to this popularity.
LR languages form the largest class of languages that can be processed with deterministic pushdown automata. Other techniques (precedence grammars, (
m;n
)-bounded context gram-7.4 Notes and References 149 mars or Floyd-Evans Productions, for example) either apply to smaller language classes or do not attain the same computational eciency or error recovery properties as the techniques treated here. Operator precedence grammars have also achieved signicant usage because one can easily construct parsers by hand for expressions with inx operators. Ahoand Ullman
[1972] give quite a complete overview of the available parsing techniques and their optimal implementation.
Instead of obtaining the LALR(1) parser from the LR(1) parser by merging states, one could begin with the SLR(1) parser and determine the exact right context only for those states in which the transition function is ambiguous. This technique reduces the computation time, but unfortunately does not generalize to an algorithm that eliminates all chain productions.
Construction 7.7 requires a redundant eort that can be avoided in practice. For example, the closure of a situation [
X
!B
;] depends only upon the nonterminalB
if the lookahead set is ignored. The closure can thus be computed ahead of time for eachB
2N
, and only the lookahead sets must be supplied during parser construction. Also, the repeated construction of the follower state of an LALR(1) state that develops from the combination of two LR(1) states with distinct lookahead sets can be simplied. This repetition, which results from the marking of states as not yet examined, leaves the follower state (specied as a set of situations) unaltered. It can at most add lookahead symbols to single situations. This addition can also be accomplished without computing the entire state anew.Our technique for chain production elimination is based upon an idea of Pager [1974]. Use of the failure matrix to increase the number of don't-care entries in the transition matrix was rst proposed byJoliat[1973, 1974].
Exercises
7.1 Consider a grammar with embedded connection points. Explain why transformations of the grammar can be guaranteed to leave the invocation sequence of the associated parser actions invariant.
7.2 State the LL(1) condition in terms of the extended BNF notation of Section 5.1.3. Prove that your statement is equivalent to Theorem 7.2.
7.3 Give an example of a grammar in which the graph of
LAST
contains a cycle. Prove thatFOLLOW
(A
) =FOLLOW
(B
) for arbitrary nodesA
andB
in the same strongly connected subgraph.7.4 Design a suitable internal representation of a grammar and program the generation algorithm of Section 7.2.3 in terms of it.
7.5 Devise an LL(1) parser generation algorithm that accepts the extended BNF notation of Section 5.1.3. Will you be able to achieve a more ecient parser by operating upon this form directly, or by converting it to productions? Explain.
7.6 Consider the interpretive parser of Figure 7.9.
(a) Dene additional operation codes to implement connection points, and add the appropriate alternatives to the case statement. Carefully explain the interface conventions for the parser actions. Would you prefer a dierent kind of parse table entry? Explain.
(b) Some authors provide special operations for the situations [
X
!B
] and [X
!tB
]. Explain how some recursion can be avoided in this manner, and write appropriate alternatives for the case statement.(c) Once the special cases of (b) are recognized, it may be advantageous to provide extra operations identical to 4 and 5 of Figure 7.9, except that the conditions are reversed. Why? Explain.
(d) Recognize the situation [
X
!t
] and alter the code of case 4 to absorb the processing of the 2 operation following it.(e) What is your opinion of the value of these optimizations? Test your predictions on some language with which you are familiar.
7.7 Show that the following grammar is LR(1) but not LALR(1):
Z
!A
,A
!aBcB
,A
!B
,A
!D
,B
!b
,B
!Ff
,D
!dE
,E
!FcA
,E
!FcE
,F
!b
7.8 Repeat Exercise 7.5 for the LR case. Use the algorithm of Section 7.3.4.
7.9 Show that
FIRST
(A
) can be computed by any marking algorithm for directed graphs that obtains a `spanning tree',B
, for the graph.B
has the same node set as the original graph,G
, and its branch set is a subset of that ofG
.7.10 Consider the grammar with the following productions:
Z
!AXd
,Z
!BX
,Z
!C
,A
!B
,A
!C
,B
!CXb
,C
!c
,X
!(a) Derive an LALR(1) parser for this grammar.
(b) Delete the reductions by the chain productions
A
!B
andA
!C
.7.11 Use the techniques discussed in Section 7.3.5 to compress the transition matrix pro- duced for Exercise 7.8.
7.12 [Andersonet al., 1973] Consider a transition matrix for an LR parser constructed by one of the algorithms of Section 7.3.2.
(a) Show that for every state
q
there is exactly one symbolz
(q
) such thatf
(q
0;a
) impliesa
=z
(q
).(b) Show that, in the case of shift-reduce transitions introduced by the algorithms of Sections 7.3.3 and 7.3.4, an unambiguous symbol
z
(A
! ) exists such thatf
(q;a
) = `shift and reduceA
!' impliesa
=z
(A
!).(c) Show that the states (and shift-reduce transitions) can be numbered in such a way that all states in column
c
have sequential numbersc
0+i
,i
= 0;
1;:::
Thus it suces to store only the relative numberi
in the transition matrix; the basec
0 is only given once for each column. In exactly the same manner, a list of the reductions in a row can be assigned to this row and retain only the appropriate index to this list in the transition matrix.(d) Make these alterations in the transition matrix produced for Exercise 7.8 before beginning the compression of Exercise 7.11, and compare the result with that obtained previously.
7.4 Notes and References 151 7.13 Bell [1974] Consider an
mn
transition matrix,t
, in which all unspecied entries are don't-cares. Show that the matrix can be compressed into apq
matrixc
, two length-m
arraysf
andu
, and two length-n
arraysg
and by the following algorithm: Initiallyf
i =g
i = 1, 1i
m
, 1j
n
, andk
= 1. If all occupied columns of thei
th row oft
uniformly contain the valuer
, then setf
i :=k
,k
:=k
+ 1,u
i :=r
and delete the
i
th row oft
. If thej
th column is uniformly occupied, delete it also andset
g
j :=k
,k
:=k
+ 1, j :=r
. Repeat this process until no uniformly-occupied rowor column remains. The remaining matrix is the matrix
c
. We then enter the row (column) number inc
of the formeri
th row (j
th column) intou
i (j). The following
relation then holds:
t
i;j =if
f
i< g
jthen
u
ielse if
f
i< g
jthen
jelse
(*f
i=g
j =1 *)c
ui;j;
(Hint: Show that the size of
c
is independent of the sequence in which the rows and columns are deleted.)Attribute Grammars
Semantic analysis and code generation are based upon the structure tree. Each node of the tree is `decorated' with attributes describing properties of that node, and hence the tree is often called an attributed structure tree for emphasis. The information collected in the attributes of a node is derived from the environment of that node; it is the task of semantic analysis to compute these attributes and check their consistency. Optimization and code generation can be also described in similar terms, using attributes to guide the transformation of the tree and ultimately the selection of machine instructions.
Attribute grammars have proven to be a useful aid in representing the attribution of the structure tree because they constitute a formal denition of all context-free and context- sensitive language properties on the one hand, and a formal specication of the semantic analysis on the other. When deriving the specication, we need not be overly concerned with the sequence in which the attributes are computed because this can (with some restrictions) be derived mechanically. Storage for the attribute values is also not reected in the specication. We begin by assuming that all attributes belonging to a node are stored within that node in the structure tree; optimization of the attribute storage is considered later.
Most examples in this chapter are included to show constraints and pathological cases; practical examples can be found in Chapter 9.
8.1 Basic Concepts of Attribute Grammars
An attribute grammar is based upon a context-free grammar
G
= (N;T;P;Z
). It associates a setA
(X
) of attributes with each symbol,X
, in the vocabulary ofG
. Each attribute represents a specic (context-sensitive) property of the symbolX
, and can take on any of a specied set of values. We writeX:a
to indicate that attributea
is an element ofA
(X
).Each node in the structure tree of a sentence in
L
(G
) is associated with a particular set of values for the attributes of some symbolX
in the vocabulary ofG
. These values are established by attribution rulesR
(p
) = fX
i:a
f
(X
j:b;:::;X
k:c
)g for the productionsp
:X
0!
X
1
:::X
n used to construct the tree. Each rule denes an attributeX
i:a
in terms of attributesX
j:b;:::;X
k:c
of symbols in the same production. (Note that in this chapterwe use upper-case letters to denote vocabulary symbols, rather than using case to distinguish terminals from nonterminals. The reason for this is that any symbol of the vocabulary may have attributes, and the distinction between terminals and nonterminals is generally irrelevant for attribute computation.)
In addition to the attribution rules, a condition
B
(X
i:a;:::;X
j:b
) involving attributes ofsymbols occurring in
p
may be given.B
species the context condition that must be fullled if a syntactically correct sentence is correct according to the static semantics and thereforerule
assignment ::= name ':=' expression .attribution
name.environment assignment.environment; expression.environment assignment.environment; name.postmode name.primode; expression.postmodeif
name.primode = ref_int_typethen
int_typeelse
real_type ;rule
expression ::= name addop name .attribution
name[1].environment expression.environment; name[2].environment expression.environment; expression.primode
if
coercible (name[1].primode, int_type)and
coercible (name[2].primode, int_type )then
int_typeelse
real_type ; addop.mode expression.primode ;name[1].postmode expression.primode ; name[2].postmode expression.primode ;
condition
coercible (expression.primode, expression.postmode);rule
addop ::= '+'.attribution
addop.operation
if
addop.mode = int_typethen
int_additionelse
real_addition ;rule
name ::= identifier .attribution
name.primode defined_type (identifier.symbol , name.environment );
condition
coercible (name.primode , name.postmode );Figure 8.1: Simplied LAX Assignment
translatable. We could also regard this condition as the computation of a Boolean attribute
consistent, which we associate with the left-hand side of the production.
As an example, Figure 8.1 gives a simplied attribute grammar for LAX assignments. Each
p
2P
is marked by the keywordrule
and written using EBNF notation (restricted to express only productions). The elements ofR
(p
) follow the keywordattribution
. We use a conventional expression-oriented programming language notation for the functionsf
, and ter- minate each element with a semicolon. Particular instances of an attribute are distinguished by numbering multiple occurrences of symbols in the production (e.g. name[1], name[2])from left to right. Any condition is also marked by a keyword and terminated by a semicolon. In order to check the consistency of the assignment and to further identify the + operator, we must take the operand types into account. For this purpose we dene two attributes,
primode and postmode, for the symbols expression and name, and one attribute, mode,
for the symboladdop. Primode describes the type determined directly from the node and its
descendants;postmode describes the type expected when the result is used as an operand by
other nodes. Any dierence betweenprimode and postmode must be resolved by coercions.
8.1 Basic Concepts of Attribute Grammars 155 name1 identifier1 name3 identifier3 name2 identifier2 assignment expression addop ’+’ a) Syntactic structure tree
assignment.environment identifieri.symbol
b) Attribute values given initially (
i
= 1;::: ;
3)name1.environment expression.environment
namei.environment name1.primode
name1.postmode expression.postmode namei.primode
expression.primode name1 condition
addop.mode namei.postmode expression condition addop.operation namei condition
c) Attribute values computed (
i
= 2;
3) Figure 8.2: Analysis ofx
:=y
+z
Figure 8.2 shows the analysis of
x
:=y
+z
according to the grammar of Figure 8.1. (Assignment.environment would be computed from the declarations ofx
,y
andz
, but herewe show it as given in order to make the example self-contained.) Attributes on the same line of Figure 8.2c can be computed collaterally; every attribute is dependent upon at least one attribute from the previous line. These dependency relations can be expressed as a graph (Figure 8.3). Each large box represents the production whose application corresponds to the node of the structure tree contained within it. The small boxes making up the node itself represent the attributes of the symbol on the left-hand side of the production, and the arrows represent the dependency relations arising from the attribution rules of the production. The node set of the dependency graph is just the set of small boxes representing attributes; its edge set is the set of arrows representing dependencies.
identifier identifier symbol identifier symbol symbol
env pri post primode postmode
environment pri pri name addop name name assignment expression post post environment
env mode oper env
We must know all of the values upon which an attribute depends before we can compute the value of that attribute. Clearly this is only possible if the dependency graph is acyclic. Figure 8.3 is acyclic, but consider the following LAX type denition, which we shall discuss in more detail in Sections 9.1.2 and 9.1.3:
type
t =record
(real x ,ref
t p );We must compute a type attribute for each of the identiers
t
,x
andp
so that the associated type is known at each use of the identier. The type attribute oft
consists of the keywordrecord
plus the types and identiers of the elds. Now, however, the type ofp
contains an application of
t
, implying that the type identied byt
depends upon which type a use oft
identies. Thus the typet
depends cyclically upon itself. (We shall show how to eliminate the cycle from this example in Section 9.1.3.)Let us now make the intuition gained from these examples more precise. We begin with