• No results found

Ad-hoc Syntax-directed Translation

In document Engineering A Compiler pdf (Page 128-139)

The rule-based evaluators for attribute grammars introduced a powerful idea that actually serves as the basis for the ad hoc techniques used for context- sensitive analysis in many compilers. In the rule-based evaluators, the compiler writer specifies a sequence of actions in terms of productions in the grammar. The underlying observation, that the actions required for context-sensitive anal- ysis can be organized around the structure of the grammar, leads to a powerful, albeit ad hoc, approach to incorporating this kind of analysis into the process

1In fact, the copy rules in Figure 4.5 encode the same set of constraints. To see this clearly,

4.4. AD-HOC SYNTAX-DIRECTED TRANSLATION 117 of parsing a context-free grammar. We refer to this approach as ad hocsyntax- directed translation.

In this scheme, the compiler writer provides arbitrary snippets of code that will execute at parse time. Each snippet, oraction, is directly tied to a produc- tion in the grammar. Each time the parser reduces by the right-hand side of some production, the corresponding action is invoked to perform its task. In a top-down, recursive-descent parser, the compiler writer simply adds the appro- priate code to the parsing routines. The compiler writer has complete control over when the actions execute. In a shift-reduce parser, the actions are per- formed each time the parser performs a reduce action. This is more restrictive, but still workable.

The other points in the parse where the compiler writer might want to perform an action are: (1) in the middle of a production, or (2) on a shift action. To accomplish the first, the compiler writer can transform the grammar so that it reduces at the appropriate place. Usually, this involves breaking the production into two pieces around the point where the action should execute. A higher-level production is added that sequences the first part, then the second. When the first part reduces, the parser will invoke the action. To force actions on shifts, the compiler writer can either move them into the scanner, or add a production to hold the action. For example, to perform an action whenever the parser shifts terminal symbolVariable, the compiler writer can add a production

ShiftedVariableVariable

and replace every occurrence of Variable with ShiftedVariable. This adds an extra reduction for every terminal symbol. Thus, the additional cost is directly proportional to the number of terminal symbols in the program.

4.4.1 Making It Work

Forad hoc syntax-directed translation to work, the parser must provide mecha- nisms to sequence the application of the actions, to pass results between actions, and to provide convenient and consistent naming. We will describe these prob- lems and their solution in shift-reduce parsers; analogous ideas will work for top-down parsers. Yacc, an early lr(1) parser generator for Unix systems, introduced a set of conventions to handle these problems. Most subsequent systems have used similar techniques.

Sequencing the Actions In fitting anad hocsyntax directed translation scheme to a shift-reduce parser, the natural way to sequence actions is to associate each code snippet with the right-hand side of a production. When the parser reduces by that production, it invokes the code for the action. As discussed earlier, the compiler writer can massage the grammar to create additional reductions that will, in turn, invoke the code for their actions.

To execute the actions, we can make a minor modification to the skeleton lr(1)parser’s reduce action (see Figure 3.8).

else if action[s,token] = ”reduceA β” then invoke the appropriate reduce action pop 2× |β|symbols

s←top of stack push A

push goto[s,A]

The parser generator can gather the syntax-directed actions together, embed them in a case statement that switches on the number of the production being reduced, and execute the case statement just before it pops the right-hand side from the stack.

Communicating Between Actions To connect the actions, the parser must pro- vide a mechanism for passing values between the actions for related productions. Consider what happens in the execution time estimator when it recognizes the identifierywhile parsingx÷y. The next two reductions are

Factor Ident

Term Term÷Factor

For syntax-directed translation to work, the action associated with the first pro- duction, Factor Ident, needs a location where it can store values. The ac- tion associated with the second productionTerm Term ÷Factormust know where to find the result of the action caused by reducingxtoFactor.

The same mechanism must work with the other productions that can derive Factor, such asTerm →Term×Factor and Term →Factor. For example, in

x÷y, the values for theTermon the right hand side ofTerm →Term÷Factor is, itself, the result of an earlier reduction by Factor Ident, followed by a reduction ofTerm→Factor. The lifetimes of the values produced by the action for Factor Ident depend on the surrounding syntactic context; thus, the parser needs to manage the storage for values.

To accomplish this, a shift-reduce parser can simply store the results in the parsing stack. Each reduction pushes its result onto the stack. For the pro- ductionTerm Term÷Factor, the topmost result will correspond to Factor. The second result will correspond to÷, and the third result will correspond to Term. The results will be interspersed with grammar symbols and states, but they occur at fixed intervals in the stack. Any results that lie below theTerm’s slot on the stack represent the results of other reductions that form a partial left context for the current reduction.

To add this behavior to the skeleton parser requires two further changes. To keep the changes simple, most parser generators restrict the results to a fixed size. Rather than popping 2× |β | symbols on a reduction byA→β, it must now pop 3× |β|symbols. The result must be stacked in a consistent position; the simplest modification pushes the result before the grammar symbol. With these restrictions, the result that corresponds to each symbol on the right hand side can be easily found. When an action needs to return multiple values, or a complex value such as a piece of an abstract syntax tree, the action allocates a structure and pushes a pointer into the appropriate stack location.

4.4. AD-HOC SYNTAX-DIRECTED TRANSLATION 119 Naming Values With this stack-based communication mechanism, the compiler writer needs a mechanism for naming the stack locations corresponding to sym- bols in the production’s right-hand side. Yacc introduced a concise notation to address these problems. The symbol $$refers to the result location for the current production. Thus, the assignment $$ = 17; would push the integer value seventeen as the result corresponding to the current reduction. For the right-hand side, the symbols$1,$2, . . . , $nrefer to the locations for the first, second, and nth symbols in the right-hand side, respectively. These symbols translate directly into offsets from the top of the stack. $1becomes 3 × |β |

slots below the top of the stack, while$4becomes 3 ×( | −4 + 1) slots from the top of the stack. This simple, natural notation allows the action snippets to read and write the stack locations directly.

4.4.2 Back to the Example

To understand howad-hocsyntax-directed translation works, consider rewriting the execution-time estimator using this approach. The primary drawback of the attribute grammar solution lies in the proliferation of rules to copy information around the tree. This creates many additional rules in the specification. It also creates many copies of the sets. Even a careful implementation that stores pointers to a single copy of each set instance must create new sets whenever an identifier’s name is added to the set.

To address these problems in anad hoc syntax-directed translation scheme, the compiler writer can introduce a central repository for information about variables, as suggested earlier. For example, the compiler can create a hash table that contains a record for eachIdentin the code. If the compiler writer sets aside a field in the table, namedInRegister, then the entire copy problem can be avoided. When the table is initialized, the InRegister field is set to false. The code for the productionFactor→Identchecks theInRegisterfield and selects the appropriate cost for the reference to Ident. The code would look something like:

i = hash(Ident);

if(Table[i].Loaded=true) then

cost = cost + Cost(load); Table[i].Loaded = true;

Because the compiler writer can use arbitrary constructs, the cost can be accu- mulated into a single variable, rather than being passed around the parse tree. The resulting set of actions is somewhat smaller than the attribution rules for the simplest execution model, even though it can provide the accuracy of the more complex model. Figure 4.6 shows the full code for an ad hoc version of the example shown in Figures 4.4 and 4.5.

In the ad hoc version, several productions have no action. The remaining actions are quite simple, except for the action taken on reduction byIdent. All of the complication introduced by tracking loads falls into that single action;

Production Syntax-directed Actions

Block0 Block1 Assign

| Assign { cost = 0; }

Assign Ident=Expr ; {cost = cost + Cost(store)} Expr0 Expr1 +Term {cost = cost + Cost(add);}

| Expr1 −Term {cost = cost + cost(sub);}

| Term

Term0 Term1×Factor {cost = cost + Cost(mult);} | Term1÷Factor {cost = cost + Cost(div);} | Factor

Factor ( Expr )

| Number {cost = cost + Cost(loadI);}

| Ident {i = hash(Ident);

if(Table[i].Loaded =true) then

cost = cost + Cost(load); Table[i].Loaded = true; }

Figure 4.6: Tracking loads withad hoc syntax-directed translation

contrast that with the attribute grammar version, where the task of passing around theBeforeandAfter sets came to dominate the specification. Because it can accumulatecostinto a single variable and use a hash table to store global information, thead hoc version is much cleaner and simpler. Of course, these same strategies could be applied in an attribute grammar framework, but doing so violates the spirit of the attribute grammar paradigm and forces all of the work outside the framework into an ad hocsetting.

4.4.3 Uses for Syntax-directed Translation

Compilers use syntax-directed translation schemes to perform many different tasks. By associating syntax-directed actions with the productions for source- language declarations, the compiler can accumulate information about the vari- ables in a program and record it in a central repository—often called asymbol table. As part of the translation process, the compiler must discover the answer to many context-sensitive questions, similar to those mentioned in Section 4.1. Many, if not all, of these questions can be answered by placing the appropri- ate code in syntax-directed actions. The parser can build an ir for use by

4.4. AD-HOC SYNTAX-DIRECTED TRANSLATION 121 the rest of the compiler by systematic use of syntax-directed actions. Finally, syntax-directed translation can be used to perform complex analysis and trans- formations. To understand the varied uses of syntax-directed actions, we will examine several different applications ofad hoc syntax-directed translation. Building a Symbol Table To centralize information about variables, labels, and procedures, most compilers construct a symbol table for the input program. The compiler writer can use syntax-directed actions to gather the information, insert that information into the symbol table, and perform any necessary processing. For example, the grammar fragment shown in Figure 4.7 describes a subset of the syntax for declaring variables in c. (It omitstypedefs, structs, unions, the type qualifiers constand volatile, and the initialization syntax; it also leaves several non-terminals unelaborated.) Consider the actions required to build symbol table entries for each declared variable.

Each Declaration begins with a set of one or more qualifiers that specify either the variable’s type, its storage class, or both. The qualifiers are fol- lowed by a list of one or more variable names; the variable name can include a specification about indirection (one or more occurrences of *), about array dimensions, and about initial values for the variable. To build symbol table entries for each variable, the compiler writer can gather up the attributes from the qualifiers, add any indirection, dimension, or initialization attributes, and enter the variable in the table.

For example, to track storage classes, the parser might include a variable

StorageClass, initializing it to a value “none.” Each production that reduced to StorageClass would would set StorageClassto an appropriate value. The language definition allows either zero or one storage class specifier per declara- tion, even though the context-free grammar admits an arbitrary number. The syntax-directed action can easily check this condition. Thus, the following code might execute on a reduction byStorageClass register:

if (StorageClass=none) then StorageClass =auto

else report the error

Similar actions set StorageClass for the reductions by static, extern, and

register. In the reduction for DirectDeclarator Identifier, the action creates a new symbol table entry, and usesStorageClassto set the appropriate field in the symbol table. To complete the process, the action for the production DeclarationSpecifierList InitDeclaratorListneeds to resetStorageClassto none.

Following these lines, the compiler writer can arrange to record all of the attributes for each variable. In the reduction to Identifier, this information can be written into the symbol table. Care must be taken to initialize and reset the attributes in the appropriate place—for example, the attribute set by the Pointer reduction should be reset for each InitDeclarator. The action routines can check for valid and invalid TypeSpecifier combinations, such as signed char(legal) anddouble char(illegal).

DeclarationList DeclarationList Declaration | Declaration

Declaration SpecifierList InitDeclaratorList;

SpecifierList Specifier SpecifierList | Specifier Specifier StorageClass | TypeSpecifier StorageClass auto | static | extern | register TypeSpecifier char | short | int | long | unsigned | float | double

InitDeclaratorList InitDeclaratorList, InitDeclarator | InitDeclarator

InitDeclarator Declarator = Initializer | Declarator

Declarator Pointer DirectDeclarator | DirectDeclarator Pointer * | *Pointer DirectDeclarator Identifier | (Declarator) | DirectDeclarator( ) | DirectDeclarator(ParameterTypeList ) | DirectDeclarator(IdentifierList) | DirectDeclarator[ ] | DirectDeclarator[ConstantExpr]

4.4. AD-HOC SYNTAX-DIRECTED TRANSLATION 123 When the parser finishes building the DeclarationList, it has symbol table entries for each variable declared in the current scope. At that point, the com- piler may need to perform some housekeeping chores, such as assigning storage locations to declared variables. This can be done in an action for the production that leads to the DeclarationList. (That production has DeclarationListon its right-hand side, but not on its left-hand side.)

Building a Parse Tree Another major task that parsers often perform is building an intermediate representation for use by the rest of the compiler. Building a parse tree through syntax-directed actions is easy. Each time the parser reduces a production A→βγδ, it should construct a node for A and make the nodes forβ,γ, andδchildren ofA, in order. To accomplish this, it can push pointers to the appropriate nodes onto the stack.

In this scheme, each reduction creates a node to represent the non-terminal on its left-hand side. The node has a child for each grammar symbol on the right-hand side of the production. The syntax-directed action creates the new node, uses the pointers stored on the stack to connect the node to its children, and pushes the new node’s address as its result.

As an example, consider parsing x 2 × y with the classic expression grammar. It produces the following parse tree:

Id Factor Term Expr ? ? ? Goal Expr ? X X X X X z ? Num Factor Term ? ? × Term ? P P P q Id Factor ?

Of course, the parse tree is large relative to its information content. The com- piler writer might, instead, opt for anabstract syntax treethat retains the essen- tial elements of the parse tree, but gets rid of internal nodes that add nothing to our understanding of the underlying code (see §6.3.2).

To build an abstract syntax tree, the parser follows the same general scheme as for a parse tree. However, it only builds the desired nodes. For a production like A→B, the action returns as its result the pointer that corresponds toB. This eliminates many of the interior nodes. To further simplify the tree, the compiler writer can build a single node for a construct such as the if–then–

else, rather than individual nodes for theif, thethen, and theelse.

Anastforx 2 × yis much smaller than the corresponding parse tree: Id × Num Id H H H j H H H j

Grammar Actions

List→List Elt $$←L($1,$2);

| Elt $$←$1;

Grammar Actions

List→Elt List $$←L($1,$2);

| Elt $$←$1; Elt1Elt2 Elt3 Elt4 Elt5 , , , , , , , , @ @ R @ @ R @ @ R @ @ R Elt1 Elt2 Elt3 Elt4Elt5 @ @ @ @ @ @ @ @ R , , , , , , , ,

Left Recursion Right Recursion

Figure 4.8: Recursion versus Associativity

Changing Associativity As we saw in Section 3.6.3, associativity can make a difference in numerical computation. Similarly, it can change the way that data structures are built. We can use syntax-directed actions to build representations that reflect a different associativity than the grammar would naturally produce. In general, left recursive grammars naturally produce left-associativity, while right-recursive grammars naturally produce right associativity. To see this consider the left-recursive and right-recursive list grammars, augmented with syntax-directed actions to build lists, shown at the top of Figure 4.8. Assume thatL(x,y)is a constructor that returns a new node withxandyas its children. The lower part of the figure shows the result of applying the two translation schemes to an input consisting of fiveElts.

The two trees are, in many ways, equivalent. An in-order traversal of both trees visits the leaf nodes in the same order. However, the tree produced from the left recursive version is strangely counter-intuitive. If we add parentheses to re- flect the tree structure, the left recursive tree is ((((Elt1,Elt2),Elt3,)Elt4),Elt5) while the right recursive tree is (Elt1,(Elt2,(Elt3,(Elt4,Elt5)))). The ordering produced by left recursion corresponds to the classic left-to-right ordering for al-

In document Engineering A Compiler pdf (Page 128-139)