• No results found

6.2 Construction

7.1.1 The Parser Interface

A parser accepts a sequence of basic symbols, recognizes the extant syntactic structure, and outputs that structure along with the identity of the relevant symbols. If the syntactic structure is not error-free, the parser invokes the error handler to report errors and to aid in recovery so that processing can continue. (The details of the recovery mechanism will be discussed in Section 12.2.2.) Figure 7.1 shows the information ow involved in the parsing process.

Three possible interface specications are suggested by Figure 7.1, depending upon the overall organization of the compiler. The most common is for the parser module to provide

Sythesized tokens Error reports Connection points Tokens Parser analyzer Lexical Semantic analyzer Error handler

Figure 7.1: Parser Information Flow

the operation parse program. It invokes the lexical analyzer's next symbol operation for

each basic symbol, and reports each connection point by invoking an appropriate operation of some other module. (We term this invocation a parser action.) Control of the entire transduction process resides within the parser in this design. By moving the control out of the parser module, we obtain the two alternative designs: The parser module provides either an operation parse symbol that is invoked with a token as an argument, or an operation next connection that is invoked to obtain a connection point specication.

It is also possible to divide the parsing over more than one pass. Properties of the language and demands of the parsing algorithm can lead to a situation where we need to know the semantics of certain symbols before we can parse the context of the denitions of these symbols. ALGOL 68, for example, permits constructs whose syntactic structure can be recognized by deterministic left-to-right analysis only if the complete set of type identiers is known beforehand. When the parsing is carried out in several passes, the sequence of symbols produced by the lexical analyzer will be augmented by other information collected by parser actions during previous passes. The details depend upon the source language.

We have already considered the interface between the parser and the lexical analyzer, and the representation of symbols. The parser looks ahead some number of symbols in order to control the parsing. As soon as it has accepted one of the lookahead symbols as a component of the sentence being analyzed, it reads the next symbol to maintain the supply of lookahead symbols. Through the use of LL or LR techniques, we can be certain that the program is syntactically correct up to and including the accepted symbol. The parser thus need not retain accepted symbols. If the code for these symbols, or their values, must be passed on to other compiler modules via parser actions, these actions must be connected directly to the acceptance of the symbol. We shall term connection points serving this purpose symbol connections.

We can distinguish a second class of connection point, the structure connection. It is used to connect parser actions to the attainment of certain sets of situations (in the sense of Section 5.3.2) and permits us to trace the phrases recognized by the parser in the source program. Note carefully that symbol and structure connections provide the only information that a compiler extracts from the input text.

In order to produce the parse tree as an explicit data structure, it suces to provide one structure connection at each reduction of a simple phrase and one symbol connection at acceptance of each symbol having a symbol value; at the structure connections we must know which production was applied. We can x the connection points for this process mechanically from the grammar. This process has proved useful, particularly with bottom-up parsing.

Parser actions that enter declarations into tables or generate code directly cannot be xed mechanically, but must be introduced by the programmer. Moreover, we often know which production is to be applied well before the reduction actually takes place, and we can make

7.1 Design 125 good use of this knowledge. In these cases we must explicitly mark the connection points and parser actions in the grammar from which the parser is produced. We add the symbol encoding (code and value) taken from the lexical analyzer as a parameter to the symbol connections, whereas parser actions at structure connections extract all of their information from the state of the parser.

Expression ::= Term ('+' Term % Addop) .

Term ::= Factor ('*' Factor %Mulop) .

Factor ::= 'Identifier' &Ident j'(' Expression ')' . a) A grammar for expressions

Addop: Output "+" Mulop: Output "*"

Ident: Output the identier returned by the lexical analyzer

b) Parser actions to produce postx Figure 7.2: Connection Points

Figure 7.2 illustrates a grammar with connection points. The character % marks structure connections, the character & symbol connections. Following these characters, the parser action at that point is specied. Denitions of the parser actions are given in Figure 7.2b. The result of these specications is a translation of arithmetic expressions from inx to postx form.

The processes for parser generation to be described in Sections 7.2 and 7.3 can inter- pret symbol and structure connections introduced explicitly into the grammar as additional nonterminals generating the null string. Thus the connection points do not require special treatment; only the generated parsing algorithm must distinguish them from symbols of the grammar. In addition, none of the transformations used during the generation process alters the invocation sequence of the associated parser actions.

The introduction of connection points can alter the properties of the grammar. For ex- ample, the grammar whose productions are f

Z

!

S

,

S

!

abc

,

S

!

abd

g is LR(0). The modied grammar f

Z

!

S

,

S

!

a

&

Abc

,

S

!

a

&

Bbd

g no longer possesses this property: After reading

a

it is not yet clear which of the parser actions should be carried out.

If a grammar does not have a desired property before connection points are introduced, then their inclusion will not provide that property. This does not, however, prohibit a parser action from altering the state of the parser and thus simulating some desirable property. For example, one can occasionally distinguish among several possible state transitions through the use of semantic information and in this manner establish an LL property not previously present. More problems are generally created than avoided by such ad hoc measures, however.