• No results found

Non-deterministic automata

Both examples above feature two components: a machine that can make substitutions and record a parse tree, and a control mechanism that decides which moves the machine should make. The machine is relatively simple since its substitutions are res- tricted to those allowed by the grammar, but the control mechanism can be made arbi- trarily complex and may incorporate extensive knowledge of the grammar.

This structure can be discerned in all parsing methods; there always is a substitut- ing and record-keeping machine and a guiding control mechanism (Figure 3.4).

control mechanism

substituting and record-keeping

mechanism

Figure 3.4 Global structure of a parser

The substituting machine is called a non-deterministic automaton or NDA; it is called “non-deterministic” since it often has several possible moves and the particular choice is not predetermined, and an “automaton” since it fits the Webster† definition “an apparatus that automatically performs certain actions by responding to preset controls or encoded instructions”. It manages three items: the input string (actually a copy of it), the partial parse tree and some internal administration. Every move of the NDA transfers some information from the input string through the administration to the par- tial parse tree; each of the three items may be modified in the process:

partial parse trees control input internal administration

The great strength of a NDA, and the main source of its usefulness, is that it can easily be constructed so that it can only make “correct” moves, that is, moves that keep

Sec. 3.4] Non-deterministic automata 69 the system of partially processed input, internal administration and partial parse tree consistent. This has the consequence that we may move the NDA any way we choose: it may move in circles, it may even get stuck, but if it ever gives us an answer, i.e., a finished parse tree, that answer will be correct. It is also essential that the NDA can make all correct moves, so that it can produce all parsings if the control mechanism is clever enough to guide the NDA there. This property of the NDA is also easily arranged.

The inherent correctness of the NDA allows great freedom to the control mechan- ism, the “control” for short. It may be naive or sophisticated, it may be cumbersome or it may be efficient, it may even be wrong, but it can never cause the NDA to produce an incorrect parsing; and that is a comforting thought. (If it is wrong it may, however, cause the NDA to miss a correct parsing, to loop infinitely or to get stuck in a place where it should not).

3.4.1 Constructing the NDA

The NDA derives directly from the grammar. For a top-down parser its moves consist essentially of the production rules of the grammar and the internal administration is ini- tially the start symbol. The control moves the machine until the internal administration is equal to the input string; then a parsing has been found. For a bottom-up parser the moves consist essentially of the reverse of the production rules of the grammar (see 3.3.2) and the internal administration is initially the input string. The control moves the machine until the internal administration is equal to the start symbol; then a parsing has been found. A left-corner parser works like a top-down parser in which a carefully chosen set of production rules has been reversed and which has special moves to undo this reversion when needed.

3.4.2 Constructing the control mechanism

Constructing the control of a parser is quite a different affair. Some controls are independent of the grammar, some consult the grammar regularly, some use large tables precalculated from the grammar and some even use tables calculated from the input string. We shall see examples of each of these: the “hand control” that was demonstrated at the beginning of this section comes in the category “consults the gram- mar regularly”, backtracking parsers often use a grammar-independent control, LL and LR parsers use precalculated grammar-derived tables, the CYK parser uses a table derived from the input string and Earley’s and Tomita’s parsers use several tables derived from the grammar and the input string.

Constructing the control mechanism, including the tables, from the grammar is almost always done by a program. Such a program is called a parser generator; it is fed the grammar and perhaps a description of the terminal symbols and produces a program which is a parser. The parser often consists of a driver and one or more tables, in which case it is called table-driven. The tables can be of considerable size and of extreme complexity.

The tables that derive from the input string must of course be calculated by a rou- tine that is part of the parser. It should be noted that this reflects the traditional setting in which a large number of different input strings is parsed according to a relatively static and unchanging grammar. The inverse situation is not at all unthinkable: many grammars are tried to explain a given input string (for instance, an observed sequence of events).