A naive approach - Recursive descent - Parsing Techniques A Practical Guide pdf

6.6 Recursive descent

6.6.1 A naive approach

As a first approach, we regard a grammar rule as a procedure for recognizing its left- hand side. The rule

SS -->> aaBB || bbAA

is regarded as a procedure to recognize anSS. This procedure then states something like the following:

SSsucceeds if

asucceeds and thenBBsucceeds or else

bsucceeds and thenAAsucceeds

This does not differ much from the grammar rule, but it does not look like a piece of Pascal or C either. Like a cookbook recipe that usually does not tell us that we must peel the potatoes, let alone how to do that, the procedure is incomplete.

There are several bits of information that we must maintain when carrying out such a procedure. First, there is the notion of a “current position” in the rule. This current position indicates what must be tried next. When we implement rules as procedures, this current position is maintained automatically, by the program counter, which tells us where we are within a procedure. Next, there is the input sentence itself. When implementing a backtracking parser, we usually keep the input sentence in a global array, with one element for each symbol in the sentence. The array must be global, because it contains information that must be accessible equally easily from all procedures. Then, there is the notion of a current position in the input sentence. When the current position in the rule indicates a terminal symbol, and this symbol corresponds to the symbol at the current position in the input sentence, both current positions will be advanced one position. The current position in the input sentence is also global information. We will therefore maintain this position in a global variable, of a type that is suitable for indexing the array containing the input sentence. Also, when starting a rule we must remember the current position in the input sentence, because we need it for the “or else” clauses. These must all be started at the same position in the input sentence. For instance, starting with the rule forSSof grammar 6.1, suppose that theaamatches the symbol at the current position of the input sentence. The current position is advanced and thenBBis tried. ForBB, we have a rule similar to that ofSS. Now suppose thatBBfails. We then have to try the next choice forSS, and backup the position in the input sentence to what it was when we started the rule for SS. This is backtracking, just as we have seen it earlier.

a grammar that has more than one non-terminal, so there will be more than one rule. When we arrive at a non-terminal in a rule, we have to execute the rule for that non- terminal, and, if it succeeds, return to the current invocation and continue there. We achieve this automatically by using the procedure-call mechanism of the implementation language.

Another detail that we have not covered yet is that we have to remember the grammar rules that we use. If we do not remember them, we will not know afterwards how the sentence was derived. Therefore we note them in a separate list, striking them out when they fail. Each procedure must keep its own copy of the index in this list, again because we need it for the “or else ” clauses: if a choice fails, all choices that have been made after the choice now failing must be discarded. In the end, when the rule forSS’’succeeds, the grammar rules left in this list represent a left-most derivation of the sentence.

Now, let us see how a parser, as described above, works for an example. Let us consider again grammar of Figure 6.6, and input sentenceaabbbbcccc. As before, we add a ruleSS’’-->>SS##to the grammar and a##to the end of the sentence, so our parser starts in the following state:

Active rules Sentence Parse

1: SS’’ -->> SS## ababcc## 1:SS’’ -->> SS##

Our administration is divided into three parts; the “Active rules” part indicates the active rules, with a dot () indicating the current position within that rule. The bottom

rule in this part is the rule that we are currently working on. The “Sentence” part indicates the sentence, including a position marker indicating the current position in the sentence. The “Parse” part will be used to remember the rules that we use (not only the currently active ones). The entries in this part are numbered, and each entry in the “Active rules” part also contains its index in the “Parse” part. As we shall see later, this is needed to backup after having taken a wrong choice.

There is only one possibility here: the current position in the procedure indicates that we must invoke the procedure forSS, so let us do so:

Active rules Sentence Parse 1: SS’’ -->> SS## aabbcc## 1:SS’’ -->> SS## 2: SS -->> DDCC || AABB aabbcc## 2:SS -->> DDCC

Notice that we have advanced the position in the SS’’ rule. It now indicates where we have to continue when we are finished withSS(the return address). Now we try the first alternative forSS. There is a choice here, so the current position in the input sentence is saved. We have not made this explicit in the pictures, because this position is already present in the “Sentence”-part of the entry that invokedSS.

Active rules Sentence Parse

! ! 1: SS’’ -->> SS## ababcc## 1:SS’’ -->> SS## 2: SS -->> DDCC || AABB aabbcc## 2:SS -->> DDCC 3: DD -->> aabb || aaDDbb aabbcc## 3:DD -->> aabb !

Sec. 6.6] Recursive descent 135 we get:

Active rules Sentence Parse

! ! 1: SS’’ -->> SS## ababcc## 1:SS’’ -->> SS## 2: SS -->> DDCC || AABB aabbcc## 2:SS -->> DDCC 3: DD -->> aabb || aaDDbb aabbcc## 3:DD -->> aabb !

Now, we are at the end of a choice forDD. This means that it succeeds, and we remove this entry from the list of active rules, after updating the current positions in the entry above. Next, it isCC’s turn:

Active rules Sentence Parse 1: SS’’ -->> SS## aabbcc## 1:SS’’ -->> SS## 2: SS -->> DDCC || AABB ababcc## 2:SS -->> DDCC 4: CC -->> cc || ccCC ababcc## 3:DD -->> aabb 4:CC -->> cc

Now, theccsucceeds, so theCCsucceeds, and then theSSalso succeeds.

Active rules Sentence Parse

1: SS’’ -->> SS## aabbcc## 1:SS’’ -->> SS## 2:SS -->> DDCC 3:DD -->> aabb 4:CC -->> cc

Now, the##also succeeds, and thusSS’’succeeds, resulting in:

Active rules Sentence Parse

1: SS’’ -->> SS## aabbcc## 1:SS’’ -->> SS## 2:SS -->> DDCC 3:DD -->> aabb 4:CC -->> cc

The “Parse” part now represents a left-most derivation of the sentence:

SS’’ -->> SS## -->> DDCC## -->> aabbCC## -->> aabbcc##.

This method is called recursive descent. Descent, because it operates top-down, and recursive, because each non-terminal is implemented as a procedure that can directly or indirectly (through other procedures) invoke itself. It should be stressed that “recursive descent” is merely an implementation issue, albeit an important one. It should also be stressed that the parser described above is a backtracking parser, independent of the implementation method used. Backtracking is a property of the parser, not of the implementation.

The backtracking method developed above is aesthetically pleasing, because we in fact use the grammar itself as a program (or we transform the grammar rules into procedures, which can be done mechanically). There is only one problem: the recursive descent method, as described above, does not always work! We already know that it does not work for left-recursive grammars, but the problem is worse than that. For

instance, aaaabbccandaabbccccare sentences that are not recognized, but should be. Parsing of theaaaabbccsentence gets stuck after the first aa, and parsing of theaabbccccsentence gets stuck after the firstcc. Yet,aaaabbcccan be derived as follows:

SS -->> AABB -->> aaAABB -->> aaaaBB -->> aaaabbcc, andaabbcccccan be derived with

SS -->> DDCC -->> aabbCC -->> aabbccCC -->> aabbcccc.

So, let us examine why our method fails. A little investigation shows that we never try the AA-->>aaAA choice when parsing aaaabbcc, because the AA-->>aa choice succeeds. Such a problem arises whenever more than one right-hand side can succeed, and this is the case whenever a right-hand side can derive a prefix of a string derivable from another right-hand side of the same non-terminal. The method developed so far is too optimistic, in that it assumes that if a choice succeeds, it must be the right choice. It does not allow us to backtrack over such a choice, when it was the wrong one. This is a particularly serious problem if the grammar has ε-rules, because ε-rules always succeed. Another consequence of being unable to backup over a succeeding choice is that it does not allow us to get all parses when there is more than one (this is possible for ambiguous grammars). Improvement is certainly needed here. Our criterion for determining whether a choice is the right one clearly is wrong. Looking back at the backtracking parser of the beginning of this section, we see that that parser does not have this problem, because it does not consider choices independently of their context. One can only decide that a choice is the right one if taking it results in a successful parse; even if the choice ultimately succeeds, we have to try the other choices as well if we want all parses. In the next section, we will develop a recursive-descent parser that solves all the problems mentioned above. Meanwhile, the method above only works for grammars that are prefix-free. A non-terminal A is prefix-free if A→*_{x and A}→*_xy,

where x and y are strings of terminal symbols, implies that y = ε. A grammar is called prefix-free if all its non-terminals are prefix-free.

In document Parsing Techniques A Practical Guide pdf (Page 130-133)