• No results found

Grammar for a Pushdown Automaton

In document This page intentionally left blank (Page 152-156)

Solution to Selected Exercises

Algorithm 3.67 Grammar for a Pushdown Automaton

Input: • a pushdown automaton M = (MΣ, MR).

Output: • a grammar G = (GΣ, GR) such that L(G) = εL(M).

Method begin

GΣ = GN ∪ G∆ with G∆ = M∆ and GN = {〈pAq〉| A ∈MΓ, p, q ∈ MQ} ∪ {GS};

repeat

if Aq0a → BnBn-1…B1q1MR,

where A ∈ MΓ, BiMΓ, 1 ≤ i ≤ n, a ∈ M∆ ∪ {ε}, q0, q1MQ, for some n ≥ 1 then include {〈q0Aqn+1〉 → a〈q1B1q2〉〈q2B2q3〉…〈qn-1Bn-1qn〉〈qnBnqn+1〉| qjMQ, 2 ≤ j ≤ n + 1}

into GR

until no change;

repeat

if Aq0a → q1MR, where q0, q1MQ, A ∈MΓ, a ∈ M∆ ∪ {ε} then add 〈q0Aq1〉 → a to GR

until no change;

end.

Lemma 3.68. Let M be a pushdown automaton. With M as its input, Algorithm 3.67 correctly constructs a context-free grammar G such that L(G) = εL(M).

Proof. To establish L(G) = εL(M) this proof first demonstrates the following claim.

Claim. For all w ∈ M*, A ∈ MΓ, and q, q′ ∈ Q, 〈qAq′〉 lm* w in G if and only if Aqw ⇒*q in M.

Only if. For all i ≥ 0, 〈qAq′〉 lmi w in G implies Aqw ⇒*q′ in M, where w ∈ M*, A ∈ MΓ, and q, q' ∈ Q.

Basis. For i = 0, G cannot make 〈qAq′〉 lmi w. Thus, this implication holds vacuously, and the basis is true.

Induction hypothesis. Assume that the implication holds for all j-step derivations, where j = l, …, i, for some i > 0.

Induction step. Consider 〈qAq′〉 lm* w [pπ] in G, where π consists of i rules, and p ∈ GR. Thus,

〈qAq′〉 lm* w [pπ] is a derivation that has i + l steps. From Algorithm 3.67, p has the form p:

〈qAq′〉 → a〈q1B1q2〉〈q2B2q3〉…〈qnBnqn+1〉, where q′ = qn+1. Express 〈qAq′〉 lm* w [pπ] as

〈qAq′〉 lm⇒ a〈q1B1q2〉〈q2B2q3〉…〈qnBnqn+1lm* w

In greater detail, 〈qAq′〉 lm* a〈q1B1q2〉〈q2B2q3〉…〈qnBnqn+1lm* w, where w = aw1w2…wn and

〈qjBjqj+1lm* wj in G, for all j = l, …, n. As π consists of no more than i rules, the induction hypothesis implies Bjqjwj* qj+1 in M, so Bn…Bj+1Bjqjwj* Bn…Bj+1qj+1 in M. As p: 〈qAq′〉 → a〈q1B1q2〉〈q2B2q3〉…〈qnBnqn+1〉 ∈ GR, MR contains r: Aqa → Bn…B1q1. Thus, Aqw Bn…B1q1w1w2…wn [r] in M. Consequently, Aqw ⇒ Bn…B1q1w1w2…wn* Bn…B2q2w2…wn* Bnqnwn * qn+1 in M. Because q′ = qn+1, Aqw ⇒* q′ in M, and the inductive step is completed.

If. For i ≥ 0, Aqw ⇒i q′ in M implies 〈qAq′〉 lm* w in G, where w ∈ M*, A ∈ MΓ, and q, q′ ∈ Q.

Basis. For i = 0, Aqw ⇒i q′ in M is ruled out, so this implication holds vacuously, and the basis is true.

Induction hypothesis. Assume that the above implication holds for all j-step derivations, where j = l, ..., i, for some i > 0.

Induction step. Consider Aqw * q′ [rρ] in M, where ρ represents a rule word consisting of i rules, and r ∈ MR. Thus, Aqw ⇒i+1 q′ [rρ] in M. From Algorithm 3.67, r has the form r: Aqa → Bn…B1q1. Now express Aqw ⇒* q′ [rρ] as Aqav1v2...vn ⇒ Bn…B1q1v1v2...vn [r] ⇒* Bn…B2q2v2…vn

1] ⇒* … ⇒* Bnqnvn n-1] ⇒* qn+1n], where q′ = qn+1, w = av1v2…vn, and ρ = ρ1…ρn-1ρn. As ρj

consists of no more than i rules, the induction hypothesis implies 〈qjBjqj+1lm* vjj] in G, for all j = 1, …, n. As r: Aqa → Bn....B1q1MR and q2,..., qn+1MQ, GR contains p: 〈qAqn+1〉 → a〈q1B1q2〉〈q2B2q3〉…〈qnBnqn+1〉 from the for loop of Algorithm 3.67. Consequently, 〈qAqn+1lma〈q1B1q2〉〈q2B2q3〉…〈qnBnqn+1〉 [p] lm* av1v2…vn [π], where π= π1π2…πn. As q′ = qn+1 and w = av1v2...vn, G makes this derivation 〈qAq′〉 lm* w. That is, the inductive step is completed.

Consequently, the if part of this claim is true as well, so the claim holds.

Consider the above claim for A = MS and q = Ms. At this point, for all w ∈ M*, 〈MsMSq'〉 lm* w in G if and only if MSMsw ⇒* q' in M. Therefore, GS lm⇒ 〈MsMSq'〉 lm* w if and only if MSMsw ⇒* q' in M. In other words, L(G) = εL(M). Thus, Lemma 3.68 holds.

„ 3.35. Consider Algorithm 3.67 Grammar for a Pushdown Automaton (see the solution to Exercise 3.34). Initially, this algorithm sets

N = {〈sSs〉, 〈qSq〉, 〈sSq〉, 〈qSs〉, 〈sas〉, 〈qaq〉, 〈saq〉, 〈qas〉, S}

and ∆ = {a,b}. Then, Algorithm 3.67 enters its for loop. From Ssa → Sas, this loop produces

〈sSs〉 → a〈sas〉〈sSs〉, 〈sSs〉 → a〈saq〉〈qSs〉, 〈sSq〉 → a〈sas〉〈sSq〉, 〈sSq〉 → a〈saq〉〈qSq〉 and adds these four rules to GR. Analogously, from asa → aas, the for loop constructs 〈sas〉 → a〈sas〉〈sas〉,

〈sas〉 → a〈saq〉〈qas〉, 〈saq〉 → a〈sas〉〈saq〉, 〈saq〉 → a〈saq〉〈qaq〉 and adds these four rules to GR.

Based on asb → q, this loop adds 〈saq〉 → b to GR. From aqb → q, the for loop constructs

〈qaq〉 → b and includes this rule in GR. Finally, from Sq → q, this loop produces 〈qSq〉 → ε and adds it to GR. As a result, GR consists of the following rules

S → 〈sSs〉, S → 〈sSq〉, 〈sSs〉 → a〈sas〉〈sSs〉, 〈sSs〉 → a〈saq〉〈qSs〉, 〈sSq〉 → a〈sas〉〈sSq〉, 〈sSq〉 → a〈saq〉〈qSq〉, 〈sas〉 → a〈sas〉〈sas〉, 〈sas〉 → a〈saq〉〈qas〉, 〈saq〉 → a〈sas〉〈saq〉, 〈saq〉 → a〈saq〉〈qaq〉,

〈saq〉 → b, 〈qaq〉 → b, 〈qSq〉 → ε

For simplicity, by using Algorithm 3.30 Useful Symbols, turn this grammar to the following equivalent grammar containing only useful symbols

p0: S → 〈sSq〉

p1: 〈sSq〉 → a〈saq〉〈qSq〉

p2: 〈saq〉 → a〈saq〉〈qaq〉

p3: 〈saq〉 → b p4: 〈qaq〉 → b p5: 〈qSq〉 → ε

Observe that this grammar generates {anbn| n ≥ 1}.

139

From now on, compared to the previous chapters, this book becomes less theoretical and more practical. Regarding parsing, while the previous chapter has explained its basic methodology in general, this chapter and the next chapter give a more realistic insight into parsing because they discuss its deterministic methods, which fulfill a central role in practice.

A deterministic top-down parser verifies that the tokenized version of a source program is syntactically correct by constructing its parse tree. Reading the input string representing the tokenized program from left to the right, the parser starts from its root and proceeds down toward the frontier denoted by the input string. To put it alternatively in terms of derivations, it builds up the leftmost derivation of this tokenized program starting from the start symbol. Frequently, this parser is based upon LL grammars, where the first L stands for the left-to-right scan of tokens and the second L stands for the leftmost derivations. By making use of predictive sets constructed for rules in these grammars, the parser makes a completely deterministic selection of an applied rule during every leftmost derivation.

Based on LL grammars, we concentrate our attention on predictive parsing, which is perhaps the most frequently used deterministic top-down parsing method in practice. More specifically, first, we return to the popular recursive descent method (see Section 3.2), which frees us from explicitly implementing a pushdown list, and create its deterministic version. Then, we use the LL grammars and their predictive sets to make a predictive table used by a deterministic predictive table-driven parser, which explicitly implements a pushdown list. In this parser, any grammatical change only leads to a modification of the table while its control procedure remains unchanged, which is its key pragmatic advantage. We also explain how this parser handles the syntax errors to recover from them.

Synopsis. Section 4.1 introduces and discusses predictive sets and LL grammars. Then, based upon the LL grammar, Section 4.2 discusses the predictive recursive-descent and table-driven parsing.

4.1 Predictive Sets and LL Grammars

Consider a grammar, G = (GΣ, GR), and a G-based top-down parser working with an input string w (see Section 3.2). Suppose that the parser has already found the beginning of the leftmost derivation for w, S lm* tAv, where t is a prefix of w. More precisely, let w = taz, where a is the current input symbol, which follows t in w, and z is the suffix of w, which follows a. In tAv, A is the leftmost nonterminal to be rewritten in the next step. Assume that there exist several different A-rules, so the parser has to select one of them to continue the parsing process, and if the parser works deterministically, it cannot revise this selection later on. A predictive parser selects the right rule by predicting whether its application gives rise to a leftmost derivation of a string starting with a. To make this prediction, every rule r ∈ GR is accompanied with its predictive set containing all terminals that can begin a string resulting from a derivation whose first step is made by r. If the A-rules have their predictive sets pairwise disjoint, the parser deterministically selects the rule whose predictive set contains a. To construct the predictive sets, we first need the first and follow sets, described next.

First. The predictive set corresponding to r ∈ GR obviously contains the terminals that occur as the first symbol in a string derived from rhs(r).

Definition 4.1 first. Let G = (GΣ, GR) be a grammar. For every string xGΣ*,

first(x) = {a| x ⇒* w, where either wG+ with a = symbol(w, 1) or w= ε = a}, where symbol(w, 1) denotes the leftmost symbol of w (see Section 1.1).

In general, first is defined in terms of ⇒. However, as for every wG*, x ⇒* w if and only if x lm* w (see Theorem 3.20), we could equivalently rephrase this definition in terms of the leftmost derivations, which play a crucial role in top-down parsing, as

first(x) = {a| x lm* w, where either wG+ with a = symbol(w, 1) or w= ε = a}

Furthermore, observe that if x ⇒* ε, where x∈ Σ*, then ε is in first(x); as a special case, for x= ε, first(ε) = {ε}.

Next, we will construct the first sets for all strings contained in G∆ ∪ {lhs(r)| r ∈ GR} ∪ {y| y ∈ suffixes(rhs(r)) with r ∈ GR}. We make use of some subsets of these first sets later in this section (see Algorithm 4.4 and Definition 4.5).

Goal. Construct first(x) for every x ∈ G∆ ∪ {lhs(r)| r ∈ GR} ∪ {y| y ∈ suffixes(rhs(r)) with r ∈

GR}.

Gist. Initially, set first(a) to {a} for every a G∆ ∪ {ε} because a lm0 a for these as.

Furthermore, if A → uw ∈ GR with u ⇒* ε, then A ⇒* w, so add the symbols of first(w) to first(A) (notice that u lm* ε if and only if u is a string consisting of ε-nonterminals, determined by Algorithm 3.33). Repeat this extension of all the first sets in this way until no more symbols can be added to any of the first sets.

In document This page intentionally left blank (Page 152-156)