Automata for Deterministic Context-Free Iterations

erations

Now we define semantic automata for type ⟨1,1,2⟩ deterministic context-free iterations. Recall that we use the notation⟨q, x, α, β, q′⟩_for_δ(_{q, x, α}) = (_q′_{, β})

for (D)PDA. Similarly, if δ is the transition function of a DFA, we will write

⟨q, x, q′⟩_if_δ(_{q, x}) =_q′_.

Definition 6.2.1. LetQ1= (Q1,Σ1,Γ1, δ1, s1, F1)be any DPDA recognizing a

qaccept, qreject)be a DPDA modified according to Lemma 6.1.1 recognizing an end-

marked deterministic context-free quantifier languageLQ2. Define the iteration

DPDAQ1⋅Q2 by: • Q = Q1∪ Q2 • Σ=Σ1∪Σ2 • Γ=Γ1∪Γ2∪ Q1 • δ = δ2 (1) ∪ {⟨q, , α, β, q′⟩ ∶ ⟨_{q, , α, β, q}′⟩ ∈_δ 1} (2) ∪ {⟨q, , α, qα, s2⟩ ∶ (q, x, α) ∈dom(δ1)andx∈ {0,1}} (3) ∪ {⟨qaccept, , qα, β, q ′⟩ ∶ ⟨_q,₁_{, α, β, q}′⟩ ∈_δ 1} (4) ∪ {⟨qreject, , qα, β, q ′⟩ ∶ ⟨_q,₀_{, α, β, q}′⟩ ∈_δ 1} (5) • s=s1 • F=F1

We take the states ofQ1and the states ofQ2and connect them in the following

way: for every transition inδ1in which some stateqreads a symbol, we replace

that transition with an transition to the start state of Q2 and push q to the

stack. Thus all subwordswi⧈of the input are processed byQ2; in any case,Q2

empties its stack up toqand ends up in one ofqaccept or qreject, and transitions

back intoQ1–with the new state and new stack contents decided byq.6

Of course, natural language iterations often involve a mixture of regular and context-free quantifiers:

(i) A third of the students answered every question correctly.

(ii) Fewer than five students attended more than half of the presentations. Whether the DPDA-computable quantifier is outermost (as in (i)) or embedded (as in (ii)), we can still utilize the stack to record the progress of the computation (state of the outermost machine) before beginning a subcomputation with the embedded machine, avoiding the need to create multiple copies of the latter. The following two definitions make this precise; note the only significant departure from Definition 6.2.1 is with respect toδ, and in any case the resulting iteration automaton is a DPDA. For simplicity we do not pursue the “minimal” DPDA by distinguishing between terminal and non-terminal states of the outer quantifier. Note that since there is no state merging and the transitions from states inQ1

to states inQ2 are all transitions, the definition works for words of the form

(wi⧈)∗ even if somewi are.

6_{The proof sketch of DCFL iteration closure in [45] is by DPDA construction and indicates}

that the construction proceeds similarly to the DFA case. Since simply complementing the accepting states of a given DPDA may not result in the correct behavior (as it may continue to transition between accepting and rejecting states after reading the input), the correctness of our definitions in this section relies on the modifications described in Lemma 6.1.1. The

cases are not necessarily similar (that is, we can not necessarily use a single DPDAQ2 to

Definition 6.2.2. Let Q1 = (Q1,Σ1,Γ1, δ1, s1, F1) be a DPDA recognizing a

deterministic context-free quantifier languageLQ1 andQ2= (Q2,Σ2, δ2, s2, F2)

a DFA recognizing a regular quantifier languageLQ2. Define the DPDAQ1⋅Q2

by: • Q = Q1∪ Q2 • Σ=Σ1∪ {⧈} • Γ=Γ1∪ Q1 • δ = {⟨q, x, α, α, q′⟩ ∶ ⟨_{q, x, q}′⟩ ∈_δ 2} (1’) ∪ {⟨q, , α, β, q′⟩ ∶ ⟨_{q, , α, β, q}′⟩ ∈_δ 1} (2’) ∪ {⟨q, , α, qα, s2⟩ ∶ (q, x, α) ∈dom(δ1)andx∈ {0,1}} (3’) ∪ {⟨p,⧈, qα, β, p′⟩ ∶_p∈_F 2 and⟨q,1, α, β, p′⟩ ∈δ1} (4’) ∪ {⟨p,⧈, qα, β, p′⟩ ∶_p/∈_F 2 and⟨q,0, α, β, p′⟩ ∈δ1} (5’) • s=s1 • F=F1

Definition 6.2.3. LetQ1= (Q1,Σ1, δ1, s1, F1)be a DFA recognizing a regular

quantifier languageLQ1 andQ2= (Q2,Σ2, δ2, s2, qaccept, qreject)a DPDA modified

according to Lemma 6.1.1 recognizing an endmarked deterministic context-free quantifier languageLQ2. Define the DPDAQ1⋅Q2by:

• Q = Q1∪ Q2 • Σ=Σ2 • Γ=Γ2∪ Q1 • δ = δ2 (1”) ∪ {⟨q, , , q, s2⟩ ∶ (q, x) ∈dom(δ1)} (2”) ∪ {⟨qaccept, , q, , q ′⟩ ∶ ⟨_q,₁_{, q}′⟩ ∈_δ 1} (3”) ∪ {⟨qreject, , q, , q ′⟩ ∶ ⟨_q,₀_{, q}′⟩ ∈_δ 1} (4”) • s=s1 • F=F1

Since the DFAQ1only has 0,1 transitions, there is no need for a corresponding

(2”) case.

Claim 6.2.4. Each automata Q1⋅Q2 yielded by Definitions 6.2.1, 6.2.2 and

6.2.3 is deterministic.

Proof. First we show this holds when both Q1 and Q2 are DPDA. We show

there is only one move per configuration inδby examining each part (1)-(5) of the definition:

(1) δ2 has at most one move per configuration.

(3) A transition of this type is added if q has 0,1 moves in δ1 with α on the

stack. This meansqdoes not have an move withαon the stack inδ1 (or

a transition with both input and stack). Thus, replacing 0,1 withwith

αon the stack leavesq with one choice inδ.

(4) Inδ2,qaccepthas no moves by construction, andδ1is deterministic, so there

is exactly one move inδfor configuration (p, , qα). (5) The same argument in (4) applies forqreject.

The same arguments suffice to see that theδ given by (1’)-(5’) and (1”)-(5”) are deterministic, with the additional minor observations: for (1’), adding an inert stack component does not affect choice; for (4’/5’), the statesp∈ Q2 have

no⧈transitions inδ2.

To see the correctness of these automata definitions, we again prove a lemma relating transitions on ⧈-ended words in Q1⋅Q2 to transitions on individual

symbols inQ1.

Lemma 6.2.5. Let g be the characteristic function of LQ2. For wi ∈ {0,1} ∗

andq∈ Q1,δ(q, wi⧈, α) =δ1(q, g(wi), α).

Proof. There are three cases; one per definition.

(i) LetQ1,Q2 both be DPDA. Assume w.l.o.g. thatq has 0,1-transitions in

δ1 (otherwise there is an -transition to someq′, in bothδ1 and δ2, with

the same effect on the stack (2)). Then inδ,qhas an-move tos2withq

pushed to the stack (3). Sinceqis not in Γ2, this is effectively an empty

stack toδ2, so by (1) and Lemma 6.1.1 we have thatδ(s2, wi⧈, qα)goes to (qaccept, qα)ifg(wi) =1 or (qreject, qα)ifg(wi) =0. By (4) and (5), there

is an-move toδ1(q, g(wi), α).

(ii) Let Q1 be a DPDA and Q2 a DFA. Again assume w.l.o.g. that q has

0,1-moves (2’). By (3’), there is an -move to s2 with q pushed to the

stack. By (1’), δ(s2, wi, qα) = (p, qα) (as the stack is left untouched),

wherep∈F2if and only ifg(wi) =1. Finally, by (4’/5’), there is a⧈-move

toδ1(q, g(wi), α).

(iii) LetQ1be a DFA andQ2a DPDA. In this case we takeα=and show that

δ(q, wi⧈, ) = (δ1(q, g(wi)), ). By (3”), there is first an-move tos2withq

pushed to the stack. As in case (i),Q2goes froms2with effectively empty

stack (qon top) to qaccept ifg(wi) =1 orqreject ifg(wi) =0 with effectively

empty stack (qon top). By (4”/5”), there is an-move to(δ1(q, g(wi)), ).

Theorem 6.2.6. The language accepted by the DPDA Q1⋅Q2, generated by

Proof. This follows from the preceding lemma and the argument for Theorem 5.1.6 for the two-DFA case.

We do not give a formal definition for reasons of space, but these constructions can be generalized to iterations of an arbitrary number of DFA and DPDA by (1) using ⧈i as an indicator for the ith embedded DPDA to empty its stack

and (2) expanding the stack alphabet with each new embedding to maintain a history (as in Definition 5.2.5 for generalized iteration DFA).

Summary and Open Questions

In this chapter we saw a proof of the closure of deterministic context-free languages under quantifier iteration and definitions for constructing iteration DPDA when one or more of the quantifiers is DPDA-computable.

Are there other natural classes of quantifiers for which we can investigate iteration closure? For instance:

Question 6.2.7. Are context-sensitive languages (for example,the same number of a’s,b’s, and c’s, which is type ⟨1,1,1⟩) closed under quantifier iteration? The answer is not immediate; like DCFLs, context-sensitive languages are not closed under substitution.

Mostowski [38] identifies a subset of quantifiers that are accepted by DPDA by bothfinal state and empty stack (recall: this means the DPDA is in a final state and the stack is also empty; not the usual notion of empty stack). This class more-or-less corresponds to exact proportional quantifiers (for example,exactly 1/3). Isthis subset of DCFLs closed under quantifier iteration? This question is actually easy to answer.

Fact 6.2.8. This natural proper subset of deterministic context-free quantifier languages isnot closed under iteration.

Proof. This follows immediately from almost-linear quantifiers lacking complement closure (indeed, the complement of an almost-linear quantifier is never almost-linear).7

Chapter 7

Cumulation Automata

[46] mentions in a footnote the possibility of defining cumulation automata as the sequential composition of iteration automata. The contribution of this chapter is to give precise definitions of automata both for type⟨1,1,2⟩regular cumulations as well as for type⟨1,1, . . . , n⟩regular cumulations based on that suggestion. This requires a modification of the translation function to record both R and R−1_{, which turns out to substantially simplify the difficulty of}

the language, indicating that the choice of model representation is an integral factor of semantic automata complexity. Cumulation constitutes an interesting extension of semantic automata because the quantifiers resulting from this lift are irreducibly polyadic. They are somehowon or around the Frege boundary since theyare definable from iterations, but are not themselves iterations.

7.1 Automata for Type

⟨1,1,2⟩

Regular Cumu-

lations

Recall that the cumulation(Q1,Q2)cl(A, B, R)can be defined by

(Q1⋅some)(A, B, R) ∧ (Q2⋅some)(B, A, R−1)

Here we cannot interpret the “∧” as intersection, as we might in the case of the language of “More than two and less than four.” The first iteration is evaluated using a string generated by τ2 given R, and the other using R−1. To extend

the semantic automata model to cumulation, we need a way to combine DFAs

Q1⋅some and Q2⋅some into a single automaton that accepts a single input

combining the strings they should evaluate. What we need is the sequential composition of these iteration automata, accepting concatenations of strings from the two languages.

Recall that the concatenation ofL1 andL2isL1L2= {uv∣u∈ L1, v∈ L2}, and

regular languages are closed under concatenation (Theorem 2.2.5). Sequential composition is the automata operation corresponding to concatenation, illus- trated in Figure 7.1, consisting of connecting final states ofN1to the start state

ofN2by-transitions. 1.2 NONDETERMINISM 61

FIGURE 1.48

Construction ofN to recognizeA1◦A2

PROOF

LetN1= (Q1,Σ, δ1, q1, F1)recognizeA1, and

N2= (Q2,Σ, δ2, q2, F2)recognizeA2.

ConstructN = (Q,Σ, δ, q1, F2)to recognizeA1◦A2. 1. Q=Q1∪Q2.

The states ofN are all the states ofN1andN2. 2. The stateq1is the same as the start state ofN1.

3. The accept statesF2are the same as the accept states ofN2. 4. Defineδso that for anyq_∈Qand anya_∈Σε,

δ(q, a) =          δ1(q, a) q∈Q1andq6∈F1 δ1(q, a) q∈F1anda6=ε δ1(q, a)∪ {q2} q∈F1anda=ε δ2(q, a) q∈Q2.

Figure 7.1: Sequential composition of automata ([44])

Definition 7.1.1. LetM = ⟨M, A, B, R⟩be a model with⃗aand⃗b any enumer- ations ofAandB. Define the translation functionτcl

2 which takes two sets and

a binary relation as arguments:

τcl

2(⃗a,⃗b, R) =τ2(⃗a,⃗b, R) ⊠τ2(⃗b,⃗a, R−1)

Definition 7.1.2. Let Q1 and Q2 be quantifiers of type ⟨1,1⟩. Define the

language(Q1,Q2)clby:

L(Q1,Q2)cl = {w1⊠w2∶w1∈ LQ1⋅some, w2∈ LQ2⋅some}

By separating the words inLQ1⋅someandLQ2⋅somewith a distinguished⊠symbol,

we avoid the problem of the automaton having to “guess” when it has seen the end of the first word and the beginning of the second. Such non-determinism is not really an issue since NFAs and DFAs both generate exactly the regular languages; however, determinism is easily retained in this fashion. Moreover, these separator symbols may be necessary for cumulation automata to mean what we intend them to mean (see the discussion following Question 7.2.11 in the summary).

Example 7.1.3. Consider the sentenceThree cinephiles watched five movies, on the reading that the three cinephiles, all together, watched a sum total of five movies. To translate the model in Figure 7.2, we calculate τcl

2(⃗c,m, W⃗ ).

τ2(⃗c,m, W⃗ ) yields the string 11110⧈10100⧈00001⧈ and τ2( ⃗m,⃗c, W−1)yields

the string 110⧈100⧈110⧈100⧈001⧈(imagine looking at the mirror image of the model). Concatenating these with⊠in the middle results in:

11110⧈10100⧈00001⧈ ⊠110⧈100⧈110⧈100⧈001⧈

Since the pre-⊠portion of the string is inL3⋅someand the post-⊠ portion is in L5⋅some, the whole string is in L(3,5)cl.

c1 c2 c3 m2 m1 m3 m4 m5 W C M

Figure 7.2: Model for Example 7.1.3

Definition 7.1.4. Let Q1 and Q2 be DFAs accepting the monadic quantifier

languages LQ1 and LQ2, respectively. Construct the iteration DFA Q1⋅some

andQ2⋅some. Denote these byA1 andA2, respectively. The cumulation DFA

(Q1,Q2)clis given by: • Q = QA1∪ QA2 • Σ=ΣA1∪ {⊠} • δ(q, x) =⎧⎪⎪⎪⎪⎨ ⎪⎪⎪⎪ ⎩ δA1(q, x) q∈ QA1, x≠ ⊠ sA2 q∈FA1, x= ⊠ δA2(q, x) q∈ QA2, x≠ ⊠ • s=sA1 • F=FA2

Theorem 7.1.5. The language accepted by the DFA(Q1,Q2)clisL(Q1,Q2)cl

Proof. LetA1 andA2 denote the DFAQ1⋅someandQ2⋅some.

(⊆) Supposew∈ L(Q1,Q2)cl, sow=w1⊠w2 wherew1∈L(A1)andw2∈L(A2).

By definitionδ(s, w1) =f ∈FA1,δ(f,⊠) =sA2, andδ(sA2, w2) =f ′∈_F

A2,

soδ(s, w) ∈F.

(⊇) Supposeδ(s, w) ∈F. By construction, any path fromstof∈F consists of

sA1⋯f sA2⋯f′wheref ∈FA1andf′∈FA2. Clearlywmust consist of some

Fact 7.1.6. The state complexity of cumulation is twice the state complexity of iteration. More specifically, the size of(Q1,Q2)clis at most the size ofQ1⋅some

plus the size ofQ2⋅some. This upper bound is not reached in case bothQ1⋅some

and Q2⋅some have qT states. These can be merged since if either automaton

reaches a terminal non-final state, the cumulation automaton will not accept (a terminal non-final state of either iteration automaton is of course also a terminal non-final state of the whole cumulation automaton).

Theorem 7.1.7. Regular languages are closed under cumulation (using the language definition developed for this chapter allowing inverse relations). Proof. This result follows from the closure of regular languages under quantifier iteration and concatenation.

Finally, we also sketch the corresponding result for deterministic context-free languages. Since cumulation automata are essentially a simple sequential composition of iteration automata, this follows from the results of the previous chapter, and it is clear that our definitions of cumulation DFA can easily be extended to cumulation DPDA.

Theorem 7.1.8. Deterministic context-free languages are closed under cumulation.

Proof. Let LQ1 and LQ1 be binary deterministic context-free languages rec-

ognized byQ1 andQ2. Since Lsome is regular, and thus a DCFL, by Theorem

6.1.2,LQ1⋅someandLQ2⋅someare also DCFL, and we can construct their automata

Q1⋅someandQ⋅someusing Definition 6.2.2. DCFL are not in general closed un-

der concatenation; however,L(Q1,Q2)cl is defined by concatenating LQ1⋅someand LQ2⋅somewith a distinguished symbol⊠, it is clear that connectingQ1⋅someand

Q2⋅somewith a⊠transition as in Definition 7.1.4 (and using⊠as a marker for

Q1⋅someto empty its stack) yields a deterministic PDA recognizing the correct

language.

In document MoL 2014 14: An Automata Theoretic Perspective on Polyadic Quantification in Natural Language (Page 70-78)