Model Checking - Genuinely Polyadic Quantifier Languages are Not Context-Free

8.2 Genuinely Polyadic Quantifier Languages are Not Context-Free

9.1.1 Model Checking

Verification tasks have long been used in cognitive science in an attempt to study the neural and psychological representations of generalized quantifiers. Such tasks are basically model-checking: presented with a visual scene and a quantified sentence describing it, people are asked to judge whether the sentence is true of the scene. Controlling for features of the scene, scientists can study how particular semantic features affect processing by measuring response time, accuracy, and activation in particular brain regions among other things. Verification Studies Using Monadic Semantic Automata

McMillan et al. [34] were the first to investigate the neural bases of generalized quantifier comprehension by observing patterns of neuroanatomical recruitment using BOLD fMRI while people assessed the truth-value of a quantified sentence paired with a pictorial scene. McMillan et al. concluded that higher-order quantifiers such aseven andmost recruit the prefrontal cortex, including executive resources like working memory, while first-order quantifiers like some and at least three do not, but that both recruit the right inferior parietal cortex (indi- cating a numerosity component). They further claimed that this maps onto the distinction between DFA and PDA, the former being memoryless while the latter possesses memory in the form of a stack. Their subsequent study [35] further

supported that automata-based properties of quantifiers are related to the actual neural underpinnings of quantifier comprehension by studying people with particular neurological diseases. They found that patients with FTD (fron- totemporal demential) and AD (Alzheimer’s disease), which involve working memory limitations, had a harder time understanding higher-order quantifiers, while patients with CBD (corticobasal degeneration), which involves number knowledge impairment, had more trouble than the other groups, for both kinds of quantifiers.

Szymanik [48] points out that their interpretation of those results is not entirely correct since, as we saw in Section 3.2, divisibility or parity quantifiers, while not definable in first-order logic,are computable by (looping) finite automaton. Thus parity and proportional quantifiers cannot be lumped together. Szymanik and Zajenkowski [52] created studies to test whether there are interesting corre- spondences between computational models and logical definability when all the relevant distinctions are made. They compared the following three basic types of quantifiers:

• FO-definable (Aristotelian and counting), computable by acyclic finite automata

• Parity (computable by finite automata with loops)

• Proportional (computable by PDA)1

Additionally, they chose a counting quantifier of “high-rank” (requiring counting to at least seven or eight) based on the hypothesis that the number of states of the automaton has a greater impact on resource recruitment than the existence of loops. Fascinatingly, all the predictions based on structural dissimilarities in the automata were actually attested in the response times of the verification task. Proportional quantifiers required the longest time, followed by high-rank cardinals, then parity, and finally Aristotelian. A subsequent study [53] also demonstrated that proportional quantifiers place a higher demand on working memory than parity quantifiers.

The qualitative difference between proportional and other quantifiers is further corroborated by [57], finding that schizophrenic patients perform on par with healthy subjects in verification tasks with the exception of proportional quantifiers. This suggests the semantic automata model gives a partial explanation of the combined working memory and language deficits observed in those with schizophrenia. See also [58] and [59] for further research connecting the processing of proportional quantifiers and working memory.

Next we move on to verification tasks involving multi-quantifier sentences and whether semantic automata for polyadic quantifiers have similar predictive power. But first, we list a few of the interesting open questions remaining concerning the processing of simple quantifiers:

Question 9.1.1. [52] found that, while high-rank cardinals were more difficult than parity quantifiers, these two had the pairwise smallest difference among the types of quantifiers tested. Could further studies decide conclusively whether the number of states or the existence of loops has a greater effect on difficulty? For example, it might be informative to study processing times for cardinal quantifiers of even higher number, and parity quantifiers other thaneven and odd.

Question 9.1.2. In light of Kanazawa’s recent characterization of nondeter- ministic PDA (see Section 3.3), it would be interesting to compare processing times for DPDA and NPDA-computable quantifiers (for instance,more than 1/3 versusmore than 1/3 and less than 2/3). As it stands, it is unclear how to rec- oncile the existence of semantic automata positing non-deterministic algorithms with the deterministic nature of human cognition.

Predictions for Multi-quantifier Sentences Using Iterating Semantic Automata

After the publishing of [46] proposing stack iteration automata (presented in Chapter 4)–but before a general mechanism to generate iteration DFA was known–Szymanik, Steinert-Threlkeld, Zajenkowski, and Icard III [51] took the first steps toward answering whether iterated quantifiers actually utilize memory as the stack construction predicts. In particular, they compared stack versions ofevery⋅someandsome⋅everywith their minimal DFA versions, shown in Fig- ure 9.1 (the only difference is that our depiction ofevery⋅somehas a complete transition function, so there is an “extra” dead state looping on every symbol).

⧈ 1 1 ⧈ 0 0,1 0 0, 1,⧈ ⧈

(a)every⋅some

⧈ 0 0, 1 ⧈ ⧈ 0, 1 ⧈ 1 0, 1 (b)some⋅every

Figure 9.1: Iteration DFA used in multi-quantifier verification study They predicted that true instances ofevery⋅some would be harder than true instances of some⋅every, since in the former case one must run through the entire model to verify a sentence, and in the latter one need only find a single witness. They further predicted that the opposite relationship would hold for

false instances, since the negations of these iterations become some⋅noneand

every⋅not every, respectively.

Interestingly, their results were best explained by positing the stack model for

some⋅every and the DFA model for every⋅some. True instances of the latter were indeed more difficult, but the expected relation for false instances was not observed. Subjects took longer response time and were less accurate for every. . .some sentences, while the study showed that only some. . .every sentences engaged working memory. If their explanation is correct, it explains the memory engagement observed forsome. . .every sentences and suggests a strategy resembling the DFA is less reliable than strategy resembling the PDA, which is plausible.

It remains to be seen why these particular results were observed. Now that a completely general method is known for constructing iteration DFA, more empirical predictions comparing the stack and DFA versions may be made by judiciously choosing from the whole gamut of regular iterations. Since differences were already observed when the iterated quantifiers were as simple assome

andevery, we think there are likely interesting phenomena to observe by studying (1) differences between combinations of different types of quantifiers and (2) for a given type of combination, the difference between the two orderings. Our initial suspicion was that the DFA model would be appropriate for most simple applications, while a stack-like algorithm would be invoked for more difficult sentences, under various possible interpretations of “difficulty.” We might think that difficulty correlates with the number of states of the semantic automata. Recalling that forQ1⋅Q2 where ∣ Q1∣= m and ∣ Q2 ∣=n the stack

version has preciselym+n+1 states while the DFA version has on the order of

m⋅nstates, the state-space of the latter becomes unwieldy when one or more of the quantifiers is a high-rank cardinal. However, we also noted previously that a computation of the stack version always takes more steps than its DFA counterpart since it at least reads the entire input before processing its stack contents. For this reason, we should expect a mechanism utilizing memory like the stack version to take longer. This does not mesh with the conclusion of [51]; however, studies may now be done with quantifiers with a wider range of complexity (loops and state size) to see how response time and memory recruitment vary. As Steinert-Threlkeld said in [46], “Indeed, a general notion of complexity for automata in the context of language processing would be useful in this context.”2

One potential issue with teasing apart the explanatory contributions of the kind of iteration automata presented in this thesis and stack automata is that, once iterations involve simple quantifiers that are already themselves DPDA- computable, it is no longer a question of whether the proper model has a stack or not (because even the version defined here must utilize a stack). There is

2_{Note that, for example, state-space is probably not a good metric for the complexity of}

still a qualitative difference between DFA/DPDA iteration automata and stack automata, which is more apparent if one considers the generalizations in Sections 5.2 and 5.3.

In [46], Steinert-Threlkeld comments on the finding that parity quantifiers, which have both DFA and PDA representations,3 recruit working memory, suggesting “This provides prima facie reason to believe that working memory will be recruited when processing sentences with multiple quantifiers each computable by a DFA. This would show that the PDA [stack] representation more closely resembles the actual processing mechanism.” We note this is not necessarily the case. It could be that a good model is given by using the DPDA representation ofevenin one of our definitions from Section 6.2 instead of using either the DFA or DPDA representation of even in the stack model. Perhaps eye-tracking studies could provide evidence for one or the other method when memory-recruitment cannot be the distinguishing factor.

In document MoL 2014 14: An Automata Theoretic Perspective on Polyadic Quantification in Natural Language (Page 97-101)