A. Simple lexical decision
2. How is context represented?
I have so far focused on how context affects responses to stimuli. But we still need an account of what types of representations or processes can account for these responses. Several possibilities have been considered in the linguistic and psycholinguistic literature. At one extreme, some linguistic theories argue that words (specifically, open-class roots) bear no syntactic information whatsoever (Borer, 2005; Marantz, 1997). To explain the behavioral effects, these theories could argue that reading usually involves syntactic processing. The artificiality of the task does not overcome the expectations of the system, and the
independent syntactic system kicks on when exposed to the word. This syntactic activity could feed into the lexical decision. Such an account would be difficult if not impossible to distinguish from theories that include syntactic information in the lexicon.
Other accounts annotate words for syntactic features. In these theories, syntactic features are matched against labeled positions in syntactic trees to ensure that each word is slotted into its appropriate position. This is the general logic behind terminal productions in context- free grammars (CFGs). For example, in the production N → chicken, the nonterminal N category is equivalent to a syntactic annotation for chicken that constrains its distribution in the broader syntactic system. These annotations can be much more elaborate than simple part-of-speech labels. For example, in Lexical Functional Grammar (LFG; Bresnan, 2001; Neidle, 1994), lexical entries are marked for the complements (arguments) they take, both at the functional (e.g., agent) and structural level (e.g., subject). This information is stored as categorical feature labels that allow the word to trigger syntactic building processes. These
theories can therefore account for the relationship between verbs and the diversity of structures studied in Linzen et al. (2013) without having to invoke spontaneous task-
irrelevant syntactic activity. However, these theories are explicitly non-probabilistic. In order for these theories to be correct, the number of syntactic structures should explain the RTs better than the frequency distribution of a targets across those structures. Earlier probabilistic findings would be recast as noisy approximations of the number of syntactic types per word (type count and entropy are positively correlated).
Another class of theories emphasizes the functional and theoretical similarities between words and syntax. Many such theories even go so far as to define syntax as simply the most abstract end of the lexicon (e.g., Langacker, 1987; Goldberg, 1995; Diessel, 2015). Words relate directly to syntactic structures, similar to the representations in LFG. However, they can relate to any type of syntactic structure, irrespective of whether they are functional heads of that structure. This more inclusive position predicts that syntactic distributions beyond the subcategorization frames studied by Linzen and colleagues should also impact processing. Possible support for this comes from Baayen et al. (2011), who found distributional effects for a syntactic structure in which the noun is not head.
Another important feature of these theories is that they treat word–syntax relationships as fundamentally probabilistic (Diessel, 2015). Relationships between nodes in the network are tuned by experience: the more often and more distinctively that two linguistic units are experienced in close syntagmatic or syntactic conjunction, the stronger the connection between them (and the weaker the connections between these and other structures). This model allows for a straightforward interpretation of the distributional effects observed in
lexical decision. These effects could arise through a pattern of feedback between lexical and syntactic nodes, where the local feedback potentials are proportional to frequency.
Psycholinguists have proposed their own set of models of the lexicon and lexical access specific to comprehension (Baayen et al., 2011; Coltheart et al., 2001; Davis, 2010; Grainger & Jacobs, 1996; Norris, 2006; Plaut, 1997; Morton, 1978; Seidenberg & McClelland, 1989). None of these models contains a syntactic component, but neither do any specifically
preclude syntax. However, as pointed out by Norris (2013), some models are more flexible than others. For example, the interactive activation models of Seidenberg and colleagues (e.g., Harm & Seidenberg, 2004; Seidenberg & McClelland, 1989) require a new component and interfaces between that component and the others. The fundamental organization of the model would change, but the functional properties would remain the same. Specifically, a tier of syntactic nodes would be added, with connections at least to orthography/phonology, and likely to semantics as well. Prototype effects could be modeled by setting resting activation at the syntactic tier according to the prototypical distribution. Other models, such as the Bayesian Reader of Norris (2006), simply need to “plug” syntax into the existing machinery. For example, syntactic information could be fed into the prior probability of the Bayesian equation. Similarly, task-specific models such as the Drift-Diffusion Model of two- way choices (Ratcliff et al., 2004) could also easily accommodate new information streams when making predictions about behavior in lexical decision.
Other models might not need any change at all. Baayen et al. (2011) introduce a two-tier network of input orthographic nodes and output meaning nodes. They couple this network with an expectation-based, error-driven learning algorithm (Rescorla & Wagner, 1972). The
network was able to model the syntactic paradigm effects from Milin et al. (2009). This means that a morphological paradigm exerted its effect without being represented in the model! They explain the success of the model in terms of discriminative learning. Syntax provides stable points of variability within the input. For example, English prepositional phrases define a position relatively close to nouns in which prepositions may vary. Over time, this variability helps to carve away the incidental aspects of the context to solidify the connection between the noun's form and its meaning. What remains is the most cross-
contextually stable meaning that coincides with the presence of the noun. With more diverse distributions come stronger and more targeted inferences from noun form to meaning. Similarly, prototypical words, whose distributions match the expectations of the system the best, stand to benefit the most from the contextual variability that drives learning. This leads us to the third hypothesis:
The argument from discriminative learning runs into a problem with the findings of Linzen et al. (2013), who found no effect of diversity or prototype measures on lexical decision RTs. Why should the discriminative logic play out for nouns but not verbs? A possible answer presents itself if we consider the different ways that the two studies defined their syntactic distributions. Baayen and colleagues looked at lexical variation within a single syntactic construction. Their measure therefore amounts to a syntactically constrained version of the lexical co-occurrence measures discussed in Bullinaria and Levy (2012), which are typically interpreted as capturing semantics, not syntax. Furthermore, by looking at lexical variation, Baayen and colleagues bias the question in favor of the abilities of their two-tier model. By contrast, Linzen and colleagues looked at syntactic variation within a
single lexical item. The distributions they considered were based on abstract syntactic templates. Therefore, they specifically ignore the lexical contribution of the syntactic context, where Baayen and colleagues rely on it completely. Without the overt lexical cues for the different syntactic constructions, discriminative learning may not apply. The question is swhether other measures of syntactic diversity and prototypicality will likewise produce null results for nouns once lexical cues have been filtered out.
The above literature review suggests three general hypotheses about possible syntactic effects on lexical recognition. These three hypotheses relate to syntactic diversity, measured categorically and probabilistically, and prototypicality. They are outlined below:
― categorical hypothesis: words that are attested in more syntactic constructions are recognized faster.
― probabilistic hypothesis: nouns that are distributed more uniformly across the syntactic structures in which they occur will be recognized faster.
― prototypicality hypothesis: nouns with syntactic distributions that resemble that of the prototypical noun will be recognized faster.
There are two points to note about these hypotheses. First, the two diversity measures, categorical and probabilistic, are treated separately. This is because the number of available syntactic structures is logically independent of the frequencies with which a word occurs in those structures, and so may independently affect RTs. Second, prototypicality is treated alongside the diversity measures. This is because Linzen et al. (2013) found
neurophysiological evidence that diversity and prototypicality tap into separate mental processes. Therefore, predictions about effects from these two sources are not logically
attached to the same null hypothesis.
Across the three hypotheses, there are eight possible outcomes. Of these, only four are seriously associated with (psycho)linguistic theory. These patterns along with the compatible theories are given in Table 1. Pluses indicate support for the hypothesis; minuses indicate no support.
Table 1: Possible outcomes across hypotheses and compatible theories. Diversity
Categorical Probabilistic Prototypicality Supported theory
+
+
+
Baayen et al., 2011 Diessel, 2015 Goldberg, 2006-
+
+
+
-
+
not predicted+
+
-
not predicted+
-
-
Bresnan, 2001 Chomsky, 1995-
+
-
not predicted-
-
+
not predicted-
-
-
Linzen et al., 2013The discriminative learning and usage-based models are compatible with positive results for probabilistic diversity and prototypicality. Two outcomes meet this requirement, shown in the first two rows of Table 1. The difference between these outcomes speaks to a
secondary question regarding the independence of categorical and probabilistic diversity effects. Do they tap into distinct processes, or are the categorical measures merely worse
approximations of the same phenomenon underlying both measures? If the latter is true, the effect of probabilistic measures should swallow that of the categorical measures. If not, we should see independent effects of each. Importantly, we shouls not see an effect of
categorical diversity and prototypicality, but not probabilistic diversity. This outcome would require that probabilistic information is represented and exploited by the prototypicality system but ignored by the diversity system. No theory reviewed here could explain this outcome in a principled way.
In the next section, I introduce categorical, probabilistic, and prototypical measures of the syntactic distributions of nouns. I adopt a lower-level approach than Linzen et al. (2013) based on Dependency Grammar (Hudson, 2007; Mel'čuk, 1988; Nivre, 2005; Tesnière, 1959) that accounts for both word order and headedness in a straightforward way. With the help of these measures, we can evaluate whether (truly) syntactic distributions affect isolated noun processing and, if so, how.