Conclusion - The Processing of Lexical Sequences

In the three experiments presented here I looked at the subjective frequency of words and n-grams and how it related to their objective frequency. In Experiment 1, I found that, similar to those for words, the subjective ratings of frequency for n-grams were correlated with their corpus frequencies. In Experiment 2 I introduced a relative frequency judgement task and applied it to the relative frequency of words. In Experiment 3 I extended this task to

n-grams, and I saw that the ratio of the frequencies of n-grams can predict the likelihood of correctly choosing the higher frequency n-gram in a forced choice task. My efforts to remove lexical frequency cues by matching stimuli by the geometric mean of their component word frequencies were successful, as I saw no predictive input from word frequencies or the ratio of their word frequencies. Relative n-gram frequency was the key predictor of accuracy in Experiment 3. These results imply that people have implicit access to the relative frequency of n-grams, and that n-gram and word probabilities are involved in our processing of n-grams, just as letter and word probabilities are involved in our processing of words.

Subjective frequency knowledge is often used implicitly in many tasks, linguistic and non-linguistic, such as word segmentation (Saffran, Aslin, & Newport, 1996), lexical recognition (M. S. Seidenberg & McClelland, 1989), visual object perception (Kirkham, Slemmer, & Johnson, 2002), social learn- ing (Bandura, 1997) and many others. It is not surprising to see subjective frequency effects for groups of words, but the sheer number of n-grams that humans are exposed to in our lives makes it difficult to see how it is possible to keep track of our familiarity with each n-gram. This concept of a mental lexicon, with various entries for each word or compound word or n-gram, has

been recently criticized by Elman (2009, 2011) and Dilkina, McClelland, and Plaut (2010b). Using Elman’s ideas, I seen-gram representations as dynamic, interactive relations between many types of non-symbolic knowledge. Memory systems could be interacting with comprehension systems and production systems when reading n-grams. In this kind of representation, recall of episodic memory traces, ease of articulatory simulation and ease of semantic accessi- bility all contribute to our ability to judge the absolute and relative frequency of n-grams. Frequency of exposure and the depth of the entrenchment of n- grams can contribute to the strength of a representation in all of these mental systems, and this could explain why my data show such a consistent influence of n-gram frequency on performance in my tasks.

Another way that n-gram subjective frequency may emerge is from the sensation of fluency which arises from accessing the meanings of an n-gram. Much as lexical access takes longer for words that we do not know the meaning of,n-gram access may take longer for n-grams that we do not know the meanings of. If n-gram subjective frequency emerges from the same processes that produce lexical subjective frequency, and if subjective frequency is related to speed of recognition, then we can look at recent models of word recognition for ideas on how this may happen. Some recent models posit a process of accumulation of evidence when we read and recognize words (Norris & Kinoshita, 2008; Dilkina et al., 2010b; Baayen et al., 2011). One of these models, the Naive Discriminative Reader (NDR) has already been applied to modeling the reading of n-grams. It is important to note that the NDR does not assume separate representations for word forms or n-gram forms, but rather shows the emergence of morphological and lexical effects using nothing but sub-lexical probabilistic information. Baayen and Hendrix (2011) used the NDR model to predict reading times for the stimuli used by Arnon and Snider (2010). The NDR model predicted the reading time from the model’s knowledge of the statistical properties of pattern of letters and letter bigrams in the input. This model is an example of the kinds of long-term memory traces that are being created from our experience with words andn-grams– distributed probabilistic traces. More work will need to be done to link subjective frequency models to

models of reading, but I feel that this may be a promising direction to head in if we are to discover what creates the qualia of word or n-gram frequency.

In my results there were some effects that I could not predict when I de- signed my studies, but that I was able to detect due to the correlational design of my experiments. By choosing stimuli that simultaneously covered a broad span of frequencies and frequency ratios I was able to capture the influence of component n-gram frequencies. For example in Experiment 1 I found that both the n-gram frequency and the split bigram frequency bf3 contributed to predicting participants subjective frequency ratings. This result suggests that the first and third words are salient for subjective frequency judgements in 3-grams, but not for other n-grams. These split-grams may be related to

discontiguous subtrees proposed by Bod (2009). They are used in Bod’s data- oriented parsing (DOP) model to help explain our ability to parse nonadjacent dependencies such as “BA carriedmorepeoplethancargo in 2005” (Bod, 2009, p. 764). This discontiguous subtree, more XX than bears a striking resemblance to the split 3-gram, and the influence of the split 3-gram’s frequency might provide some behavioural support for parsing models that allow these discontiguous constructions. In contrast, most of the other split-grams that I included in my analyses (see Section 2.6 for the full list) had no detectable influence on the outcomes. The only other time a split-gram entered one of my models was in the relative frequency accuracy model for 5-grams, when the 3-gram, consisting of the first, second and fifth words of the 5-gram rose in importance above the other variables. More evidence will need to collected before any links can be made between probabilistic reading models and syntactic models that presume a representation for many different types of split-grams. Our results also hint at the existence of differences in the amount of influence of the various grain-sizes. If there was a recurring size of n-gram that predicted performance in all of my tasks, it was the 3-gram. In Experiment 3, 3-gram frequency ratios were found to be the most salientn-grams for judging the relative frequency of 3-, 4- and 5-grams. One possible explanation for this could be that the probability of seeing groupings of three words provides a par- ticularly strong signal to the language system compared to other size n-grams.

This result extends the work of Tremblay and Tucker (2011) by finding that 3-gram ratios were being used in my tasks, just as they found that probabilistic information in 3-grams was being used more than other n-gram sizes to recognize 4-grams in their experiment. Furthermore, in both absolute (Exp. 1) and relative (Exp. 3) frequency judgments for 5-grams, the frequency of their component n-grams were more predictive of the outcome than the frequency of the 5-gram itself. These results suggest that when we read longer n-grams, the subjective frequency of the shorter internal n-grams is involved somehow in the process. This type of converging evidence strongly supports continued exploration of the contribution of internaln-gram probabilities in future analyses ofn-gram processing. Such analyses could explore whether specific subset of internal probabilities will come into play in the majority psycholinguistic tasks.

If 3-gram frequency is implicated in the processing of 4-grams and 5-grams, I speculate that 3-gram probability information is being used continuously dur- ing reading longer streams of text, and is being done so implicitly. This simul- taneous interaction between the probabilities of multiple n-gram components in my evidence bears a striking resemblance to recent results from research into processing polymorphemic and compound words, where lexeme/morpheme frequency and meaning are all simultaneously involved in processing, even when they are not required, or even helpful, for the task (Kuperman, Dambacher, Nuthmann, & Kliegl, 2010; Kuperman et al., 2008; Gagn´e & Spalding, 2009; Juhasz & Berkowitz, 2011). This probabilistic interaction within n-gram processing buttresses the argument that n-gram processing may be analogous to word processing, with the only difference being the length and probabilistic complexity of the input. The possibility that words and n-grams are somehow represented as entries in a lexicon, and that there is a search process across this lexicon as proposed by Forster and Hector (2002), looks increasingly un- tenable. The sheer number of representations that would be required in a localist model of language that included wordsand n-grams in a lexicon would be around 109_{, and even if this search could proceed faster than the fastest}

My results are compatible with an emergent account of lexical processing that does not depend on unique representations for words or n-grams.

The work presented in this chapter supports the notion that n-gram probability is a new and important element in psycholinguistics, one that will allow us to explore language processing in new ways. The vast majority of models for word and sentence processing have thus far avoided dealing with the impact of arbitrary n-grams on language performance. I have presented experimental evidence that the granularity of language extends beyond words to n-grams, and that the probability ofn-grams influences their subjective frequency. The evidence I have presented here, built upon the work of many others, suggests that subjectiven-gram probability effects exist at many grain-sizes. Consider- ing the accumulation of evidence presented here, the time has come to bring

n-gram probability information into language processing models. New models of reading, such as the NDR model (Baayen et al., 2011) that can predict

n-gram frequency effects and incorporate linguistic knowledge of patterns of varying sizes and levels of abstraction will give us the necessary context to better understand experimental results and to determine what cognitive limi- tations shape our ability to processn-grams. There may be fundamental upper bounds to the complexity of the probabilistic information that we can use when reading n-grams and those constraints will require further exploration before they become clearly defined.

In document The Processing of Lexical Sequences (Page 73-78)