2.7 Distribution of n gram frequency ratios in Experiment 3
3.5.5 Third Analysis: Number of Fixations
The number of fixations made during the reading of a trigram is another measurable variable of interest. Would my measures of probability and infor- mation help me predict how many fixations each participant would make on each trigram? The analysis of the fixation counts follows.
To make sure that my analysis was not biased by the number of regressive saccades in each trial, I subtracted the number fixations after regressive sac- cades in each trial from the total number of fixations. The median number of fixations was 5 and the standard deviation was 1.57. The distribution of these fixation counts was still skewed, and so I applied a log transformation to the fixation counts. After transforming the fixation counts the skewness, g1, of
Table 3.8: Model Comparisons for models predicting total fixations for a tri- gram. ∆AIC denotes the change in AIC between two models. All random slopes were for the random effect of subject.
AIC ∆AIC
Relative Model Likelihood Model 1: Random intercepts for Participants and Items with random
slopes for PrevTrialDur and sTrial
-214
Model 2: Model 1 + random slopes for sPMI and sPMI -248 -34 3e+07 Model 3: Model 2 + cc2 -272 -24 1e+05 Model 4: Model 3 +n-gram frequency -282 -10 1e+02 Model 5: Model 4 + sPMI×cc2 -292 -10 1e+02
−2 −1 0 1 2 3 4 4.4 4.5 4.6 4.7 4.8 4.9 sPMI Number of Fixations no W
ord 2 is closed class.
yes
Figure 3.4: Partial effect of the interaction between Pointwise Mutual Infor- mation and class of second word in predicting the number of fixations.
that I entered into all of the models.
I added random slopes per subject for predictors along with the predic- tors themselves to find out if the effect was generalizable. In this process, I retained three new random slopes for each subject: Pointwise Mutual Infor- mation (sPMI), longitudinal effects (sTrial) and previous trial duration (Pre- vTrialDur). In Model 3, the closed class status of the second word was added, causing an increase in model likelihood. In Model 4, n-gram frequency was added, and in the final model, Model 5, an interaction between sPMI and cc2 was added, which was the best model of all.
Table 3.9: MCMC-based estimates for the coefficients for the fixed effects in the linear mixed effects model fitted to the observed total fixations.
Estimatedβ β¯M CM C HPD lower HPD upper pM CM C
Intercept 0.7299 0.7256 0.6296 0.8252 0.0002 n-gram frequency -0.0072 -0.0071 -0.0117 -0.0027 0.0012 PC1 0.0226 0.0227 0.0156 0.0290 0.0002 PC2 0.0452 0.0454 0.0372 0.0529 0.0002 PC3 -0.0146 -0.0146 -0.0194 -0.0100 0.0002 PC4 -0.0225 -0.0225 -0.0288 -0.0151 0.0002 PC5 -0.0633 -0.0634 -0.0717 -0.0553 0.0002 sTrial -0.0344 -0.0344 -0.0569 -0.0130 0.0040 PrevTrialDur 0.2084 0.2096 0.1843 0.2361 0.0002 sPMI 0.0202 0.0199 0.0040 0.0363 0.0176 cc2 0.0459 0.0459 0.0284 0.0631 0.0002 sPMI×cc2 -0.0232 -0.0232 -0.0360 -0.0114 0.0002
The fixed effect coefficients for Model 5 are shown in Figure 3.9. As with total duration, PC1-PC5, sTrial and PrevTrialDur all contributed to explain- ing variability. Above and beyond all of these predictors, the two predictors of greatest interest are cc2 and n-gram frequency. There was an increase in the number of fixations when the second word was a closed class word. Finally there was a decrease in the number of fixations forn-grams of higher frequency. The partial effect of the interaction term in the model is shown in Figure 3.4. The estimated standard deviations for all of the random effects in my model are reported in Table 3.10 along with the 95% highest posterior density credible intervals from the MCMC simulations. All of the standard deviations are within the 95% HPD intervals.
Table 3.10: MCMC-based estimates for the random effects in the linear mixed effects model fitted to the observed number of fixations.
Standard Dev HPD lower HPD upper Random Intercept: Item 0.072 0.060 0.070 Random Intercept: Subject 0.049 0.014 0.057 Random Slope: PrevTrialDur for Subjects 0.044 0.032 0.063 Random Slope: sTrial for Subjects 0.014 0.009 0.023 Random Slope: sPMI for Subjects 0.209 0.087 0.197
Residual 0.232 0.230 0.235
3.5.6
Discussion
Our best model predicting the number of fixations showed linear effects of frequency and PMI, but no interactions between frequency and PMI, and no involvement of any information content variable at all. Why did none of infor- mation content predictors improve the quality of the model during the stepwise model selection? It seems logical that more predictable trigrams should have fewer fixations, but I did not find any such effects. This is quite different from the situation with both total duration and regressive saccades.
There was more efficient reading (fewer fixations) for high frequency n- grams, a replication of the effect found by Siyanova-Chanturia et al. (2011). The interaction of PMI and cc2 is of primary interest because when the second word was a closed class word, PMI had almost no effect on the number of fixations. When the second word was an open class word, higher PMI trigrams had more fixations. It is unclear why this should be but I can speculate that the coherency of trigrams created the need for caution during the planning of the saccades. The care taken during the reading of the coherent trigrams may have influenced the number of fixations.
The final group of variables I analyzed were the sub-gazes. Considering the trigrams as unitary wholes, the full gaze time should include all of the fixations made during the reading of that trigram. I divided this gaze into sub-gazes in much the same way that other researcher have done when looking at the sub- gazes made within a compound word (Kuperman et al., 2008). I hypothesized that there would be an unfolding of information that could be detected by modeling the sub-gazes. I defined two sub-gazes, the first sub-gaze (SG1) and the second sub-gaze (SG2). SG1 is the sum of the fixations on the first word
in the trigram before there are any intra-word regressive saccades. SG2 is the sum of the fixations on the first and second words in the trigram before there are any inter-word regressive saccades or intra-word regressive saccades.