• No results found

size of tree

4.5 Multilevel models

The dataset that we will use for multilevel modeling is lexdec = read.table("DATA/lexdec.txt",T)

This data set is reported in Baayen and Hay [2004]. For 21 subjects, we have visual lexical decision latencies for 79 words, 44 refer to animals, 35 to plants (fruits and vegetables).

This kind of design is know as a repeated measures design: for each word, we have 21 repeated measures (one measure for each subject), or, alternatively, we have 79 repeated measures for each subject (one for each item). In this study, our interest is in how the words are processed, so we want to do several things simultaneously:

1. We want to know how the response latency depends on a range of variables captur-ing properties of the words.

2. We want to know whether these properties have the same effect for all subjects, or whether subjects differ with how sensitive they are to these properties.

3. We want to know whether properties of the subjects lexdec$RT = log(lexdec$RT)

lexdec2 = lexdec[lexdec$RT> 5.9 & lexdec$RT < 7,]

> nrow(lexdec)-nrow(lexdec2) [1] 45

> (nrow(lexdec)-nrow(lexdec2))/nrow(lexdec) [1] 0.02712477

> mean(lexdec$RT)+3*sd(lexdec$RT) [1] 7.109617

> mean(lexdec$RT)-3*sd(lexdec$RT) [1] 5.660563

lexdec2$Native = lexdec2$Country=="Anglo"

lexdec3 = lexdec2[lexdec2$Correct == "correct",]

par(mfrow=c(2,2))

plot(density(lexdec$RT))

plot(density(log(lexdec$RT))) plot(sort(log(lexdec$RT))) abline(h=5.9)

abline(h=7)

qqnorm(lexdec2$RT) qqline(lexdec2$RT) par(mfrow=c(1,1))

6.0 6.5 7.0 7.5

0.00.51.01.52.0

density(x = lexdec$RT)

N = 1659 Bandwidth = 0.04382

Density

1.75 1.85 1.95 2.05

024681012

density(x = log(lexdec$RT))

N = 1659 Bandwidth = 0.006893

Density

0 500 1000 1500

1.801.902.00

Index

sort(log(lexdec$RT))

−3 −2 −1 0 1 2 3

6.06.26.46.66.87.0

Normal Q−Q Plot

Theoretical Quantiles

Sample Quantiles

Figure 4.10: Distribution of RTs.

We load the multilevel modeling library, library(nlme)

written by Pinheiro and Bates [2000], and we begin with inspecting the effect of Trial, the trial number in the experiment.

xyplot(RT ˜ Trial | Subject, data = lexdec3, xlab = "Trial", ylab = "log RT",

prepanel = function(x, y) prepanel.loess(x, y, span = 1), panel = function(x, y) {

panel.grid(h=-1, v= 2) panel.xyplot(x, y)

panel.loess(x,y, span=1) })

We see effects for some subjects, not for others, and sometimes in opposite directions. Do we have a main effect in one direction?

lexdec3.lme = lme(RT˜Trial,

data=lexdec3, random=˜1|Subject) anova(lexdec3.lme)

numDF denDF F-value p-value (Intercept) 1 1534 47248.34 <.0001

Trial 1 1534 3.31 0.0693

Not very significant, so no main effect, i.e., no main trend towards learing or towards fatigue, but, still there might be practice effects for some subjects, and effects of fatigue for others. We can test this by addingTrialto the random effects part of the model.

lexdec3.lmeA = lme(RT˜Trial,

data=lexdec3, random=˜1+Trial|Subject) anova(lexdec3.lme, lexdec3.lmeA)

Model df ... logLik Test L.Ratio p-value lexdec3.lme 1 4 ... 555.1314

lexdec3.lmeA 2 6 ... 567.6075 1 vs 2 24.95213 <.0001

We made a new model, and compared it with the old model usinganova(), which car-ried out a log-likelihood ratio test. This test confirms that it makes sense to assume that there are differences between subjects with respect to how they respond to Trial (i.e., how they go through the experiment). If we inspect the main effect ofTrial, we see that it is no longer significant.

anova(lexdec3.lmeA)

numDF denDF F-value p-value (Intercept) 1 1534 49280.99 <.0001

Trial 1 1534 1.21 0.2722

Trial

Ingrid Junghwa Kate Marissa

6.0

Phuing Rebecca Rick Robert Sarah

Tehari Tomiko Victoria Waham

6.0

Figure 4.11: RT as a function of trial: trellis plot for subjects.

Once you take the differences between the subjects into account, you can see that there is no overall trend. You can describe this as an interaction betweenTrialand the main grouping factor,Subject. To see the differences between the two models, let’s consider the random effects in the model:

ranef(lexdec3.lme) (Intercept) Abby -0.095316943 Andrew -0.139028488 Anita 0.008304117 Christha -0.047403501 Daniel 0.036900454 ...

ranef(lexdec3.lmeA)

(Intercept) Trial

Abby -0.117064890 2.017390e-04 Andrew -0.180280990 3.907184e-04 Anita -0.030725951 3.716809e-04 Christha -0.068212268 1.971881e-04 Daniel -0.116685631 1.491633e-03 ...

In the first model, the only difference that we allow into the model for the subjects is that some subjects are faster and other subjects are slower. The numbers you see listed under(Intercept)are the adjustments that we have to make to the overall intercept to make the model precise for the separate subjects. The intercept of the model gives a kind of average that is not precise for the actual subjects. There is no such thing as a

’mean subject’, nevertheless, we want to have a good model for the individual subjects.

Therefore, we adjust the model for each subject by adding or subtracting the estimated by-subject random effects from the intercept. Note that in this model, this is the only difference between the subjects, in other words, it assumes that the effect of Trial is identical for all subjects. In the more complex model, we allow the effect ofTrialto vary from subject to subject. That is, we now have to adjust the slope of Trialsuch that it becomes adequate for each individual subject. This is done by the small adjustments that are listed for the random effects forTrialin the second column above. This is very clear from the next two plots:

plot(lexdec3.lme, predict(.)˜Trial|Subject, type="l") plot(lexdec3.lmeA, predict(.)˜Trial|Subject, type="l")

SinceTrialis no longer significant, we remove it from the model, and consider another variable, whether the subject is a native speaker of English.

bwplot(Native˜RT, data=lexdec3)

Trial

Ingrid Junghwa Kate Marissa

6.2

Phuing Rebecca Rick Robert Sarah

Tehari Tomiko Victoria Waham

6.2

Figure 4.12: Effect ofTrialin a model with a random effect only for the intercept.

Trial

Fitted values

50 100 150 6.2

6.4 6.6

Abby Andrew

50 100 150

Anita Christha

50 100 150

Daniel

Ingrid Junghwa Kate Marissa

6.2 6.4 6.6

Mira

6.2 6.4 6.6

Phuing Rebecca Rick Robert Sarah

Tehari Tomiko Victoria Waham

6.2 6.4 6.6

Winston

6.2 6.4 6.6

Zinook

Figure 4.13: Effect of Trialin a model with a random effect not only for the intercept but also forTrial.

RT

6.0 6.2 6.4 6.6 6.8 FALSE

TRUE

Figure 4.14: Distribution of reaction times for native (TRUE) and nonnate (FALSE) speak-ers of English.

This suggests that nonnative speakers took longer for their lexical decisions. Let’s add this factor to the model:

lexdec3.lmeA = lme(RT˜Native,

data=lexdec3, random=˜1+Trial|Subject) anova(lexdec3.lmeA)

numDF denDF F-value p-value (Intercept) 1 1535 64591.12 <.0001

Native 1 19 7.34 0.0139

A very important predictor in visual lexical decision is frequency, and when we add this to the model, we consider the possibility that it interacts with whether you are a native speaker:

lexdec3.lmeA = lme(RT˜Native*Frequency, data=lexdec3, random=˜1+Trial|Subject) anova(lexdec3.lmeA)

numDF denDF F-value p-value (Intercept) 1 1533 64658.94 <.0001

Native 1 19 7.44 0.0134

Frequency 1 1533 137.05 <.0001 Native:Frequency 1 1533 8.63 0.0034

Apparently, it does interact, and we look at the coefficients to see how:

round(summary(lexdec3.lmeA)$tTable, 4)

Value Std.Error DF t-value p-value (Intercept) 6.6770 0.0453 1533 147.5322 0.0000 NativeTRUE -0.2334 0.0595 19 -3.9210 0.0009 Frequency -0.0591 0.0060 1533 -9.7999 0.0000 NativeTRUE:Frequency 0.0232 0.0079 1533 2.9373 0.0034

Note that Frequency has a negative coefficient — you get faster if the frequency is higher — but that if you are a native speaker, you have to add 0.0232 to the coefficient ofFrequency, making it less negative. In other words, the frequency effect is stronger for non-native speakers.

Another variable that we can look at is the ontological class of the noun’s referent:

animal versus plant (fruit, vegetable, nut).

lexdec3.lmeA = lme(RT˜Native*Frequency+Class, data=lexdec3, random=˜1+Trial|Subject)

anova(lexdec3.lmeA)

numDF denDF F-value p-value (Intercept) 1 1532 64607.55 <.0001

Native 1 19 7.43 0.0134 Frequency 1 1532 139.01 <.0001

Class 1 1532 22.60 <.0001

Native:Frequency 1 1532 8.65 0.0033

Finally, we addLengthand the estimatedSize, averaged over subjects in the size rating experiment.

lexdec3.lmeA = lme(RT˜Native*Frequency+Class+LenEnglish+predSize, data=lexdec3, random=˜1+Trial|Subject)

anova(lexdec3.lmeA)

numDF denDF F-value p-value (Intercept) 1 1530 63990.31 <.0001

Native 1 19 7.37 0.0138

Frequency 1 1530 143.39 <.0001

Class 1 1530 23.31 <.0001

LenEnglish 1 1530 44.89 <.0001

predSize 1 1530 4.04 0.0446

Native:Frequency 1 1530 9.91 0.0017 A summary of the kinds of effects is again the table of coefficients:

round(summary(lexdec3.lmeA)$tTable, 4)

Value Std.Error DF t-value p-value (Intercept) 6.5267 0.0559 1530 116.8043 0.0000 NativeTRUE -0.2375 0.0594 19 -3.9999 0.0008 Frequency -0.0659 0.0067 1530 -9.8684 0.0000 Classplant 0.0622 0.0249 1530 2.5013 0.0125 LenEnglish 0.0139 0.0024 1530 5.8622 0.0000

predSize 0.0231 0.0113 1530 2.0386 0.0417

NativeTRUE:Frequency 0.0243 0.0077 1530 3.1480 0.0017

Note that longer words elicited longer latencies, as expected. Furthermore, words that were judged to be larger in size also elicited longer response latencies, suprisingly. Upon closer inspection, however, it turns out that the effect of word length is not the same for all subjects. If we add length to the random effects structure, thereby allowing the slope for length to be different from subject to subject, we get an improvement according to the log-likelihood ratio test.

lexdec3.lmeB = lme(RT˜Native*Frequency+Class+LenEnglish+predSize, data=lexdec3, random=˜1+Trial+LenEnglish|Subject)

anova(lexdec3.lmeA, lexdec3.lmeB)

Model df AIC BIC logLik Test L.Ratio p-value lexdec3.lmeA 1 11 -1298.421 -1239.622 660.2106

lexdec3.lmeB 2 14 -1312.917 -1238.082 670.4587 1 vs 2 20.49608 1e-04

And if we then inspect the anova table, you can see that the main effect ofNativeis no longer significant. This indicates that there are differences in how native and non-native speakers deal with word length.

anova(lexdec3.lmeB)

numDF denDF F-value p-value (Intercept) 1 1530 77140.73 <.0001

Native 1 19 1.48 0.2381

Frequency 1 1530 139.93 <.0001

Class 1 1530 8.73 0.0032

LenEnglish 1 1530 17.90 <.0001

predSize 1 1530 4.23 0.0399

Native:Frequency 1 1530 9.87 0.0017 Let’s look more closely at the random effects in this new model:

ranef(lexdec3.lmeB)

(Intercept) Trial LenEnglish Abby -0.016931244 1.168329e-04 -0.010514814 Andrew -0.063755535 3.722471e-04 -0.014374656 Anita -0.027408542 4.369463e-04 -0.006105166 Christha 0.002637974 -3.770859e-05 -0.002645735 Daniel -0.108946569 1.312312e-03 -0.002183162

We see we have three adjustments for each subject. As we are dealing with three proper-ties for each subject, it could be that the adjustmens are correlated. This possibility was explicitly left open by the way we modelled the random effects. You can see that these three vectors of adjustments are correlated:

cor(ranef(lexdec3.lmeB))

(Intercept) Trial LenEnglish (Intercept) 1.0000000 -0.6843181 0.7271104 Trial -0.6843181 1.0000000 -0.6024104 LenEnglish 0.7271104 -0.6024104 1.0000000

In fact, it is not necessary to calculate these correlations, because the lme() function provides much better estimates of these correlations, which you can see when you run summary(). In fact, these correlations are part of the parameters that the model esti-mates.

summary(lexdec3.lmeB) ...

Random effects:

Formula: ˜1 + Trial + LenEnglish | Subject

Structure: General positive-definite, Log-Cholesky parametrization

StdDev Corr

(Intercept) 0.1171583426 (Intr) Trial Trial 0.0006486677 -0.658

LenEnglish 0.0127279644 0.473 -0.504 Residual 0.1489753204

...

The first column lists four standard deviations. The first concerns the intercept, it is the standard deviation estimated for the population of by-subject adjustments for the inter-cept. So it describes the variation in the first column ofranef(lexdec3.lmeB)shown above. The second standard deviation describes the variance for the adjustments for Trial, and the third one those for word length. The final standard deviation concerns the standard error, the noise that remains unaccounted for. The remaining three numbers are the three pairwise correlations that the model estimated.

What you should realize at this point is that we invested three parameters in the corre-lations between our three random effects. The question arises here whether this is really necessary. To answer this question, we can run an lme()model that explicitly assumes that there are no such correlations. In order to run this model, we first need to create a new kind of data frame, namely, a data frame that mentions what is the dependent vari-able and what is the main grouping factor (Subject) in this example. If you look at the first few lines of the grouped data frame, you will see that it now has a formula attached to it.

lexdec3.g = groupedData(RT˜1|Subject, data=lexdec3) lexdec3.g[1:4,]

> lexdec3.g[1:4,]

Grouped Data: RT ˜ 1 | Subject

Subject Hand Sex Country Trial RT ...

1 Abby right F Anglo 23 6.340359 ...

2 Abby right F Anglo 27 6.308098 ...

3 Abby right F Anglo 29 6.349139 ...

4 Abby right F Anglo 30 6.186209 ...

...

We now runLME() with the same fixed effects (the variables listed in the main formula), and we now use a rather complex formula for specifying the random effects:

lexdec3.lmeC = lme(RT˜Native*Frequency+Class+LenEnglish+predSize, data=lexdec3.g,

random=pdBlocked(list(pdIdent(˜1), pdIdent(˜LenEnglish-1), pdIdent(˜Trial-1)))) What this random effects specification says is that we have adjustments for each subject

for the intercept, and then for word length, and then for trial. This model has fewer parameters (listed as df in the table below), and when we run the loglikelihood ratio test, we can see whether investing extra parameters (those for the correlations) is worth the trouble.

anova(lexdec3.lmeC, lexdec3.lmeB)

Model df AIC BIC logLik Test L.Ratio p-value lexdec3.lmeC 1 11 -1308.250 -1249.451 665.1251

lexdec3.lmeB 2 14 -1312.917 -1238.082 670.4587 1 vs 2 10.66711 0.0137 As you can see, the p-value is significant, so it indeed pays to have the additional

correla-tions in the model. In order to highlight the differences between the two models, consider the random effects of the (incorrect) model with uncorrelated random effects:

ranef(lexdec3.lmeC)

(Intercept) LenEnglish Trial Marissa -0.139943508 -0.0059269772 -1.762411e-04 Waham -0.027822362 -0.0069026178 -3.073388e-04 Christha -0.007170513 -0.0046285662 -6.030426e-05 Kate -0.160249930 -0.0069378666 3.609386e-04 Tehari 0.065270554 -0.0057416587 -4.119152e-04 ...

summary(lexdec3.lmeC) Random effects:

Composite Structure: Blocked Block 1: (Intercept)

Formula: ˜1 | Subject (Intercept) StdDev: 0.1117201 Block 2: LenEnglish

Formula: ˜LenEnglish - 1 | Subject LenEnglish

StdDev: 0.01391892 Block 3: Trial

Formula: ˜Trial - 1 | Subject Trial Residual StdDev: 0.0005686653 0.1490820

As you can see in the summary, this model has only 3+1 standard deviations as parame-ters, and no correlations between the three pairs of random effects.

As with any regression model, we have to check whether the model is doing a reason-able job. There are some very nice diagnostic plot functions that you can use. First a plot of residuals against fitted values:

plot(lexdec3.lmeB)

Fitted values

Standardized residuals

6.2 6.4 6.6 6.8

−2 0 2 4

Figure 4.15: Standardized residuals versus fitted values for modellexdec3.lmeB. Note that the scatter is greater for small fitted values than for large fitted values.

This might signal trouble. It may indicate that we need to relax the assumption of ho-moskedasticity. There are various functions that you could use to do this, we choose one here, and test whether the extra parameter that we need to build this function into our model is justified:

lexdec3.lmeD = lme(RT˜Native*Frequency+Class+LenEnglish+predSize, data=lexdec3, random=˜1+Trial+LenEnglish|Subject,

weights = varExp(form = ˜ fitted(.))) anova(lexdec3.lmeD, lexdec3.lmeB)

Model df AIC BIC logLik Test L.Ratio p-value lexdec3.lmeD 1 15 -1312.092 -1231.911 671.0458

lexdec3.lmeB 2 14 -1312.917 -1238.082 670.4587 1 vs 2 1.174204 0.2785 The p-values suggests that it is not worth the trouble to add this extra parameter, the

heteroskedasticity in the plot is not a source of worry.

A qqnorm plot also warns us that the there is something wrong with our model, which is not surprising as there is something wrong with the distribution of our RTs as well.

Figure 4.16 shows the qqnorm plots for the individual subjects in the experiment.

qqnorm(lexdec3.lme) # not shown qqnorm(lexdec3.lmeB,˜resid(.)|Subject)

One of the steps to take at this point is to remove outliers, and check whether the effects remain robust.

There are many outliers in this model, as you can see in 4.15. Let’s see whether any-thing changes when we take them out:

lexdec3$resid = resid(lexdec3.lmeB)

lexdec3$rstandard = scale(lexdec3$resid, scale=T) nrow(lexdec3[lexdec3$rstandard > 2,])

[1] 60

lexdec3.lmeB1 = lme(RT˜Native*Frequency+Class+

LenEnglish+predSize,

data=lexdec3[lexdec3$rstandard < 2,], random=˜1+Trial+LenEnglish|Subject) anova(lexdec3.lmeB1)

numDF denDF F-value p-value (Intercept) 1 1470 76805.18 <.0001

Native 1 19 1.16 0.2946

Frequency 1 1470 169.73 <.0001

Class 1 1470 10.94 0.0010

LenEnglish 1 1470 21.00 <.0001

predSize 1 1470 0.58 0.4472

Native:Frequency 1 1470 12.87 0.0003

A qqnorm plot of the residuals is now much better. Note that predSize is no longer significant. This indicates that its effect is present primarily among the outliers.

It is also possible to run a multilevel logistic regression model. To do so, you will need the library MASS, and then use the glmmPQL() function available in this library.

This function simplifies the handling of thenlme()function from thenlmelibrary. It is called as follows:

data.glmm = glmmPQL(ZeroOrOne ˜ predictor1 + ... + predictorN,

data=myDataFrame, family = "binomial", random=˜1|MyMainGroupingFactor) Note that you have to specify here that you are using a ”binomial” model (this tells

glmmPQL() to use the log odds ratio). Also note that this function requires a vector of zeros (failures) and ones (successes), and does not work with a binary vector with strings, as doeslrm()from theDesignlibrary.

An example of a data set where this function is useful is the NP versus PP choice for verbs with dative alternation.

cueni = read.table("DATA/cueni.txt",T)

cueni$Resp = as.numeric(cueni$realization_of_recipient)-1 table(cueni$realiz)

n p

Residuals

Quantiles of standard normal

−0.2 0.4

−2

2