4.4 Linear Mixed Modelling
4.4.2 General Procedure and Model Assumptions
In the linear mixed modelling analyses that follow, UGMs consisted solely of the grand mean of the continuous outcome variable plus random intercepts and/or slopes for subject. FGMs consisted additionally of the fixed effects of time, group and a time × group interaction, where the latter was found to add meaningfully to the model. With three repeated measurements, models were restricted to linear trajectories only, as additional measurements would be required for the fitting of non-linear or quadratic growth terms (Singer & Willett, 2003; Law et al., 2008). All linear mixed models were generated using the lme4 software package (Bates, Maechler, Bolker & Walker, 2015) in R (R Core Development Team, 2017).
Reported statistics from LMMs include fixed effect coefficients (β) and their standard error (SE), t-statistics, variance estimates for the random effects of intercept and slope, residual terms for random effects, and AIC. Statistical significance of fixed effects is reported using the Ken- ward and Rogers (1997) scaled Wald Z -statistic and Satterthwaite approximation of degrees of freedom, which takes into account variation in estimation of the variance-covariance matrix and is found to provide more accurate estimates in small-sample studies (Verbeke & Molenberghs, 2009; available in the lmerTest package of Kuznetsova, Brockhoff & Christensen, 2016). Model
4
Note that this occurred in only five instances (Models 15 to 17 for YARC passage reading rate, accuracy, and comprehension [all utilising scaled scores], and Models 20.1 and 20.2 for morphosyntactic and semantic error rate in writing, respectively)
fit is reported with AIC statistics for UGMs and FGMs, where a reduction in AIC is interpreted as an improvement in model fit. Raw AIC values are reported in tables, whereas change in AIC for each step in the model building process is reported in-text5. The intraclass correlation (ICC)
coefficient is reported for the FGM as a measure of consistency across subjects over time, where a high ICC is interpreted as a high level of within-subject consistency and as justification for the inclusion of random effects in the model (Bliese & Ployhart, 2002; Burton et al., 1998). Marginal and conditional pseudo R2 statistics are also reported, using the MuMIn package in R (Bartón,
2015). The lme4 package also allows for the separate estimation of intercepts and slopes by group: these analyses were conducted for each model in order to answer questions regarding group differences in developmental trajectories (reported in-text).
Basic checks for univariate normality included assessment of Q-Q plots, boxplots, and his- tograms, and calculation of the proportion of data points with a z-score of ≥ 1.96, 2.58, and 3.29 (given the assumption of a normal distribution, no more than 5% of z-scores should lie above 1.96; Field, 2012). Justification for the removal of outliers is provided when applicable. As a regres- sion framework, linear mixed modelling is subject to certain underlying distributional assumptions; specifically, residuals (µ) and estimated random effects are assumed to be normally distributed (i.e. centered at zero), and to have constant variance across covariates (Pinheiro & Bates, 2000). The lme4 package provides a number of exploratory data analysis tools and graphing capabil- ities for the investigation of LMM assumptions. For each fitted model, the following plots were generated and interpreted: histograms of all residuals; boxplots of residual variances by sub- ject; boxplots of residual variances disaggregated by group and time; scatterplots of fitted versus observed values as an indication of models’ accuracy in explaining the data; and normal plots for estimated random effects (Best Linear Unbiased Predictors). Pearson standardised residuals were utilised in order to compare plots across different models.
Traditional tests for the presence of highly influential observations (i.e. those with high lever- age) include Mahalanobis distance and Cook’s D. However, such tests are deemed not to be appropriate for LMMs due to their hierarchical structure, correlated errors, and inclusion of ran- dom effects (Bannerjee & Frees, 1997; Nieuwenhuis, Grotenhuis & Pelzer, 2012). Alternatively, case-deletion diagnostics provide one solution by iteratively deleting each subject (or observa- tion) and then refitting the model in order to observe changes in coefficients and model fit (West et al., 2007). This procedure was carried out for each FGM using the influence.ME package in R (Nieuwenhuis et al., 2012), and particular attention was paid to any changes in the t-value of the fixed effect of group. Given the repeated-measures nature of the data, preference was given to the removal of individual influential observations rather than whole subjects in order to preserve data6.
ICCs of all UGMs were positive, ranging from 0.12 to 0.88 (mean = 0.47), which may be interpreted as a justification for the inclusion of random effects, indicating a degree of non-
5
For example, ∆AIC is read as ‘change in AIC’ between different models.
6
As alluded to in Section 4.4, a major advantage of linear mixed modelling is the flexible handling of missing data. Just as LMM models are not required to resort to listwise deletion in the face of missing data, they similarly have the option not to do so in the case of influential data. For example, in the vector [5, 6, 15], only the third data point represents an outlier, and LMM has the option to remove only this data point, rather than the whole vector. Where possible, this strategy was applied in the present study. In some cases, however, only whole subjects and not individual observations were found to be highly influential, in which case these subjects were removed.
independence, i.e. within-subject consistency over time (Bliese & Ployhart, 2002). Due to space limitations, ICCs are reported for FGMs only. Order of analysis will follow the categorisations of dependent variables as presented in Table 4.1 on page 74, beginning with vocabulary and oral language measures, and then moving on to phonological processing and literacy measures. Each analysis will be structured identically, beginning with a conceptual summary of findings and then detailing the model fitting process. Graphical representations of group trajectories are provided in figures in order to aid interpretation. Shaded areas on figures represent standard error (in keep- ing with descriptive tables, green represents monolingual and blue represents EAL) and where available, population norming means of assessments are depicted in equal-dashed lines (i.e. y = 10 [scaled score] or 100 [standard score]). A summary of group-specific intercepts and slopes for all models is provided in Table 4.21 before the Discussion.