Pioneered by Laird and Ware (1982), LME models are a common modelling approach in longitudinal data sets where there is variation between-individuals through a random effects term and within individual dependence among the repeated observations. They are flexible models which are very good at handling data
imbalances in longitudinal datasets, where the number of observations per individual is not the same. LME models include random effects, which are multivariate Normally distributed, along with the inclusion of the fixed effects. This therefore allows for analysis to be performed upon the between-individual (random effects) and within- individual (fixed effects) variation in the repeated observations over time. The
estimated within-individual changes over time can be referred to as growth curves or latent developmental trajectories, which can vary in their characteristics from person to person. The estimated growth parameters are responsible for explaining the changes in the average responses from the population and can predict the individual trajectories changes over time. The LME model takes a division of the regression parameters which randomly vary from individual to individual into fixed and random effects, which results in a single trajectory for the entire population and individuals vary around this trajectory. This is therefore taking into account the natural
heterogeneity from the entire population or sample being examined. The idea is that individuals have their own developmental trajectories with a subject-specific mean response over time, making the subset of regression parameters viewed as random. This mean response from individuals is a mix of attributes which are assumed to be
64
shared by all individuals in the population (fixed effects) and individual-specific features which are exclusive to each individual (random effects). Incorporating random effects allows the covariates to be measured as functions of time among the repeated responses. These random effects can be interpreted as exhibiting the natural heterogeneity that occurs in the population from the factors which are not measured.
Let πππbe the response for individual π (π = 1, β¦ , π) at the ππ‘β time occasion (π =
1, β¦ , ππ), where ππ denotes the number of responses observed from individual π. Assume also that the πππ are continuous and Normally distributed. Let π p and π denote the number of fixed effects and random effects parameters respectively. Define πΏππ = (πππ1, πππ2, β¦ , ππππ) to be the fixed effects covariates for individual π at time occasion π and πππ= (πππ1, πππ2, β¦ , ππππ) to be the random effects covariates. Also define π· = (π½1, β¦ , π½π)to be a p-vector of unknown regression coefficients for the fixed effects. Assume πππ~ππππππ(0, π2) and the ππ = (ππ0, ππ1, β¦ , πππ) parameters are multivariate Normally distributed with mean zero and variance-covariance matrix Ξ¦:
ππ~πππ(π, Ξ¦).
Then the general form of the LME can be defined as follows:
πππ=
π½0. πππ0+ π½1. πππ1+, β¦ , +π½π. ππππ+ ππ0. πππ0+
ππ1. πππ1+, β¦ , +πππ. ππππ+ πππ.
( 5.1 )
In the LME model these are usually identical β thus fixed effect covariates of age and age-squared also mean that are random effects covariates of age and age-squared.
65
trajectories so it is necessary to use polynomial growth curves. Therefore, the first column of ππis a vector of ones and the other columns are the polynomial time transformations of a chosen order π . For example, in a cubic model, the entries of
πΏππ would be 1, π‘ππ, π‘ππ2 and π‘ππ3, where π‘ππ is a time measurement such as age for individual π time occasion π.
The typical structure of the model includes the estimation of the intercept and slopes, at both the individual and group level, which are represented via time effect
covariates such as age. The random effects addition to the model represents the variance of the intercept and the growth parameters. The LME model makes the assumption that the variation in the responses is accountable to the variation within- individual and to variation between-individuals. The within-individual variation is the deviation between the individual observations πππ and the linear trajectory. The betas in the fixed effects part of the model are used to define the trajectory pathway
π½0+ π½1. ππ1+, β¦ , +π½π. πππfor individual π where πππ is (typically) the ππ‘β power of π‘π. Each individual has their own intercept and slope and the within-individual variation is reflected in the deviance between the observations and individual trajectories.
For the datasets in this chapter, an extended form of the LME model is used, which allows the response to be a count variable. The aim is to estimate the mean number of the counts of convictions over time for the entire population, as well as obtaining predictions of individual counts of convictions over time.
5.2.1 Extending the model for count data
Standard LME models are limited to using a continuous dependent variable. The model therefore needs to be extended to accommodate for a count dependent variable and to allow a sample of the regression coefficients to vary randomly
66
between individuals. A generalised linear mixed effect model (GLMM) can be used for this. The GLMM is an extension of a Generalised Linear Model (GLM) to
longitudinal data, by building upon the LME approach (Fitzmaurice et al., 2004). In a GLMM, the assumption is made that any of the responses by individualsβ are
independent observations from an exponential family of distributions. For example, if the response variable πππ is a count, then the Poisson distribution is usually a
sensible choice. As the response variable from the two conviction datasets is the total number of convictions, the Poisson distribution will be the chosen distribution for the models in this chapter.
5.2.2 Linear Mixed effects model for count data
Let πππ‘ be the observed number of convictions for offender π in time period π‘. It is assumed that the polynomial used to represent the mean trajectory is cubic in this development; this assumption can easily be changed to other orders of polynomial.
The GLMM model for count data can be written as;
πππ‘~ππππ π ππ (πππ‘) With log(πππ‘) = π½0+ π½1π‘ + π½2π‘2+ π½3π‘3+ πΌ0π+ πΌ1ππ‘ + πΌ2ππ‘2 + πΌ3ππ‘3 Where πΆπ = [ πΌ0π πΌ1π πΌ2π πΌ3π ] ~ πππ (π, [ π£00 π£01 π£02 π£03 π£01 π£11 π£12 π£13 π£02 π£12 π£22 π£23 π£03 π£13 π£23 π£33 ] )
and where MVN represents the multivariate normal distribution.
67 unknown covariance terms.
The above model is a log-linear regression model that includes intercepts and slopes that are allowed to vary randomly. The model assumes a Poisson distribution for counts, which are conditional on the random effects.