• No results found

Pioneered by Laird and Ware (1982), LME models are a common modelling approach in longitudinal data sets where there is variation between-individuals through a random effects term and within individual dependence among the repeated observations. They are flexible models which are very good at handling data

imbalances in longitudinal datasets, where the number of observations per individual is not the same. LME models include random effects, which are multivariate Normally distributed, along with the inclusion of the fixed effects. This therefore allows for analysis to be performed upon the between-individual (random effects) and within- individual (fixed effects) variation in the repeated observations over time. The

estimated within-individual changes over time can be referred to as growth curves or latent developmental trajectories, which can vary in their characteristics from person to person. The estimated growth parameters are responsible for explaining the changes in the average responses from the population and can predict the individual trajectories changes over time. The LME model takes a division of the regression parameters which randomly vary from individual to individual into fixed and random effects, which results in a single trajectory for the entire population and individuals vary around this trajectory. This is therefore taking into account the natural

heterogeneity from the entire population or sample being examined. The idea is that individuals have their own developmental trajectories with a subject-specific mean response over time, making the subset of regression parameters viewed as random. This mean response from individuals is a mix of attributes which are assumed to be

64

shared by all individuals in the population (fixed effects) and individual-specific features which are exclusive to each individual (random effects). Incorporating random effects allows the covariates to be measured as functions of time among the repeated responses. These random effects can be interpreted as exhibiting the natural heterogeneity that occurs in the population from the factors which are not measured.

Let π‘Œπ‘–π‘—be the response for individual 𝑖 (𝑖 = 1, … , 𝑁) at the π‘—π‘‘β„Ž time occasion (𝑗 =

1, … , 𝑛𝑖), where 𝑛𝑖 denotes the number of responses observed from individual 𝑖. Assume also that the π‘Œπ‘–π‘— are continuous and Normally distributed. Let 𝑝 p and π‘ž denote the number of fixed effects and random effects parameters respectively. Define 𝑿𝑖𝑗 = (𝑋𝑖𝑗1, 𝑋𝑖𝑗2, … , 𝑋𝑖𝑗𝑝) to be the fixed effects covariates for individual 𝑖 at time occasion 𝑗 and 𝒁𝑖𝑗= (𝑍𝑖𝑗1, 𝑍𝑖𝑗2, … , π‘π‘–π‘—π‘ž) to be the random effects covariates. Also define 𝜷 = (𝛽1, … , 𝛽𝑝)to be a p-vector of unknown regression coefficients for the fixed effects. Assume 𝑒𝑖𝑗~π‘π‘œπ‘Ÿπ‘šπ‘Žπ‘™(0, 𝜎2) and the 𝒂𝑖 = (π‘Žπ‘–0, π‘Žπ‘–1, … , π‘Žπ‘–π‘ž) parameters are multivariate Normally distributed with mean zero and variance-covariance matrix Ξ¦:

π’‚π’Š~𝑀𝑉𝑁(𝟎, Ξ¦).

Then the general form of the LME can be defined as follows:

π‘Œπ‘–π‘—=

𝛽0. 𝑋𝑖𝑗0+ 𝛽1. 𝑋𝑖𝑗1+, … , +𝛽𝑝. 𝑋𝑖𝑗𝑝+ π‘Žπ‘–0. 𝑍𝑖𝑗0+

π‘Žπ‘–1. 𝑍𝑖𝑗1+, … , +π‘Žπ‘–π‘ž. π‘π‘–π‘—π‘ž+ 𝑒𝑖𝑗.

( 5.1 )

In the LME model these are usually identical – thus fixed effect covariates of age and age-squared also mean that are random effects covariates of age and age-squared.

65

trajectories so it is necessary to use polynomial growth curves. Therefore, the first column of 𝑋𝑖is a vector of ones and the other columns are the polynomial time transformations of a chosen order 𝑝 . For example, in a cubic model, the entries of

𝑿𝑖𝑗 would be 1, 𝑑𝑖𝑗, 𝑑𝑖𝑗2 and 𝑑𝑖𝑗3, where 𝑑𝑖𝑗 is a time measurement such as age for individual 𝑖 time occasion 𝑗.

The typical structure of the model includes the estimation of the intercept and slopes, at both the individual and group level, which are represented via time effect

covariates such as age. The random effects addition to the model represents the variance of the intercept and the growth parameters. The LME model makes the assumption that the variation in the responses is accountable to the variation within- individual and to variation between-individuals. The within-individual variation is the deviation between the individual observations π‘Œπ‘–π‘— and the linear trajectory. The betas in the fixed effects part of the model are used to define the trajectory pathway

𝛽0+ 𝛽1. 𝑋𝑖1+, … , +𝛽𝑝. 𝑋𝑖𝑝for individual 𝑖 where 𝑋𝑖𝑗 is (typically) the π‘—π‘‘β„Ž power of 𝑑𝑖. Each individual has their own intercept and slope and the within-individual variation is reflected in the deviance between the observations and individual trajectories.

For the datasets in this chapter, an extended form of the LME model is used, which allows the response to be a count variable. The aim is to estimate the mean number of the counts of convictions over time for the entire population, as well as obtaining predictions of individual counts of convictions over time.

5.2.1 Extending the model for count data

Standard LME models are limited to using a continuous dependent variable. The model therefore needs to be extended to accommodate for a count dependent variable and to allow a sample of the regression coefficients to vary randomly

66

between individuals. A generalised linear mixed effect model (GLMM) can be used for this. The GLMM is an extension of a Generalised Linear Model (GLM) to

longitudinal data, by building upon the LME approach (Fitzmaurice et al., 2004). In a GLMM, the assumption is made that any of the responses by individuals’ are

independent observations from an exponential family of distributions. For example, if the response variable π‘Œπ‘–π‘— is a count, then the Poisson distribution is usually a

sensible choice. As the response variable from the two conviction datasets is the total number of convictions, the Poisson distribution will be the chosen distribution for the models in this chapter.

5.2.2 Linear Mixed effects model for count data

Let π‘Œπ‘–π‘‘ be the observed number of convictions for offender 𝑖 in time period 𝑑. It is assumed that the polynomial used to represent the mean trajectory is cubic in this development; this assumption can easily be changed to other orders of polynomial.

The GLMM model for count data can be written as;

π‘Œπ‘–π‘‘~π‘ƒπ‘œπ‘–π‘ π‘ π‘œπ‘› (πœ†π‘–π‘‘) With log(πœ†π‘–π‘‘) = 𝛽0+ 𝛽1𝑑 + 𝛽2𝑑2+ 𝛽3𝑑3+ 𝛼0𝑖+ 𝛼1𝑖𝑑 + 𝛼2𝑖𝑑2 + 𝛼3𝑖𝑑3 Where πœΆπ’Š = [ 𝛼0𝑖 𝛼1𝑖 𝛼2𝑖 𝛼3𝑖 ] ~ 𝑀𝑉𝑁 (𝟎, [ 𝑣00 𝑣01 𝑣02 𝑣03 𝑣01 𝑣11 𝑣12 𝑣13 𝑣02 𝑣12 𝑣22 𝑣23 𝑣03 𝑣13 𝑣23 𝑣33 ] )

and where MVN represents the multivariate normal distribution.

67 unknown covariance terms.

The above model is a log-linear regression model that includes intercepts and slopes that are allowed to vary randomly. The model assumes a Poisson distribution for counts, which are conditional on the random effects.