Predicting abnormal scores at age 7-8: multilevel binomial models

11.4 Exploring a ‘Glasgow Effect’: Parent-rated SDQ data for preschool

12.2.3 Predicting abnormal scores at age 7-8: multilevel binomial models

A binary variable was derived for the Total Difficulties scale and all subscales, using Goodman’s standard cut-offs for the teacher-rated SDQs (Goodman, 2013b). This produced two groups: an ‘abnormal’ group (those with likely difficulties) and a ‘normal group’ (those without difficulties or only showing borderline difficulties). It is important to examine the binary groups in models as well as the continuous scores, as the effect of being in this group may arguably be greater than simply having a higher score at any section of the scale.

Furthermore, previous research has demonstrated that, of 5-10 year old children who are rated by their teacher as being in the ‘abnormal’ group, 59.8% will go

162 onto have a psychiatric diagnosis (Goodman et al., 2004). This is arguably the group that we are most interested in, both in predicting what may be associated with such outcomes, and in following up over time in Glasgow City.

There are a range of methods available for estimating multilevel logistic models which are available to statisticians and able to be produced in MLwiN.

Generalised Linear models are estimated using maximum likelihood methods. Maximum Likelihood estimation estimates the mean and variance (the

parameters) of the model based on the information that is known from the sample, assuming that these data are normally distributed. The prevailing approach in multilevel modelling is to approximate the non-linear link using quasi-likelihood methods.

Quasi-likelihood is a way of allowing for overdispersion i.e. greater variability in the data than the statistical model would normally use (e.g. the number of boys vs. girls in a family does not tend to conform to a 50:50 split, as may be

expected, but rather each family may have a skew towards one gender, thus yielding an estimated variance which is larger than predicted by a binomial model). The downside of quasi-likelihood models in general is that they tend to produce estimates which are downwardly biased. This is particularly the case in datasets with few level 1 units per level 2 unit (i.e. few students per school in this case) (Everitt & Palmer, 2011).

In quasi-likelihood models, the non-linear function is ‘linearised’ using an approximation know as the Taylor series expansion. This approximates a non- linear function by an infinite series of terms. Analysts can choose whether to use only the first term of the series, known as the first order Taylor approximation, or to use the second term to give the second order Taylor approximation (Hox, 2002). Generally speaking first order algorithms tend to slightly underestimate the size of the effect (Rodriguez & Goldman, 2001). In order to get the most precise estimates I will therefore use the second order algorithm.

The Taylor series linearization of a non-linear function depends on the value of its parameters. In multilevel modelling, the maximum likelihood is iterative, so it starts with approximate parameter values which are improved with each iteration. This means that the Taylor series linearization must be repeated after each iteration, using the latest parameter estimates. There are two options at

163 this point. The analyst can choose to fit the Taylor series linearization using the current values of the fixed part of the model only – this is called ‘marginalised quasi-likelihood’ (MQL). The other option is to use the fixed part in conjunction with the residual values – this is referred to as ‘penalised (or sometimes,

predictive) quasi-likelihood’ (Hox, 2002). After applying the chosen quasi-

likelihood method, the model is then estimated using iterative generalised least squares (IGLS) (Khan & Shaw, 2011).

The combination of these two methods gives us four options for estimating the models:

Rodriguez and Goldman explored these different quasi-likelihood algorithms and concluded that all methods applied were severely biased with the exception of 2nd_{order PQL methods. They found that 1}st_{order MQL models produce estimates}

which are little different to a standard logit model, whilst MQL-2 and PQL-1 ‘offer only slight improvements’. They therefore recommend the use of 2nd order PQL models for giving the most precise estimates (Rodriguez & Goldman, 2001). However, PQL-2 models can suffer from convergence problems (Khan & Shaw, 2011). For this reason, all unadjusted and adjusted models were fitted as MQL – 1 models in the first instance. The final model produced was then extended into a PQL-2 model.

1st order

Marginalised

quasi-

likelihood

1st order

Penalised

quasi-

likelihood

2nd order

Marginalised

quasi-

likelihood

2nd order

Penalised

quasi-

likelihood

High convergence, high error Low convergence, low error

164 In order to gain an even higher level of accuracy in the models, the final stage was re-run using Bayesian methods. Bayesian statistics are based around probabilities or ‘degrees of belief’. In particular Bayesian models require a formulation of a set of prior probability distributions for unknown parameters: that is the probability distribution of the uncertainty of a value before some evidence is taken into account. The most common of these in multilevel

modelling is Marcov Chain Monte Carlo methods. These methods have been found to eliminate the bias seen in quasi-likelihood models (Rodriguez & Goldman, 2001). However, they are computationally intensive, so are not recommended for exploratory models (Hox, 2002), hence fitting only the final model in this way.

The multilevel binomial models were fitted in the same order as the linear

models, with individual level factors and school level factors entered individually into models first of all, before gradually building the final model. As explained above, in order to get the best model estimates, the model was then re-run first as a PQL-2nd order model, and finally as an MCMC model.

The fitting of cross-classification models, which take into account the fact that a student is a member of both a school and a home area, was also considered. Both of these membership categories may have a relationship with the child’s social, emotional and behavioural development at P3, however they are not necessarily distinct groups: children from the same area may go to different schools, and equally, a school may take in children from a number of

neighbourhoods. However, following the fitting of the first set of multilevel models, it was found that area effects were considerably weaker than those at an individual level. Indeed, area level deprivation was not a significant factor in any of the multilevel models, once other factors were controlled for. For this reason it was decided that it would not be of benefit to fit cross-classification models for either the linear or binomial models.

12.3 Results

In document Modelling social, emotional and behavioural development in the first three years of school: what impact do schools have? (Page 162-165)