Structural equation modelling (SEM) is a statistical technique that can be used to estimate and test causal relationships and adjust for mediator-outcome confounding as well as exposure-mediator interaction. SEM can estimate multiple outcomes and intermediate variables simultaneously, and construct latent variables, correcting for any correlations between them.
The term ’structural equation modelling’ relays two important aspects of the method: a) that the causal processes under study are represented by a series of structural (i.e. regression) equations, and b) that these structural relations can be modelled pictorially to enable a clearer conceptualisation of the theory under study [Byrne, 2011].
Figure 7.2: Theoretical associations (a-j) between early parental depression, recurrent parent depression between the ages of 1-13 years, internalizing behaviour between ages 5-10 years manifested by diagnoses for recurrent abdominal pain, migraine and fatigue, internalizing behaviour between ages 10-13 years manifested by diagnoses for anxiety, depressive symptoms and sleep disorders and adolescent depression between the ages of 13-18 years
The theoretical model for my study, as shown in Figure 7.2, explores the association between early comorbid parental depression and adolescent depression. I will use Cox regression to provide a raw estimate of this association. Then, I will estimate how much of this association (if there is one) is mediated by recurrent parental depression. I will assess this by comparing the direct and indirect effects, which is possible by using SEM as the simultaneous estimation of the regression equations allows for them to be adjusted for one another. Finally, I will assess whether internalizing behaviour is on the causal pathway, again by comparing direct and indirect effects of early parental depression on adolescent depression.
As mentioned in chapter 5, internalizing behaviour is not measured directly in THIN. However, individual aspects that could indicate internalizing behaviour are recorded in THIN. As there are many different types of behaviours and disorders that could poten-
Figure 7.3: A theoretical measurement model for internalizing behaviour
tially be indicative of internalizing behaviour, I explore a measurement model to try to reduce the number of variables into one or more latent variables by using exploratory factor analysis (EFA) [Fabrigar et al., 1999, Gorsuch, 1983]. In Figure 7.3, the latent variable is represented by an oval, indicating that this is the variable that is to be estim- ated in the measurement model. The variables that are measured, the factors, are indicated by rectangles.
EFA is used to select indicators that measure the latent variable of interest and its goal is to explain a set of data in less than the total number of observations [Rabe-Hesketh and Skrondal, 2008]. Before using EFA, I will inspect the correlation matrix. EFA as- sumes that the selected factors are measuring the same concept, so there should be some correlation between the factors. If the majority of correlation are lower than 0.20, this could indicate that my selected factors are measuring different things and that EFA is not appropriate.
With EFA, each potential factor is assigned a factor loading. This loading indicates how strongly each factor is associated with the latent variable, and is used to determ- ine which factors can be used to estimate the latent variable. Factors with a loading of >0.40 are considered acceptable, and ideally each factor would only load onto one latent variable.
To determine on the number of factors to include for each latent variable, I will assess the eigenvalues of the factors. The eigenvalue represents the total variance of each factor. I will use the Kaiser criterion to select the number of factors to include, which
means I will use factors with an eigenvalue >1 [Kaiser, 1958].
I will model the EFA using oblique factor rotation, as this type of factor rotation allows for the individual factors to be associated to one another.
After selecting the factors for the latent variable(s), I will use Exploratory Structural Equation Modelling (ESEM)1 [Wall and Li, 2003, McArdle, 2009]. ESEM is a method
that combines features of unrestrictive measurement models (EFA) with restrictive meas- urement models (confirmatory factor analysis CFA). ESEM allows part of the structural model to be exploratory, in the sense that no constraints are imposed on whether factor loadings should be fixed at 0, akin to CFA, while these latent factors are allowed to influ- ence all manifest indicators according to a pre-defined structure. By using a multivariate structural model, I will be able to estimate direct and indirect effects simultaneously.
It is likely that not all factors I have identified as potential indicators of internaliz- ing behaviour will be selected by EFA. If factors are shown not to load onto the latent variable(s), I will model them individually. If no latent variable constructs are appro- priate, I will construct two binary variables indicating whether children experienced any indicators between ages 5-10 years (recurrent abdominal pain, migraine, and fatigue) or ages 10-13 years (anxiety, sleep disorders, depressive symptoms). Furthermore, instead of ESEM, I will use ’regular’ SEM as latent variable constructs will not need to be con- firmed. I will correct analysis for covariates mentioned in the previous chapter (Townsend deprivation quintile, maternal and paternal age at birth, birth year, child gender, potential child maltreatment or neglect, parental illicit drug use, alcohol abuse and comorbidity). I will not list the covariates in the diagrams for simplicity. However, their association with the outcome is shown in Appendix D.1.
All data management and exploratory analyses will be performed using Stata SE ver- sion 12.1 (StataCorp, College Station, TX). EFA and (E)SEM analyses will be performed
1ESEM is a combination between path analysis and confirmatory factor analysis, and as such provides
a method for describing the assumed causal relationships between observed variables that are related them- selves. Traditional multiple regression can run into problems with interpretation and multicollinearity when multiple predictor variables are considered, but this can be avoided with SEM. This method is particularly useful for analysing longitudinal repeated measures data.
using MPlus version 7.0 (Muth´en & Muth´en, 2012).