Multivariate Multilevel Models - Multilevel Longitudinal Models

2.3 Multilevel Longitudinal Models

2.3.2 Multivariate Multilevel Models

Longitudinal multilevel models, as described above, assume conditional indepen- dence between the repeated outcomes, after controlling for the explanatory variables and the random effects. However, in longitudinal panel data, it is expected that successive measurements within the same individual are correlated. For example, the income of a head of household measured in month t + 1 is expected to be correlated with their income measured in month t conditioned on the head of the household’s characteristics. This conditional correlation imposes a structure in the error covariance matrix, which is often of interest in the analysis of a longitudinal data set.

Multivariate multilevel models provide the appropriate tools for the analysis of a longitudinal data set where the error covariance matrix has no restrictions and is also of interest. This extends the general multivariate regression analysis, where a balanced data set with no missing observations is required (Longford, 1993). Multivariate multilevel models are also seen as an extension of the two- level growth curve model. However, time is now treated as a discrete variable. Therefore, a categorical variable for time is included in the model and each occasion is represented by a “dummy” variable. These occasion dummies are treated as

having both fixed and random coefficients. Extra levels of the data hierarchy, in addition to the occasion and the individual levels, can also be considered in the model. The occasion dummies can be set to vary randomly at these higher levels as well. However, this is not a necessary set up. A potential disadvantage of the multivariate multilevel model is that, as it is equivalent to setting up one equation for each occasion response, it may not be suitable for panel data with a large number of occasions.

There are some advantages of fitting models to longitudinal data within the multivariate multilevel framework. For example, a balanced data set is no longer a requirement. The data can include individuals with different numbers of time points. Therefore, these models can be used to analyse rotating panel data and can be further constrained to accommodate the planned missing data as performed in Yang et al. (2002). However, they are not as flexible as the growth curve models when dealing with unequally spaced data (Fraine et al., 2005) as some error structures can only be considered for equally spaced data. The multivariate multilevel approach also handles missing data, as long as the assumption of missing at random holds3_.

As already mentioned, this approach does not assume that the repeated outcomes are conditionally independent (Griffiths et al., 2004). The error variance components are parameters of the model to be estimated. Moreover, constraints on these parameters can be made in order to impose different error covariance structures, such as those in subsection 2.3.1. However, here lies another disadvantage of the multivariate approach. Although the variance components are now estimated and they can also be constrained, they are no longer interpreted as cluster specific effects or individual specific effects as before (Snijders and Bosker, 1999).

The multivariate multilevel approach extends the random slope model in equation 2.21 so that each individual response in a given occasion t is considered as a component of a multivariate normally distributed random vector y_ij. These y_ij are simultaneously modelled under the multivariate multilevel model (Goldstein, 2003).

To make it clear, consider the same hierarchy as before, where occasions are nested within individuals which are nested within clusters. The occasion level (subscript t, varying from 0 to T ) defines the multivariate structure (Goldstein, 2003). The individuals are the level one (subscript i, varying from 1 to nj) and

clusters the level two (subscript j, varying from 1 to n). A multivariate multilevel model with only the time variable as covariate, treated as having both fixed and random effects at the individual level, can be written as:

ytij = dTtijβ + d T

tijvj + dTtijuij. (2.23)

In this model dtij is the vector with the T occasion dummies. They are defined to

indicate whether the row in the data set refers to the response at occasion t, being equal to 1 or equal to zero (Snijders and Bosker, 1999). Note that this model can handle intermittent missing response by setting all the elements of the dummy for the missing occasion to zero. The occasion dummies are associated with the vector of fixed regression coefficients

βT = (β1, β2, . . . , βT) ,

and are also associated with the vectors of random effects at both the individual uij and the cluster level vj. Note that all the T dummies are included in the

model. Therefore the model in equation 2.23 does not contain the intercept. This is a two-level multivariate model.

Now consider a balanced data set where the total number of occasions per individuals is fixed and equal to four and that the time points are labelled from 0 to 3. Here it is also assumed that:

vj∼M N (0, Σv) and uij∼M N (0, Σu), where Σv =       σ_v02 σv10 σ2v1 σv20 σv21 σ2v2 σv30 σv31 σv32 σ2v3       and Σu =       σ_u02 σu10 σu12 σu20 σu21 σu22

σu30 σu31 σu32 σ2u3

      .

Defining Σr= Σv+ Σu, the multivariate vector of responses for an individual is:

y_ij∼M N (Dijβ, Σr) ,

where Dij is a matrix containing the vectors dTtij. The model in equation 2.23

is a fully multivariate multilevel model with saturated covariance structure (Sni- jders and Bosker, 1999). Therefore, this model has no occasion level variance estimated. This is an important assumption of this model, that is that there are

no measurement errors in the repeated outcomes (Fraine et al., 2005). However, it is a necessary assumption to ensure model identification. This model also assumes that the effect of time varies randomly across clusters. An alternative model for- mulation would be to consider a common random intercept at the cluster level for all the occasions. This model can be written as:

ytij = dTtijβ + vj + dTtijuij. (2.24)

Here it is assumed, that

vj∼N (0, σv2) and uij∼M N (0, Σu),

where σ2

v is a scalar and Σu is as defined above. Linear and non-linear constraints

can be applied to the elements of Σu in order to express the different forms of

correlation structures. Multivariate multilevel models can be fitted imposing these different structures. The IGLS and RIGLS methods, described in section 2.1.2, can also be used to estimate the multivariate multilevel model. These methods provide, for the fixed part of the model, statistically efficient parameter estimates and accurate standard errors. They also provide efficient estimates of the error variance components (Goldstein, 2003).

The multivariate model in equation 2.24 has no explanatory variables other than the time dummies but the inclusion of such variables is straightforward. This model still allows for the inclusion of different sets of covariates for the different levels of the data, including the occasion level. These variables can be further considered as having common or separate coefficients for each of the time points. Separate coefficients are produced by including interaction terms between the time dummies and the covariates, and they can be jointly tested for their significance in the model.

The same steps for model checking as those for univariate multilevel models are applied to the multivariate multilevel models. In addition, hypothesis testing for the significance of parameters of a given explanatory variable can be performed for each of the outcomes. In other words, some explanatory variables may be statistically significant at one occasion but not at others.

This approach has not been vastly explored in the statistical literature. For example, Yang et al. (2000) presented the multivariate approach for the analysis of a longitudinal data set on voting attitudes. They modelled a discrete response comparing the general multilevel longitudinal model with the multivariate model.

The models from both approaches were estimated via penalized quasi-likelihood es- timation. Their multivariate model, a two-level model, had time dummies treated as random at both the cluster level and the individual level. Multivariate Wald tests were used to decide on the inclusion or exclusion of variables for each of the responses. Furthermore, the models tested were based upon a data set with a fixed number of occasions for each individual.

Barbosa and Goldstein (2000) presented a multilevel longitudinal model for discrete response assuming the responses within the same individuals were posi- tively correlated. Barbosa and Goldstein (2000) used the same data example as Yang et al. (2000) trying to extend their models to accommodate unequal time points but noted that in this case the multivariate multilevel approach could no longer be applied. Instead, Barbosa and Goldstein (2000) fitted one three-level longitudinal model and two time-series multilevel models as defined in Goldstein et al. (1994). In this definition, the time-series model is a multilevel longitudinal model where the level one variance is considered as a function of time through an autocorrelation function (first or second order). This allows for rather complex dependency structures of the level one residuals. In their analysis of the time- series models, Barbosa and Goldstein (2000) considered different autocorrelation functions and aimed to compare their results with the multivariate model in Yang et al. (2000).

In another article Yang et al. (2002) applied the multivariate longitudinal framework to data under a non-random missing mechanism. This was allowed by setting up constraints in the covariance matrices of both levels considered in the analysis. Fraine et al. (2005) compared the longitudinal growth curve model with the multivariate multilevel model. Their models were applied to data on student well being. They advocate the use of multivariate models when the number of time points is small. Their multivariate model considered an unstructured error covariance and the test for different specifications for the covariance matrix was supported. Plewis (2005) also compared the multivariate multilevel model formu- lation with the growth curve models, for both continuous and binary outcomes. His findings under the different specifications were consistent. His multivariate models imposed restriction to the error covariance matrix and tests for other structures were also suggested.

In document Methods for analysing complex panel data using multilevel models with an application to the Brazilian labour force survey (Page 54-59)