• No results found

Multilevel and multivariate response multilevel modelling

Chapter 3 General Overview of Statistical Methods

3.1 Multilevel and multivariate response multilevel modelling

This section briefly presented the description of the general multilevel and multivariate response multilevel modelling methods.

3.1.1 Multilevel modelling

Multilevel modelling technique (Goldstein, 2003; Snijders and Bosker, 2012) is a methodology for analysing data which are clustered, hierarchical or nested in nature. The complex nature of these data introduces different sources of variability and this section of the thesis focuses on nested source of such variability. For instance, patients nested within doctors. Thus, there is a different source of variability between patients and between doctors and not recognizing such distinctively could lead to incorrect conclusions. In analysing such data, it is crucial to take into account the various levels of nesting because they might be associated with variability which has a distinct interpretation.

Multilevel analysis are been used in many disciplines such as behavioural, social, biomedical and life sciences. The paper by Robinson discussed the need to recognise individual and group level effects distinctively and provided detailed discussion on the mix-up between aggregate and individual level effects, termed as ecological fallacy (Robinson, 1950) and Davis’ et al. also discussed the distinction between within-group and between-group regression analysis (Davis et al., 1961). For detailed history about multilevel analysis, see the bibliographical sections of Longford (Longford, 1993).

41

Let i denotes level-one unit (e.g., individual) and j denotes level-two unit (e.g., the group an individual belongs to). Also, let Yij indicates an outcome variable for

individual i in group j and π‘₯𝑖𝑗 be explanatory variable for child i in group j which may be directly related to the individual or the group the individual belongs to. A two-level random intercept variance components model in a multilevel framework can be

presented as:

π‘Œπ‘–π‘— = 𝛽0+ π‘ˆ0𝑗 + 𝛽1π‘₯𝑖𝑗 + πœ€π‘–π‘— (3.1)

with π‘ˆ0𝑗 ~ N(0, Οƒ2u), and πœ€π‘–π‘—~ N(0, Οƒ2Ξ΅). The regression coefficient 𝛽1 is common to all the groups, 𝛽0 is the average intercept, π‘ˆ0𝑗 is the group-dependent deviation and πœ€π‘–π‘—is an individual-level residual.

The variance components Οƒ2u and Οƒ2Ξ΅ are used to obtain the intra-class correlation coefficients (ICC) which coincides with variance partition coefficients (VPC) for the model (3.1) given as {πœŽπ‘’2 (πœŽπ‘’2+ πœŽπœ€2)} x 100, a measure of the amount of variation (%) explained by the group. This model (3.1) can be extended to more than 2 levels when more levels of grouping are made available.

We can also extend model (3.1) by allowing the explanatory variable π‘₯𝑖𝑗to have a random coefficients at a level higher than the individual level, where the variation of a group-level factor might differ significantly across the groups. This model is presented as:

π‘Œπ‘–π‘— = 𝛽0+ π‘ˆ0𝑗 + 𝛽1π‘₯𝑖𝑗 + π‘ˆ1𝑗π‘₯𝑖𝑗 + πœ€π‘–π‘— (3.2)

The group-dependent coefficients (π‘ˆ0𝑗,π‘ˆ1𝑗) are assumed to be independent across j with a bivariate normal distribution. They have (0, 0) expected values, and covariance matrix given by:

42

var(π‘ˆ0𝑗) = πœŽπ‘’02 ; var(π‘ˆ1𝑗) = πœŽπ‘’12 ; and cov(π‘ˆ0𝑗, π‘ˆ1𝑗) = πœŽπ‘’01. For further discussion on multilevel modelling techniques, see (Goldstein, 1987; Goldstein, 2003; Snijders and Bosker, 2012).

3.1.2 Multivariate response multilevel modelling

Multivariate response data arises when 2 or more dependent variables (outcomes) are collected on the same individual. Note that the term β€˜multivariate’ as used in this thesis refers to 2 or more outcome variables. The multivariate version of the multilevel models arises when the individual for whom 2 or more outcomes were collected belongs to a group, say examination scores on mathematics and biology (level-one units) nested on students (level-two units) within schools (level-three units).

Multivariate response multilevel models (Snijders and Bosker, 2012; Thum, 1997) could be necessary when one is interested in 2 or more outcomes measured on

individuals within group and the researcher is interested in drawing conclusions about the degree to which the residual correlations depend on the individual and the group level; investigate specific effect of a covariate across 2 or more outcomes; and interested in conducting a single test of a joint effect of a covariate on 2 or more outcomes. This approach also leads to higher accuracy and reliability in estimates compared to modelling these outcomes separately, especially when these outcomes are at least moderately correlated.

A three-level multivariate response multilevel model with outcomes π‘Œπ‘–π‘—1 and π‘Œπ‘–π‘—2, where the superscripts 1 and 2 denote the first and second outcome variables for an individual i within group j is of the form:

43 π‘Œπ‘–π‘—1 = Ξ² 1 Xij + Uj(1)+ Ξ΅ij(1) π‘Œπ‘–π‘—2 = Ξ² 2 Xij + Uj (2) + Ξ΅ij(2) (3.3) with, Uj 1 Uj 2 ~MVN 0 0 , Οƒu 1 2 Οƒu 1,2 Οƒu 2 2 Ξ΅ij 1 Ξ΅ij 2 ~MVN 0 0 , σΡ 1 2 σΡ 1,2 σΡ 2 2

where, Xij is the covariate that can be defined at the child or household levels; Ξ²(1) and Ξ²(2)

are vector of regression coefficients for π‘Œπ‘–π‘—1 and π‘Œπ‘–π‘—2 respectively. The quantities Ξ΅ij(1) and Ξ΅ij(2) are the residuals at the individual level for π‘Œπ‘–π‘—1 and π‘Œπ‘–π‘—2 respectively and the quantities Uj(1)and Uj(2) are the residuals at the group level for π‘Œπ‘–π‘—1 and π‘Œπ‘–π‘—2 respectively. In addition, the quantities Οƒ2u(1,2) and Οƒ2Ξ΅(1,2) are the covariance at the group and individual levels respectively for π‘Œπ‘–π‘—1 and π‘Œπ‘–π‘—2. Note that there is no level 1 variation specified because level 1 exists only to define the multivariate structure.

The corresponding population correlation coefficients (residual correlations) at the group (𝜌2) and individual (𝜌1) level are, respectively,

𝜌2 = πœŽπ‘’(1,2) (πœŽπ‘’ 1 2 Γ— πœŽπ‘’(2)2 )

, π‘Žπ‘›π‘‘ 𝜌1 = πœŽπœ€(1,2) (πœŽπœ€ 1 2 Γ— πœŽπœ€(2)2 )

It is also possible to estimate correlations between observed outcomes π‘Œπ‘–π‘—1 and π‘Œπ‘–π‘—2

between individuals which can be presented as:

𝜌 (π‘Œπ‘–π‘—1, π‘Œπ‘–π‘—2) = πœŽπ‘’(1,2)+ πœŽπœ€(1,2)

(πœŽπ‘’ 1 2 + πœŽπœ€ 1 2 ) Γ— (πœŽπ‘’(2)2 + πœŽπœ€(2)2 )

Furthermore, for a hypothetical group of size n, a correlation between group means can be obtained as:

44

𝜌 (π‘Œ 𝑗1, π‘Œ 𝑗2) = πœŽπ‘’(1,2)+ πœŽπœ€(1,2)/𝑛

(πœŽπ‘’ 1 2 + πœŽπœ€ 1 2 /𝑛) Γ— (πœŽπ‘’ 2 2 + πœŽπœ€ 2 2 /𝑛)

Further discussion on multivariate response multilevel modelling is available elsewhere (Snijders and Bosker, 2012; Thum, 1997).