Variance Component models - Multi-level Models

2.6 Multi-level Models

2.6.1 Variance Component models

In this section, we examine simultaneously the effects of individual-level and group-level factors on risk of HIV infection. The data defines a multilevel structure:-sites are within states and the states are grouped into zones. Information on HIV positivity is collected from individuals in the sites. Hence, it is possible to have a three-level (site, state and zone) model or a two-level (site/state or state/zone) model as the data is also aggregated by state. Langford et al(112) extended the multilevel models developed by Goldstein (105), (107) to disease mapping. The simplest Poisson multilevel model is the one that incorporates a measure of the extra-Poisson variation in the model as a high level variable. Given the covariates x_i, the logarithm of the mean of the relative risks is

log(µ_i) = log(E_i) + α + x^T_i β + u_i

where u_i are the heterogeneity effects or extra-Poisson variation caused by

variation among underlying populations at risk in the areas considered. log(E_i) is the offset which accounts for the population at risk and α is the intercept.

u_i ∼ N(0, σ_u)

and

log(θ_i) ∼ N(µ_i, σ_u)

where

µi = x^T_i β

This model can be extended to a model with more than one higher level of geographical aggregation (107) (118), (112). For a model consisting of two levels, site i nested in the state j , then the observed HIV counts in site i becomes

O_ij ∼ P oisson(θ_ijE_ij)

And the log-linear model becomes

log(µ_ij) = log(E_ij) + α + x^T_ijβ + u_ij + v_j

uij ∼ N(0, σu), vj ∼ N(0, σv)

And a three-level model comprising of site i nested in state j nested within zone k we have that

O_ijk ∼ P oisson(θ_ijkE_ijk)

and

log(µ_ijk) = log(E_ijk) + α + x^T_ijkβ + u_ijk+ v_jk + y_k

u_ijk ∼ N(0, σ_u), v_jk ∼ N(0, σ_v), y_k∼ N(0, σ_y)

where u_ijk are the random effects for the sites, v_jk are the random effects for the states and y_k are the random effects for the zones.

Hence log(θ) ∼ MV N (µ, Σ). where Σ is a block diagonal comprising of the variance of the three random effects due to site, state and zone respectively.

The models we have considered so far are the variance component models.

The effects of the spatial distribution of the sites and the states were not taken into cognisance.

The variance component models can be fitted using the quasi-likelihood, iter-ative generalized least squares (IGLS), Fisher scoring algorithm or the restricted iterative generalized least squares (RIGLS). The detailed account of the algo-rithm for the estimation procedure of the multilevel model is given in Goldstein (105).

Iterative Generalized Least Squares

This method is based on generalized least squares which gives the maximum likelihood estimates for hierarchically structured data (105). A simple model of fixed and random effects (107),(184) is given as,

Y = Xβ + Zθ (2.58)

where Xβ is the fixed part and Zθ is the random part. Y is the observed vector of events being modelled by predictor variables X and the fixed parameters β, and the predictor variable Z with random coefficients θ. The design matrices X and Z need not be the same. Z may represent variables random at any level in the model.

The procedure of the IGLS is a two-stage process for estimating the fixed parameters and the variances and covariances of the random parameters in suc-cessive iterations. The first stage is to estimate the fixed parameters using the ordinary least squares regression and taking the higher level variances to be zero.

The vector of residuals from this initial model is then used to construct the initial values for the dispersion matrix V . The dispersion matrix is then used in the estimation of the fixed parameters using the generalized least squares estimation procedure as

β = (Xˆ ^TV⁻¹X)⁻¹X^TV⁻¹Y (2.59)

again the vector of residuals are computed as

Y = Y − X ˆ˜ β (2.60)

and we obtain the cross-product matrix of the residuals, Y^∗ = ˜Y ˜Y^T such that

V = E(Y^∗) = E( ˜Y ˜Y^T) (2.61)

We then stack the columns of the cross-product matrix into a vector,

Y^∗∗= vec( ˜Y ˜Y^T) = vec[(Y − X ˆβ)(Y − X ˆβ)^T] (2.62)

and Y^∗∗will then be used as the response variable in a regression equation to estimate the random parameters. The covariance of the random coefficients θ is estimated as

cov(ˆθ) = (Z^TV^∗−1Z)⁻¹Z^TV^∗−1Y^∗∗ (2.63)

where V^∗ is the Kronecker product of V , that is V^∗ = V N

V . Assuming mul-tivariate normality, the estimated covariance matrix for the fixed parameters is

cov(β) = (X^TV⁻¹X)⁻¹ (2.64)

Goldstein and Rabash (108) gave the estimate of the random parameters as

cov(ˆθ) = 2(Z^TV^∗−1Z)⁻¹ (2.65)

Hence, the iterative procedure continues alternating between estimation of the fixed and random parameter vectors until convergence is achieved.

Fisher Scoring algorithm

This iterative technique is used to obtain the maximum likelihood estimate of the hyperparameters γ which are updated using estimates from the pth iteration as

γ = γ^(p)+ i^(p)−1U^(p)

where i^(p) is the Fisher’s information matrix and U^(p) is the score statistic and both of them are evaluated at γ^(p). See Breslow and Clayton (184)for more details in this estimation procedure.

Penalized Quasi-likelihood (PQL)

Given that the observed number of cases (O_i) in each site follow the Poisson distribution with mean µ_i and that

log(µi) = log(Ei) + α + xiβ + ui (2.66)

this equation implies a nonlinear (logarithmic) relationship between the ob-served number of cases O_iand the predictor part of the model. Hence, the normal distribution approximation does not directly apply here. In order to estimate the random parameters ˆu_i from the model we use the penalized quasi-likelihood es-timation procedure which involves the application of an approximate linearizing technique at each iteration using a first and second order Taylor series approxi-mation. If

µi = f (H)

where H = α+x_iβ +u_i and if H_tis the value of the linear predictor H at iteration t, then f (H_t+1) is expressed as a function of H_t through a second-order Taylor expansion about the current fixed and random part estimates as

f (Ht+1) = f (Ht) + xi(βt+1− ˆβt)f⁰(Ht) + (ut+1,i− ˆut,i)f⁰(Ht)

+(ut+1,i− ˆut,i)²f⁰⁰(Ht)/2 (2.67)

The first two terms on the right-hand side provide the updating function for the fixed part of the model and the last two terms are for the estimation of the random part. See Breslow and Clayton (184), Goldstein (107) and Goldstein and Rasbash (109) for a full description of the linearizing procedure. For the Poisson distribution

f (H) = f⁰(H) = f⁰⁰(H) = exp(X_iβˆ_ti+ ˆu_i) (2.68)

Langford et al. (1999) gave the extension of the PQL procedure to spatial models as follows:

log(µ_i) = log(E_i) + α + x_iβ + u_i+ v_i (2.69)

The random parameters ˆu_i and ˆv_i are estimated from the model using the pro-cedure outlined above as

µ_i = f (H)

where H = α + x_iβ + u_i+ v_i and

f (Ht+1) = f (Ht) + (αt+1− ˆαt) + xi(βt+1− ˆβt)f⁰(Ht) + (ut+1,i− ˆut,i)f⁰(Ht)

+(v_t+1,i− ˆv_t,i)f⁰(H_t) + (u_t+1,i− ˆu_t,i)²f⁰⁰(H_t)/2 + (v_t+1,i− ˆv_t,i)²f⁰⁰(H_t)/2 (2.70)

The first three terms on the right-hand side provide the updating function for the fixed part of the model and the last four terms is for the estimation of the random part.

Marginal Quasi-likelihood (MQL)

The linearizing procedure given equations (2.67) and (2.70) above can lead to convergence problems or the model may fail if residuals are very large. To

overcome this limitation the MQL procedure (184),(107) can be adopted whereby the second-order terms in the equations (2.67) and (2.70 are omitted. In extreme cases, estimates can be based only on the fixed part of the model such that

H_t = X_iβˆ_t

This procedure has the disadvantage of producing biased estimates when the sample size is small. However, it can be corrected using bootstrap procedures (107). Generally, the PQL procedure gives better estimates than the MQL (107).

Restricted Iterative Generalized Least Squares

This is an extension of the IGLS. Like the maximum likelihood estimates, the IGLS estimates are biased. Goldstein (106) shows that a slight modification of the IGLS by restricting the model to take account of the sampling variations in the parameters can lead to unbiased estimates of the fixed and random parameters.

Given the general model,

Y = Xβ + Zθ

such that E[(Zθ)(Zθ)^T] = V and cov( ˆβ) = (X^TV⁻¹X)⁻¹ Then the

E[(Y − X ˆβ)(Y − X ˆβ)^T] = V − Xcov( ˆβ)X^T = V − X(X^TV⁻¹X)⁻¹X^T

where X is the design matrix for the fixed effects in the model with full rank. V is then updated at each iteration using its current value ˆV as

V = (Y − X ˆβ)(Y − X ˆβ)^T + X(X^TVˆ⁻¹X)⁻¹X^T

The last term X(X^TVˆ⁻¹X)⁻¹X^T can be regarded as a bias correction term.

Under the assumption of multivariate normality this procedure is equivalent to restricted maximum likelihood.

In document Modelling HIV/AIDS epidemic in Nigeria (Page 110-119)