Diagnostics for generalised linear mixed models

(1)

Diagnostics for generalised linear

mixed models

Sophia Rabe-Hesketh, Institute of Psychiatry, King’s College, London Anders Skrondal, Norwegian Institute of Public Health, Oslo

UK Stata Users’ Group Meeting

(2)

Outline

• Example: Longitudinal epileptic seizure count data • Influence

• Empirical Bayes (EB) prediction of higher-level residuals • Detecting outliers by cross-validation

(3)

Example: Longitudinal count data

• Famous epilepsy data from Thall & Vail (1990)

• 59 subjects j were randomized to receive progabide or placebo • Outcomes:

– Counts yij of epileptic seizures during the two weeks before each of

four clinic visits, i = 1, · · · , nj, nj = 4

• Between-subject covariates xj:

– [Lbas] The logarithm of a quarter of the number of seizures in the eight weeks preceding entry into the trial

– [Treat] Dummy variable for treatment group – [LbasTrt] Interaction between two variables above – [Lage] Logarithm of age

• Within-subject covariate zij:

(4)

Model and estimates

• Model II from Breslow & Clayton (1993)

yij ∼ Poisson(µij), ln(µij) = x0jβ + β5zij + uj, uj ∼ N(0, σ2) gllamm y lbas treat lbas_trt lage v4, i(subj) fam(poiss) nip(15) adapt gllamm, robust

Robust Est (SE) (SE) Fixed effects: β0 [Cons] 2.11 (0.22) (0.21) β1 [Lbas] 0.88 (0.13) (0.11) β2 [Treat] -0.93 (0.40) (0.40) β3 [LbasTrt] 0.34 (0.20) (0.20) β4 [Lage] 0.48 (0.35) (0.30) β5 [V4] -0.16 (0.05) (0.07) Random effect: σ 0.50 (0.06) (0.06) Log-likelihood -665.29

(5)

Influence of top-level unit j

• Influence on log-likelihood: Cook’s D Dj = −2s0jH

−1_s j,

• Dj can be interpreted as a quadratic approximation to twice the change

in log-likelihood when parameters are estimated with and without cluster j

– sj is the score vector (first derivatives of log-likelihood contribution)

for cluster j

– H is the Hessian of the total log-likelihood – In gllamm (using numerical derivatives):

(6)

Interpreting influence of top-level unit j

• Influence on particular parameter θp

DFBETASpj = b θp− bθp(−j) SE(bθp) , b

(7)

Influence for epilepsy data

DFBETAS Cook’s

Subj. [Base] y_j D [Treat] [V4] σ

Placebo 126 13.0 40 20 23 12 1.10 -0.02 0.51 0.02 135 2.5 14 13 6 0 1.52 0.39 0.40 -0.34 227 13.8 18 24 76 25 1.46 -0.14 0.39 -0.33 Progabide 207 37.8 102 65 72 63 1.68 0.58 0.24 -0.16 225 5.5 1 23 19 8 1.05 -0.23 0.18 -0.45 232 3.3 0 0 0 0 1.57 0.34 0.00 -0.44

Mean over all subjects

(8)

• [Treat]

– Deleting subjects with large counts in placebo group (135) and small counts in progabide group (232) will diminish the negative treatment effect

=⇒ positive DFBETAS

– Deleting subjects with small counts in placebo group and large counts in progabide group (225) will increase the negative treatment effect =⇒ negative DFBETAS

– Subject 207 is complicated; due to the lage baseline value, this subject is responsible for the positive coefficient of [LbasTrt] with a DFBETAS of -0.71 (the coefficient becomes nearly 0)

• [V4]: Subjects 126, 135 and 227 have a large drop at visit 4, so that deleting them will diminish the negative coefficient of [V4]

=⇒ positive DFBETAS

• σ: Deleting subjects with extreme counts, relative to baseline, (large: 135, 227, 225; small: 232) will decrease σ

(9)

Estimation using adaptive quadrature

• Likelihood contribution for cluster j by Gaussian quadrature:

`j(β, σ) = Z φ(uj; 0, σ) Y i f (yij | uj; β) | {z } ∝ posterior of uj duj ≈ R X r=1 Wr Y i f (yij | σAr; β)

– Ar: Quadrature locations – Wr: Quadrature weights

• Adaptive quadrature: `j(β, σ) ≈ R X r=1 ωjr Y i f (yij | σαjr; β)

– αjr: Adaptive quadrature location: uej+ τjAr ∗ _euj: Posterior mean of uj

=⇒ Locations shifted to posterior mean ≈ peak of integrand ∗ τj: Posterior standard deviation of uj

=⇒ Locations scaled by posterior sd ≈ width of peak – ωjr: Adaptive quadrature weights:

√

(10)

Adaptive quadrature

Quadrature

Adaptive quadrature

(11)

Empirical Bayes using adaptive quadrature

• Posterior mean and variance given y_j with bβ and _bσ plugged in

e uj = E[uj | yj, xj; bβ,σ] =b R ujφ(uj; 0,σ)b Q if (yij | uj; bβ)duj `j(bβ,bσ) τ_j2 = var[uj | yj, xj; bβ,σ] =b R u2 jφ(uj; 0,bσ) Q if (yij | uj; bβ)duj `j(bβ,bσ) −u_e2_j

• Adaptive quadrature (in gllamm; similar to Naylor & Smith, 1988) – Start with u_e0_j = 0 and τ_j0 = 1

– In iteration k (between NR steps):

`j(bβ,bσ) k ₌ R X r=1 wk−1_jr Y i f (yij |bσα k−1 jr ; bβ) e uk_j = PR r=1(bσα k−1 jr )w k−1 jr Q if (yij |bσα k−1 jr ; bβ) `j(bβ,bσ) k (τ_jk)2 = PR r=1(bσα k−1 jr )2w k−1 jr Q if (yij |bσα k−1 jr ; bβ) `j(bβ,σ)b k − (ue k j)2

(12)

Variances for EB prediction & approximations

• Posterior variance (by numerical integration): var[uj | yj, xj; bθ]

• Marginal sampling variance:

ν_j2 _{≡ vary[e}uEB_j | xj; bθ] ≈ bσ

2 _{− τ}2 j

‘Diagnostic’ variance

• Prediction error variance (marginal):

vary[euEB_j −uj | xj; bθ] ≈ τ_j2

(13)

Deletion residuals

• A large true residual will lead to a larger estimate of the random effects variance, making the residual appear more consistent with the model • To avoid this problem, estimate EB residuals _euj(−j) using parameter

es-timates bθ_(−j) when the jth top-level cluster is deleted

e

u_j(−j) = E[uj | yj, xj; bθ(−j)]

• Standardised deletion residual e u_j(−j) νj(−j)

• In multilevel models, delete the top-level cluster to derive deletion resid-uals for all lower-level units in that cluster

(14)

EB prediction in

gllamm

• Raw and standardised residuals:

gllapred res_, u /* posterior mean and sd in res_m1 res_s1 */ gllapred stres_, ustd /* stres_m1 = _euj/νj */

• Deletion residuals:

gllamm ... if subj~=126, i(subj) from(a) ...

gllapred dres if subj==126, u fsample /* fsample to include 126 */ gllapred dstres if subj==126, ustd fsample

(15)

Level-2 residuals for epilepsy data

DFBETAS Subj. σ uej(−j) νj(−j) e uj νj uej Placebo 126 0.02 1.04 0.89 0.44 135 -0.34 2.23 1.97 0.93 206 -0.32 -2.11 -1.91 -0.88 227 -0.33 2.19 1.93 0.96 Progabide 207 -0.16 1.97 1.37 0.69 112 -0.32 2.25 2.07 1.01 225 -0.46 2.47 2.26 1.09 232 -0.44 -2.92 -2.77 -0.97

(16)

Cross-validation by simulation

• Obtain sampling distribution of deletion statistic S_j(−j) for cluster j under null hypoth-esis that the responses for cluster j come from the same distribution as for remaining clusters (Similar to Marshall & Spiegelhalter, 2001):

– For cluster j, simulate new responses yk_j from the model with parameters bθ_(−j) – Obtain the statistic S_j(−j)k for the simulated responses

• Stata commands for simulating standardised deletion residuals under null hypothesis:

postfile file res using delres, replace forvalues i=1/1000 {

gllasim y1 if subj==126, fsample /* simulate new responses */ replace y = y1 if subj==126

gllapred b if subj==126, ustd fsample /* simulated std. del. res. */ summ bm1

post file (r(mean)) drop y1 bm1 bs1 }

postclose file

(17)

Cross-validation results

Std. Deletion Residual euj(−j)

νj(−j) Del. Log-likelihood `j(−j)

Power α = 0.05 Power α = 0.05

Subj. Obs. p-value uj= −1 uj= 1 Obs. p-value uj= −1 uj= 1

Placebo 126 1.04 0.314 0.43 0.58 -19.1 0.005 0.00 0.55 135 2.23 0.026 0.26 0.47 -20.1 0.001 0.00 0.49 206 -2.11 0.058 0.33 0.44 -19.4 0.004 0.00 0.52 227 2.20 0.026 0.38 0.69 -39.9 0.001 0.01 0.63 Progabide 207 1.98 0.068 0.50 0.40 -21.3 0.004 0.01 0.58 112 2.25 0.028 0.49 0.68 -13.8 0.043 0.00 0.63 225 2.47 0.020 0.35 0.46 -26.4 0.001 0.00 0.50 232 -2.92 0.002 0.25 0.57 -6.4 0.821 0.00 0.57

(18)

Conclusions

• Adaptive quadrature can be used to obtain reliable estimates and empirical Bayes pre-dictions

• Cook’s distances and DFBETAS are useful for identifying influential top-level clusters • Standardized residuals (and their deletion counterparts) can flag potential outliers at

any level

• Cross-validation is a useful method for testing for outliers/influential units at any level. This method is feasible for applications since the parameters do not need to be re-estimated in each simulation

• All diagnostics discussed, as well as simulations, are available in gllamm (from next update after 20 May 2003)

• gllamm can also be used to compute expected counts for categorical data. If there is a moderate number of response and covariate patters, these can be used to obtain the deviance, Pearson X2 and various residuals

• gllamm can be downloaded from:

(19)

References to our work

• Generalized multilevel structural equation modeling. Psychometrika, in press. (S.Rabe-Hesketh, A.Skrondal & A.Pickles).

• Generalized latent variable modeling: Multilevel, longitudinal and struc-tural equation models. Boca Raton, FL: Chapman & Hall/ CRC, to appear. (A.Skrondal & S.Rabe-Hesketh).

• GLLAMM Manual. London: Institute of Psychiatry, 2001. (S.Rabe-Hesketh, A.Pickles & A.Skrondal).

• Reliable estimation of generalized linear mixed models using adaptive quadrature. The Stata Journal, 2: 1-21, 2002.

(S.Rabe-Hesketh, A.Skrondal & A.Pickles).

• Correcting for covariate measurement error in logistic regression using nonparametric maximum likelihood estimation. Statistical Modelling, in press. (S.Rabe-Hesketh, A.Pickles & A.Skrondal).

(20)

References

• Breslow & Clayton (1993). Approximate inference in generalized linear mixed models. JASA 88, 9-25.

• Marshall C. & Spiegelhalter D. (2001). Simulation-based tests for diver-gent behaviour in hierarchical models. Submitted.

• Naylor & Smith (1988). Econometric illustrations of novel numerical integration strategies for Bayesian inference. Journal of Econometrics 38, 103-125.

• Thall & Vail (1990). Some covariance models for longitudinal count data with overdispersion. Biometrics 46, 657-671.