• No results found

Introduction to mixed model and missing data issues in longitudinal studies

N/A
N/A
Protected

Academic year: 2021

Share "Introduction to mixed model and missing data issues in longitudinal studies"

Copied!
30
0
0

Loading.... (view fulltext now)

Full text

(1)

Introduction to mixed model and missing data issues in longitudinal studies

Hélène Jacqmin-Gadda

INSERM, U897, Bordeaux, France

Inserm workshop, St Raphael

(2)

Outline of the talk I

Introduction

Mixed models

Typology of missing data

Exploring incomplete data

Methods MAR data

Conclusion

(3)

Longitudinal data : definition

Definition :

Variables measured at several times on the same subjects Examples :

repeated measures of biological markers (CD4, HIV RNA) in HIV patients

repeated measures of neuropsychological tests to study cognitive aging

Repeated events : dental caries, absences from school or job, ...

(4)

Longitudinal data analysis

Objective :

Describe change of the variable with time

Identify factors associated with change

Problem : Intra-subject correlation

(5)

Example : HIV clinical trial

Xi=1 if treatment A, Xi=0 if treatment B

Criterion : Change over time of CD4

Repeated measures of CD4 over the follow-up period.

t= 0at initiation of treatment.

Yij=CD4 measure for subjectiat timetij,i= 1, ..., N, j= 1, ..., ni.

(6)

Analysis assuming independence

Yij= β0+ β1tij+ β2Xi+ β3Xitij+ ǫij

withǫij∼ N (O, σ2)andǫij⊥ ǫij

Intra-subject correlation

Var( ˆˆ β)biased

→ Tests forβ biased

For time-independent covariate :

var( ˆβ2)under-estimated

Tests forH0: β2 = 0anti-conservative (p value too small)

(7)

Linear mixed model with random intercept

Yij= (β00i) + β1tij+ β2Xi+ β3Xitij+ ǫij

withγ0i∼ N (O, σ02), andǫij∼ N (O, σ2)andǫij⊥ ǫij

γ0iare random variables

Only one additional parameter :σ02

(8)

Linear mixed model with random intercept (2)

Population (marginal) mean :

E(Yij) = β0+ β1tij+ β2Xi+ β3Xitij

Subject-specific (conditional) mean :

E(Yij0i) = (β00i) + β1tij+ β2Xi+ β3Xitij

Assume common correlation between all the repeated measures

(9)

Linear mixed model with random intercept and slope

Yij= (β00i) + (β11i)tij+ β2Xi+ β3Xitij+ ǫij, γ0i ∼ N (O, σ20),γ1i∼ N (O, σ12),ǫij∼ N (O, σ2),ǫij⊥ ǫij

Population (marginal) mean :

E(Yij) = β0+ β1tij+ β2Xi+ β3Xitij

Subject-specific (conditional) mean :

E(Yiji)= (β00i) + (β11i)tij+ β2Xi+ β3Xitij

The correlation between repeated measures depend on measurement times

(10)

Linear mixed model : general formulation

Yij= XijTβ+ ZTijγi+ ǫij γi∼ N (0, B)andǫi∼ N (0, Ri).

Xij: vector of explanatory variables β: vector of fixed effects

Zij: sub-vector ofXij(including functions of time) γi : vector of random effects.

Population (marginal) mean :E(Yij) = XTijβ

Subject-specific (conditional) mean :E(Yiji) = XijTβ+ ZTijγi

(11)

Linear mixed model : example

Linear mixed model with AR Gaussian error Yij= (β00i) + (β11i)tij+ β2Xi+ β3Xitij+wij+ eij

withγit= (γ0i, γ1i)∼ N (0, B), eij∼ N (O, σ2),eij⊥ eij,

wij∼ N (O, σw2)andCorr(wij, wij) = exp(−δ|tij− tij|)

(12)

Linear mixed model : Estimation

Maximum likelihood estimator

Yi = (Yi1, ..., Yij, ..., Yini)T multivariate Gaussian with

meanXiβ

and covariance matrixVi= ZiBZiT+ Ri

Softwares : SAS Proc mixed, R lme, stata

(13)

Generalized linear mixed model

Yij∼exponential family of distribution and

g(E(Yiji)) = XijTβ+ ZijTγi withγi ∼ N (O, B).

Example : Logistic mixed model

logit(Pr(Yij= 1|γi)) = XijTβ+ ZijTγiwithγi ∼ N (0, B).

Maximum likelihood estimation : Numerical integration

Softwares : SAS Proc nlmixed, R nlme, stata

(14)

Typology of missing data in longitudinal studies

Notation :

Yi= (Yobs,i, Ymis,i)

withYobs,i the observed part ofYi andYmis,ithe missing part, Rij= 1ifYijis observed andRij= 0ifYijis missing

Ri= (Ri1, ..., Rij, ..., Rini)

Xiexplanatory variables completely observed

(15)

Typology of missing data (2)

Monotone missing data = dropout :P(Rij= 0|Rij−1 = 0) = 1 Rimay be summarized by the time to dropoutTi

and an indicator for dropoutδi

Intermittent missing data :P(Rij= 0|Rij−1= 0) < 1

(16)

Typology of missing data (3)

Missing Completely at random (MCAR) : P(Rij= 1)is constant

The observed sample is representative of the whole sample.

→Loss of precision, no bias Covariate-dependent missingness process : P(Rij= 1) = f (Xi)

→Loss of precision, no bias if analyses are adjusted onXi

(17)

Typology of missing data (4)

Missing at random (MAR) : P(Rij= 1) = f (Yobs,i, Xi)

Example : Probability of dropout depends on past observed values

→Loss of precision, no bias with appropriate statistical methods Informatives or MNAR : P(Rij= 1) = f (Ymis,i, Yobs,i, Xi)

Example : Probability thatY be observed depends on currentY value

→Loss of precision, biases

→Sensitivity analyses

(18)

Exploring incomplete data

Describe missing data frequency

Cross classify missing data patterns with covariates

Compare mean evolution for available data and complete cases

Compare mean evolution until timetgiven observation status at timet+ 1

Logistic regression forP(Rij= 1)given covariates and Yik, k < j

Cox regression for time to dropout given covariates

→Impossible to distinguish MAR from MNAR

(19)

An example : Paquid data set

The Paquid Cohort in Gironde

2792 subjects of 65 years and older at baseline

Living at home at the beginning of the study (1988) in Gironde (France)

Seen at home at 1, 3, 5, 8, and 10 years after the baseline visit

Cognitive measure : Digit Symbol Substitution Test of Wechsler (attention, limited time to 90s)

Sample :

2026 subjects

without diagnosis of dementia between T0 and T10

with the test completed at least once (at T0)

(20)

Description of dropout : Kaplan-Meyer

Dropout time (=event): first visit with missing score Probability to be in the cohort

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0 1 5 8 10

Probability

Follow-up time

95% confidence interval Kaplan-Meyer estimate

(21)

Observed means of the DSST score given time

10 15 20 25 30 35 40

65-69 years 70-74 75-79 80 and +

Score

Age

Available data

(22)

Observed means of the DSST score given time

10 15 20 25 30 35 40

65-69 years 70-74 75-79 80 and +

Score

Age

Complete data Available data

(23)

Logistic regression model for dropout in the first 5 years

Covariates OR 95% CI of the OR

T3 0.02 0.003 - 0.10

T5 0.01 0.001 - 0.09

age 1.01 0.99 - 1.02

age×T3 1.05 1.03 - 1.08 age×T5 1.06 1.03 - 1.09 previous MMSE score 0.91 0.88 - 0.93

men 0.86 0.75 - 0.99

Education (vs university level)

No education 1.88 1.15 - 3.07 no diploma 2.02 1.39 - 2.93

CEP 1.67 1.17 - 2.40

high school level 1.39 0.96 - 2.00

(24)

Methods for MCAR or MAR data

Complete case analysis (loss of precision, require MCAR)

Imputation (require MCAR or MAR)

Maximum likelihood using available data (require MAR)

(25)

Maximum likelihood for MAR data (1)

Objective :Estimateθfrom the distributionf(Y|θ) Likelihood of the observed data :Yobs, R

f(Yobs, R|θ, ψ) = Z

f(Yobs, Ymis|θ)f (R|Yobs, Ymis, ψ)dYmis

(26)

Maximum likelihood for MAR data (2)

If the data are MAR : f(Yobs, R|θ, ψ) =

Z

f(Yobs, Ymis|θ)f (R|Yobs, ψ)dYmis

= f (R|Yobs, ψ) Z

f(Yobs, Ymis|θ)dYmis

= f (R|Yobs, ψ)f (Yobs|θ)

Log-likelihood :

l(θ, ψ|Yobs, R) = l(θ|Yobs) + l(ψ|R, Yobs) Ifψandθare distinct :

→the missing data areignorable

→θis estimated by maximisation ofl(θ|Yobs)using only available reponses.

(27)

Example : MAR analysis of Paquid data

Mixed effect model

Yijtest score for subjectiat timetij

Yij= (β0+ ageiγ0+ α0i) + (β1+ ageiγ1+ α1i) × tij+ β3I{tij=0}+ eij with

αi = (α0iα1i)T∼ N(0, G),eij∼ N 0, σe2

ageivector of indicators for baseline age classes (70-74, 75-79, 80 years and older , ref= 65-69)

I{tij=0} indicator of the baseline visit

(28)

Observed and predicted means of the score given time

10 15 20 25 30 35 40

65-69 years 70-74 75-79 80 and +

Score

Age

Complete data Available data Mixed model (MAR)

(29)

Conclusion

Advantages of mixed models

use all the available information (repeated measures)

Flexibly handle intra-subject correlation (unbiased inference)

Any number and times of measurements

Robust to missing at random data

Available in most softwares Limits of mixed models

Assume homogeneous population

−→extended models included latent classes(mixture)

As the MAR assumption is uncheckable, complete the study by a sensitivity analysis

−→extended models for MNAR data

(30)

References

Chavance, M. et Manfredi R.Modélisation d’observation incomplètes. Revue d’Epidémiologie et Santé Publique 2000,48,389-400.

Diggle PJ, Heagerty P, Liang KY, Zeger SL.Analysis of Longitudinal Data.2nd Edition. Oxford Statistical Science series 2002, Oxford University Press.

Jacqmin-Gadda H, Commenges D, Dartigues JF.Analyse de données longitudinales gaussiennes comportant des données manquantes sur la variable à expliquer. Revue d’Epidémiologie et Santé Publique 1999, 47,525-534.

Little R.J.A. et Rubin D.B.Statistical Analysis with Missing Data, New York : John Wiley & Sons, 1987.

Verbeke G and Molenberghs GLinear mixed models for longitudinal data . Springer Series in Statistics, Springer-Verlag,2000, New-York.

References

Related documents

Given low utilization rates and poor quality of services in existing public facilities, UNICEF concluded in its National Primary Health Management Information Report

They propose a generalized moments (GM) estimator for the parameters of the spatial regressive error process in a homoskedastic random effects panel data model without

The first step of the reflective process is describing current or recent teaching responsibilities, goals, activities, and acquisition of outcome evidence3. It answers

Lack of time to analyse energy- saving potentials (TIME) is found to be statistically signi fi cant in two of the fi ve sub-sectors falling under public and private services

2.2.2.3 On lines fitted with extra strength magnets, the trainborne equipment shall be capable of detecting the minimum field strengths set out in Table 9, measured in free air at

values are normalized such that the max is unity and the minimum is zero. The contours take steps of 0.1.. The forces are normalized such that the max is unity and the minimum is

The percent- ages of blacks, Hispanics, Asians, other race students, males, free-lunch eligible students, special education students, English language learners, and students

20 Years of Work with Composers John Cage Christian Wolff Jennifer Walshe Saskia Moore James Saunders Christian Marclay Laurence Crane Tim Parkinson Claudia Molitor Christopher