6.2 Model specification
6.3 Inference and considerations for missing data 6.4 Best linear unbiased prediction and empirical Bayes 6.5 Implementation via the EM algorithm
Population-averaged linear models for continuous response: • Appropriate whenquestionsof scientific interest are questions
about features ofpopulation mean response profiles
• Population mean response representeddirectlyby alinear model • Overall aggregate covariance matrixof a response vector is
modeleddirectly
• Theaggregate patternof variance and correlation may be sufficiently complex that standard modelscannotrepresent it • With differentni and/or different time points for different
individuals,explorationandimplementationof covariance structure isproblematic
General class of models and methods for continuous response: • Subject-specific perspective⇒Linear mixed effects model • Does not requirebalance– responsesneed notbe at thesame
ntime points
• Individual inherent response trajectoriesrepresented by alinear modelincorporatingwithin-individualcovariates
• Individual-specific featuresof trajectories represented by alinear modelincorporatingamong-individualcovariates
• Within-andamong-individualsources of variance and correlation explicitlyacknowledged and modeledseparately
• Appropriate whenquestionsof scientific interest are questions about “typical” oraveragefeatures of individual trajectories; e.g., the “typical slope”
Linear models: Forinherent trajectoriesandfeatures
• The subject-specific modelimpliesa linear model for overall population mean response
• Andinducesa model for the overall aggregate covariance matrix • ⇒Inducesalinear population-averagedmodel
• The model is a relevant framework for addressingeither subject-specific or population-averaged questions of interest • Theinduced covariance structureameliorates the problems with
direct specification of overall pattern and implementation with unbalanceddata suffered by a population-averaged model⇒ great flexibility
Implementation: Because apopulation-averaged modelisinduced • Fitting usingmaximum likelihoodunder the assumption of
normalityandREML
• Samelarge sample theory applies forapproximateinference • Sameissues/approaches in the presence ofmissing data
6.2 Model specification
6.3 Inference and considerations for missing data 6.4 Best linear unbiased prediction and empirical Bayes 6.5 Implementation via the EM algorithm
Recall: The observed data are
(Yi,zi,ai) = (Yi,xi), ,i =1, . . . ,m
independentacrossi • Yi = (Yi1, . . . ,Yini)
T,Y
ij recorded at timetij,j=1, . . . ,ni (possibly different times for different individuals)
• zi = (zTi1, . . . ,zTini)T comprisingwithin-individualcovariate informationui and thetij
• ai =collection ofamong-individualcovariates • All covariatesxi = (zTi ,aTi )T
• x˜ =collection of allxi,i =1, . . . ,m
Linear mixed effects model: Basic form
Yi =Xiβ+Zibi+ei, i=1, . . . ,m
• Xi (ni×p)andZi (ni×q)aredesign matricesincorporating individuali’scovariatesxi and time
• β(p×1)is referred to as thefixed effectsparameter • bi (q×1)is a vector ofrandom effectscharacterizing
among-individualbehavior,independentacrossi • Standard assumption: bi ⊥⊥xi
E(bi|xi) =E(bi) =0, var(bi|xi) =var(bi) =D, bi ∼ N(0,D)
whereDcharacterizesamong-individualvariance and correlation • Can berelaxedto dependence onamong-individual covariatesai E(bi|xi) =0, var(bi|xi) =var(bi|ai) =D(ai), bi|xi ∼ N {0,D(ai)}
Linear mixed effects model: Basic form
Yi =Xiβ+Zibi+ei, i=1, . . . ,m
• Within-individual deviationei = (ei1, . . . ,eini)
T represents aggregateeffects ofwithin-individual realizationand measurement error,independentacrossi
• Standard assumption: ei ⊥⊥bi,xi
E(ei|xi,bi) =E(ei) =0, var(ei|xi,bi) =var(ei) =Ri(γ), ei ∼ N {0,Ri(γ)} can be relaxed to allow dependence ofei onxi andbi
• (ni ×ni)covariance matrixRi(γ), parametersγ(often writtenRi for brevity)
• Default assumption: Often withoutadequate thought Ri(γ) =σ2In, γ =σ2, for alli =1, . . . ,m
Conceptual framework: Chapter 2
Yi =µi+Bi+ei =µi+Bi+ePi +eMi
• µi =Xiβ(ni×1)overall population mean response
• Bi =Zibi (ni×1)vector ofdeviationsfrom the population mean ⇒among-individualvariation
• ei (ni×1)vector ofwithin-individualdeviations • Individual-specific trajectory
Xiβ+Zibi
This structure offers great latitude for representingindividual profiles(examples coming up)
Conditional onbi andxi: The model impliesYi isni-variate normal
with meanXiβ+Zibi and covariance matrixRi(γ)
Yi|xi,bi ∼ N {Xiβ+Zibi,Ri(γ)}
• Characterizes how observations for individuali vary/covary abouti’s inherent trajectoryXiβ+Zibi due torealization processandmeasurement error
• Letp(yi|xi,bi;β,γ)denote this normal density Population-averaged density: OfYi givenxi
p(yi|xi;β,γ,D) =
Z
p(yi|xi,bi;β,γ)p(bi;D)dbi
Population-averaged density: OfYi givenxi p(yi|xi;β,γ,D) =
Z
p(yi|xi,bi;β,γ)p(bi;D)dbi
• Straightforward:This is ani-variate normal density with mean vectorXiβand covariance matrix
Vi(γ,D,xi) =Vi(ξ,xi) =ZiDZTi +Ri(γ), ξ={γT,vech(D)T}T (6.1) • vech(D) =vector ofdistinctelements ofD(Appendix A)
Result: Impliedpopulation-averaged model
E(Yi|xi) =Xiβ, var(Yi|xi) =Vi =Vi(ξ,xi) Yi|xi ∼ N {Xiβ,Vi(ξ,xi)}, i =1, . . . ,m
Induced aggregate covariance matrix: Has particular form (6.1) Vi(ξ,xi) =Vi(γ,D,xi) =ZiDZTi +Ri(γ) =ZiDZTi +Ri
• Specific form depends on choices ofRi(γ), reflecting belief within-individual realization and measurement error processes, and ofD, characterizingamong-individual variabilityin individual trajectoriesXiβ+Zibi
• Generalizes in obvious way when var(bi|xi) =var(bi|ai) =D(ai) • Conceptual framework:From this point of view
Vi(ξ,xi) =var(Yi|xi) =var(Bi|xi) +var(ei) =ZiDZTi +Ri(γ) ⇒first term represents contribution ofamong-individualsources, second term represents contribution ofwithin-individualsources
“Stacked” notation: Y = Y1 Y2 .. . Ym
(N×1), b=
b1 b2 .. . bm
(mq×1), e=
e1 e2 .. . em
(N×1)
X = X1 X2 .. . Xm
(N×p), Z =
Z1 0 · · · 0
0 Z2 · · · 0
..
. ... . .. ... 0 0 · · · Zm
(N×mq)
R=
R1 0 · · · 0
0 R2 · · · 0
..
. ... . .. ... 0 0 · · · Rm
(N×N), De =
D 0 · · · 0 0 D · · · 0
..
. ... . .. ... 0 0 · · · D
“Stacked” notation: The model is often written succinctly as
Y =Xβ+Z b+e (6.2)
E(Y|x˜) =Xβ, var(Y|x˜) =V(ξ,x˜) =V =ZDZe T +R
• Commonly expressed as in (6.2) in the literature and most software documentation
Two-stage hierarchy: A model of form (6.2) usually results from specification of
• The form ofindividual inherent trajectoriesin terms of individual-specific parametersand an assumption on the within-individual covariance matrixRi
• A representation of how individual-specific parametersvary among individualsin the population, including in association with among-individual covariates
In general: Ri(γ) =var(ei|bi,xi), where ei =ePi+eMi
• Usual assumption:For the linear mixed effects model ei ⊥⊥bi,xi or at least ei ⊥⊥bi
Assume the former for this discussion; dependence onbi is allowed in Chapter 9
• Usual assumption:ePi ⊥⊥eMi, discussed further in Chapter 7 • var(eMi|bi,xi) =var(eMi), the contribution toRi due to
measurement error, adiagonal matrix
• var(ePi|bi,xi) =var(ePi), the contribution due to therealization process, may exhibitcorrelationdue time-ordered data collection • ⇒ DecomposeRi(γ)as
Ri(γ) =RPi(γP) +RMi(γM), γ = (γPT,γ T M)
Default specification: Ri(γ) =σ2Ini • From perspective of (6.3)
Ri(γ) =σ2PIni +σ 2
MIni, σ 2=σ2
P+σ 2 M
• Assumesnegligible serial correlationdue to realization process; could be reasonable if time points aresufficiently intermittent • Assumes measurement errors arehaphazardwith variancethe
sameregardless of the magnitude of the true realization
• σ2=σP2 +σM2 is variance due tocombined effectsof realization process and measurement error
• If measurement error is assumednegligibleorto not exist, reduces to
Ri(γ) =σP2Ini, σ 2=σ2
P
More generally: Common to assume measurement error is haphazardwithconstant varianceas above
Ri(γ) =RPi(γP) +σ2MIni, γ= (γ T P, σ
2 M)
T. (6.4)
and simplify (6.3) to
Ri(γ) =RPi(γP) +σM2Ini, γ= (γ T P, σ2M)T
• (6.4) is reasonable starting point when outcome is ascertained by adeviceoranalytical procedure(e.g., dental distance, hæmatocrit, CD4 count)
• Ifno measurement erroris plausible (e.g., visual acuity), (6.4) simplifies to
SpecifyingRPi(γP): Represent as RPi(γP) =Ti1/2(θ)Γi(α)T
1/2
i (θ), γP = (θT,αT)T • Γi(α) (ni×ni)correlation matrix
• Ti(θ)isdiagonal; diagonal elements reflectrealization process variance, e.g., constant over time
Ti(θ) =σP2Ini, θ=σ 2 P
• Or fornintended times andiwithni =n
In practice: Unfortunately, it is common tofail to acknowledgethat there areseparatecontributions due to realization and measurement error and to represent the entire covariance structure as
Ri(γ) =T1i/2(θ)Γi(α)Ti1/2(θ) or Ri(γ) =σ2Γi(α)
• Somesoftwarerestricts to models forRi(γ)of this form
• Implicitly take measurement errornegligible, which it may not be • ⇒Confusion and lack of understanding among users
More generally: In the literature, whencorrectly distinguishingthe separate contributions of realization and measurement error
processes, it is common to assumeconstant variancesand take as thedefaultspecification
Ri(γ) =σP2Γi(α) +σ2MIni, γ = (σ 2
8 9 10 11 12 13 14
20
25
30
age (years)
distance (mm)
Girls
8 9 10 11 12 13 14
20
25
30
age (years)
distance (mm)
Recall: gi =0(1)ifi is a girl (boy),(t1, . . . ,t4) = (8,10,12,14), subject-specific perspective
• Question of interest: Typicaloraverage rate of changeof dental distance for boys different from that for girls?
• Straight-line model forindividual trajectory
Yij =β0i+β1itij +eij, j =1, . . . ,ni =n=4, βi =
β0i
β1i
• Cansummarizeas
Yi =Ciβi+ei, Ci =
1 ti1 1 ti2
.. . ... 1 tini
=
1 t1 1 t2 1 t3 1 t4
Individual-specific intercepts and slopes:
β0i =β0,Bgi+β0,G(1−gi) +b0i,
β1i =β1,Bgi+β1,G(1−gi) +b1i.
bi =
b0i b1i
• Cansummarizeas
βi =Aiβ+Bibi, (6.7)
β=
β0,G
β1,G
β0,B
β1,B
, Ai =
(1−gi) 0 gi 0 0 (1−gi) 0 gi
, Bi =I2
Remark: In early literature, (6.6) +βi as in (6.7) is referred to as a random coefficient model
Combining: Substituting (6.7) in (6.6)
Yi =CiAiβ+CiBibi+ei=Xiβ+Zibi+ei
Xi =CiAi, Zi =CiBi
Xi =
(1−gi) (1−gi)t1 gi git1 ..
. ... ... ...
(1−gi) (1−gi)t4 gi git4
, Zi =
1 t1 1 t2 1 t3 1 t4
Complete the model: Specify models foramong-individual
covariance matrixvar(bi|ai)andwithin-individual covariance matrix Ri(γ)
Recall: Exploratory analyses suggest
• Overall correlationisdifferentfor boys and girls;overall variance constant across timebut possiblylargerfor boys
• Within-individual residualsfromindividual-specificfitsdo not show strong evidence ofwithin-individual correlation;within-child variancedue to the combined effects of realization and
measurement error isconstantbut possibly different
• Largervariance for boys could be artifact of one “unusual” boy
Within-individual covariance matrix: Assume (6.5) Ri(γ) =σ2PΓi(α) +σ2MIni
Within-individual covariance matrix: Assume (6.5) Ri(γ) =σ2PΓi(α) +σ2MIni
• Within-child aggregate variance:
var(ei|ai) =Ri(γ) =σPG2 I4+σMG2 I4 ifi is a girl
=σPB2 I4+σMB2 I4 ifi is a boy
• Final specification:
var(ei|ai) =Ri(γ,ai) ={σ2GI(gi =0) +σB2I(gi =1)}I4
σG2 =σPG2 +σMG2 , σB2 =σ2PB+σMB2
• If different variances are artifact of “unusual” boy, could reduce to thedefaultRi(γ) =σ2I4
Among-child sources: Lack ofwithin-individual correlationsuggests overall correlation is due toamong-child sources
• If we assume for eachi
var(bi|ai) =D=
D11 D12 D12 D22
⇒Contribution to overall covariance structureZiDZTi has diagonalelements
D11+D22tj2+2D12tj, j =1, . . . ,4
and(j,j0)off-diagonal element
D11+D22tjtj0 +D12(tj+tj0) j,j0 =1, . . . ,4
• Induced patternof among-individual covariance/correlation is clearlynonstationary⇒can representcomplexcovariance
Among-child sources: Evidence suggests overall patterndifferent by gender, suggesting the model
var(bi|ai) =D(ai) =DGI(gi =0) +DBI(gi =1)
Recall: m=30 subjects,gi =0(1)ifi is female (male),ai =age, hæmatocrit measured at week 0, prior to surgery and ideally at weeks 1 2, and 3 thereafter, where some subjects aremissingthe week 2 and possibly baseline measure
• Subject-specific perspective: Differences between genders in individual-specific featuresof the pattern of change of
hæmatocrit following hip replacement? • Quadratic model forindividual trajectory
Yij =β0i+β1itij +β2itij2+eij
• Written succinctly
Yi =Ciβi+ei, βi =
β0i β1i β2i
, Ci =
1 ti1 ti12 ..
. ... ... 1 tini t
2 ini
Individual-specific parameters:
β0i ={β0,F(1−gi) +β0,Mgi}+{β3,F(1−gi) +β3,Mgi}ai+b0i
β1i =β1,F(1−gi) +β1,Mgi+b1i
β2i =β2,F(1−gi) +β2,Mgi+b2i Can be represented as
βi=Aiβ+Bibi
β=
β0,F
β0,M
β1,F
β1,M
β2,F
β2,M
β3,F
β3,M
, Ai=
(1−gi) gi 0 0 0 0 (1−gi)ai giai
0 0 (1−gi) gi 0 0 0 0
0 0 0 0 (1−gi) gi 0 0
Combining: Zi =Ci
Xi =
(1−gi) gi (1−gi)ti1 giti1 (1−gi)ti21 giti21 (1−gi)ai giai
..
. ... ... ... ... ... ... ...
(1−gi) gi (1−gi)tini gitini (1−gi)t2 ini git
2
ini (1−gi)ai giai
• Complete the model with specification ofwithin-individual covariance matrixRi(γ)and the covariance matrix var(bi|ai)for the random effects as above
• Note: If var(bi|ai) =D(3×3),Dhas 6 distinct parameters, and ZiDZTi corresponding toamong-individual sourcesis a(ni ×ni) matrix whose elements have acomplicatedform (try it)
More generally: Theinduced overall covariance model Vi =ZiDZTi +Ri(γ)can depend onhigh-dimensionalξ
• Capable of representingcomplextrue patterns of overall variance and correlation
• But also can beoverkill
Among-individual variation: In an individual-specific model like the quadraticfor the hip replacement data
• In principle: All ofindividual-specific intercepts, linear terms, and quadratic termsvaryin the population
• However: Although quadratic termsβ2i do varyin the population, this variation ispractically negligible relativeto the extent of variation in intercepts and linear termsβ0i andβ1i
Eliminating random effects: A common tactic underquadraticand higher-order polynomialindividual-specific models tosimplifythe model forβi
• Removerandom effects associated with quadratic and higher-order terms, redefineZi andD accordingly
• ⇒ZiDZTi andinducedVi stillsufficiently richto approximate the true among-individual and overall covariance structures
Demonstration: Hip replacement study – eliminateb2i
β0i ={β0,F(1−gi) +β0,Mgi}+{β3,F(1−gi) +β3,Mgi}ai+b0i
β1i =β1,F(1−gi) +β1,Mgi+b1i
β2i =β2,F(1−gi) +β2,Mgi
βi =Aiβ+Bibi
bi =
b0i b1i
, Bi =
1 0 0 1 0 0
, so that Bibi = b0i b1i 0
• From asubject-specificperspective, strictly an approximation of convenience
• Do not really believe that individuals of each gender have individual-specific trajectoriescharacterized byexactly the same
General consideration: Important issue in specifying models • Although from SS point of viewallindividual-specific parameters
are expected to exhibit variation in the population, what matters is theirrelative magnitudes of variation
Example:
• Straight line inherent trendis reasonable
• Individual-specific interceptsclearly vary substantially • Underlying straight lines appear to havevery similar slopes • Although scientifically slopesshould vary,relativeto variation in
intercepts, variation in slopes isorders of magnitudesmaller • With no covariates:β0i,β1i intercept, slope for individuali
β0i =β0+b0i, β1i =β1+b1i, bi = (b0i,b1i)T, var(bi) =D
⇒IfD11 isnonnegligiblerelative toβ0, then intercepts vary perceptibly; ifD22 isvirtually negligiblerelative to toβ1, then variation in slopes is almost undetectable
Example: Approximation to achieve numerical stability
β0i =β0+b0i, β1i =β1
• Do not really “believe” slopes do not varyat allin the population • But invoke thisapproximationrecognizing that their magnitude of
variation is inconsequentialrelativeto that of other features • Design matrixBi in the general model specification
accommodates this possibility
Terminology: Popular to distinguish between individual-specific features being “fixed” or “random;” here,β0i would be said to be “random” whileβ1i would be referred to as “fixed”
ZDV+ alt ddI ZDV+ZAL
ZDV+ddI ZDV+ddI+NVP
0 2 4 6
0 2 4 6
0 10 20 30 40 0 10 20 30 40
Week
Recall: Subjectsrandomizedto four treatment regimens,
ai = (gi,ai, δi1, . . . , δi4)T,gi =0(1)ifiis female (male);ai age;δi` =1 if subjecti randomized to regimen`and 0 otherwise,`=1, . . . ,4
• Straight-lineinherent log(CD4+1) trajectory
Yij =β0i+β1itij +eij
⇒β0i isi’s inherent mean log(CD4+1) immediately prior to therapy
• Subject-specific perspective: Aretypicalor average slopes different among the four regimens?
β0i =β00+β01ai+β02gi+b0i
β1i =β10+β11δi1+β12δi2+β13δi3+b1i
Standard formulation: Thelinear mixed effectsmodel is often presented formally as atwo-stage hierarchy
• Stage 1 - Individual model:
Yi =Ciβi+ei (ni×1) ei|xi ∼ N {0,Ri(γ)} (6.8) Ci (ni×k)design matrixordinarily depends ontimesti1, . . . ,tini; βi (k×1); oftenei ⊥⊥bi,xi orei ⊥⊥bi
• Stage 2- Population model:
βi =Aiβ+Bibi (k ×1) bi|xi ∼ N(0,D) (q×1) (6.9) β(p×1)fixed effects;Ai (k ×p),Bi (k ×q)design matrices;Ai incorporatesamong-individual covariates,Bi indicates elements ofβi treated as “fixed” or “random;” oftenbi ⊥⊥xi
Linear mixed effects model: Substituting (6.9) in (6.8)
Yi =Xiβ+Zibi+ei, bi|xi ∼ N(0,D), ei|xi ∼ N {0,Ri(γ)}
• Xi =CiAi (ni×p)fixed effects design matrix • Zi =CiBi (ni×q)random effects design matrix • ei|xi ∼ N {0,Ri(γ)},bi|xi ∼ N(0,D)
6.2 Model specification
6.3 Inference and considerations for missing data
6.4 Best linear unbiased prediction and empirical Bayes 6.5 Implementation via the EM algorithm
Given a linear mixed effects model: Inducedpopulation-averaged model
E(Yi|xi) =Xiβ, var(Yi|xi) =Vi =Vi(ξ,xi)
Vi(ξ,xi) =ZiDZTi +Ri(γ), ξ ={γT,vech(D)T}T
(Yi,xi),i =1, . . . ,m,independentacrossi • Succinctly:
E(Y|x˜) =Xβ, var(Y|x˜) =V(ξ,x˜) =V =ZDZe T +R
• Can be generalized to var(bi|xi) =D(ai)
Estimation ofβandξ: Normalityassumptions at each stage of the hierarchy yield the assumption
Yi|xi ∼ N {Xiβ,Vi(ξ,xi)}
• ⇒βandξcan be estimated by solving the estimating equations corresponding tomaximum likelihoodorREML
Large sample inference: Approximatesampling distributions forβb
using ML or REML • Model-based:
b
β∼ N· (β0,ΣbM), ΣbM =
m
X
i=1
XTi V−i 1(bξ,xi)Xi !−1
={XTV−1(ξb,x˜)X}−1
• Robust or empirical:
b
β∼ N· (β0,ΣbR)
b ΣR=
( m X
i=1
XiV−i 1(bξ,xi)Xi )−1m
X i=1
XTi V −1
i bξ,xi)(Yi−Xiβb)(Yi−Xiβb) T
V−i 1(bξ,xi)Xi
×
( m X
i=1 XiV
−1 i bξ,xi)Xi
)−1
• Can be used for inference onlinearfunctionsLβwitheithera SS or PA interpretation
Information criteria: Can be used to comparenon-nestedmodels as in Chapter 5
• In particular: Compare different models for overall covariance structureinducedby combinations of choices of models for var(ei|xi)and var(bi|ai)
• Dental study: Compare taking var(bi|ai) =var(bi) =D to
var(bi|ai) =D(ai) =DGI(gi =0) +DBI(gi =1)
• Dental study: Compare taking var(ei|xi) =σ2I4for alli to
Implications of missing data:Identicalto those discussed in Chapter 5
• Under assumptions of a MAR mechanismandnormality, estimators forβandξareconsistent
• Model-basedapproximation to sampling distribution ofβb can be
used, but withΣbM replaced by the appropriate element of the
inverse of theobserved information matrix
Curiosity: Balanceddata,Yi (n×1),i =1, . . . ,m,samentimes • Zi =Z∗, same for alli (verify)
• If var(bi) =D,Ri(γ) =σ2In,
Vi(ξ,xi) =V∗=Z∗DZ∗T +σ2In • WithDb and
b
σ2obtained by ML or REML⇒Vb ∗
=Z∗DZb ∗T + b
σ2In can be shown that
b
β=
m
X
i=1 XTi Vb
∗ −1 Xi
!−1 m X
i=1 XTi Vb
∗ −1 Yi and the OLS estimator
b
βOLS =
m
X
i=1 XTi Xi
!−1 m X
i=1 XTi Yi
Equivalence: Shown usingmatrix inversion resultsgiven in Appendix A
• Continues to hold ifσ2andDin take ondifferentvalues
corresponding to different levels of anamong-individual covariate • Thediligent studentwill of course verify this
• Important: This equivalencedoes notmean that one can
disregardthe need to characterize covariance structure and take allN observations to bemutually independent
• Correct characterization of theapproximate sampling distribution of the estimator requires that the overall covariance be
Motivation for linear mixed effects model: From asubject-specific perspective
• Aligns with theconceptual frameworkin Chapter 2
• Is natural when questions of interest involveindividual-specific phenomena and features
Implied model: From apopulation-averagedperspective • Form of population mean response andoverall covariance
structureincorporating components due toamong-and within-individual sourcesareinduced
• ⇒Model for overall covariance structure is “automatic” rather than chosenexplicitlyby the data analyst
• Sidesteps thechallengeof specifying a suitable overall structure • The induced model isrichly parameterizedand can represent
However: SS vs. PA perspective is critical forinferenceon and interpretationof the inducedoverall covariance matrix
• Population-averaged perspective:Theinducedoverall covariance structure is a convenient and flexible way of representing acomplex true structure
• ξ={γT,vech(D)T}T are parameters that characterize this structure, withno restrictionson possible values ofξ
• ⇒Dneed not be a legitimate covariance matrix (non-negative diagonal elements), andγneed not be restricted so thatRi(γ)is legitimate covariance matrix
• Subject-specific perspective: DandRi(γ)must belegitimate covariance matrices corresponding toamong-and
within-individualsources of variation and correlation
• ⇒Restrictionson the parameter space ofξ={γT,vech(D)T}T that ensure this
6.2 Model specification
6.3 Inference and considerations for missing data
6.4 Best linear unbiased prediction and empirical Bayes
6.5 Implementation via the EM algorithm 6.6 Testing variance components
Subject-specific perspective: Each individualihas specific parametersβi characterizing his/herinherent trajectory, depending onrandom effectsbi reflecting howi’s parameters deviate from the “typical” values and howi’s inherent trajectory deviates from the overall population mean profile
• Of interest: “Estimate”bi (andβi) for each individuali • ⇒Characterize individual-specificfeaturesandtrajectories,
identifyoutlyingindividuals
• bi is arandom vectorcorresponding to arandomly chosen individualfrom the population⇒predictionof the value taken on by a random vector
• Inference onbi is aprediction problem
• BecauseYi contains information aboutbi, it is natural to view this prediction problem as characterizingbi giventhat we have observedYi =yi
Natural predictor: The value ofbi “most likely” given we have observedYi =yi
Posterior mode: “Estimate”bi by the value thatmaximizesthe posterior distributionofbi givenYi evaluated atyi
• Bayesian view: bi areparameters,bi ∼ N(0,D)is referred to as theprior distribution, densityp(bi;D)
• Density ofYi|xi,bi ∼ N {Xiβ+Zibi,Ri(γ)},p(yi|xi,bi;β,γ) • Here: Do not considerβandξfrom the classical Bayesian
perspective as random quantities with prior distributions, but treat them asfixed and known; more on this momentarily
Posterior mode: “Estimate”bi by the value thatmaximizesthe posterior distributionofbi givenYi evaluated atyi
• Bayes theorem:Theposterior densityis
p(bi|yi,xi;β,γ,D) =
p(yi|xi,bi;β,γ)p(bi;D) p(yi|xi;β,γ,D)
(6.10)
p(yi|xi;β,γ,D) = Z
p(yi|xi,bi;β,γ)p(bi;D)dbi
• Can show: (6.10) is a normal density withmean
DZTi V−i 1(ξ,xi){yi−Xiβ} (6.11)
• Themeanof a normal distribution is also themodeof the density ⇒(6.11)maximizesthe posterior density
Result: Natural to substitute estimatorsβb andbξin (6.11) and obtain
theempirical Bayes“estimator” forbi
b
bi =DZb Ti V−1
i (bξ,xi){Yi−Xiβb}, (6.12)
• Ideally:Ifξwereknown, (6.12) is
b
bi =DZTi V−i 1(ξ,xi){Yi−Xiβb} (6.13)
• Can show: Conditional onx, (6.13) has mean zero and˜
var(bbi|x˜) =DZTi
V−i 1−V−i 1Xi m
X
i=1
XTi V−i 1Xi
!−1
XTi V−i 1
ZiD
• Canunderstatethe variability inbbi becausebi is a “moving
target” (random rather than fixed); use instead
var(bbi−bi|x˜) =D−DZTi
V−i 1−V −1 i Xi
m X
XTiV −1 i Xi
!−1 XTiV
−1 i
Alternatively: Can obtain (6.13) directly by usingBayes theorem withξtreated as known butβis viewed instead as arandom vector independentofbi
• Take theprior distributionofβto be aN(β∗,H)distribution with prior densityp(β|β∗,H)depending onhyperparametersβ∗and H
• ⇒Assumevagueprior information onβby settingH−1=0
• Can show the mean of the posterior density forβisβb
• And the mean of the posterior density forbi is (6.13) • Left as an exercise for thediligent student
Standard principle: A “best” predictorc(Yi)minimizes mean squared error
E[{c(Yi)−bi}TA{c(Yi)−bi}]
expectation is with respect to the joint distribution ofYi andbi, andA is any positive definite symmetric matrix
• It is straightforward (try it) that thebest predictorin this sense is
E(bi|Yi)
which does not depend onA
• Thus: Under thenormalityassumptions for the linear mixed model andξknown,bbi in (6.13) is “best”
• Becausebbi is alsolinearinYi, it is the bestlinear functionofYi
to use as a predictor under normality
Restricting to linear predictors: Evenwithoutnormality, can show withξknown that (6.13) minimizes mean squared error, is alinear function ofYi, andE(bbi) =E(bi) =0
• Best linear unbiased predictor(BLUP)
• Derivation: Searle, Casella, and McCulloch (2006, Chapter 7), Robinson (1991)
• In practice:ξ is replaced by ML or REML estimatorbξ ⇒
estimated best linear unbiased predictororEBLUP
• The termsBLUP,empirical Bayes estimator, andEBLUPare often usedinterchangeably
Another approach to a predictor: For knownξ(Henderson, 1984) • “Estimate”bi,i=1, . . . ,m, in the “stacked” vectorbjointly with
β, by minimizing inβandbtheobjective function log|De|+bTDe
−1
b+log|R|+ (Y −Xβ−Z b)TR−1(Y −Xβ−Z b)
• Under normality: Twice the negative log of theposterior density ofb(fixedβ) and twice the negative loglikelihood forβ(fixedb) • Using Appendix A, solve
XTR−1(Y−Xβ−Z b) =0
e
Db−ZTR−1(Y−Xβ−Z b) =0
• Rearrange to obtain themixed model equations XTR−1X XTR−1Z
ZTR−1X ZTR−1Z +De −1 ! b β b b ! =
XTR−1Y ZTR−1Y
Another approach to a predictor: For knownξ(Henderson, 1984) • Can show using Appendix A
R−1−R−1Z(ZTR−1Z +De −1
)ZTR−1= (R+ZDZe T)−1=V−1
that the solutions to the mixed model equations are
b
Predictions are “shrunk” toward the mean: For demonstration, consider (6.13),ξ known
b
bi =DZTi V−i 1(ξ,xi){Yi−Xiβb}
in the case of thesimplestlinear mixed model (no covariates) Yi =1niβ+1nibi+ei, var(ei) =σ
2I
ni, var(bi) =D • Xi =1ni (p=1),Zi =1ni (q =1),scalarrandom effectbi ⇒ • Vi =DJni +σ
2I
ni (compound symmetric) V−i 1=σ−2
Ini − D
σ2+n iD
Jni
• BLUP:Can show (verify)
b
bi = niD
σ2+n iD
(Yi−βb), Yi =n−i 1
ni X
b
bi = niD
σ2+n iD
(Yi−βb), Yi =n−i 1
ni X
j=1
Yij (6.15)
• Write (6.15) as theweighted average
b
bi =wi(Yi−βb) + (1−wi)0, wi =
niD
σ2+n iD
<1
ofbest guessfor wherei “sits” in the population relative to overall meanβ, based solely on the data, and 0, the mean ofbi
• “Weight”wi <1 movesbbi away from being solely based on the
b
bi =wi(Yi−βb) + (1−wi)0, wi =
niD
σ2+n iD
<1 • The largerni (more data oni), the closerwi is to 1⇒more
weighton(Yi−βb)
• Similarly, ifamong-individual variationlargerelativeto within-individual variation,D/σ2is large, the closerwi is to 1 • If insteadni is small and/orD/σ2is small⇒information oniis
Individual-specific mean:β+bi, obvious predictor
b
β+bbi =wiYi+ (1−wi)βb=
niD
σ2+n iD
Yi+
σ2 σ2+n
iD
b
β
• Ifni and/orD/σ2large,wi close to 1, and prediction is based mainly on data fromi,Yi
• ⇒quality of information fromiis high and/orlittle to be learned about a specific individualfrom the population
• ifni and orD/σ2small,wi close to 0, and prediction is based mainly onβb(population)
• ⇒quality of information fromiis low and/orlittle to be learned about a specific individualfrom his data
Terminology: This is referred to asshrinkage
• In predicting where an individual “sits” in the population and thus his/her individual-specific trajectory, information from her data is “shrunk” toward the overall population mean
General form: Predictor of individual-specific trajectoryXiβ+Zibi Xiβb+Zibbi =Xiβb+ZiDZTi V−i 1(ξ,xi)(Yi −Xiβb)
= (Ini −ZiDZ T i V
−1
i )Xβb+ZiDZTi V−i 1Yi =RiV−i 1Xβb+ (In
i −RiV −1 i )Yi
• Aweighted averageof the estimated overall population mean profileXiβband the dataYi oni
• Predictor ofindividual-specific parametersβi =Aiβ+Bibi
b
Common: Usebbi fordiagnostic purposes
• Histogramsandscatterplotsof components ofbbi to identify
unusualindividuals
• Histogramsandnormal quantile plotsof the components of the
b
bi to evaluate relevance of thenormality assumptiononbi Caveats:
• bbi havedifferent distributionsfor eachi unlessXi andZi arethe
samefor alli⇒graphics based on the rawbbi may be
uninterpretable[canstandardizethebbi using (6.14)]
• Even with the same distribution,bbi are subject toshrinkage, so
graphics and summaries will reflectless variabilitythanactually presentin the distribution of the truebi
6.2 Model specification
6.3 Inference and considerations for missing data 6.4 Best linear unbiased prediction and empirical Bayes
6.5 Implementation via the EM algorithm
History: Prior tomodern computing, optimization of the ML/REML objective functions could becomputationally challenging
• In aworld-famouspaper, Laird and Ware (1982) showed how this could be accomplished using theexpectation-maximization algorithm(EM)
EM algorithm: Acomputational techniqueto maximize an objective function
• Can be motivated generically from a MAR missing data perspective starting with theobserved data likelihood
• Under reasonable conditions, the algorithmconvergesto the values of the model parameters maximizing the objective function and isguaranteedto increase toward the maximum at each iteration
Here: Do not derive the algorithm, butsketch heuristicallythe rationale for and form of the algorithm for maximizing the ML objective function in the particular linear mixed model
Yi =Xiβ+Zibi+ei, bi∼ N(0,D), ei ∼ N(0, σ2Ini), ,i =1, . . . ,m
• Missing data analogy: (Yi,xi,bi),i=1, . . . ,m, are thefull data • bi,i =1, . . . ,m, are “missing” for alli ⇒(Yi,xi),i=1, . . . ,m,
are theobserved data
“Full data” loglikelihood:The joint density of(Yi,bi),i =1, . . . ,m, givenx˜ isproportional to
m
Y
i=1
σ−1exp{−(Yi−Xiβ−Zibi)T(Yi−Xiβ−Zibi)/(2σ2)}|D|−1/2exp(−bTi D −1b
i/2)
• Ifβwere known, so onlyξis unknown,sufficient statisticsforσ2
andDare T1=
m
X
i=1
eTi ei, ei =Yi−Xiβ−Zibi, T2= m
X
i=1 bibTi
which could be calculated if the “full data”(Yi,xi,bi), i =1, . . . ,mwere available
• Yieldingestimatorsforσ2andD
b
EM algorithm: Based on repeated evaluation of theconditional expectationsof the “full data” sufficient statistics given the “observed data”(Yi,xi),i=1, . . . ,m
• ⇒Must deriveE(T1|Yi,xi),E(T2|Yi,xi) • Can be found by noting that
Yi bi ei xi
∼ N
Xiβ
0 0 ,
ZiDZTi +σ2Ini ZiD σ 2I
ni
DZTi D 0
σ2I 0 σ2Ini
from which it follows that E(bi|Yi,xi) =DZTi V
−1
i (Yi−Xiβ), var(bi|Yi,xi) =D−DZ T i V
−1 i ZiD E(ei|Yi,xi) =σ2V−i 1(Yi−Xiβ) =Yi−Xiβ−ZiDZTi V
−1
i (Yi−Xiβ) var(e|Y ,x ) =σ2(In −σ2V−1)
Sufficient statistics: Using the above results T1=
m
X
i=1
eTi ei, ei =Yi−Xiβ−Zibi, T2= m
X
i=1 bibTi
• E(T2|Yi,xi)can be obtained from
E(bibTi |Yi,xi) =E(bi|Yi,xi)E(bi|Yi,xi)T +var(bi|Yi,xi)
• E(T1|Yi,xi)can be obtained from
E(eTi ei|Yi,xi) =tr{E(eieTi |Yi,xi)}
EM algorithm: With starting valuesσ2(0),D(0), at the`th iteration, with current iteratesσ2(`),D(`)andV(`)
i =σ
2(`)I
ni +ZiD (`)ZT
i , carry out the following two steps
1. Calculate
β(`)=
m
X
i=1
XTi V(i`)−1Xi
!−1 m
X
i=1
XTi V(i`)−1Yi
2. Define
r(i`)=Yi −Xiβ(`), b (`) i =D
(`)
ZiTV(i`)−1ri(`), i=1, . . . ,m Then updateσ2(`)andD(`)as
σ2(`+1)=N−1 m
X
i=1
{(r(i`)−Zib (`) i )
T(r(`) i −Zib
(`) i ) +σ
2(`)
tr(Ini−σ
2(`)V(`)−1
i )}
D(`+1)=m−1 m
X
i=1
Remarks:
• Details of implementation in Laird, Lange, and Stram (1987); these authors also present an algorithm for maximizing the REML objective function
• The algorithm can bevery slowto reach convergence, but the value of the objective function is guaranteed to increase at every iteration, which is reassuring
• With modern computing, implementations of direct optimization in SAS and R have been optimized to the point that it is unusual to encounter computational difficulties with linear mixed effects models
• However, EM algorithms can bevery valuablein much more complicated statistical models
6.2 Model specification
6.3 Inference and considerations for missing data 6.4 Best linear unbiased prediction and empirical Bayes 6.5 Implementation via the EM algorithm
Recap:Inducedoverall covariance structure
var(Y|x˜) =V(ξ,x˜) =V =ZDZe T +R
• Population-averaged perspective:This induced structure is a flexible way of representing acomplex true structure
• ξ={γT,vech(D)T}T are parameters that characterize this structure, withno restrictionson possible values ofξ • ⇒DandRi(γ)need not belegitimate covariance matrices • Subject-specific perspective: DandRi(γ)must belegitimate
covariance matricescorresponding toamong-and within-individualsources of variation and correlation
• ⇒Restrictionson the parameter space ofξ={γT,vech(D)T}T Imperative: The analyst must acknowledge the modeling perspective in makinginferencesonξ= (γT,vech(D)T}T
Hip replacement study data: Recall the SS model
Yij =β0i+β1itij +β2itij2+eij, βTi = (β0i, β1i, β2i)T
βi =Aiβ+Bibi, bi =
b0i b1i b2i
, Bi=I3 • Withbi ⊥⊥ai, var(bi) =D(3×3), var(ei) =σ2Ini
Vi =ZiDZTi +σ2Ini, ξ={σ
2,vech(D)T}T (7×1) • PA perspective: This modelinducespopulation mean and overall
covariance models; no restrictions onξ
• SS perspective: RequiredthatDis nonnegative definite (legitimate covariance matrix) andσ2≥0 (variance)
“Reduced” model: Takeβ2i to be “fixed” (eliminateb2i)
bi =
b0i b1i
, Bi =
1 0 0 1 0 0
• ⇒var(bi) =D2,ξ={σ2,vech(D2)T}T (4×1)
• PA perspective: A way toinduceamore parsimoniousoverall covariance structure withfewer parameters
• SS perspective: Relative to variation in the population in intercepts and linear components, individual-specificquadratic components either do not vary at all or exhibitnegligible variation • ⇒ Are individual-specific quadratic components “fixed” or
Either case: Is the “full” model required, or is the “reduced” model adequate?
• PA perspective: Is asimpler representationof theoverall covariance structurebased on fewer parameters is adequate? • SS perspective: Belief about therelative magnitude of variation
in individual-specific quadratic components
• The “reduced” model can be represented as takingD(3×3)to be
D=
D2 0
0 0
Formally: Can be addressed by testing thenull hypothesis H0: D=
D11 D12 D13 D12 D22 D23 D13 D23 D33
=
D11 D12 0 D12 D22 0
0 0 0
=
D2 0
0 0
against an appropriate alternative
• Nestedmodels⇒suggestslikelihood ratio test(LRT)
Validity of the LRT forH0: A required regularity conditionfor usual
large sample theory approximationsis that the true value of a parameter is not on theboundary of its parameter spacebut rather lies in itsinterior
• Hypothesis testing: The value of the parameterunder the null hypothesiscannot be on the boundary of the parameter space for usual asymptotic arguments leading to tests to be valid
Implication:
• PA perspective: D(3×3)is a symmetric matrix with parameters characterizing overall covariance structure⇒no restrictionon D33(or any element ofD), so that the value ofD33 underH0is in theinteriorof the parameter space
• SS perspective: D(3×3)is a legitimate covariance matrix⇒ D33is avariance; forD to nonnegative definite, it must be that D33≥0, so that the value ofD33 underH0is on theboundaryof the parameter space
Result: Under aPA perspective, comparing the usuallikelihood ratio test statisticdescribed above to the appropriatechi-square critical valuewill yield avalid testofH0(interpreted from a PA perspective as above)
Subject-specific perspective: Carrying out the likelihood ratio test in the usual waywill notlead to a valid test ofH0
• Must appeal to specialized theoretical results fornonstandard testing situationsin a classic paper by Self and Liang (1987) • Stram and Lee (1994) used this theory to demonstrate that,
whenRi =σ2Ini, the large sample distribution of the LRT statistic is amixture of chi-squared distributions
• See also Section 6.3 of Verbeke and Molenberghs (2000); see also Verbeke and Molenberghs (2003)
General result: Under a SS perspective, withD (q+1×q+1)
H0: D=
Dq 0
0 0
forDq (q×q)postitive definite versus alternative thatD is a general
(q+1×q+1)nonnegative definite matrix
• Large sample distribution of the LRT statistic underH0is a mixtureof aχ2q+1and aχ2q distribution with equal weights of 0.5 • Effect: Reducethe p-value that results relative to the p-value
obtained by (incorrectly) using the LRT procedure in the usual way⇒ignoring the “boundary problem” leads to rejection ofH0 of less often and possibly adopting models that aretoo
Additional questions of interest: From aSS perspective, covariance parametersξmay be ofscientific interest
• E.g., diagonal elements ofDrepresent magnitudes of variation in features of inherent trajectoriesinβi, such as individual-specific intercepts and slopes, in the population of individuals
• Thus,estimatesof these parameters may be desired
Approximate standard errors: In principlecan be based on alarge sample approximationto the sampling distribution of the estimatorbξ
• Can be derived by anestimating equation argument(ni fixed) assuming theinducedmodels for population mean response and overall covariance structure arecorrectly specified
• Issue:The covariance matrix of the approximate sampling distribution depends onthirdandfourth momentsof the true distribution ofYi|xi
• IfYi|xi is taken to benormally distributed, this covariance matrix can be derived from theinformation matrixand depends on the fourth momentof a normal distribution
• However, if the true distributiondepartsfrom the normal even slightly, these approximate standard errors for components ofbξ
can bevery unreliable