Regression Models for Longitudinal Data - Missing Data in Longitudinal Studies: Dropout, Causal

2.1 Introduction 2.1.1 Longitudinal data

The material in this book is organized around regression models for repeated measurements. Appealing to first principles, one can think of longitudinal data as arising from the joint evolution of response and covariates,

{Yi(t), xi(t) : t ≥ 0}

If the process is observed at a discrete set of time points t = (t1, . . . , tJ)^T that is common to all individuals, the resulting response data can be written in terms of the J × 1 vector

Yi = {Yi(t) : t ∈ t}

= (Yi1, . . . , YiJ)^T.

The covariate process {xi(t)} is p × 1. At time tj, the observed covari-ates are collected in the vector xij = (xij1, . . . , xijp)^T. Hence the full collection of observed covariates is contained in the J × p matrix

Xi =





 x^T_i1 x^T_i2 ... x^T_iJ





.

When the set of observation times is common to all individuals, we say the responses are balanced or temporally aligned. It is sometimes the case that observation times are unbalanced, or temporally misaligned, in that they vary by subject. in which case the times are ti1, . . . , tiJi and the dimensions of Yi and Xi are Ji× 1 and Ji× p, respectively.

In regression, we are interested in characterizing the effect of covariates

18 REGRESSION MODELS X on a longitudinal dependent variable Y . Formally, we wish to draw inference about the joint distribution of the vector Yi of responses, con-ditionally on Xi,

[Yi | Xi] = [Yi1, . . . , YiJi| Xi].

Likelihood-based regression models for longitudinal data require a spec-ification of this joint distribution using a model f (y | x, θ). The parame-ter θ is a finite-dimensional vector of parameparame-ters indexing the model; it might include regression coefficients, variance components, and param-eters indexing serial correlation.

The joint distribution of responses can specified directly or indirectly.

Directly specified models are written in terms of the marginal mean at each measurement occasion or time point t, together with a model for the variance-covariance structure. Indirectly specified models typically use a multilevel format, for example involving subject-specific random effects or latent variables bito partition within- and between-subject variation.

The usual strategy is to specify the joint distribution of responses and random effects, factored as

[Yi, bi| Xi] = [Yi| bi, Xi] [bi| Xi].

The distribution of interest, [Yi | Xi], is obtained by integrating over the bi. Both directly- and indirectly-specified models are common for modeling longitudinal data, and in our review we will give several ex-amples.

2.1.2 Regression models

The literature on regression models for longitudinal data is vast, and we make no attempt to be comprehensive here. Our review is designed to highlight predominant approaches to regression modeling, emphasizing those models used in later chapters. Readers are referred to Diggle et al. [DHLZ02b], Fitzmaurice et al. [FLW04], Laird [Lai04], Jones [Jon93], Davidian and Giltinan [DG98], Crowder and Hand [CH90], Verbeke and Molenberghs [VM00], and Lindsey [Lin99] for a variety of perspectives.

As we review several different regression models, the intent is to give the reader a sense of the rich variety of models that can be used to characterize longitudinal data, and to illustrate that these fit coherently into a single framework. As a result, missing data strategies described in later chapters can be applied very generally. Specific models described here will be familiar to those with experience analyzing longitudinal data (e.g. multivariate normal regression model, random effects models), but

INTRODUCTION 19 others represent fairly new developments (e.g. marginalized transition models [Hea02], regression splines [EM96, LZ99, RWC03]). Here we focus on specification and interpretation; Chapter 3 covers various aspect of inference.

Because many regression models for longitudinal data have their founda-tion in the generalized linear model (GLM) for cross-secfounda-tional data [MN99], our review begins with a concise description of GLMs. Coverage of mod-els for longitudinal data begins with random effects modmod-els; these build directly on the GLM structure by introducing individual-level random effects to capture between-subject variation. Conditionally on the ran-dom effects, within-level variation can be described by a simpler model, such as a GLM. Random effects models are very attractive in that they naturally partition variation in the dependent variable into its between-and within-subject components, between-and they can be used to model both balanced and unbalanced data. At the same time, there is sometimes the disadvantage that the implied marginal distribution of responses is opaque.

An alternative to random effects models is directly-specified models of the joint marginal distribution of responses (Section 2.4). Frequently re-ferred to as marginal models, directly-specified models have a natural construction when the error distribution is multivariate normal, but for binary, count, and other discrete data, the choice of an appropriate joint distribution is less obvious. Our review touches on some recent devel-opments for discrete longitudinal responses, such as the marginalized transition model [Hea02] and others. For a detailed review of likelihood-based models of multivariate discrete responses, see Chapter 11 of Diggle et al. [DHLZ02a] and Chapter 7 of Laird [Lai04].

For all models covered in the first part of the chapter, the regression function is linear in covariates and takes a known functional form. Sec-tion 2.5 describes models in which the regression funcSec-tion can be non-linear, either through a known function of the covariates or through an unspecified smooth function. The latter type of model is typically called semiparametric, because the regression is left unspecified but dis-tributional assumptions are made about the error structure. Nonlinear and semiparametric models have a close connection to the GLM struc-ture; our discussion of these models emphasizes that connection and illustrates that regression models as a whole can be very generally char-acterized [HTF01, RWC03].

The final element of our review concerns interpretation of covariate ef-fects in longitudinal models. Because the response and covariates change with time, models of longitudinal data afford the opportunity to infer

20 REGRESSION MODELS both within- and between-subject covariate effects; however the impor-tance of underlying assumptions to the interpretation of covariate ef-fects should not be underestimated. Section 2.6 discusses three key as-pects of interpretation and specification for longitudinal models: cross-sectional versus longitudinal effects of a time-varying covariate, marginal (population-averaged) versus conditional (subject-specific) covariate ef-fects, and the assumptions governing the use of time-varying covariates.

2.1.3 Full vs. observed data

Throughout Chapters 2 and 3, the models refer to a full-data distri-bution. The distinction between full and observed data is particularly important when drawing inference from incomplete longitudinal data.

We define the full data as those observations intended to be collected on a pre-specified interval, such as [0, T ]. For example, if intended collection times t1, . . . , tJ are common to all individuals, then the full response and covariate data are (Yi1, Xi1), . . . , (YiJ, XiJ), where Yij = Yi(tj) and Xij= Xi(tj).^∗

In most applications, interest lies in the effect of covariates on the mean structure. When data are fully observed, the variance and covariance models can frequently be treated as nuisance parameters. Correct speci-fication of variance and covariance allows more efficient use of the data, but it is not always necessary for obtaining proper inferences about mean parameters. When data are not fully observed, variance-covariance specification takes on heightened importance because missing data will effectively be imputed or extrapolated from observed data, based on modeling assumptions. For longitudinal data, unobserved responses will be imputed from observed responses for the same individual; the as-sumed correlation structure will usually have considerable influence on the imputation. This theme recurs throughout the book, and therefore our review pays particular attention to aspects of variance-covariance specification.

2.1.4 Additional notation

Random variables and their realizations are denoted by Roman letters (e.g., X, x), and parameters are represented by Greek letters (e.g. α, θ).

∗ In Chapter 5, we expand the definition of full data to include random variables such as dropout time that characterize the missing data process.

GENERALIZED LINEAR MODELS 21 Vector- and matrix-valued random variables and parameters are repre-sented using boldface (e.g. x, Y , β, Σ). For any matrix or vector A, we use A^T to denote transpose. If A is invertible, then A⁻¹ is its in-verse and S = A^1/2 is the lower triangular matrix square root such that SS^T = A. A full listing of notational conventions appears in the Appendix.

2.2 Generalized linear models for cross sectional data

The generalized linear model (GLM) forms the foundation for many ap-proaches to regression with multivariate responses, such as longitudinal or clustered data. Models such as random effects or mixed effects models, latent variable and latent class models, and regression splines, all highly flexible and general, are based on the GLM framework. Moment-based methods such as generalized estimating equations (GEE) also follow di-rectly from the GLM for cross-sectional data [LZ86].

The GLM is a regression model for a dependent variable arising from the exponential family of distributions,

f (y | θ, ψ) = exp {(yθ − b(θ)) /a(ψ) + c(y, ψ) } ,

where a, b and c are known functions, θ is the canonical parameter, and ψ is a scale parameter. The exponential family includes several commonly-used distributions, such as normal, Poisson, binomial, and gamma. It can be readily shown that

E(Y ) = b⁰(θ) var(Y ) = a(ψ)b⁰⁰(θ)

(see McCullagh and Nelder [MN89], Section 2.2.2 for details). The effect of covariates xi = (xi1, . . . , xip)^T can be modeled by introducing the linear predictor

ηi(xi, β) = x^T_iβ,

where β = (β1, . . . , βp)^Tis a vector of regression coefficients. Now define µi = µ(xi, β) = E(Y | xi). A smooth, monotone function g links the mean µi to the linear predictor ηi via

g(µi) = ηi= xiβ. (2.1)

In many exponential family distributions, it is possible to identify a link function g such that X^TY is the sufficient statistic for β (here, X is the n × p design matrix and Y = (Y1, . . . , Yn)^T is the n × 1 vector of responses). In this case, the canonical parameter is θ = η. Examples are well-known and widespread: for the Poisson distribution, the canonical

22 REGRESSION MODELS parameter is log(µ); for binomial distribution, it is the log odds (logit), log{µ/(1 − µ)}.

Although canonical links are sometimes convenient, their use is not nec-essary to form a GLM. In general, it only requires specification of a mean and variance function, conditionally on covariates. The mean fol-lows (2.1), and the variance is given by

v(µi, φ) = φh(µi),

where h(·) is some function of the mean and φ > 0 is a scale factor. Cer-tain choices of g and h will yield likelihood score equations for common parametric regression models based on for exponential family distribu-tions. For example, setting g(µ) = log{µ/(1 − µ)}, h(µ) = µ(1 − µ) and φ = 1 yields logistic regression under a Bernoulli distribution [Yi | xi].

Similarly, Poisson regression can be specified by setting g(µ) = log µ, h(µ) = µ and φ = 1.

2.3 Conditionally specified (random effects) models

Conditionally specified models using random effects or latent variables provide a highly flexible class of models for handling longitudinal data.

A defining characteristic of these models is that they impose structure on marginal variance and correlation using individual-specific random effects or latent variables. The models can be applied either to balanced or unbalanced response patterns, and can be used to capture key fea-tures of both between- and within-subject variation using relatively few parameters.

A standard approach is to specify a regression model that includes subject-level random effects or latent variables b, and then to assume that conditionally on the latent variables, the distribution [Y | X, b]

has a simple form (e.g. its elements are independent). Integrating out the random effects yields marginal correlations between the {Yij} within subject [BK99].

Many models, regression and otherwise, can be represented using a dom effects or latent variable formulation. These include standard ran-dom effects regression models for responses that are continuous [LW82, Dig88], or discrete [SLW84, GH97, HG94, NMK00]; see Breslow and Clayton [BC93] and Daniels and Gatsonis [DG99] for an overview. This class of models also includes include regression models with factor-analytic and latent class structures [SR96, AW00, SL96, RLR99, RA01]. See Bartholomew and Knott [BK99] for a full account.

Here we briefly review conditionally-specified regression models where

CONDITIONALLY-SPECIFIED MODELS 23 conditioning is done on random effects; these models also are known by a variety of names, including ‘mixed effects models’, ‘random effects models’, and ‘random coefficient models’. We use the term ‘random ef-fects’ models.

The most common random effects models for longitudinal data specify the joint distribution [Yi, bi| Xi, θ] as

[Yi| bi, Xi, θ1] [bi| Xi, θ2].

The parameter θ1captures the conditional effect of X on Y . The marginal distribution [Yi | Xi] is obtained by integrating bi out of the joint dis-tribution, and is indexed by the full set of parameters θ = (θ1, θ2).

2.3.1 Random effects models based on GLMs

By including random effects, generalized linear models can be used to model longitudinal and clustered data. For common distributions such as Bernoulli and Poisson, the GLM with random effects can be written in terms of the conditional mean and variance. The conditional mean takes the form

g{E(Yij| xij, zij, bi)} = g(µ^b_ij) = xijβ+ zijbi,

where g(·) is a link function and zij is a design matrix for the subject-specific random effects. This representation of the conditional mean mo-tivates the term ‘mixed-effects model’ because the coefficients quantify both population-level (β) and individual-level (bi) effects. The condi-tional variance is

V_ij^b = var(Yij | xij, zij, bi) = φh(µ^b_ij).

Finally, within subject correlation is specified through a covariance func-tion

C_ijk^b (γ) = cov(Yij, Yik | xij, xik, bi, γ).

In many cases it is assumed that C_ijk^b = 0; i.e., that the random effects capture relevant within-subject correlation (after averaging over their distribution), but this assumption may not always be appropriate for longitudinal responses.

At the second level, the random effects bi follow some distribution such as multivariate normal. The model for the marginal joint distribution of (Yi1, . . . , YiJ | Xi) is obtained by integrating over bi,

f (y1, . . . , yJ| Xi, θ) = Z

f (y1, . . . , yJ| Xi, bi, θ1) dF (bi| Xi, θ2).

24 REGRESSION MODELS The relationship between marginal and conditional (random effects) models is important to understand, particularly as it relates to inter-preting covariate effects. In what follows we give several examples to illustrate.

2.3.2 Random effects models for continuous response

A natural choice for modeling continuous or measured responses is the normal distribution. In random effects models, allowing both within-and between-subject variation to follow a normal distribution, or more generally a Gaussian process, affords considerable modeling flexibility while retaining interpretability.

Example 2.1. Normal random effects model for continuous responses.

A common model for continuous longitudinal responses is the normal random effects model. This model illustrates well the concept of an

‘indirectly-specified’ joint distribution because the variance-covariance structure in [Yi | Xi, θ] is a by-product of the assumed random effects distribution.

Like many random effects models, it is easiest to describe in two stages.

At the first stage, the responses Yiare normal conditionally on a q × 1 vector of random effects bi,

[Yi| Xi, bi, θ1] ∼ N(µ^b_i, Σ^b_i),

where superscript b is added to emphasize that the mean and covariance are conditional on bi. To incorporate covariate effects, let

µ^b_i = Xiβ+ Zibi,

where Zi is the design matrix for random effects. The variance matrix Σ^b_i = Σ^b_i(φ) captures within-subject variation and is parameterized by the r × 1 vector of φ of nonredundant parameters. Hence θ1= (β, φ).

When Zi ⊆ Xi, as is usually the case, the bi can be thought of as error terms for regression coefficients, which gives rise to the term ‘ran-dom coefficient model.’ For example, if Xi= Zi, we obtain a ‘random-coefficients model’,

µ^b_i = Xiβ_i= Xi(β + bi). (2.2) where the random effects bi can be interpreted as individual-specific deviations from β.

The within-subject variance Σi(φ) usually has a simplified structure, parameterized through a covariance function Cijk(φ). For example, an

CONDITIONALLY-SPECIFIED MODELS 25 exponential structure takes the form

Cijk(φ) = σ²ρ^|t^ij^−t^ik^|, where φ = (σ², ρ) and 0 ≤ ρ ≤ 1.

At the second level, the random effects are assigned a distribution that can depend on covariates. The (multivariate) normal is a common choice,

[bi| Xi] ∼ N(0, Ω),

where Ω = Ω(η) is a q × q variance matrix indexed by η (hence θ2= η).

It also is possible to allow η to depend on individual-level covariates through appropriate specifications [DZ03]; this is covered in more detail in Chapter 6.

The marginal distribution of Yi follows the multivariate normal distri-bution

[Yi| Xi, Zi, θ] ∼ N(Xiβ, ZiΩZ^T_i + Σ). (2.3) The marginal variance var(Yi | Xi) is indirectly specified because it depends on parameters from both [Yi| Xi, bi] and [bi| Xi]. Moreover, we see from by comparing (2.2) and (2.3) that β can be interpreted both as a marginal and a conditional effect of Xi on Yi.

A version of this model is used to analyze data described in Example 1.1, a longitudinal clinical trial comparing three doses of an antipsychotic to the standard of care in schizophrenia patients. The analysis appears in

Data Analysis 4.2. 2

2.3.3 Random effects models for discrete responses

Random effects specifications can be very useful for modeling longitu-dinal discrete responses, where the joint distribution rarely takes an obvious form and principles from generalized linear models are not eas-ily applied. In the case of longitudinal binary data, for example, it is straightforward to show that the joint distribution of a J-dimensional response variable can be represented by a multinomial distribution with 2^J categories. When J is appreciably large, however, parameter con-straints must be imposed to make modeling practical. See Laird [Lai04], Chapter 7 for a more detailed discussion.

Compared to direct specification of the joint distribution, random ef-fects models offer the advantage of being parsimonious, providing a nat-ural decomposition of sources of variation, and applying equally well to balanced and unbalanced response profiles. The regression parameters

26 REGRESSION MODELS represent covariate effects in the conditional rather than marginal joint distribution of Y , however, and because the link functions are nonlinear transformations of the mean (e.g., log, logit), these do not generally coin-cide. Therefore care must be taken when interpreting regression effects.

The logistic regression with normal random effects illustrates several of these points rather well.

Example 2.2. Logistic regression with random effects.

As in Example 2.1, a logistic random effects model is specified in terms of the joint distribution

[Yi, bi| Xi, θ] = [Yi| Xi, bi, θ1] [bi| Xi, θ2],

where θ = (θ1, θ2). The conditional distribution of each component in Yi follows the Bernoulli model,

[Yij | xij, bi, θ1] ∼ Ber(µ^b_ij), where

g(µ^b_ij) = xijβ+ zijbi (2.4) (hence θ1= β). The random effects distribution follows

[bi| Xi, θ2] ∼ N(0, Ω), so θ2= Ω.

The parameter β characterizes the conditional, or subject-specific effect of Xi on Yi. By contrast, the marginal — or population-averaged — distribution [Yi | Xi, θ] must be obtained by integrating over bi. The marginal mean µij(β, Ω) = E(Yij | xij, β, Ω) is

µij(β, Ω) = Z

µ^b_ij(β) dF (bi| xij, Ω)

Z exp(xijβ+ zijbi)

1 + exp(xijβ+ zijbi) dF (bi| xij, Ω).

The marginal effect of Xidiffers from the conditional effect in that it is a function of both β and Ω, and on the logit scale, it is no longer linear.

Zeger and Liang [ZL92] show that in some cases, the marginal effect in the logit-normal model is approximately linear on the logit scale, and differs from the conditional effect by a scale factor that depends on Ω;

i.e., g(µij) = xijβ^∗, where β^∗ = βk(Ω) and k : R^q → R^p is a known function.

In many cases the population-averaged effect β^∗is attenuated relative to the subject-specific effect β; for example, when q = 1 (random intercept model), bi is normally distributed, biis independent of Xi, and Xi has a single time-constant covariate xi, then |β^∗| ≤ |β|, with the difference

DIRECTLY SPECIFIED (MARGINAL) MODELS 27

|β^∗−β| increasing with var(bi). Interpreting the marginal and conditional effects is considered further in Section 2.6.

In Chapter 4, we use this model to characterize the effect of a behav-ioral intervention on weekly smoking cessation status using longitudi-nal binary data from a recent clinical trial. The data are described in Dataset ?? and analyzed in Data Analysis ??. 2 Examples 2.1 and 2.2 assume the random effects bi follow a normal distribution; this is not necessary and in many cases it may be inappro-priate or incorrect. Zhang and Davidian [ZD01] describe models where the random effects distribution belongs to a flexible class of densities that includes the normal as a special case. Verbeke and Lesaffre [VL96]

describe random effects distributions that follow discrete mixtures of nor-mal distributions. For some simple models, it is sometimes possible to use exploratory analysis in order to ascertain whether a normal or other symmetric distribution is suitable for describing the random effects. In other cases more formal methods of model choice may be needed.

2.4 Directly specified (marginal) models

This section reviews the family of models in which the joint distribu-tion [Y | X] is directly specified by a model f (y | x, θ). Usually the most challenging aspect of model specification is finding a suitable pa-rameterization for the correlation and/or covariance, particularly when observations are unbalanced in time or when the number of observa-tions per subject is large relative to sample size. In these cases, sensible decisions about dimension reduction must be made.

For continuous data that can be characterized using a normal distri-bution or Gaussian process, model specification (though not necessarily selection) can be reasonably straightforward, owing to the natural sepa-ration of mean and variance parameters in the normal distribution. The analyst can focus efforts separately on sensible models for mean and variance/covariance structures.

Other types of data also pose more significant challenges to the process of direct specification due to a lack of obvious choices for joint distribution models. Unlike the normal distribution, which generalizes naturally to

In document Missing Data in Longitudinal Studies: Dropout, Causal Inference, and Sensitivity Analysis (Page 23-51)