Joint Partially Linear Model for Longitudinal Data with
Informative Drop-Outs
Sehee Kim1,*, Donglin Zeng2, and Jeremy M. G. Taylor1
1Department of Biostatistics, University of Michigan, Ann Arbor, Michigan 48109, U.S.A 2Department of Biostatistics, University of North Carolina, Chapel Hill, North Carolina 27599,
U.S.A
Summary
In biomedical research, a steep rise or decline in longitudinal biomarkers may indicate latent disease progression, which may subsequently cause patients to drop out of the study. Ignoring the informative drop-out can cause bias in estimation of the longitudinal model. In such cases, a full parametric specification may be insufficient to capture the complicated pattern of the longitudinal biomarkers. For these types of longitudinal data with the issue of informative drop-outs, we develop a joint partially linear model, with an aim to find the trajectory of the longitudinal biomarker. Specifically, an arbitrary function of time along with linear fixed and random covariate effects is proposed in the model for the biomarker, while a flexible semiparametric transformation model is used to describe the drop-out mechanism. Advantages of this semiparametric joint modeling approach are the following: 1) it provides an easier interpretation, compared to standard nonparametric regression models, and 2) it is a natural way to control for common (observable and unobservable) prognostic factors that may affect both the longitudinal trajectory and the drop-out process. We describe a sieve maximum likelihood estimation procedure using the EM algorithm, where the Akaike information criterion (AIC) and Bayesian information criterion (BIC) are considered to select the number of knots. We show that the proposed estimators achieve desirable asymptotic properties through empirical process theory. The proposed methods are evaluated by simulation studies and applied to prostate cancer data.
Keywords
Joint models; Longitudinal data; Nonparametric maximum likelihood; Partially linear model; Random effects; Sieve maximum likelihood; Transformation models
1. Introduction
In prostate cancer studies, Prostate-specific Antigen (PSA) has been widely used to make clinical decisions. Higher or rising PSA patterns after treatment are related to an increased
7. Supplementary Materials
HHS Public Access
Author manuscript
Biometrics
. Author manuscript; available in PMC 2017 July 25.Published in final edited form as:
Biometrics. 2017 March ; 73(1): 72–82. doi:10.1111/biom.12566.
A
uthor Man
uscr
ipt
A
uthor Man
uscr
ipt
A
uthor Man
uscr
ipt
A
uthor Man
uscr
risk of prostate cancer recurrence. One important scientific goal is, therefore, to identify the PSA trajectory among patients who are treated with radiation therapy for localized prostate cancer (Proust-Lima et al., 2008). In cases where the treatment has been successful, PSA levels are expected to drop over the first post-radiation year and then remain stable at low levels (Zagars et al., 1995). In the other cases, however, PSA values after treatment declined for a short period of time and then might rise again at later times, for example, as illustrated in Web Figure 1. Hence, it is desirable to develop a flexible model to capture this nonlinear temporal trend of PSA levels. One key challenge here is that the follow-up of PSA stopped when salvage hormone therapy was initiated, which is known to change the PSA level or when prostate cancer recurred, resulting in possibly informative drop-out which can lead to bias for PSA trajectory estimation if not accounted for properly.
Motivated by the prostate cancer data, we propose a method that flexibly models longitudinal trajectories of biomarkers using a partially linear model, while taking informative drop-outs into account. We treat the informative drop-out as an event which makes subsequent PSA measurements missing, and these unobserved measures are related to the drop-out process via subject-specific random effects. Advantages of this semiparametric joint modeling approach are 1) it provides an easier interpretation of the covariate effects, compared to standard nonparametric regression models, and 2) it is a natural way to control for common (observable and unobservable) prognostic factors that may affect both the longitudinal trajectory and the informative drop-out.
Without considering complications due to the informative drop-out, several authors have studied partially linear models for longitudinal/clustered data (Zeger and Diggle, 1994; Zhang et al., 1998; Lin and Carroll, 2001; Lin and Ying, 2001). Other related work about joint models with time-varying coefficients, but not considering informative drop-outs, includes Cai et al. (2012) and Lu and Huang (2015). In longitudinal studies, the problem of informative drop-out has received enormous attention. Hogan and Laird (1997) provided an excellent review of model-based approaches to handling incomplete longitudinal
measurements, where most of the existing methods are based on the full parametric
specification of the longitudinal model and dependence structure of the drop-out mechanism. We refer the readers to Hogan and Laird (1997) for a more detailed explanation and
discussion of the limitations of these parametric models. The conventional pattern-mixture and selection models reviewed by them were extended by Roy (2003) and Beunckens et al. (2008), who allowed multiple latent subgroups of subjects (i.e., joint latent class models). The assumption on latent subgroups has been relaxed by Muthén et al. (2011) so that a subject’s subgroup can differ for drop-out and outcomes in their pattern-mixture and selection models. However, all these methods assume some parametric or even linear trends for the longitudinal trajectories, therefore not appropriate for our application.
Roy and Lin (2005) considered longitudinal data with missing time-varying covariates in addition to informative drop-outs. They used a generalized linear mixed model for
continuous or binary longitudinal outcomes, and used a transition model to estimate missing time-varying covariates. For informative drop-out, they used a conventional selection model
A
uthor Man
uscr
ipt
A
uthor Man
uscr
ipt
A
uthor Man
uscr
ipt
A
uthor Man
uscr
random variable, the proportional hazards model is commonly assumed to characterize the drop-out process, however, it may not be appropriate in some cases. For example, patients receiving more aggressive treatment may suffer elevated risk in the beginning but may benefit in the long term if they tolerate the treatment. Considering a general class of transformations for such cases will lead to a better model fit. The transformation models for a single survival time and recurrent event times have been extensively studied, dating back to Dabrowska and Doksum (1988) and more recently by Kosorok et al. (2004) and Kim et al. (2012).
In this article, we propose a joint partially linear model for longitudinal data with the issue of informative drop-out, where the informative drop-out is allowed to be dependent on covariates. By applying general transformation models for informative drop-out, we do not restrict to the proportional hazards cases. The underlying trajectory is estimated using a B-spline approach. A key advantage of a B-B-spline approach is the computational simplicity of using a small number of knots and a parametric regression-type implementation via the sieve approximation, which counterbalances the model complexity in the joint modeling
approach. The knot selection procedures based on AIC and BIC are described in Section 3 and numerically evaluated in Sections 4–5. Through our flexible, but readily interpretable, modeling approach, we can detect significant changes in the trajectory, which may not be found using linear mixed effects models due to their parametric model constraints, while correcting biases caused by the informative drop-outs.
2. Joint Partially Linear Model (JPLM)
Let Y(t) be the longitudinal response at time t, and let T be the informative drop-out time. We define = {X(t); t ≥ 0} and = {Z(t); t ≥ 0} as the covariate processes of fixed and random effects, respectively. The vectors of external covariates X(t) and Z(t) are possibly time-varying, and their subsets are denoted as Xk(t) and Zk(t) (k = 1, 2) in models (1) and
(2). We consider a partially linear model for Y(t)
(1)
and a transformed Cox model for T with the cumulative hazard function
(2)
where α(t) is the unspecified underlying trajectory, β and γ are the vectors of unknown regression coefficients, Λ(·) is an unspecified non-decreasing function, and ε(t) is a white
noise process with variance . To account for the correlation between Y(t) and T, we introduce a common latent variable b, following a (multivariate) normal distribution with mean zero and covariance matrix Σb. We assume Y(t) and T are independent, conditional on
, , and b. In model (2), ϕ is a set of unknown constants with the same number of
A
uthor Man
uscr
ipt
A
uthor Man
uscr
ipt
A
uthor Man
uscr
ipt
A
uthor Man
uscr
elements as b, and ϕ∘ b denotes the component-wise product of ϕ and b. For instance, if one would expect that the informative drop-out is related to the current level of a biomarker, as in our application, then (ϕ∘ b)TZ2(t) can take the form ϕ̃(b1 + b2t) by taking Z2(t) = (1, t)T, bT
= (b1, b2) and ϕT = (ϕ̃, ϕ̃). We note that each patient’s longitudinal responses and informative
drop-out rate are linked through the unobservable latent factors b as well as the observed common covariates, for example, if X1(t) = X2(t). The amount of variation in the informative
drop-out process due to the latent factors is characterized by ϕ.
In model (2), the transformation function H(·) is assumed to be continuously differentiable and strictly increasing, and is required to be specified in the analysis. For example, H(x) can take the form of the logarithmic transformation,
The choices of η = 0 and η = 1 lead to the proportional hazards model and the proportional odds model, respectively.
Let C be the non-informative drop-out time (e.g., administrative end date) assumed to be independent of {Y(·), T, b} given and , and let V = min(T, C) denote the observed drop-out time. The observed data for the ith subject with ni repeated measurements are denoted by
Oi= {Yi(tij), Vi, Δi, X(u), Z(u); tij ≤ Vi, u ≤ Vi, i = 1, …, n, j = 1, …, ni}, where Δi = I(Ti ≤
Ci) with I(·) being the indicator function. The log-likelihood function for the observed data is
given by
(3)
3. Inference Procedure
We propose a maximum likelihood estimation procedure to estimate the finite dimensional
parameters , the underlying trajectory function α(t), and the baseline cumulative hazard function Λ(t), where Vec(Σb) denotes the vector consisting of the
upper triangular elements of Σb. Specifically, in Section 3.1 we use sieve maximum
likelihood estimation for α(t) (Geman and Hwang, 1982; Shen and Wong, 1994) in which α(t) is approximated by a combination of known basis functions (e.g., cubic B-splines) and unknown sieve coefficients, while in Section 3.2 we use nonparametric maximum likelihood estimation (NPMLE) for Λ (t) by allowing Λ (t) to be any increasing right-continuous function. The transformation H(x) is fixed.
A
uthor Man
uscr
ipt
A
uthor Man
uscr
ipt
A
uthor Man
uscr
ipt
A
uthor Man
uscr
3.1. Sieve Approximation for α(t)
Suppose that subjects are followed up to a fixed time τ. We approximate α(t) in (1) through a finite number of basis functions in a sieve space of t in = [0, τ] as follows:
where { } is a basis function of t with the highest degree less than m, ζk is the
regression coefficient with a fixed knot sequence, and Kn is the number of interior points in
the sieve space. Rigorously speaking, the sieve space for α(t) is defined as
on a finite partition of
where the constants Kn and Mn depend on the data and the sample size (n). The
boundedness condition guarantees the sieve space defined by
Sn(m,Kn,Mn) is a bounded set in a finite dimensional space. Unlike parametric regression,
both the number of knots and the coefficient of the basis function at each knot need to be estimated from the data. In practice, m is usually chosen to be at least 2, which corresponds to a linear function. In particular, we use cubic B-spline functions (m = 4),
and for k = 1, …, (m + Kn). By the properties of B-splines, for a
given t value, only at most m basis functions among { } are nonzero, therefore, α(t) is
approximated by a linear combination of { } on m nearest knot points at any point t. Therefore, conditional on { }, and hence Kn and {sk}, we can use the methodology
that has been developed for the parametric longitudinal data analysis in this nonparametric context. It consequently reduces the computational burden of using nonparametric estimation in both longitudinal and survival components.
For the knot locations {sk}, equally spaced knots are commonly used. For longitudinal
studies with the issue of informative drop-outs, however, we suggest choosing {sk} based on
the observed data. It can prevent numerical problems in {ζk} caused by sparsity in the later
study period. To determine the number of interior knots with the best fit to the data, we use
A
uthor Man
uscr
ipt
A
uthor Man
uscr
ipt
A
uthor Man
uscr
ipt
A
uthor Man
uscr
the AIC- or BIC-based selection procedures. The performance of AIC as a knot selection criterion has previously been investigated by Shibata (1981) and Ding and Wang (2008) among others, but that of BIC has not been studied in the joint modeling context. The proposed knot selection procedure is as follows. For a given Kn, locations of interior knots
are determined as every distinct q100/(Kn+1), the 100/(Kn + 1)th percentile of the observed
longitudinal measurement times. For example, when Kn = 3 and Kn = 9 are considered, {q25,
q50, q75} and {q10, q20, …, q90} are used as the locations of interior knots, respectively. We
next repeatedly fit the joint model using the candidate interior points Kn∈ {2, 3, …}, and
calculate AIC (or BIC). Then, the model with the smallest AIC (or BIC) is considered the best fitting one. In our simulation study, we observed that AIC and BIC selected the same model approximately 42% of the time, and the number of knots selected by AIC was always greater than or equal to that selected by BIC. Moreover, our numerical evaluation was consistent with the results obtained in Huang et al. (2002); the same estimate α̂ can be achieved through different sets of basis functions and their corresponding {ζ̂k}. Further
detailed comparisons of the two selection procedures are provided in Sections 4 and 5.
3.2. Nonparametric Maximum Likelihood Estimation for Λ(t)
Using the NPMLE approach, we treat Λ as a nondecreasing step function with jumps only at the observed failure times and replace λ(t) with the jump size of Λ at t, denoted by Λ{t}, in the likelihood function (3). For commonly used transformation functions such as log-transformation, exp{−H(x)} can be expressed as the Laplace transformation of some
function δ(ξ) for ξ ≥ 0, such that . For example, if we choose a gamma frailty ξ with mean one and variance η, then it holds that H(x) = log(1 + ηx)/η. Applying the Laplace transformation, the observed log-likelihood function (3) can be rewritten as
(4)
where ζ = (ζ1, …, ζKn)T, , and ξ is assumed to be
independent of b.
The most attractive feature about writing the transformation in this way is that the modified log-likelihood (4) can be seen as the proportional hazards frailty model (Kosorok et al., 2004) with the conditional hazard function
A
uthor Man
uscr
ipt
A
uthor Man
uscr
ipt
A
uthor Man
uscr
ipt
A
uthor Man
uscr
This makes the algorithm more stable and computationally efficient. Now, the MLEs can be obtained by maximizing the modified log-likelihood function over Sn(m,Kn,Mn), θ and all
jump sizes of Λ at the observed failure times. Since this maximization involves unobservable variables ξ and b, it can be carried out through the following EM algorithm, treating ξ and b as missing data.
3.3. EM Algorithm
We describe the EM algorithm (Dempster et al., 1977) to compute the MLEs of (θ, ζ, Λ{·}). In the E-step, we calculate conditional expectations of certain functions of (ξ, b) given the observed data Oi, say Ê[ξ gi(b) | Oi]. Hereafter, we omit to write that the expectations are
conditional on the observed data and the current parameter estimates, and abbreviate such expectation Ê[ξ gi(b) | Oi] as Ê[ξ gi(b)]. Computation of this expectation can be simplified
by first obtaining the nested conditional expectation of ξ, given b and Oi. That is, Ê[ξ gi(b)]
can be calculated as Êb[Êξ [ξ | b] gi(b)]. Since the conditional distribution of ξ given b is
proportional to
the conditional expectation of ξ given b has the form of
where . Once Êξ [ξ | b] is calculated, which is a function of b, the conditional expectation Ê[ξ gi(b)] can be computed using numerical
approximation methods such as Gaussian quadrature with Hermite orthogonal polynomials. Note that the conditional distribution of b given Oi is proportional to Γ(Oi| b)f (b;Σb), where
We thus calculate the conditional expectation by
A
uthor Man
uscr
ipt
A
uthor Man
uscr
ipt
A
uthor Man
uscr
ipt
A
uthor Man
uscr
In the M-step, we maximize the expectation of the complete-data log-likelihood function:
Maximizing the above objective function over (ζ, β, , Σb) is as simple as in a classic linear
regression; whereas the rest of the parameters (γ, ϕ, Λ{.}) do not have a closed-form of the maximizers. Using a reliable numerical approach, we solve the following equation for γ:
(5)
and the following equation for ϕ:
(6)
where Rj(t) = I(Vj ≥ t) and q2j(t) = exp{γTX2j(t) + (ϕ∘ b)TZ2j(t)}. In addition, Λ is estimated
as a step function with the following jump size at Vi:
(7)
At each M-step, we update γ and ϕ by solving the equations (5) and (6) through a one-step Newton–Raphson algorithm, and update the jump sizes of Λ by equation (7).
To obtain the MLEs, we iterate the E-step and M-step until the parameter estimates converge. The variances of the MLEs can be estimated from the inverse of the observed information matrix of all parameters (θ, ζ, Λ{·}). The observed information matrix can be computed from the complete data log-likelihood function denoted by for the ith subject using the following Louis formula (Louis, 1982) of
(8)
A
uthor Man
uscr
ipt
A
uthor Man
uscr
ipt
A
uthor Man
uscr
ipt
A
uthor Man
uscr
where u⊗2 = uuT, ∇ and ∇2 are the first and second derivatives with respect to parameters, and Ê is the conditional expectation of a function of b given the observed data.
3.4. Asymptotic Properties
One of the attractive features of the proposed estimator is that its large sample properties can be shown by using techniques from empirical process theory. Let (θ̂, α̂, Λ̂) denote the estimator maximizing (4), and let (θ0, α0, Λ0) denote the true parameter values. Zeng and
Cai (2005) showed the strong consistency and asymptotic normality of θ̂ and Λ̂(·) when α0(t) is constant over time. We relax their full parametric assumptions for Y(t) by adding a
nonparametric functional component into their linear mixed effects models. The key difference in the proof is to find an upper bound of supt∈[0,τ] |α̂(t) − α0(t)|. The upper bound
is constructed based on a linear span of α0(t) into the sieve space Sn(m,Kn,Mn) that consists
of functions with uniformly bounded rth (r ≥ 2) derivatives. Under the mild regularity conditions (A1)–(A11) stated in the Web Appendix A, we can show that if the smoothing parameters satisfy Mn = O(log log n) and Kn = O(nr0) with 1/(4r) < r0 < 1/3, then the MLEs
are uniformly consistent in the sense that ||θ̂ − θ0|| = op(1), supt∈[0,τ] |Λ̂(t) − Λ0(t)| = op(1)
and supt∈[0,τ] |α̂(t) − α0(t)| = op(1), where . From the consistency and the choice
of Mn = O(log log n), we can further show that
. By choosing Kn = O(n1/(4r−1)), for example, we can obtain the -convergence rate of the MLEs. Finally,
it is proved that n1/2(θ̂ − θ0) is asymptotically normal with mean zero and θ̂ is a
semiparametric efficient estimator for θ0. The asymptotic covariance of θ̂ can be estimated
by (8), and its finite sample properties are shown to be good in Section 4. A detailed proof of the asymptotic properties is given in Web Appendix A.
4. Simulation Studies
To assess the performance of the proposed method, we conduct extensive simulation studies under settings mimicking the prostate cancer data. Specifically, using the analysis results in Figure 3 and Table 3 as a basis, we assume that true α(t) = (t + 0.5)−1.5 + (t/2 + 1)3 − 4 and ϕ = 1.7, choose n = 500, and target a similar early drop-out rate as in the real data where 50% and 75% of informative drop-outs occurred within 3 and 5 years, respectively, over the maximum follow-up time of 14 years. For simplicity we include one observed covariate x, baseline PSA level, following a normal distribution with true β = γ = 0.5. Then, the longitudinal outcomes are generated from Y(t) = α(t) + βx + b1 + b2t + ε(t), where
with and (b1, b2) are from a normal distribution with zero means,
, and . The correlation between b1 and b2 is ρ = −0.1. Informative drop-out times are generated from a transformation model taking the form Λ(t | x, b1) = H(exp(γx + ϕb1)Λ(t)).
Various missing mechanisms are explored by varying H(·) and ϕ. We consider two popular models for survival data: the proportional hazards model H(x) = x (when η = 0) and the proportional odds model H(x) = log(1 + x) (when η = 1). Assuming true ϕ = 0 represents missing at random (MAR). True ϕ = 0.5 and ϕ = 1.7 represent missing not at random
A
uthor Man
uscr
ipt
A
uthor Man
uscr
ipt
A
uthor Man
uscr
ipt
A
uthor Man
uscr
(NMAR) with the same and larger effect size than the coefficient of the observed covariate (γ = 0.5). These settings produce about six measurements of Y, on average.
We also investigate the impact of knot selection procedures on α(t) estimation when AIC or BIC is used. We repeatedly fit the joint model using interior points Kn∈ {2, 3, …, 20},
where the locations of knots are determined by every distinct 100/(Kn + 1)th percentile of
the observed measurement times. The best numbers of interior knots by AIC (or BIC), denoted by kAIC (or kBIC), may be different in each data set. In our simulation settings, we
observe that kAIC equals kBIC for 42% of the data sets, and kAIC is always greater than or
equal to kBIC. For comparison purposes, in Table 1 we also report estimation results when
the interior knots are always fixed with Kn = 2, Kn = 4, or Kn = 8.
Simulation results based on 1000 replications are presented in Tables 1 and 2 for various missing mechanisms by different ϕ and H(·), where Bias is the average of the differences between the true parameter and the estimates, SD is the sample standard deviation of the parameter estimator, SEE is the average of the standard error estimates, and CP is the coverage probability of 95% confidence intervals. The confidence intervals of variance components are constructed based on the Satterthwaite approximation.
Table 1 and Web Tables 1–2 report the performance of α(t) estimation in the joint modeling approach, compared with that using the marginal modeling approach (i.e., ignoring the informative drop-out), in terms of Bias, SD, the mean square error (MSE), and the ratio of the MSE for joint estimates to the marginal estimates (MSER). Under MAR (i.e., ϕ = 0), both approaches are similar in Bias and SD. However, under NMAR (i.e., ϕ = 0.5 or 1.7), the marginal approach resulted in a larger Bias but with a similar magnitude of SD, which indicates that the joint modeling approach leads to a more accurate and efficient estimator.
In Table 1, when the number of interior knots (Kn) increases, the SD consistently increases,
whereas the Bias decreases or increases in earlier or later observations times, respectively. The AIC-based knot selection procedure tends to result in smaller biases than the BIC-based knot selection procedure but at the cost of larger variation. It is worth noting that, when AIC- or BIC-selected knots are used, part of SD is attributed to the uncertainty of model selection, whereas such variability does not exist with the fixed knots. In Table 1, we observe that SDs with the AIC- and BIC-selected knots are similar to SDs with the fixed knots, indicating that the major source of variations in estimates was attributed to the magnitude of Kn itself,
rather than the variability in Kn chosen differently by AIC or BIC. This might be the reason
why the coverage probabilities could remain close to the nominal level 95%, even if not accounting for the uncertainty of model selection. Figure 1 shows the estimates of α(t) for all t, using different knot selection procedures and transformations. The average, minimum, and maximum of estimates over 1000 simulated data sets indicate that the proposed estimator α̂(t) behaves well for both transformation models when BIC was used. However, the AIC-estimates appeared to have larger variations in both tails. Therefore, we suggest using the BIC-based knot selection procedure, based on negligibly small biases and consistently smaller SDs in all scenarios we studied.
A
uthor Man
uscr
ipt
A
uthor Man
uscr
ipt
A
uthor Man
uscr
ipt
A
uthor Man
uscr
Table 2 shows good performance of the remaining parameters for both H(x) = x and H(x) = log(1 + x) when BIC-based knot selection procedure was used. That is, the MLEs are unbiased, the standard error estimates calculated using the Louis formula reflect well the true variations in the proposed estimators, and the coverage probabilities are in a reasonable range. When AIC was used as the knot selection criterion, simulation results are very similar to those in Table 2 and hence omitted here (see Web Table 3).
5. Data Application
We illustrate the application of the proposed method using prostate cancer data from the University of Michigan. The data were collected from a total of 503 patients with average age of 69 (range of 34–86) who received planned radiation therapy as the primary treatment method. The objective of this analysis was to identify the trajectory of post-radiation PSA change, while correctly accounting for the informative drop-out either by the start of salvage hormone therapy or by tumor recurrence. A detailed description of the data and possible clinical impact achieved from the study are discussed in Proust-Lima et al. (2008) and Taylor et al. (2013).
In this cohort, 118 patients (23.5%) dropped out, and the PSA level was measured nine times on average within a median follow-up time of 4.5 years. The model to characterize the PSA changes in the presence of informative drop-out was
where Y(t) is the observed values of log(PSA(t) + 0.1) at time t, Λ(t | X, b) is the cumulative hazard function of time to informative drop-out from the end of radiation therapy, and the prognostic factors (denoted as X) are log(baseline PSA+0.1), T-stage, Gleason score, and age at diagnosis. The log-transformation of PSA values and the choice of which covariates to include were based on the findings in Proust-Lima et al. (2008). A subject-specific random intercept and time slope (i.e., b1 + b2t) were included in both longitudinal and survival components to account for the dependence of informative drop-out on the PSA trajectory. The inclusion of b1 + b2t in the survival component implied that the time to informative drop-out was linked to the current PSA level.
To adjust for the informative drop-out mechanism better, we assumed different
transformation models H(x) = log(1 + ηx)/η by varying η values in the range [0, 1]. This was because, when η = 0 or η = 1 can be assumed, the chosen transformation provides useful interpretations for the drop-out process. That is, η = 0 implies that the unit change in a covariate has a linear impact on the log-hazard of dropping-out. The choice of η = 1 implies the data fits better to a model with a linear increment in the log-odds of dropping-out per unit change in a covariate.
To select the best transformation model η and interior knots Kn, model selection approaches
such as AIC and BIC are considered. Under each set of (η, Kn) in η∈ {0, 0.1, …, 1} and Kn
A
uthor Man
uscr
ipt
A
uthor Man
uscr
ipt
A
uthor Man
uscr
ipt
A
uthor Man
uscr
∈ {2, 4, …, 12}, we computed the MLEs for the regression coefficients using the proposed method, and the resulting AIC and BIC values were compared. The search range of Kn was
chosen to cover the upper and lower bounds of Kn satisfying Conditions (A9)–(A11) in Web
Section A. In Figure 2 the smallest BIC value corresponding to (η,Kn) = (1, 5) indicated that
use of five interior knots and the proportional odds model produced the best fit to the data. As a sensitivity analysis, we expanded the range of the transformation parameter to η > 1, leading to the smallest BIC value at η = 1.6 and Kn = 6 (Web Figure 2). Since the decrement
in BIC value was very small, for a better interpretation in practice, we still reported the results from the proportional odds model.
Table 3 summarizes the analysis results under the selected best model. The positive ϕ̂ = 1.730 (P<0.001) indicates that a higher current PSA level was significantly associated with a higher rate of informative drop-out. The rate of informative drop-out statistically
significantly increased with a higher baseline PSA level, T-stage and Gleason score, and for older patients. On the other hand, there was no significant difference in post-radiation PSA level by T-stage and Gleason score, although it was significantly affected by baseline PSA level and patient’s age. We noticed that overall results for time-constant covariates were similar between joint and marginal (i.e., ignoring the informative drop-outs) analyses, whereas the results for the temporal trend in PSA level were different. In Figure 3, circles and dots present the full history of all post-radiation PSA values for patients whose follow-up was informatively and non-informatively dropped out, respectively. The mark “×” on some circles indicates the last observation of PSA before the informative drop-out occurred. Based on the descriptive summary indicated by the circles, dots, and ×’s, it was observed that patients who dropped out informatively had higher PSA scores and shorter follow-up time in general. When we compare the curve for the “Joint Estimate” to that for the “Marginal Estimate” in Figure 3, we can see that the joint modeling approach led to the estimated underlying PSA trajectory curve being lower than the marginal estimate. This might be because the joint modeling is a method to account for the fact that some of the PSA values were observed under some degree of disease progression, which could cause
informative drop-out due to the tumor recurrence or sufficient concern that the patient considered hormone therapy. Moreover, the reduction in the estimate by the joint modeling approach was larger at longer times.
Sensitivity to the inference based on BIC selection was examined by comparing the results by AIC selection. Web Figure 3 shows that the AIC-based procedure selected the model with (η,Kn) = (1, 7), however, the resulting estimates were very similar to those based on the
BIC-based selection procedure and hence omitted here (see Web Figure 4 and Web Table 4).
6. Discussion
We have discussed a method of fitting a partially linear model for longitudinal data with informative drop-out, which can handle a covariate-dependent drop-out mechanism through the transformed survival model. For estimation of the model parameters, we have maximized the likelihood, and the resulting MLEs have been theoretically justified. The proposed joint modeling approach has clearly shown the capability to correct biases induced by ignoring
A
uthor Man
uscr
ipt
A
uthor Man
uscr
ipt
A
uthor Man
uscr
ipt
A
uthor Man
uscr
By exploring a broad class of models for the missing data through varying H(·) and ϕ, the proposed method reduces errors due to the misspecified missing data mechanism to some extent. The proposed methodology, however, does require extensive modeling assumptions, including specification of the drop-out mechanism, the mean and covariance structures, such as linearity in the covariate effects, and a normality assumption of latent variables.
Therefore, sensitivity analysis to these assumptions as well as future efforts in research and development of model checking tools are needed to promote practical application of the proposed joint partially linear model (JPLM). In the JPLM, the distribution of the outcomes after drop-out is nonidentifiable, and thereby we assume that it remains the same as before the drop-out. One simple approach to exploring sensitivity to this untestable assumption is to 1) introduce a sensitivity parameter as the difference between the mean of observed and unobserved responses, and then 2) examine how sensitive the results are over a clinically plausible range of the sensitivity parameter (for more details, see Web Section D). For the assessment of JPLM’s fit to observed data, a graphical inspection tool has been illustrated in Web Section C. With regard to more rigorous model diagnostic procedures, the main difficulty is that some model assumptions are made about unobserved variables, and hence the standard model diagnostics based on the observed data alone are not sufficient. One possible research direction is to adopt the multiple-imputation-based diagnostic method by Rizopoulos et al. (2010). The key idea of Rizopoulos et al. (2010) is to create multiple sets of complete data by resampling missing longitudinal outcomes from the posterior
distribution given the observed data, and apply standard model diagnostics for mixed effects models and survival models with complete data. Since the general framework needed for the multiple imputation procedures in Rizopoulos et al. (2010) has already been established in equation (5) and Sections 3.3 and 3.4, the extension to survival transformation models appears promising. The assumption on the linear effects of covariate can be relaxed by extending to time-varying coefficients models. When a fixed numbers of knots are used or the number of knots can be assumed to the same for all time-varying coefficients, we do not expect any additional technical challenge in the extension. However, further efforts to reduce the computational burden are needed for selecting different numbers of knots for each covariate.
There are a few other ways in which we can extend our proposal. In this article, we assume that the observation times of the biomarkers are independent of the level of biomarkers. However, in some cases where biomarkers are observed at hospitalizations or whenever clinicians may suspect some progress of the diseases for example, our JPLM can be extended to accommodate the informative observation process by jointly modeling the additional component. The AIC and BIC were considered to determine both the best transformation and the selection of the number of knots, but we can also explore and compare the validity of other resampling-based model selection criteria such as cross-validation in the future. Lastly, there is no theoretical justification for any of these model selection procedures, and the proposed inference does not account for the uncertainty of model selection. Post-model selection inference methods under the joint modeling setting are worth pursuing.
A
uthor Man
uscr
ipt
A
uthor Man
uscr
ipt
A
uthor Man
uscr
ipt
A
uthor Man
uscr
Supplementary Material
Refer to Web version on PubMed Central for supplementary material.
References
Beunckens C, Molenberghs G, Verbeke G, Mallinckrodt C. A latent-class mixture model for incomplete longitudinal Gaussian data. Biometrics. 2008; 64:96–105. [PubMed: 17608789] Cai N, Lu W, Zhang HH. Time-varying latent effect model for longitudinal data with informative
observation times. Biometrics. 2012; 68:1093–1102. [PubMed: 23025338]
Dabrowska DM, Doksum KA. Partial likelihood in transformation models with censored data. Scandinavian Journal of Statistics. 1988; 15:1–23.
Ding J, Wang JL. Modeling longitudinal data with nonparametric multiplicative random effects jointly with survival data. Biometrics. 2008; 64:546–556. [PubMed: 17888040]
Geman S, Hwang CR. Nonparametric maximum likelihood estimation by the method of sieves. Annals of Statistics. 1982; 10:401–414.
Hogan JW, Laird NM. Model-based approaches to analysing incomplete longitudinal and failure time data. Statistics in Medicine. 1997; 16:259–272. [PubMed: 9004396]
Huang JZ, Wu CO, Zhou L. Varying-coefficient models and basis function approximations for the analysis of repeated measurements. Biometrika. 2002; 89:111–128.
Kim S, Zeng D, Chambless L, Li Y. Joint models of longitudinal data and recurrent events with informative terminal event. Statistics in Biosciences. 2012; 4:262–281. [PubMed: 23227131] Kosorok MR, Lee BL, Fine JP. Robust inference for univariate proportional hazards frailty regression
models. Annals of Statistics. 2004; 32:1448–1491.
Lin D, Ying Z. Semiparametric and nonparametric regression analysis of longitudinal data. Journal of the American Statistical Association. 2001; 96:103–126.
Lin X, Carroll R. Semiparametric regression for clustered data using generalized estimating equations. Journal of the American Statistical Association. 2001; 96:1045–1056.
Lu T, Huang Y. Bayesian inference on mixed-effects varying-coefficient joint models with skew-t distribution for longitudinal data with multiple features. Statistical Methods in Medical Research. 2015; page Advance online publication. doi: 10.1177/0962280215569294
Muthén B, Asparouhov T, Hunter AM, Leuchter AF. Growth modeling with nonignorable dropout: Alternative analyses of the star*d antidepressant trial. Psychological Methods. 2011; 16:17–33. [PubMed: 21381817]
Proust-Lima C, Taylor JMG, Williams SG, Ankerst DP, Liu N, Kestin LL, et al. Determinants of change in prostate-specific antigen over time and its association with recurrence after external beam radiation therapy for prostate cancer in five large cohorts. International Journal of Radiation Oncology* Biology* Physics. 2008; 72:782–791.
Rizopoulos D, Verbeke G, Molenberghs G. Multiple-imputation-based residuals and diagnostic plots for joint models of longitudinal and survival outcomes. Biometrics. 2010; 66:20–29. [PubMed: 19459832]
Roy J. Modeling longitudinal data with nonignorable dropouts using a latent dropout class model. Biometrics. 2003; 59:829–836. [PubMed: 14969461]
Roy J, Lin X. Missing covariates in longitudinal data with informative dropouts: Bias analysis and inference. Biometrics. 2005; 61:837–846. [PubMed: 16135036]
Shen X, Wong WH. Convergence rate of sieve estimates. Annals of Statistics. 1994; 22:580–615. Shibata R. An optimal selection of regression variables. Biometrika. 1981; 68:45–54.
Taylor JMG, Park Y, Ankerst DP, Proust-Lima C, Williams S, Kestin L, et al. Real-time individual predictions of prostate cancer recurrence using joint models. Biometrics. 2013; 69:206–213. [PubMed: 23379600]
Zagars GK, Pollack A, Kavadi VS, von Eschenbach AC. Prostate-specific antigen and radiation
A
uthor Man
uscr
ipt
A
uthor Man
uscr
ipt
A
uthor Man
uscr
ipt
A
uthor Man
uscr
Zeger S, Diggle P. Semiparametric models for longitudinal data with application to CD4 cell numbers in HIV seroconverters. Biometrics. 1994; 50:689–699. [PubMed: 7981395]
Zeng D, Cai J. Asymptotic results for maximum likelihood estimators in joint analysis of repeated measurements and survival time. Annals of Statistics. 2005; 33:2132–2163.
Zhang D, Lin X, Raz J, Sowers M. Semiparametric stochastic mixed models for longitudinal data. Journal of the American Statistical Association. 1998; 93:710–719.
A
uthor Man
uscr
ipt
A
uthor Man
uscr
ipt
A
uthor Man
uscr
ipt
A
uthor Man
uscr
Figure 1.
Simulation results for α(t) estimation when ϕ = 1.7. The solid curve indicates true α(t) = (t + 0.5)−1.5 + (t/2 + 1)3 − 4. The dash-dotted (dotted) curves indicate the maximum, average, and minimum of the estimated α(t) over 1000 simulated data sets when the AIC-based (BIC-based) knot selection procedure was used. The AIC-estimates appeared to have larger variations in both tails than the BIC-estimates. This figure appears in color in the electronic version of this article.
A
uthor Man
uscr
ipt
A
uthor Man
uscr
ipt
A
uthor Man
uscr
ipt
A
uthor Man
uscr
Figure 2.
Bayesian information criterion (BIC) plotted for different transformations H(x) = log(1 + ηx)/η and different numbers of interior knots (Kn).
A
uthor Man
uscr
ipt
A
uthor Man
uscr
ipt
A
uthor Man
uscr
ipt
A
uthor Man
uscr
Figure 3.
Coefficient function of log PSA score, adjusted by T-stage, Gleason score, and age, under the best fit of transformation H(x) = log(1 + x) and 5 interior points. The solid (dashed) curve is an estimate from the joint (marginal) model. The circles and dots present the full history of all post-radiation PSA values for patients who dropped out informatively and non-informatively, respectively. The mark “×” on some circles indicates the last observation of PSA before the informative drop-out occurred.
A
uthor Man
uscr
ipt
A
uthor Man
uscr
ipt
A
uthor Man
uscr
ipt
A
uthor Man
uscr
A
uthor Man
uscr
ipt
A
uthor Man
uscr
ipt
A
uthor Man
uscr
ipt
A
uthor Man
uscr
ipt
T ab le 1Simulation results for true
ϕ
= 1.7 (i.e., missing not at random) with
Kn ∈ { kAIC , kBIC
, 2, 4, 8} interior knots of B-spline approximation. T
rue v
alues are
α
(
τ20
) = −1.10,
α
(
τ40
) = −0.58, and
α
(
τ80
) = 2.16, where
τp
represents p% of study duration
A
uthor Man
uscr
ipt
A
uthor Man
uscr
ipt
A
uthor Man
uscr
ipt
A
uthor Man
uscr
ipt
T ab le 2Simulation results by v
arying
ϕ
and
H
(·) when the interior knots of B-spline approximation are selected by BIC (i.e.,
Kn
=
kBIC
).
τp
represents p% of
study duration τ . P arameter T ar get H (x ) = x H (x
) = log(1 +
A
uthor Man
uscr
ipt
A
uthor Man
uscr
ipt
A
uthor Man
uscr
ipt
A
uthor Man
P arameter T ar get H (x ) = x H (x) = log(1 +
A
uthor Man
uscr
ipt
A
uthor Man
uscr
ipt
A
uthor Man
uscr
ipt
A
uthor Man
uscr
ipt
T ab le 3Joint analysis results of the prostate cancer data under the best f
it of transformation
H
(x
) = log(1 +
x
) and f
iv
e interior knots selected by BIC. The 50:50
mixture of
χ
2 distrib
utions is used for testing v
ariances. Reference groups for cate
gorical co
v
ariates are T
-stage=1, Gleason score between 2 and 6, and
age<65.
τp
represents p% of study duration
τ . Effect J oint model Mar ginal model Est SE P-v alue Est SE P-v alue
Longitudinal PSA score
α ( τ20 ) 0.340 0.087 .0001 0.553 0.092 <.0001 α ( τ40 ) 0.963 0.135 <.0001 1.368 0.141 <.0001 α ( τ60 ) 1.792 0.195 <.0001 2.388 0.201 <.0001 α ( τ80 ) 2.637 0.262 <.0001 3.424 0.269 <.0001 log(baseline PSA+0.1) 0.510 0.034 <.0001 0.509 0.034 <.0001 T -stage 2 −0.058 0.064 .3617 −0.045 0.064 .4828 T
-stage 3 or 4
0.018 0.117 .8770 0.027 0.117 .8171
Gleason score 7 to 9
0.055 0.060 .3592 0.040 0.060 .4942
Age 65–75 years
−0.190 0.069 .0056 −0.201 0.069 .0037
Age > 75 years
−0.115 0.088 .1882 −0.130 0.088 .1408 0.117 0.003 <.0001 0.118 0.003 <.0001 Random ef fects 0.373 0.028 <.0001 0.371 0.027 <.0001 0.206 0.018 <.0001 0.195 0.017 <.0001 ρ −0.121 0.055 .0281 −0.114 0.054 .0359 Informati v e drop-out log(baseline PSA+0.1) 0.532 0.161 .0010 T -stage 2 1.193 0.320 .0002 T
-stage 3 or 4
1.815
0.467
.0001
Gleason score 7–9
A
uthor Man
uscr
ipt
A
uthor Man
uscr
ipt
A
uthor Man
uscr
ipt
A
uthor Man
Effect
J
oint model
Mar
ginal model
Est
SE
P-v
alue
Est
SE
P-v
alue
Age > 75 years
−1.543
0.471
.0011
ϕ
1.730
0.134