PubMedCentral-PMC5525063.pdf

(1)

Joint Partially Linear Model for Longitudinal Data with

Informative Drop-Outs

Sehee Kim1,*, Donglin Zeng2, and Jeremy M. G. Taylor1

1_{Department of Biostatistics, University of Michigan, Ann Arbor, Michigan 48109, U.S.A} 2_{Department of Biostatistics, University of North Carolina, Chapel Hill, North Carolina 27599,}

U.S.A

Summary

In biomedical research, a steep rise or decline in longitudinal biomarkers may indicate latent disease progression, which may subsequently cause patients to drop out of the study. Ignoring the informative drop-out can cause bias in estimation of the longitudinal model. In such cases, a full parametric specification may be insufficient to capture the complicated pattern of the longitudinal biomarkers. For these types of longitudinal data with the issue of informative drop-outs, we develop a joint partially linear model, with an aim to find the trajectory of the longitudinal biomarker. Specifically, an arbitrary function of time along with linear fixed and random covariate effects is proposed in the model for the biomarker, while a flexible semiparametric transformation model is used to describe the drop-out mechanism. Advantages of this semiparametric joint modeling approach are the following: 1) it provides an easier interpretation, compared to standard nonparametric regression models, and 2) it is a natural way to control for common (observable and unobservable) prognostic factors that may affect both the longitudinal trajectory and the drop-out process. We describe a sieve maximum likelihood estimation procedure using the EM algorithm, where the Akaike information criterion (AIC) and Bayesian information criterion (BIC) are considered to select the number of knots. We show that the proposed estimators achieve desirable asymptotic properties through empirical process theory. The proposed methods are evaluated by simulation studies and applied to prostate cancer data.

Keywords

Joint models; Longitudinal data; Nonparametric maximum likelihood; Partially linear model; Random effects; Sieve maximum likelihood; Transformation models

1. Introduction

In prostate cancer studies, Prostate-specific Antigen (PSA) has been widely used to make clinical decisions. Higher or rising PSA patterns after treatment are related to an increased

*_{[email protected].}

7. Supplementary Materials

HHS Public Access

Author manuscript

Biometrics

. Author manuscript; available in PMC 2017 July 25.

Published in final edited form as:

Biometrics. 2017 March ; 73(1): 72–82. doi:10.1111/biom.12566.

A

uthor Man

uscr

ipt

A

uthor Man

uscr

ipt

A

uthor Man

uscr

ipt

A

uthor Man

uscr

(2)

risk of prostate cancer recurrence. One important scientific goal is, therefore, to identify the PSA trajectory among patients who are treated with radiation therapy for localized prostate cancer (Proust-Lima et al., 2008). In cases where the treatment has been successful, PSA levels are expected to drop over the first post-radiation year and then remain stable at low levels (Zagars et al., 1995). In the other cases, however, PSA values after treatment declined for a short period of time and then might rise again at later times, for example, as illustrated in Web Figure 1. Hence, it is desirable to develop a flexible model to capture this nonlinear temporal trend of PSA levels. One key challenge here is that the follow-up of PSA stopped when salvage hormone therapy was initiated, which is known to change the PSA level or when prostate cancer recurred, resulting in possibly informative drop-out which can lead to bias for PSA trajectory estimation if not accounted for properly.

Motivated by the prostate cancer data, we propose a method that flexibly models longitudinal trajectories of biomarkers using a partially linear model, while taking informative drop-outs into account. We treat the informative drop-out as an event which makes subsequent PSA measurements missing, and these unobserved measures are related to the drop-out process via subject-specific random effects. Advantages of this semiparametric joint modeling approach are 1) it provides an easier interpretation of the covariate effects, compared to standard nonparametric regression models, and 2) it is a natural way to control for common (observable and unobservable) prognostic factors that may affect both the longitudinal trajectory and the informative drop-out.

Without considering complications due to the informative drop-out, several authors have studied partially linear models for longitudinal/clustered data (Zeger and Diggle, 1994; Zhang et al., 1998; Lin and Carroll, 2001; Lin and Ying, 2001). Other related work about joint models with time-varying coefficients, but not considering informative drop-outs, includes Cai et al. (2012) and Lu and Huang (2015). In longitudinal studies, the problem of informative drop-out has received enormous attention. Hogan and Laird (1997) provided an excellent review of model-based approaches to handling incomplete longitudinal

measurements, where most of the existing methods are based on the full parametric

specification of the longitudinal model and dependence structure of the drop-out mechanism. We refer the readers to Hogan and Laird (1997) for a more detailed explanation and

discussion of the limitations of these parametric models. The conventional pattern-mixture and selection models reviewed by them were extended by Roy (2003) and Beunckens et al. (2008), who allowed multiple latent subgroups of subjects (i.e., joint latent class models). The assumption on latent subgroups has been relaxed by Muthén et al. (2011) so that a subject’s subgroup can differ for drop-out and outcomes in their pattern-mixture and selection models. However, all these methods assume some parametric or even linear trends for the longitudinal trajectories, therefore not appropriate for our application.

Roy and Lin (2005) considered longitudinal data with missing time-varying covariates in addition to informative drop-outs. They used a generalized linear mixed model for

continuous or binary longitudinal outcomes, and used a transition model to estimate missing time-varying covariates. For informative drop-out, they used a conventional selection model

A

uthor Man

uscr

ipt

A

uthor Man

uscr

ipt

A

uthor Man

uscr

ipt

A

uthor Man

uscr

(3)

random variable, the proportional hazards model is commonly assumed to characterize the drop-out process, however, it may not be appropriate in some cases. For example, patients receiving more aggressive treatment may suffer elevated risk in the beginning but may benefit in the long term if they tolerate the treatment. Considering a general class of transformations for such cases will lead to a better model fit. The transformation models for a single survival time and recurrent event times have been extensively studied, dating back to Dabrowska and Doksum (1988) and more recently by Kosorok et al. (2004) and Kim et al. (2012).

In this article, we propose a joint partially linear model for longitudinal data with the issue of informative drop-out, where the informative drop-out is allowed to be dependent on covariates. By applying general transformation models for informative drop-out, we do not restrict to the proportional hazards cases. The underlying trajectory is estimated using a B-spline approach. A key advantage of a B-B-spline approach is the computational simplicity of using a small number of knots and a parametric regression-type implementation via the sieve approximation, which counterbalances the model complexity in the joint modeling

approach. The knot selection procedures based on AIC and BIC are described in Section 3 and numerically evaluated in Sections 4–5. Through our flexible, but readily interpretable, modeling approach, we can detect significant changes in the trajectory, which may not be found using linear mixed effects models due to their parametric model constraints, while correcting biases caused by the informative drop-outs.

2. Joint Partially Linear Model (JPLM)

Let Y(t) be the longitudinal response at time t, and let T be the informative drop-out time. We define = {X(t); t ≥ 0} and = {Z(t); t ≥ 0} as the covariate processes of fixed and random effects, respectively. The vectors of external covariates X(t) and Z(t) are possibly time-varying, and their subsets are denoted as Xk(t) and Zk(t) (k = 1, 2) in models (1) and

(2). We consider a partially linear model for Y(t)

(1)

and a transformed Cox model for T with the cumulative hazard function

(2)

where α(t) is the unspecified underlying trajectory, β and γ are the vectors of unknown regression coefficients, Λ(·) is an unspecified non-decreasing function, and ε(t) is a white

noise process with variance . To account for the correlation between Y(t) and T, we introduce a common latent variable b, following a (multivariate) normal distribution with mean zero and covariance matrix Σb. We assume Y(t) and T are independent, conditional on

, , and b. In model (2), ϕ is a set of unknown constants with the same number of

A

uthor Man

uscr

ipt

A

uthor Man

uscr

ipt

A

uthor Man

uscr

ipt

A

uthor Man

uscr

(4)

elements as b, and ϕ∘ b denotes the component-wise product of ϕ and b. For instance, if one would expect that the informative drop-out is related to the current level of a biomarker, as in our application, then (ϕ∘ b)TZ2(t) can take the form ϕ̃(b1 + b2t) by taking Z2(t) = (1, t)T, bT

= (b1, b2) and ϕT = (ϕ̃, ϕ̃). We note that each patient’s longitudinal responses and informative

drop-out rate are linked through the unobservable latent factors b as well as the observed common covariates, for example, if X1(t) = X2(t). The amount of variation in the informative

drop-out process due to the latent factors is characterized by ϕ.

In model (2), the transformation function H(·) is assumed to be continuously differentiable and strictly increasing, and is required to be specified in the analysis. For example, H(x) can take the form of the logarithmic transformation,

The choices of η = 0 and η = 1 lead to the proportional hazards model and the proportional odds model, respectively.

Let C be the non-informative drop-out time (e.g., administrative end date) assumed to be independent of {Y(·), T, b} given and , and let V = min(T, C) denote the observed drop-out time. The observed data for the ith subject with ni repeated measurements are denoted by

Oi= {Yi(tij), Vi, Δi, X(u), Z(u); tij ≤ Vi, u ≤ Vi, i = 1, …, n, j = 1, …, ni}, where Δi = I(Ti ≤

Ci) with I(·) being the indicator function. The log-likelihood function for the observed data is

given by

(3)

3. Inference Procedure

We propose a maximum likelihood estimation procedure to estimate the finite dimensional

parameters , the underlying trajectory function α(t), and the baseline cumulative hazard function Λ(t), where Vec(Σb) denotes the vector consisting of the

upper triangular elements of Σb. Specifically, in Section 3.1 we use sieve maximum

likelihood estimation for α(t) (Geman and Hwang, 1982; Shen and Wong, 1994) in which α(t) is approximated by a combination of known basis functions (e.g., cubic B-splines) and unknown sieve coefficients, while in Section 3.2 we use nonparametric maximum likelihood estimation (NPMLE) for Λ (t) by allowing Λ (t) to be any increasing right-continuous function. The transformation H(x) is fixed.

A

uthor Man

uscr

ipt

A

uthor Man

uscr

ipt

A

uthor Man

uscr

ipt

A

uthor Man

uscr

(5)

3.1. Sieve Approximation for α(t)

Suppose that subjects are followed up to a fixed time τ. We approximate α(t) in (1) through a finite number of basis functions in a sieve space of t in = [0, τ] as follows:

where { } is a basis function of t with the highest degree less than m, ζk is the

regression coefficient with a fixed knot sequence, and Kn is the number of interior points in

the sieve space. Rigorously speaking, the sieve space for α(t) is defined as

on a finite partition of

where the constants Kn and Mn depend on the data and the sample size (n). The

boundedness condition guarantees the sieve space defined by

Sn(m,Kn,Mn) is a bounded set in a finite dimensional space. Unlike parametric regression,

both the number of knots and the coefficient of the basis function at each knot need to be estimated from the data. In practice, m is usually chosen to be at least 2, which corresponds to a linear function. In particular, we use cubic B-spline functions (m = 4),

and for k = 1, …, (m + Kn). By the properties of B-splines, for a

given t value, only at most m basis functions among { } are nonzero, therefore, α(t) is

approximated by a linear combination of { } on m nearest knot points at any point t. Therefore, conditional on { }, and hence Kn and {sk}, we can use the methodology

that has been developed for the parametric longitudinal data analysis in this nonparametric context. It consequently reduces the computational burden of using nonparametric estimation in both longitudinal and survival components.

For the knot locations {sk}, equally spaced knots are commonly used. For longitudinal

studies with the issue of informative drop-outs, however, we suggest choosing {sk} based on

the observed data. It can prevent numerical problems in {ζk} caused by sparsity in the later

study period. To determine the number of interior knots with the best fit to the data, we use

A

uthor Man

uscr

ipt

A

uthor Man

uscr

ipt

A

uthor Man

uscr

ipt

A

uthor Man

uscr

(6)

the AIC- or BIC-based selection procedures. The performance of AIC as a knot selection criterion has previously been investigated by Shibata (1981) and Ding and Wang (2008) among others, but that of BIC has not been studied in the joint modeling context. The proposed knot selection procedure is as follows. For a given Kn, locations of interior knots

are determined as every distinct q100/(_Kn+1), the 100/(Kn + 1)th percentile of the observed

longitudinal measurement times. For example, when Kn = 3 and Kn = 9 are considered, {q25,

q50, q75} and {q10, q20, …, q90} are used as the locations of interior knots, respectively. We

next repeatedly fit the joint model using the candidate interior points Kn∈ {2, 3, …}, and

calculate AIC (or BIC). Then, the model with the smallest AIC (or BIC) is considered the best fitting one. In our simulation study, we observed that AIC and BIC selected the same model approximately 42% of the time, and the number of knots selected by AIC was always greater than or equal to that selected by BIC. Moreover, our numerical evaluation was consistent with the results obtained in Huang et al. (2002); the same estimate α̂ can be achieved through different sets of basis functions and their corresponding {ζ̂k}. Further

detailed comparisons of the two selection procedures are provided in Sections 4 and 5.

3.2. Nonparametric Maximum Likelihood Estimation for Λ(t)

Using the NPMLE approach, we treat Λ as a nondecreasing step function with jumps only at the observed failure times and replace λ(t) with the jump size of Λ at t, denoted by Λ{t}, in the likelihood function (3). For commonly used transformation functions such as log-transformation, exp{−H(x)} can be expressed as the Laplace transformation of some

function δ(ξ) for ξ ≥ 0, such that . For example, if we choose a gamma frailty ξ with mean one and variance η, then it holds that H(x) = log(1 + ηx)/η. Applying the Laplace transformation, the observed log-likelihood function (3) can be rewritten as

(4)

where ζ = (ζ1, …, ζ_Kn)T, , and ξ is assumed to be

independent of b.

The most attractive feature about writing the transformation in this way is that the modified log-likelihood (4) can be seen as the proportional hazards frailty model (Kosorok et al., 2004) with the conditional hazard function

A

uthor Man

uscr

ipt

A

uthor Man

uscr

ipt

A

uthor Man

uscr

ipt

A

uthor Man

uscr

(7)

This makes the algorithm more stable and computationally efficient. Now, the MLEs can be obtained by maximizing the modified log-likelihood function over Sn(m,Kn,Mn), θ and all

jump sizes of Λ at the observed failure times. Since this maximization involves unobservable variables ξ and b, it can be carried out through the following EM algorithm, treating ξ and b as missing data.

3.3. EM Algorithm

We describe the EM algorithm (Dempster et al., 1977) to compute the MLEs of (θ, ζ, Λ{·}). In the E-step, we calculate conditional expectations of certain functions of (ξ, b) given the observed data Oi, say Ê[ξ gi(b) | Oi]. Hereafter, we omit to write that the expectations are

conditional on the observed data and the current parameter estimates, and abbreviate such expectation Ê[ξ gi(b) | Oi] as Ê[ξ gi(b)]. Computation of this expectation can be simplified

by first obtaining the nested conditional expectation of ξ, given b and Oi. That is, Ê[ξ gi(b)]

can be calculated as Êb[Êξ [ξ | b] gi(b)]. Since the conditional distribution of ξ given b is

proportional to

the conditional expectation of ξ given b has the form of

where . Once Ê_ξ [ξ | b] is calculated, which is a function of b, the conditional expectation Ê[ξ gi(b)] can be computed using numerical

approximation methods such as Gaussian quadrature with Hermite orthogonal polynomials. Note that the conditional distribution of b given Oi is proportional to Γ(Oi| b)f (b;Σb), where

We thus calculate the conditional expectation by

A

uthor Man

uscr

ipt

A

uthor Man

uscr

ipt

A

uthor Man

uscr

ipt

A

uthor Man

uscr

(8)

In the M-step, we maximize the expectation of the complete-data log-likelihood function:

Maximizing the above objective function over (ζ, β, , Σb) is as simple as in a classic linear

regression; whereas the rest of the parameters (γ, ϕ, Λ{.}) do not have a closed-form of the maximizers. Using a reliable numerical approach, we solve the following equation for γ:

(5)

and the following equation for ϕ:

(6)

where Rj(t) = I(Vj ≥ t) and q2j(t) = exp{γTX2j(t) + (ϕ∘ b)TZ2j(t)}. In addition, Λ is estimated

as a step function with the following jump size at Vi:

(7)

At each M-step, we update γ and ϕ by solving the equations (5) and (6) through a one-step Newton–Raphson algorithm, and update the jump sizes of Λ by equation (7).

To obtain the MLEs, we iterate the E-step and M-step until the parameter estimates converge. The variances of the MLEs can be estimated from the inverse of the observed information matrix of all parameters (θ, ζ, Λ{·}). The observed information matrix can be computed from the complete data log-likelihood function denoted by for the ith subject using the following Louis formula (Louis, 1982) of

(8)

A

uthor Man

uscr

ipt

A

uthor Man

uscr

ipt

A

uthor Man

uscr

ipt

A

uthor Man

uscr

(9)

where u⊗2 = uuT, ∇ and ∇2 are the first and second derivatives with respect to parameters, and Ê is the conditional expectation of a function of b given the observed data.

3.4. Asymptotic Properties

One of the attractive features of the proposed estimator is that its large sample properties can be shown by using techniques from empirical process theory. Let (θ̂, α̂, Λ̂) denote the estimator maximizing (4), and let (θ0, α0, Λ0) denote the true parameter values. Zeng and

Cai (2005) showed the strong consistency and asymptotic normality of θ̂ and Λ̂(·) when α0(t) is constant over time. We relax their full parametric assumptions for Y(t) by adding a

nonparametric functional component into their linear mixed effects models. The key difference in the proof is to find an upper bound of supt∈[0,τ] |α̂(t) − α0(t)|. The upper bound

is constructed based on a linear span of α0(t) into the sieve space Sn(m,Kn,Mn) that consists

of functions with uniformly bounded rth (r ≥ 2) derivatives. Under the mild regularity conditions (A1)–(A11) stated in the Web Appendix A, we can show that if the smoothing parameters satisfy Mn = O(log log n) and Kn = O(nr0) with 1/(4r) < r0 < 1/3, then the MLEs

are uniformly consistent in the sense that ||θ̂ − θ0|| = op(1), supt∈[0,τ] |Λ̂(t) − Λ0(t)| = op(1)

and supt∈[0,τ] |α̂(t) − α0(t)| = op(1), where . From the consistency and the choice

of Mn = O(log log n), we can further show that

. By choosing Kn = O(n1/(4r−1)), for example, we can obtain the -convergence rate of the MLEs. Finally,

it is proved that n1/2(θ̂ − θ0) is asymptotically normal with mean zero and θ̂ is a

semiparametric efficient estimator for θ0. The asymptotic covariance of θ̂ can be estimated

by (8), and its finite sample properties are shown to be good in Section 4. A detailed proof of the asymptotic properties is given in Web Appendix A.

4. Simulation Studies

To assess the performance of the proposed method, we conduct extensive simulation studies under settings mimicking the prostate cancer data. Specifically, using the analysis results in Figure 3 and Table 3 as a basis, we assume that true α(t) = (t + 0.5)−1.5 + (t/2 + 1)3 − 4 and ϕ = 1.7, choose n = 500, and target a similar early drop-out rate as in the real data where 50% and 75% of informative drop-outs occurred within 3 and 5 years, respectively, over the maximum follow-up time of 14 years. For simplicity we include one observed covariate x, baseline PSA level, following a normal distribution with true β = γ = 0.5. Then, the longitudinal outcomes are generated from Y(t) = α(t) + βx + b1 + b2t + ε(t), where

with and (b₁, b₂) are from a normal distribution with zero means,

, and . The correlation between b₁ and b₂ is ρ = −0.1. Informative drop-out times are generated from a transformation model taking the form Λ(t | x, b₁) = H(exp(γx + ϕb₁)Λ(t)).

Various missing mechanisms are explored by varying H(·) and ϕ. We consider two popular models for survival data: the proportional hazards model H(x) = x (when η = 0) and the proportional odds model H(x) = log(1 + x) (when η = 1). Assuming true ϕ = 0 represents missing at random (MAR). True ϕ = 0.5 and ϕ = 1.7 represent missing not at random

A

uthor Man

uscr

ipt

A

uthor Man

uscr

ipt

A

uthor Man

uscr

ipt

A

uthor Man

uscr

(10)

(NMAR) with the same and larger effect size than the coefficient of the observed covariate (γ = 0.5). These settings produce about six measurements of Y, on average.

We also investigate the impact of knot selection procedures on α(t) estimation when AIC or BIC is used. We repeatedly fit the joint model using interior points Kn∈ {2, 3, …, 20},

where the locations of knots are determined by every distinct 100/(Kn + 1)th percentile of

the observed measurement times. The best numbers of interior knots by AIC (or BIC), denoted by kAIC (or kBIC), may be different in each data set. In our simulation settings, we

observe that kAIC equals kBIC for 42% of the data sets, and kAIC is always greater than or

equal to kBIC. For comparison purposes, in Table 1 we also report estimation results when

the interior knots are always fixed with Kn = 2, Kn = 4, or Kn = 8.

Simulation results based on 1000 replications are presented in Tables 1 and 2 for various missing mechanisms by different ϕ and H(·), where Bias is the average of the differences between the true parameter and the estimates, SD is the sample standard deviation of the parameter estimator, SEE is the average of the standard error estimates, and CP is the coverage probability of 95% confidence intervals. The confidence intervals of variance components are constructed based on the Satterthwaite approximation.

Table 1 and Web Tables 1–2 report the performance of α(t) estimation in the joint modeling approach, compared with that using the marginal modeling approach (i.e., ignoring the informative drop-out), in terms of Bias, SD, the mean square error (MSE), and the ratio of the MSE for joint estimates to the marginal estimates (MSER). Under MAR (i.e., ϕ = 0), both approaches are similar in Bias and SD. However, under NMAR (i.e., ϕ = 0.5 or 1.7), the marginal approach resulted in a larger Bias but with a similar magnitude of SD, which indicates that the joint modeling approach leads to a more accurate and efficient estimator.

In Table 1, when the number of interior knots (Kn) increases, the SD consistently increases,

whereas the Bias decreases or increases in earlier or later observations times, respectively. The AIC-based knot selection procedure tends to result in smaller biases than the BIC-based knot selection procedure but at the cost of larger variation. It is worth noting that, when AIC- or BIC-selected knots are used, part of SD is attributed to the uncertainty of model selection, whereas such variability does not exist with the fixed knots. In Table 1, we observe that SDs with the AIC- and BIC-selected knots are similar to SDs with the fixed knots, indicating that the major source of variations in estimates was attributed to the magnitude of Kn itself,

rather than the variability in Kn chosen differently by AIC or BIC. This might be the reason

why the coverage probabilities could remain close to the nominal level 95%, even if not accounting for the uncertainty of model selection. Figure 1 shows the estimates of α(t) for all t, using different knot selection procedures and transformations. The average, minimum, and maximum of estimates over 1000 simulated data sets indicate that the proposed estimator α̂(t) behaves well for both transformation models when BIC was used. However, the AIC-estimates appeared to have larger variations in both tails. Therefore, we suggest using the BIC-based knot selection procedure, based on negligibly small biases and consistently smaller SDs in all scenarios we studied.

A

uthor Man

uscr

ipt

A

uthor Man

uscr

ipt

A

uthor Man

uscr

ipt

A

uthor Man

uscr

(11)

Table 2 shows good performance of the remaining parameters for both H(x) = x and H(x) = log(1 + x) when BIC-based knot selection procedure was used. That is, the MLEs are unbiased, the standard error estimates calculated using the Louis formula reflect well the true variations in the proposed estimators, and the coverage probabilities are in a reasonable range. When AIC was used as the knot selection criterion, simulation results are very similar to those in Table 2 and hence omitted here (see Web Table 3).

5. Data Application

We illustrate the application of the proposed method using prostate cancer data from the University of Michigan. The data were collected from a total of 503 patients with average age of 69 (range of 34–86) who received planned radiation therapy as the primary treatment method. The objective of this analysis was to identify the trajectory of post-radiation PSA change, while correctly accounting for the informative drop-out either by the start of salvage hormone therapy or by tumor recurrence. A detailed description of the data and possible clinical impact achieved from the study are discussed in Proust-Lima et al. (2008) and Taylor et al. (2013).

In this cohort, 118 patients (23.5%) dropped out, and the PSA level was measured nine times on average within a median follow-up time of 4.5 years. The model to characterize the PSA changes in the presence of informative drop-out was

where Y(t) is the observed values of log(PSA(t) + 0.1) at time t, Λ(t | X, b) is the cumulative hazard function of time to informative drop-out from the end of radiation therapy, and the prognostic factors (denoted as X) are log(baseline PSA+0.1), T-stage, Gleason score, and age at diagnosis. The log-transformation of PSA values and the choice of which covariates to include were based on the findings in Proust-Lima et al. (2008). A subject-specific random intercept and time slope (i.e., b₁ + b₂t) were included in both longitudinal and survival components to account for the dependence of informative drop-out on the PSA trajectory. The inclusion of b₁ + b₂t in the survival component implied that the time to informative drop-out was linked to the current PSA level.

To adjust for the informative drop-out mechanism better, we assumed different

transformation models H(x) = log(1 + ηx)/η by varying η values in the range [0, 1]. This was because, when η = 0 or η = 1 can be assumed, the chosen transformation provides useful interpretations for the drop-out process. That is, η = 0 implies that the unit change in a covariate has a linear impact on the log-hazard of dropping-out. The choice of η = 1 implies the data fits better to a model with a linear increment in the log-odds of dropping-out per unit change in a covariate.

To select the best transformation model η and interior knots Kn, model selection approaches

such as AIC and BIC are considered. Under each set of (η, Kn) in η∈ {0, 0.1, …, 1} and Kn

A

uthor Man

uscr

ipt

A

uthor Man

uscr

ipt

A

uthor Man

uscr

ipt

A

uthor Man

uscr

(12)

∈ {2, 4, …, 12}, we computed the MLEs for the regression coefficients using the proposed method, and the resulting AIC and BIC values were compared. The search range of Kn was

chosen to cover the upper and lower bounds of Kn satisfying Conditions (A9)–(A11) in Web

Section A. In Figure 2 the smallest BIC value corresponding to (η,Kn) = (1, 5) indicated that

use of five interior knots and the proportional odds model produced the best fit to the data. As a sensitivity analysis, we expanded the range of the transformation parameter to η > 1, leading to the smallest BIC value at η = 1.6 and Kn = 6 (Web Figure 2). Since the decrement

in BIC value was very small, for a better interpretation in practice, we still reported the results from the proportional odds model.

Table 3 summarizes the analysis results under the selected best model. The positive ϕ̂ = 1.730 (P<0.001) indicates that a higher current PSA level was significantly associated with a higher rate of informative drop-out. The rate of informative drop-out statistically

significantly increased with a higher baseline PSA level, T-stage and Gleason score, and for older patients. On the other hand, there was no significant difference in post-radiation PSA level by T-stage and Gleason score, although it was significantly affected by baseline PSA level and patient’s age. We noticed that overall results for time-constant covariates were similar between joint and marginal (i.e., ignoring the informative drop-outs) analyses, whereas the results for the temporal trend in PSA level were different. In Figure 3, circles and dots present the full history of all post-radiation PSA values for patients whose follow-up was informatively and non-informatively dropped out, respectively. The mark “×” on some circles indicates the last observation of PSA before the informative drop-out occurred. Based on the descriptive summary indicated by the circles, dots, and ×’s, it was observed that patients who dropped out informatively had higher PSA scores and shorter follow-up time in general. When we compare the curve for the “Joint Estimate” to that for the “Marginal Estimate” in Figure 3, we can see that the joint modeling approach led to the estimated underlying PSA trajectory curve being lower than the marginal estimate. This might be because the joint modeling is a method to account for the fact that some of the PSA values were observed under some degree of disease progression, which could cause

informative drop-out due to the tumor recurrence or sufficient concern that the patient considered hormone therapy. Moreover, the reduction in the estimate by the joint modeling approach was larger at longer times.

Sensitivity to the inference based on BIC selection was examined by comparing the results by AIC selection. Web Figure 3 shows that the AIC-based procedure selected the model with (η,Kn) = (1, 7), however, the resulting estimates were very similar to those based on the

BIC-based selection procedure and hence omitted here (see Web Figure 4 and Web Table 4).

6. Discussion

We have discussed a method of fitting a partially linear model for longitudinal data with informative drop-out, which can handle a covariate-dependent drop-out mechanism through the transformed survival model. For estimation of the model parameters, we have maximized the likelihood, and the resulting MLEs have been theoretically justified. The proposed joint modeling approach has clearly shown the capability to correct biases induced by ignoring

A

uthor Man

uscr

ipt

A

uthor Man

uscr

ipt

A

uthor Man

uscr

ipt

A

uthor Man

uscr

(13)

By exploring a broad class of models for the missing data through varying H(·) and ϕ, the proposed method reduces errors due to the misspecified missing data mechanism to some extent. The proposed methodology, however, does require extensive modeling assumptions, including specification of the drop-out mechanism, the mean and covariance structures, such as linearity in the covariate effects, and a normality assumption of latent variables.

Therefore, sensitivity analysis to these assumptions as well as future efforts in research and development of model checking tools are needed to promote practical application of the proposed joint partially linear model (JPLM). In the JPLM, the distribution of the outcomes after drop-out is nonidentifiable, and thereby we assume that it remains the same as before the drop-out. One simple approach to exploring sensitivity to this untestable assumption is to 1) introduce a sensitivity parameter as the difference between the mean of observed and unobserved responses, and then 2) examine how sensitive the results are over a clinically plausible range of the sensitivity parameter (for more details, see Web Section D). For the assessment of JPLM’s fit to observed data, a graphical inspection tool has been illustrated in Web Section C. With regard to more rigorous model diagnostic procedures, the main difficulty is that some model assumptions are made about unobserved variables, and hence the standard model diagnostics based on the observed data alone are not sufficient. One possible research direction is to adopt the multiple-imputation-based diagnostic method by Rizopoulos et al. (2010). The key idea of Rizopoulos et al. (2010) is to create multiple sets of complete data by resampling missing longitudinal outcomes from the posterior

distribution given the observed data, and apply standard model diagnostics for mixed effects models and survival models with complete data. Since the general framework needed for the multiple imputation procedures in Rizopoulos et al. (2010) has already been established in equation (5) and Sections 3.3 and 3.4, the extension to survival transformation models appears promising. The assumption on the linear effects of covariate can be relaxed by extending to time-varying coefficients models. When a fixed numbers of knots are used or the number of knots can be assumed to the same for all time-varying coefficients, we do not expect any additional technical challenge in the extension. However, further efforts to reduce the computational burden are needed for selecting different numbers of knots for each covariate.

There are a few other ways in which we can extend our proposal. In this article, we assume that the observation times of the biomarkers are independent of the level of biomarkers. However, in some cases where biomarkers are observed at hospitalizations or whenever clinicians may suspect some progress of the diseases for example, our JPLM can be extended to accommodate the informative observation process by jointly modeling the additional component. The AIC and BIC were considered to determine both the best transformation and the selection of the number of knots, but we can also explore and compare the validity of other resampling-based model selection criteria such as cross-validation in the future. Lastly, there is no theoretical justification for any of these model selection procedures, and the proposed inference does not account for the uncertainty of model selection. Post-model selection inference methods under the joint modeling setting are worth pursuing.

A

uthor Man

uscr

ipt

A

uthor Man

uscr

ipt

A

uthor Man

uscr

ipt

A

uthor Man

uscr

(14)

Supplementary Material

Refer to Web version on PubMed Central for supplementary material.

References

Beunckens C, Molenberghs G, Verbeke G, Mallinckrodt C. A latent-class mixture model for incomplete longitudinal Gaussian data. Biometrics. 2008; 64:96–105. [PubMed: 17608789] Cai N, Lu W, Zhang HH. Time-varying latent effect model for longitudinal data with informative

observation times. Biometrics. 2012; 68:1093–1102. [PubMed: 23025338]

Dabrowska DM, Doksum KA. Partial likelihood in transformation models with censored data. Scandinavian Journal of Statistics. 1988; 15:1–23.

Ding J, Wang JL. Modeling longitudinal data with nonparametric multiplicative random effects jointly with survival data. Biometrics. 2008; 64:546–556. [PubMed: 17888040]

Geman S, Hwang CR. Nonparametric maximum likelihood estimation by the method of sieves. Annals of Statistics. 1982; 10:401–414.

Hogan JW, Laird NM. Model-based approaches to analysing incomplete longitudinal and failure time data. Statistics in Medicine. 1997; 16:259–272. [PubMed: 9004396]

Huang JZ, Wu CO, Zhou L. Varying-coefficient models and basis function approximations for the analysis of repeated measurements. Biometrika. 2002; 89:111–128.

Kim S, Zeng D, Chambless L, Li Y. Joint models of longitudinal data and recurrent events with informative terminal event. Statistics in Biosciences. 2012; 4:262–281. [PubMed: 23227131] Kosorok MR, Lee BL, Fine JP. Robust inference for univariate proportional hazards frailty regression

models. Annals of Statistics. 2004; 32:1448–1491.

Lin D, Ying Z. Semiparametric and nonparametric regression analysis of longitudinal data. Journal of the American Statistical Association. 2001; 96:103–126.

Lin X, Carroll R. Semiparametric regression for clustered data using generalized estimating equations. Journal of the American Statistical Association. 2001; 96:1045–1056.

Lu T, Huang Y. Bayesian inference on mixed-effects varying-coefficient joint models with skew-t distribution for longitudinal data with multiple features. Statistical Methods in Medical Research. 2015; page Advance online publication. doi: 10.1177/0962280215569294

Muthén B, Asparouhov T, Hunter AM, Leuchter AF. Growth modeling with nonignorable dropout: Alternative analyses of the star*d antidepressant trial. Psychological Methods. 2011; 16:17–33. [PubMed: 21381817]

Proust-Lima C, Taylor JMG, Williams SG, Ankerst DP, Liu N, Kestin LL, et al. Determinants of change in prostate-specific antigen over time and its association with recurrence after external beam radiation therapy for prostate cancer in five large cohorts. International Journal of Radiation Oncology* Biology* Physics. 2008; 72:782–791.

Rizopoulos D, Verbeke G, Molenberghs G. Multiple-imputation-based residuals and diagnostic plots for joint models of longitudinal and survival outcomes. Biometrics. 2010; 66:20–29. [PubMed: 19459832]

Roy J. Modeling longitudinal data with nonignorable dropouts using a latent dropout class model. Biometrics. 2003; 59:829–836. [PubMed: 14969461]

Roy J, Lin X. Missing covariates in longitudinal data with informative dropouts: Bias analysis and inference. Biometrics. 2005; 61:837–846. [PubMed: 16135036]

Shen X, Wong WH. Convergence rate of sieve estimates. Annals of Statistics. 1994; 22:580–615. Shibata R. An optimal selection of regression variables. Biometrika. 1981; 68:45–54.

Taylor JMG, Park Y, Ankerst DP, Proust-Lima C, Williams S, Kestin L, et al. Real-time individual predictions of prostate cancer recurrence using joint models. Biometrics. 2013; 69:206–213. [PubMed: 23379600]

Zagars GK, Pollack A, Kavadi VS, von Eschenbach AC. Prostate-specific antigen and radiation

A

uthor Man

uscr

ipt

A

uthor Man

uscr

ipt

A

uthor Man

uscr

ipt

A

uthor Man

uscr

(15)

Zeger S, Diggle P. Semiparametric models for longitudinal data with application to CD4 cell numbers in HIV seroconverters. Biometrics. 1994; 50:689–699. [PubMed: 7981395]

Zeng D, Cai J. Asymptotic results for maximum likelihood estimators in joint analysis of repeated measurements and survival time. Annals of Statistics. 2005; 33:2132–2163.

Zhang D, Lin X, Raz J, Sowers M. Semiparametric stochastic mixed models for longitudinal data. Journal of the American Statistical Association. 1998; 93:710–719.

A

uthor Man

uscr

ipt

A

uthor Man

uscr

ipt

A

uthor Man

uscr

ipt

A

uthor Man

uscr

(16)

Figure 1.

Simulation results for α(t) estimation when ϕ = 1.7. The solid curve indicates true α(t) = (t + 0.5)−1.5 + (t/2 + 1)3 − 4. The dash-dotted (dotted) curves indicate the maximum, average, and minimum of the estimated α(t) over 1000 simulated data sets when the AIC-based (BIC-based) knot selection procedure was used. The AIC-estimates appeared to have larger variations in both tails than the BIC-estimates. This figure appears in color in the electronic version of this article.

A

uthor Man

uscr

ipt

A

uthor Man

uscr

ipt

A

uthor Man

uscr

ipt

A

uthor Man

uscr

(17)

Figure 2.

Bayesian information criterion (BIC) plotted for different transformations H(x) = log(1 + ηx)/η and different numbers of interior knots (Kn).

A

uthor Man

uscr

ipt

A

uthor Man

uscr

ipt

A

uthor Man

uscr

ipt

A

uthor Man

uscr

(18)

Figure 3.

Coefficient function of log PSA score, adjusted by T-stage, Gleason score, and age, under the best fit of transformation H(x) = log(1 + x) and 5 interior points. The solid (dashed) curve is an estimate from the joint (marginal) model. The circles and dots present the full history of all post-radiation PSA values for patients who dropped out informatively and non-informatively, respectively. The mark “×” on some circles indicates the last observation of PSA before the informative drop-out occurred.

A

uthor Man

uscr

ipt

A

uthor Man

uscr

ipt

A

uthor Man

uscr

ipt

A

uthor Man

uscr

(19)

A

uthor Man

uscr

ipt

A

uthor Man

uscr

ipt

A

uthor Man

uscr

ipt

A

uthor Man

uscr

ipt

T ab le 1

Simulation results for true

ϕ

= 1.7 (i.e., missing not at random) with

Kn ∈ { kAIC , kBIC

, 2, 4, 8} interior knots of B-spline approximation. T

rue v

alues are

α

(

τ20

) = −1.10,

α

(

τ40

) = −0.58, and

α

(

τ80

) = 2.16, where

τp

represents p% of study duration

(20)

(21)

A

uthor Man

uscr

ipt

A

uthor Man

uscr

ipt

A

uthor Man

uscr

ipt

A

uthor Man

uscr

ipt

T ab le 2

Simulation results by v

arying

ϕ

and

H

(·) when the interior knots of B-spline approximation are selected by BIC (i.e.,

Kn

=

kBIC

).

τp

represents p% of

study duration τ . P arameter T ar get H (x ) = x H (x

) = log(1 +

(22)

A

uthor Man

uscr

ipt

A

uthor Man

uscr

ipt

A

uthor Man

uscr

ipt

A

uthor Man

P arameter T ar get H (x ) = x H (x

) = log(1 +

(23)

A

uthor Man

uscr

ipt

A

uthor Man

uscr

ipt

A

uthor Man

uscr

ipt

A

uthor Man

uscr

ipt

T ab le 3

Joint analysis results of the prostate cancer data under the best f

it of transformation

H

(x

) = log(1 +

x

) and f

iv

e interior knots selected by BIC. The 50:50

mixture of

χ

2 distrib

utions is used for testing v

ariances. Reference groups for cate

gorical co

v

ariates are T

-stage=1, Gleason score between 2 and 6, and

age<65.

τp

represents p% of study duration

τ . Effect J oint model Mar ginal model Est SE P-v alue Est SE P-v alue

Longitudinal PSA score

α ( τ20 ) 0.340 0.087 .0001 0.553 0.092 <.0001 α ( τ40 ) 0.963 0.135 <.0001 1.368 0.141 <.0001 α ( τ60 ) 1.792 0.195 <.0001 2.388 0.201 <.0001 α ( τ80 ) 2.637 0.262 <.0001 3.424 0.269 <.0001 log(baseline PSA+0.1) 0.510 0.034 <.0001 0.509 0.034 <.0001 T -stage 2 −0.058 0.064 .3617 −0.045 0.064 .4828 T

-stage 3 or 4

0.018 0.117 .8770 0.027 0.117 .8171

Gleason score 7 to 9

0.055 0.060 .3592 0.040 0.060 .4942

Age 65–75 years

−0.190 0.069 .0056 −0.201 0.069 .0037

Age > 75 years

−0.115 0.088 .1882 −0.130 0.088 .1408 0.117 0.003 <.0001 0.118 0.003 <.0001 Random ef fects 0.373 0.028 <.0001 0.371 0.027 <.0001 0.206 0.018 <.0001 0.195 0.017 <.0001 ρ −0.121 0.055 .0281 −0.114 0.054 .0359 Informati v e drop-out log(baseline PSA+0.1) 0.532 0.161 .0010 T -stage 2 1.193 0.320 .0002 T

-stage 3 or 4

1.815

0.467

.0001

Gleason score 7–9

(24)

A

uthor Man

uscr

ipt

A

uthor Man

uscr

ipt

A

uthor Man

uscr

ipt

A

uthor Man

Effect

J

oint model

Mar

ginal model

Est

SE

P-v

alue

Est

SE

P-v

alue

Age > 75 years

−1.543

0.471

.0011

ϕ

1.730

0.134