Cure Models - Bayesian Computation - Bayesian survival analysis using gene expression

3.4 Bayesian Computation

3.4.3 Cure Models

The cure rate model is defined in this section. Models for survival analysis typically assume that everybody in the study population is susceptible to the event of interest and will eventually experience this event if the follow-up is sufficiently long. In recent years, there has been increasing interest in modelling survivor data with long term survivors. Most approaches to the analysis of time to event data implicitly assume all individuals will experience the event of interest. However, there are situations when a proportion of individuals are not expected to experience the event of interest; that is, those individuals are often referred to as immune, cured or non- susceptible (Ibrahim et al., 2001b). For example, researchers may be interested in analysing the recurrence of disease. Many individuals may never experience a recurrence. In these situations, cure rate models are applied.

The most popular type of cure rate models introduced by Berkson and Gage (1952) with the earlier work included Boag (1949), is the mixture model, which

is also called the standard cure rate model. Let S1(t) be the survivor function for

the entire population, S∗(t) be the survivor function for the non-cured group in the

population, and π be the cure rate function. Then the standard cure rate model is given by

S1(t) = π + (1 − π)S∗(t). (3.6)

Exponential and Weibull distributions are commonly used for S∗(t). This model

has been extensively discussed in the statistical literature by many authors. A detailed review of the cure model can be found in the books by Maller and Zhou (1996, p.97-223) and Ibrahim et al. (2001b, p.155-205). In frequentist context for parametric models, Farewell (1986) examined mixture models and advocated the likelihood function as an informative inference tool to estimate the fraction of patients cured of breast cancer disease. Farewell (1986) also suggested that the cure rate models in a clinical setting would only be sensible if the data are based on long term follow-up. Peng and Xu (2012) proposed a novel interpretation for a recently proposed Box-Cox transformation cure model for colon cancer, which leads to a natural extension of the cure model. An alternative formulation of the parametric cure rate model is discussed in Yakovlev and Tsodikov (1996). In Bayesian context, Chen et al. (1985) generalised the mixture model of Berkson and Gage (1952) and used for the analysis of survival data from cancer. Chen et al. (1999) considered Bayesian methods for right censored survival data for populations with a cure fraction and proposed a model that is quite different from the standard mixture model for cure rates. Basu and Tiwari (2010) developed a model that unifies the mixture cure and competing risks approaches and that can handle the masked causes of death from breast cancer in natural way by using Markov chain sampling. Moreover, Cancho et al. (2011) discussed the use of MCMC methods as a reasonable way to get Bayesian inference for analysis of survival data with a cure rate by proposing the

3.4. BAYESIAN COMPUTATION 39

negative binomial distribution as an extension of the model presented in Chen et al. (1999). Cancho et al. (2012) developed a Bayesian analysis for the right censored survival data when cured individuals may be present in the population from which the data are taken by using MCMC method.

There have been various applications of the mixture cure model. Angelis et al. (2007) analysed the survival of colon cancer patients by adjusting for background mortality. Simonetti et al. (2008) applied the mixture cure model to estimate the complete prevalence of childhood cancer. Chen et al. (2002) developed and com- pared three Bayesian models (piecewise exponential model, a fully parametric cure rate model and a semiparametric cure rate model) for analyzing time to event data for high risk melanoma. Beside the parametric mixture cure models, there are a number of semiparametric mixture cure models in the literature, for example, semiparametric estimation procedures by using the proportion time cure model (Chen et al., 1999, Ibrahim et al., 2001a, Kim et al., 2007), the proportional haz- ards mixture cure model (Peng and Dear, 2000), (Goldman, 2000), the accelerated failure time mixture cure model (Peng and Dear, 2009, Zhang and Peng, 2007, Zhang et al., 2011). The mixture cure model has also been implemented in several statistical packages. Peng et al. (1998) developed an R package GFCURE and Corbiere and Joly (2007) provided a SAS macro for parametric and semiparametric mixture cure model. The CANSURV (Yu et al., 2005) of the National Cancer Institute (NCI) fits mixture cure models to population based cancer survival data using maximum likelihood method.

As in Yakovlev and Tsodikov (1996), Chen et al. (1999) and Ibrahim et al. (2001b), for an individual in a population, let N denote the number of latent vari-

ables. Assume that N has a Poisson distribution with mean θ. Let Zi, i = 1, . . . , N

denote the random time, where Zi are i.i.d. with a distribution function F (t) =

defined by the random variable Y = min(Zi, 0 ≤ i ≤ N ), where P (Z0 = ∞) = 1.

Hence, the survival function for the population is given by

Spop(t) = P (N = 0) + P (Z1 > t, . . . , ZN > t, N ≥ 1) = exp(−θ) + ∞ X k=1 [S(t)]kθ k k!exp(−θ) = exp(−θF (t)). (3.7)

A corresponding cure fraction in model (3.7) is limt→∞Spop(t) = exp(−θ) > 0.

We also know from (3.7) that the cure fraction is given by Spop(∞) = P (N = 0) =

exp(−θ). As θ → ∞, the cure fraction tends to 0, whereas as θ → 0, the cure fraction tends to 1. Corresponding population density and hazard functions are

fpop(t) = −_dtdSpop(t) = θf (t) exp(−θF (t)) and hpop(t) = θf (t), respectively.

The PH structure with the covariates is modelled through θ (Chen et al., 1999, Ibrahim et al., 2001b). The population survival function (3.6) can be written as

Spop(t) = exp(−θ) + [1 − exp(−θ)]S∗(t),

where S∗(t) = exp(−θF (t))−exp(−θ)_{1−exp(−θ)} , and f∗(t) = exp(−θF (t))_{1−exp(−θ)} θf (t).

Following Chen et al. (1999) and Ibrahim et al. (2001b), we construct the like-

lihood function. Suppose we have n subjects and assume that N_i0s are i.i.d with

Poisson distributions with means θi, i = 1, . . . , n. Let Zi1, . . . , ZiNi denote the

times for the Nicompeting causes, which are unobserved, and which have a cumu-

lative distribution function, F (.). In this section, we specify a parametric form for

F (.), for the Weibull distribution. Let ψ = (α, λ)0, where α is the shape parameter

and λ is the scale parameter. We incorporate covariates for the cure rate model

through the cure parameter θ and we have a different cure rate parameter, θi, for

each subject.

3.4. BAYESIAN COMPUTATION 41

and let β = (β1, . . . , βk) denote the corresponding vector of regression coefficients.

We relate θ to the covariates by θi = exp(x

iβ). Let ti denote the survival time for

subject i, which is right censored, let Ci be the censoring time, and let δi be the

censoring indicator, assuming 1 if Ti is a failure time and 0 if it is right censored.

The observed data are D = (n, t, δ, X), where t = (t1, . . . , tn)

, δ = (δ1, . . . , δn)

and X = (x1, . . . , xn)

. The complete data are given by Dc = (n, t, δ, X, N ),

where N = (N1, . . . , Nn)

. The complete-data likelihood function of the parameter (ψ, β) can be written as L(ψ, β | Dc) = ( _n Y i=1 S(ti | ψ)Ni−δi(Nif (ti | ψ))δi ) × exp ( _n X i=1 Nilog(θi) − log(Ni!) − nθi ) . (3.8)

Again, we assume independent priors for β and ψ, where α ∼ Gamma(aα, bα),

λ ∼ N (µ_λ, Σλ) and β ∼ N (µβ, Σβ). We also assume p(α, λ) = p(α | δ0, τ0)p(λ),

p(α | δ0, τ0) ∝ αδ0−1exp(−τ0α), and the hyperparameters (δ0, τ0) are specified

(Chen et al., 1999, Ibrahim et al., 2001b).

Combining these specifications with the likelihood function (3.8), the joint posterior distribution of (α, λ, β) becomes

p(α, λ, β | D) ∝ n Y i=1 (θif (ti | α, λ))δiexp(−θi(1 − S(ti | α, λ))) ×p(α | δ0, τ0)p(α, λ)p(β). (3.9)

The joint posterior density of (α, λ, β) in equation (3.9) is analytically intractable because the integration of the joint posterior density is not easy to perform. Hence, inferences are based on MCMC simulation methods. We can use the Metropolis- Hastings algorithms or slice sampling to simulate samples of α, λ and β. MCMC computations were implemented using the WinBUGS system (Spiegelhalter et al., 2002).

In document Bayesian survival analysis using gene expression (Page 59-64)