Proportional Odds model for competing risks data

Chapter 4 Some applications

5.3 Discrete time competing risks models

5.3.1 Proportional Odds model for competing risks data

Cox [1972] proposed a Proportional Odds (PO) model for discrete times and single cause of failure. It is a discrete variation of the well-known Cox PH models, proposed in the same seminal paper. Let xi ∈ Rk be a vector containing the value of k covariates associated to individualiandβ = (β1, . . . , βk)0∈Rka vector of regression

parameters. The Cox PO model is given by log h(t|δt, β;xi) 1−h(t|δt, β;xi) = log h(t) 1−h(t) +x0_iβ≡δt+x0iβ, i= 1, . . . , n, (5.5)

where{δ1, δ2, . . .}respectively represent the baseline log-odds at times{1,2, . . .}and

t = 1, . . . , ti. The model in (5.5) can be estimated in most statistical software by

means of a binary logistic regression [Singer and Willett, 1993]. For this purpose, the data has to be transformed into a person-period format. DefineYit as 1 if the event

is observed at timet for the individuali; 0 otherwise. In the person-period format, each individual is represented by as many rows as periods in which he/she was at risk. To illustrate, Table 5.4 shows the transformed version of the fictional data displayed in Table 5.3. The period-indicatorsδtare estimated by introducing binary

variables to the set of covariates. One basic assumption of the logistic regression is the independence between observations. However, in the person-period data, there is a clear association between observations linked to the same individual. Nevertheless, as shown in Singer and Willett [1993], the likelihood related to the survival process coincides with the likelihood of the logistic regression model for which the rows in the person-period data are treated as independent Bernoulli trials. In fact, the contribution to the likelihood of the individuali(data collection for this individual stops if the event is observed or right censoring is recorded) is given by

Li=P(Yiti =yiti,· · · , Yi1=yi1) =h(ti) y_iti ti Y s=1 [1−h(s)]1−yis_, _(5.6)

which is derived by decomposingP(Yiti =yiti,· · · , Yi1 =yi1) as a sequential product

of conditional probabilities (covariates are omitted for easy of notation). Equiva- lently, defining ci = 0 if the survival time is observed (i.e. Yiti = 1, Yi(ti−1) =

0, . . . , Yi1 = 0) and ci = 1 if right censoring occurs (with ti as the terminal time),

we can express the likelihood contribution as

Li = h(ti) 1−h(ti) 1−ci S(ti), S(ti) = ti Y s=1 [1−h(s)], (5.7)

which is the same expression that would be obtained in a survival setting.

The model in (5.5) can be extended in order to accommodate R possible events. LetB =

β(1), . . . , β(R) be a collection of cause-specific regression param-

Table 5.3: Fictional data. Example of a standard competing risks dataset (covariates are omitted for simplicity).

ID Follow-up time Event

1 8 Observed

2 3 Censored

Table 5.4: Fictional data. Person-period format for the data shown in Table 5.3.

ID Period Outcome 1 1 0 1 2 0 1 3 0 1 4 0 1 5 0 1 6 0 1 7 0 1 8 1 2 1 0 2 2 0 2 3 0

multinomial logistic regression can be defined as log h(r, t|δ, B;xi) h(0, t|δ, B;xi) =δrt+x0iβ(r), r= 1, . . . ,R;t= 1, . . . , ti;i= 1, . . . , n, (5.8) where h(0, t|δ, B;xi) = 1− R X r=1 h(r, t|δ, B;xi) (5.9)

is the hazard of no event being observed at timet. The latter is equivalent to h(r, t|δ, B;xi) =

eδrt+x0iβ(r)

1 +PR

s=1 e

δst+x0iβ(s). (5.10)

This notation implies that the same predictors are used for each cause-specific component (but this is easily generalised). In (5.8), covariates have an effect that is homogeneous over time. Hence, changes in the covariates influence both the marginal probability of the event (P(R = r)) and the speed at which the event occurs. In fact, positive values of the cause-specific coefficients indicate that (at any time point) the hazard of the corresponding event increases with unit changes in the associated covariates. In the context of university outcomes, (5.8) has been used by Scott and Kennedy [2005], Arias Ortis and Dehon [2011] and Clerici et al. [2014], among others. Nonetheless, its use has some drawbacks. First, it involves a large number of parameters. In fact, if T is the maximum of the recorded sur-

vival/censoring times, there are R × T different δrt’s. Scott and Kennedy [2005]

overcome this by assigning a unique indicator δrt0 to the period [t0,∞) (for fixed

t0). The choice of this threshold is rather arbitrary but it is reasonable to choose a

value oft0such that most of the individuals already experienced one of the events by

timet0. Second, maximum likelihood inference for the multinomial logistic regres-

sion is precluded when the outcomes are (quasi) complete separated with respect to the predictors, i.e. a subset of the possible outcomes are not (or rarely) observed for some covariate configurations [Albert and Anderson, 1984]. In other words, the predictors can (almost) perfectly predict the outcomes. In the case of (5.8), these predictors include binary variables that are related to the period indicators δrt’s.

Therefore, (quasi) complete separation will occur if the event types are (almost) entirely defined by the survival times. This is a major issue in the context of university outcomes. For example, no graduations can be observed during the second semester of enrollment. Therefore, the likelihood function will be maximized when the cause-specific hazard related to graduations (defined in (5.10)) is equal to zero at timet= 2. Thus, the “best” value of the corresponding period-indicator is −∞. In order to overcome these problems, Singer and Willett [2003] suggests polynomial baseline odds when modelling single outcomes. This can be easily extended to the competing risks case as

log h(r, t|δ∗, B;xi) h(0, t|δ∗_{, B}_;_x i) =δ∗_r₀+δ∗_r₁(t−1)+δ∗_r₂(t−1)2+· · ·+δ_r∗_P(t−1)P+x0_iβ₍_r₎, (5.11)

whereδ∗={δ10, . . . , δR0, . . . , δ1P, . . . , δRP}and P denotes de degree of the polyno-

mial. Defining the polynomial in terms oft−1 facilitates the interpretation of the intercept (δ_r∗₀ represents the baseline cause-specific hazard at timet= 1). This op- tion is less flexible than (5.8), but it is not affected by a separation of the outcomes with respect to the survival times. Nevertheless, its use is only attractive when a low-degree polynomial is good enough to represent the baseline hazard odds. This is not the case for the PUC dataset, where cause-specific hazard rates have a rather complicated behaviour (e.g. even semesters exhibit spikes on the hazard of voluntary dropouts). In practice, not even large values ofP would provide a good fit.

Here, the model in (5.8) is adopted for the analysis of the PUC dataset, using Bayesian methods to handle separation. We define the last period as [t0,∞) [for

fixed t0, as in Scott and Kennedy, 2005], and period-indicators for time t = 1 are

In document Incorporating unobserved heterogeneity and multiple event types in survival models : a Bayesian approach (Page 118-122)