Competing risks regression - Statistical analysis of competing events

1 Introduction

1.4 Statistical analysis of competing events

1.4.5 Competing risks regression

The statistical advantages of flexible parametric models over semi-parametric approaches (Snell, 2015) were outlined in Section 1.3.4. Recall, Royston-Parmar flexible parametric models (introduced in Section 1.3.4) estimate a smooth baseline hazard function using restricted cubic splines alongside estimates of regression coefficients, on the log cumulative hazard scale (Royston and Parmar, 2002). These methods have been adapted to incorporate competing events using both the cause-

2017) approaches. The two approaches are discussed below. Again, for simplicity each participant is at risk of experiencing K = 2 mutually exclusive events, the event of interest k = 1, and a competing event k = 2. For regression purposes, we include a vector of prognostic factors 𝐗k,i = (xk,1, xk,2, … )

, for each cause k and each individual i. The model parameters can be estimated using standard maximum likelihood techniques.

1.4.5.1 Cause-specific hazards approach

The flexible parametric modelling approach using cause-specific hazards to estimate cause-specific cumulative incidence functions (Hinchliffe and Lambert, 2013b), models on the log cumulative cause-specific hazards scale. Assuming proportional hazards, the log cumulative cause-specific hazard function for event k with a vector of prognostic factors 𝐗k,i can be written as:

ln[Hk(t|𝐗k,i)] = Spline{ln[t]|𝛄k, 𝐍k} + 𝛃kT𝐗k,i Equation 1.28

Where Spline{ln[t]|𝛄k, 𝐍k} represents a restricted cubic spline function (as given in

Equation 1.14) with a vector of knot locations 𝚴k modelling the log cumulative baseline

cause-specific hazard function for event k. The resulting vector of regression coefficients 𝛃k, which corresponds to the vector of prognostic factors 𝐗k,i, are estimated

using maximum partial likelihood techniques (Hinchliffe and Lambert, 2013b). The regression coefficients can be interpreted as log cause-specific hazard ratios under the proportional hazards assumption, and thus describe the (adjusted) effect of each prognostic factor on the risk of the event of interest when competing events cannot occur.

The cause-specific hazard function is obtained by differentiating the cumulative cause-specific hazard function with respect to time, and thus involves the derivatives

hk(t|𝐗k,i) = d

dtexp(Spline{ln[t]|𝛄k, 𝐍k} + 𝛃k T

𝐗k,i) Equation 1.29

Estimation of each cause-specific hazard function can be obtained by fitting separate models for each of the K events, while censoring all other events. Alternatively, it is possible to simultaneously estimate all K cause-specific hazard functions using a multi-state data format. Though the two approaches give the same results, simultaneous estimation is considered more flexible. It allows for shared coefficient estimates and baseline hazard functions across events, and allows for convenient testing and comparison of coefficient estimates across event types (Lunn and McNeil, 1995).

To model all K cause-specific hazard functions simultaneously, the dataset to which the model is applied must be expanded to mimic the K different datasets used if the analyses were performed separately. This is best illustrated using an example. When investigating the risk of disease recurrence following breast cancer surgery, death is a competing event, as such there are K = 2 mutually exclusive events. The original data may look like that depicted in Figure 1.11a, with participant 1 experiencing disease recurrence at 3.1 years, participant 2 dying at 4.1 years, and participant 3 being censored at 5.9 years. To simultaneously analyse all events, the dataset is expanded to the multi-state data, depicted in Figure 1.11b. Now each participant has K = 2 rows, one for each of the mutually exclusive events. Participant 1 experiences disease recurrence at 3.1 years (row 1), but is censored at 3.1 years for death (row 2). Participant 2 is censored for disease recurrence at 4.1 years (row 3), but dies at 4.1 years (row 4). Finally, participant 3 is censored at 5.9 years for both of the events (rows 5 and 6).

Figure 1.11: Original and multi-state datasets for competing risks analysis a) Original data (wide format)

ID Age Time Disease Recurrence Death

1 34 3.1 1 0

2 56 4.1 0 1

3 42 5.9 0 0

… … … … …

b) Multi-state data (long format)

Row ID Age Time Event Status Row

1 1 34 3.1 Disease Recurrence 1 1 2 1 34 3.1 Death 0 2 3 2 56 4.1 Disease Recurrence 0 3 4 2 56 4.1 Death 1 4 5 3 42 5.9 Disease Recurrence 0 5 6 3 42 5.9 Death 0 6 … … … … …

In prognostic model research, the key measure of interest is the cause-specific cumulative incidence function, as this returns individual estimates of absolute risk predictions. As discussed previously, estimating the cause-specific cumulative incidence function using the cause-specific approach requires estimates of all K cause- specific hazard functions. To estimate for the cause-specific cumulative incidence function for event k with a vector of prognostic factors 𝐗k,i recall Equation 1.25:

̂(t|𝐗k,i) = ∫ ĥ(u|𝐗k k,i) exp(− ∫ ∑Kj=1ĥ (v|𝐗j j,i) dv u

0 )

0 du Equation 1.30

The integral in the above equation cannot be solved analytically, thus additional methods, such as numerical integration (Hinchliffe and Lambert, 2013b) or the simulation approach (Fiocco et al., 2008, Crowther and Lambert, 2017), are required. Briefly, the simulation approach simulates a large sample of participants, and calculates a transition probability matrix using Nelson-Aalen estimators of the cumulative cause-specific hazard function Hk(t). The simulated participants iterate

incidence function estimates for event k given a prognostic factor vector 𝐗k,i are

calculated as the proportion of simulated participants with the same vector of prognostic factor values that experience event k. The cause-specific flexible parametric model can be fitted to all causes simultaneously using the expanded multi-state dataset (Crowther and Lambert, 2017). An example of a published model developed using the cause-specific approach is provided in Box 1.4.

1.4.5.2 Subdistribution hazards approach

The flexible parametric modelling approach using subdistribution hazards to estimate cause-specific cumulative incidence functions (Lambert et al., 2017), models on the log cumulative subdistribution hazards scale. The cumulative subdistribution hazard function is defined as Λk(t) = ∫ λk(u)du

0 for continuous time distributions.

Assuming proportional subdistribution hazardsi_{, the log cumulative subdistribution}

hazard function for event k with a vector of prognostic factors 𝐗k,i is written:

ln[Λk(t|𝐗k,i)] = Spline{ln[t]|𝛄k, 𝐍k} + 𝛃kT𝐗k,i Equation 1.31

In this case the restricted cubic spline function is modelled on the log cumulative baseline subdistribution hazard scale for event k. The resulting vector of regression coefficients 𝛃k, which corresponds to the vector of prognostic factors 𝐗k,i, are estimated

by maximising the weighted partial likelihood function (Lambert et al., 2017). The regression coefficients can be interpreted as log subdistribution hazard ratios under the proportional subdistribution hazards assumption. Thus, describe the effect of each prognostic factor on the risk of the event of interest, adjusted for the occurrence of competing events.

Box 1.4: Example of a model developed using cause-specific approach The time to death or discharge in neonatal care was examined in a recent study (Hinchliffe et al., 2013) using flexible parametric modelling and the cause-specific approach. The model was developed using retrospective data from 2,723 babies born at 24-28 weeks gestational age, admitted to neonatal care. Flexible parametric methods were used to analyse death and discharge alive as competing events.

Gestational age (weeks), sex, and birthweight (centiles) were found to significantly effect the time to death or discharge. Cause-specific hazard ratios were not reported. The cause-specific cumulative incidence for death and discarge alive for female babies admitted to neonatal care are provided below for different gestational ages (top to bottom) and birthweights (left to right).

Absolute probabilities for death (black) and discharge (grey) for female babies admitted to neonatal care, by gestational age and birthweight centile.

Estimation of the subdistribution hazard function for the event of interest k = 1 can be performed without needing to model the competing events. This is achieved by utilising the subdistribution risk set, in which participants who experience competing events remain in the risk set but are unable to experience the event of interest (Figure 1.9). If proportional subdistribution hazards are not biologically plausible, alternative link functions may be used (e.g. a logit link gives a proportional odds model) (Lambert et al., 2017), or time dependent effects can be modelled by incorporating interactions between the prognostic factors and the restricted cubic spline function (Hinchliffe and Lambert, 2013b).

The cause-specific cumulative incidence is of key interest in prognostic model research. This can be estimated using the subdistribution approach by incorporating the subdistribution hazard function (above) into Equation 1.27:

̂(t|𝐗k,i) = 1 − exp{−exp(Spline{ln[t]|𝛄k, 𝐍k} + 𝛃kT𝐗k,i)} Equation 1.32

While it is not necessary to model any event other than the event of interest, it is possible to fit the subdistribution flexible parametric model to each cause, either separately or simultaneously (Lambert et al., 2017). An example of a published prognostic model developed using the subdistribution approach is provided in Box 1.5.

Box 1.5: Example of a prognostic model developed using the subdistribution approach

The time to cancer-specific mortality, with other causes death as a competing event, in patients with head and neck squamous cell carcinoma was examined in a recent study (Shen et al., 2015) using flexible parametric modelling and the subdistribution

apparoach. The model was developed using a cohort of 23,494 patients with head and neck squarmous cell carcinoma.

The following prognostic factors were investigated for each cause: age (years), race (white, black, other), marital status (unmarried, married), radiotherapy (none,

yes),.tumour size (mm), grade, T and N classifications, and site (lip, oral cavity, salivary gland, oropharynx, hypopharynx, larynx, other).

Cause-specific cumulative incidences were estimated seperately for each cause.

Nomograms for predicting 5 and 10 year probabilities of each cause of death are below:

Nomogram for predicting 5 and 10 year probability of cancer-specific death. (Shen et al., 2015)i

Nomogram for predicting 5 and 10 year probability of other causes death. (Shen et al., 2015)

1.4.5.3 Differences between cause-specific and subdistribution approaches A summary of the key differences between the cause-specific and subdistribution modelling approaches is provided in Table 1.2.

Table 1.2: Differences between cause-specific and subdistribution modelling approaches

Cause-specific approach Subdistribution approach

Model assumptions Proportional cause-specific hazards hk(t). Proportional subdistribution hazards λk(t). Risk sets

The same as standard time to event analysis: contains participants who have not experienced either the event of interest or any competing event. Participants who experience competing events are censored.

Differs from standard time-to-event analysis: contains participants who have not experienced the event of interest.

Participants who experience

competing events remain in the risk set, and are “cured” from the event of interest. Interpretation of hazard (prognostic factor associations)

The cause-specific hazard

measures the direct association of prognostic factors on the event of interest, assuming the competing events cannot occur (i.e ignoring the indirect effects of the

competing events).

The subdistribution hazard measures the association of the prognostic factors on the “real world” risk of the event of interest, incorporating the indirect effects of the competing event.

Link to cumulative incidence (absolute risks) There is no 1:1 relationship

between the cause-specific hazard and the cumulative incidence function.

Estimation of ALL cause-specific hazard functions is required to obtain absolute risk estimates.

There is a 1:1 relationship between the subdistribution hazard and the cumulative incidence function. Need only estimate the

subdistribution hazard function for the event of interest to obtain absolute risk estimates.

When should the approach be used?

Prognostic factor research Prognostic model research

Main advantage

Measures direct associations so aids understanding of eitiological questions.

Straightforward link to cumulative incidence function, so makes it easier to model just the event of interest.

In document Investigating the presence and impact of competing events on prognostic model research (Page 56-65)