Estimation of the Case Fatality Risk using survival analysis with compet-

2.3 Analysis of survival data

2.3.2 Estimation of the Case Fatality Risk using survival analysis with compet-

Ghani et al., 2005 exploit competing risks models to formulate an estimator for the CFR. Set k= 2 and define a competing-risks process for the time from hospitalization to death or recovery (h = D, R), as illustrated in Figure 2.6. H D R α_d(t) α_r(t)

Figure 2.6: Competing-risks model for death and recovery: h, d and r represent respectively,

the hospitalized, death and recovered states. αd(t) is the hazard of death at time t since

hospital admission and αr(t) is the hazard of recovery at time t since hospital admission.

Let calendar time be indexed by s (s ∈ [0, S], with S the time of the end of the epidemic) and let the time since hospital admission (i.e. the time-to-event) be indexed by t ∈ [0, T ]. Assume that the CFRis constant over calendar time.

Let: α_d(t) = lim

∆t→0

P (t<T ≤t+∆t,h=D|T ≥t)

∆t be the cause-specific hazard function of death;

α_r(t) = lim

∆t→0

P (t<T ≤t+∆t,h=R|T ≥t)

∆t be the cause-specific hazard function of recovery;

tmax(s) be the maximum observed time from hospital admission to death or recovery that has

occurred by time s;

S(t) be the overall survival function, i.e. the survival if both endpoints are treated as a single composite endpoint;

I_d(t) = Rt

0S(u)αd(u)du be the cumulative intensity function of death, i.e. the probability of

I_r(t) =Rt

0S(u)αr(u)du be the cumulative intensity function of recovery, i.e. the probability of

recovery at or before time t.

Then the overall probability of death before or at calendar time s can be estimated by the cumulative intensity function computed at the maximum observed time:

θ_d(s) = I_d(tmax(s)) =

Z tmax(s) 0

S(t)α_d(t)dt (2.8)

and the probability of recovery before or at calendar time s can be estimated by: θR(s) = Ir(tmax(s)) =

Z tmax(s) 0

S(t)α_r(t)dt (2.9)

Note that for t → +∞, I_d(t) represents the overall probability of death, following from Equation

2.7

CFR = lim

t→+∞Id(t)

and therefore if data are available for a tmax(s) large enough, θd(s) approximates the overall

probability of death. This event happens certainly if the epidemic is complete (s = S), because there cannot be any t > tmax(S) and everyone has either died or recovered. In this setting,

the only alternative events are death and recovery; at and after the largest time-to-event S no individual is at risk, everyone had an event and, since death and recovery form a partition,

θD(S) + θR(S) = 1

from which follows that

CFR = θ_d(S)

Therefore to derive an estimate of the CFR, estimates of the cumulative intensity functions for both events must be obtained from the data on the whole epidemic (until its end S) and evaluated at the maximum observed survival time tmax(S):

[

CFR = bθd(S)

To illustrate the use of the cumulative intensity function to estimate the CFR, estimates of the cumulative intensity functions obtained from the analysis of the whole survival datasets of Figure2.4 are plotted below.

Approximating the CFRbefore the end of the epidemic

During the epidemic θ_d(s) + θ_r(s) < 1 because individuals can have recovered, died , but also be in the hospital without having had any event. Before the end of the epidemic, when s ≤ S, tmax(s) ≤ tmax(S), from which follows that the probability of death at or before time s is smaller

or equal than the probability of death at or before S, which is the CFR. The same reasoning can be made for its inverse, the probability of recovery, so that:

θd(s) ≤ θd(S) ≤ 1 − θr(s)

This inequality can be observed in Figure2.7, where the empirical cumulative intensity function of death at S, drawn in yellow, is shown to lie always between the empirical cumulative intensity function of death at s and 1 minus the empirical cumulative intensity function of recovery at s.

0 5 10 15 20 25 30 35 0.0 0.2 0.4 0.6 0.8 1.0

Empirical cumulative intensity function

survival time s CIF(s) 1−CIF^ R(s) CIF^ D(s) pD CIF^ D(S)

Figure 2.7: Results from the competing-risks analysis of the time to death and recovery. The

xaxis is the time to event, the green curves are the empirical cumulative intensity function of

death and 1 minus the cumulative intensity function of recovery. The limit of the cumulative intensity function of death for s → ∞ is approximated by the cumulative intensity function

of death at the highest survival time S, which is drawn in yellow. The value of CFRused to

generate the data is drawn in red and denoted by pd.

Under the assumption that patients who remain in the hospital between s, the observation time, and S, the end of the epidemic, experience aCFRequal to those who had an event up to time s:

b θd(S) =

θd(s)

θ_d(s) + θ_r(s) (2.10)

Estimates of θd(s) and θr(s) can be obtained using parametric survival to estimate the cu-

mulative intensity functions. They can then be plugged into Equation 2.10 which allows the estimation of θd(S), the probability of dying over all the epidemic (i.e. during [0, S]) which

approximates theCFR. Estimating the CFR

The steps to obtain estimates of the cumulative intensity function using standardKMmethods are summarised below. Consider the discrete time from hospitalization indexed by j = 1, 2, . . . J, for example days. Given an analysis time s, denote by:

d_dj(s) : number of deaths on day j from admission to hospital; drj(s) : number of recoveries on day j from admission to hospital;

nj(s) : number remaining at risk j days after admission to hospital;

J(s) the maximum observed number of days from hospital admission to death or recovery that has occurred by time s (i.e. J(s) is a discrete version of tmax(s)).

The overall survival probability is computed with theKMformula considering both endpoints. b Sj(s) = j Y r=1 1 −dd(s) + dr(s) nr(s)

The discretised version of the hazard of dying is:

αdj= P (J = j, i = D|J > j − 1)

where i indexes the event and at analysis time s, it can be estimated by:

α_dj(s) = ddj(s) nj(s)

Then the overall probability of death before or at calendar time s can be approximated by the cumulative intensity function for death computed at the maximum observed time t = J(s):

b θd(s) = J (s) X j=1 b Sj−1(s)αbdj(s) (2.11)

Similarly, the overall probability of recovery before or at calendar time s is

b θ_r(s) = J (s) X j=1 b Sj−1(s)αbrj(s) (2.12)

An estimator for the CFRat an early stage of the epidemic can be obtained by plugging 2.11

and2.12 in2.10: [ CFR(ghani)= bθD(S) = b θD(s) b θD(s) + bθR(s) (2.13) This estimator was computed at several time points during the simulated epidemic reported in Figure2.4. On this dataset alone the correction of Ghani’s estimator is not particularly evident (Figure2.8(a)). However, if a set of simulations is carried out and the median behaviour of the estimator is analysed, the estimator based on competing-risks converges to the true parameter much earlier thanWHO’s estimator (Figure 2.8(b)).

The estimator proposed by Ghani et al. (2005) is very appealing, as it takes a fully non- parametric approach and it allows correction for right censoring. However, this methods calls for two improvements. The first edit follows from the fact that, instead of computing the cumulative intensity function of death at +∞, I_d(∞) is approximated by I_d(tmax) and therefore

hazard and survival functions are only defined on the observed survival times. This is due to the fact that a KM estimator is used for the survival function. It seems natural therefore to extend this estimation method to a parametric setting where, in some cases, limt→+∞Id(t) can

be solved mathematically, given the estimated cause-specific hazards and the assumed survival distribution. Moreover, the main disadvantage of this estimator is the assumption of constant

CFRover calendar time s which is not realistic. To relax this hypothesis a time-varying version of the estimator is also proposed below.

In document Statistical inference in stochastic/deterministic epidemic models to jointly estimate transmission and severity (Page 50-53)