Model and estimation - Essays on missing data problems: MSL estimation in the analysis of censo

In this section, we discuss our model specification and estimation methods. The most important risk factor is age. As mentioned, age also captures time and cohort effects. Therefore, we use age as analysis time. Specifically, we define analysis time as t =

(age−40)/10 for age > 40. (Normalising the time unit to a decade makes the scale of certain parameters more readable.) Also, it is well known that there are dynamic patterns in risk. Those having experienced an AMI event are more likely to experience subsequent events, and the risk is particularly high for some time immediately after that event. Therefore, we specify separate models for the first and subsequent AMI events, and the equation for subsequent AMIs is allowed to depend on the timing of the most recent event. Heterogeneity in risk can be considerable. Unfortunately, we do not have risk markers in our data, be they biological, socioeconomic, or behavioural

§3.4 Model and estimation 57

factors. Therefore, we include so-called random effects (frailty) to capture the effect of unobserved individual heterogeneity. Specifically, we include an unobserved random variable v ∼ N(0, 1) in the model specifications. To allow for more flexibility, we

estimate a separate model for each gender and ethnic group. For notational simplicity, we suppress subscripts indicating the groups in the following.

Each model consists of two equations. The first equation represents the hazard function, h1, of the first AMIs:

h1(t|v,θ) =exp(tα1+µ1+vσ1), (3.1)

where θdenotes the entire unknown parameter vector to be estimated. Parameter α1

captures age dependence in the risk, parameterµ1 captures the median overall level of

risk for first AMIs, and parameterσ1 is the influence of the random effect. The second

equation represents the hazard function h2, of subsequent AMIs:

h2(t|t−,v,θ) =exp(tα2+Recentγ+µ2+vσ2), (3.2)

where t− is the timing of the most recent AMI and the variable Recent is defined by Recent = 1(t ≤ t−+τ) where the value of τ corresponds to 1 year (i.e. τ = 0.1). Parameter α2 captures age dependence in the risk, parameterγ indicates the dynamic

effect of the most recent AMI, parameter µ2 embodies the median overall level of risk

of subsequent AMIs, and parameterσ2 is the influence of the random effect.

The Gompertz specifications embodied in (3.1) and (3.2) assume that the hazard function progresses exponentially with age. The law of exponential progression is suitable for many common age patterns in actuarial, biological, and demographic applications (e.g. Wienke, 2010). It is also appropriate in the context of AMI risk until age 85, as shown in Figure 3.1. We expect positive signs ofα1 and α2 given that AMI

risk increases as people age. Note that analysis time is not reset after an AMI event. We capture history dependence partly by distinguishing between h1 and h2 and

partly by including the time-varying covariate Recent. The latter allows for elevated risk proportional to eγ within 1 year following the most recent event. The cutoff between the two regimes for the risk of subsequent events is somewhat arbitrary but follows the literature.10 There is no theoretical basis for assuming an abrupt change in risk after 1 year, but this specification allows us to distinguish short-term and long-term risks in a simple way.

Since we estimate separate models for each group, effectively all parameters are interacted with gender and ethnicity. In particular, gender and ethnicity are not as- sumed to have a simple proportional effect on risk. Group-specific parameters mean that differences in outcomes can arise because of a combination of differences in age dependence, in the dynamic effect of the most recent AMI, and in the distribution of the random effects.

As mentioned, the main problem for estimating the models is left-censoring. For people whose histories are left-censored, we cannot tell whether the first observed AMI is the first experienced AMI or a subsequent AMI, nor do we know the value ofRecent for the first 1 year of the observation period. Comparing the number of people aged 30–39 and 30–85 years in Table 3.1 reveals that about 75% of the AMI histories in our analysis data are left-censored, so the problem is substantial.

Left-censoring and history dependence mean that the likelihood function for the observed data is analytically intractable. Therefore, we estimate the models using the maximum simulated likelihood (MSL) method developed by Lee and Gørgens [2017]. To discuss this method some additional notation is needed. Let Ci = 0 indicate that

the history for individual i is not left-censored, and let bi1 = (bi1ki1,. . .,bi11) denote

their event history where each bi1k is the analysis time when person i had event k and ki1 is their total number of events (possibly 0). Persons with Ci = 0 are under age 40 on 1 July 2002, and b_i₁ is the analysis time of their AMI events from the date they turn 40 until 30 June 2012 or until the analysis time of their death, whichever

Many studies in the literature consider mortality during 1 year after an AMI event (see Introduc- tion).

§3.4 Model and estimation 59

is earlier. Let Ci = 1 indicate that the history for individual i is left-censored, and

let (bi2,bi1) = (bi2ki2,. . .,bi21,bi1ki1,. . .,bi11)denote their event history, where bi2 is

observed andbi1 is unobserved. Persons with Ci =1 are those who are over age 40 on 1 July 2002, and bi2 is the analysis time of their AMIs from 1 July 2002 until 30 June

2012 or until their analysis time of death, while bi1 is the analysis time of their AMIs

from age 40 to 1 July 2002.

Let g1 and g2 denote density functions of bi1 and bi2. They can be derived from

the hazard functions given in Equations (3.1) and (3.2). To state the expressions formally, let bi10 denote the beginning of analysis time, let bi20 denote analysis time

on 1 July 2002, and let bi30 denote analysis time on 30 June 2012 or on the date

of death. Furthermore, let H1(t|t−,v,θ) = R_tt−h1(y|v,θ)dy for t > t− denote the

value of the cumulative hazard function from time t− until time t. Similarly, define H2(t|t−,v,θ) =R_tt−h2(y|t−,v,θ)dy fort > t−. Then the densityg1 ofbi1 evaluated at

If k1 = 1, the product overk in the middle is void, and if k1 = 0, then only the very

last exponential term is present with H1 replacingH2. When k1 =0 (sob1 is empty)

and k2 >1, the conditional density g2 of bi2 given bi1=b1 evaluated at b2 is

The modifications for individuals with other values ofk2are relatively straightforward;

see Lee and Gørgens [2017] for details.11

With these definitions, and letting Φ denote the standard normal cumulative distribution function, the log likelihood function forN observed histories can be written12

L(θ) = N X i=1 " (1−Ci)ln Z Rg1(bi1|v,θ)dΦ(v) +Ciln Z R Z Support(b1) g2(bi2|b1,v,θ)g1(b1|v,θ)db1dΦ(v) # . (3.6)

The first term in the sum on the right-hand side of Equation (3.6) is the likelihood contribution if individual i is non-left-censored. The integral here is over the random effect. The second term is the likelihood contribution if individual i is left-censored. Here the outer integral is over the random effect and the inner integral is over the unobserved history.

In Equation (3.6), we assume that right-censoring and AMIs events are conditionally independent given previous event history, and we do not model the process of right- censoring explicitly. As mentioned, most individuals are right-censored because the study period ends on 30 June 2012, but a small number are right-censored when they die before 30 June 2012. If mortality risk and AMI risk are correlated (e.g. competing risks

11_{It is necessary to keep track of whether}₍_h

1,H1)or (h2,H2)applies as well as the timing of the

most recent eventt−.

Lee and Gørgens [2017] consider a more general setup than is necessary here. For example, they allow for multiple observation periods for each individual. In the present application, there is a single observation period from 1 July 2002 until the earlier of 30 June 2012 and date of death. For simplicity, we here use a simple indicator variable Ci to represent observed and unobserved periods. In the

terminology of Lee and Gørgens [2017],Ci=0 corresponds to the case where odd-numbered periods

§3.4 Model and estimation 61

correlated through the random effects), then the likelihood function is misspecified. However, the misspecification bias is likely to be small since the death rate is small. Note that right-censoring due to death here does not cause missing data, as the cause of death is observed in all cases.

There are no closed-form solutions to the integrals and analytical evaluation of the likelihood function is not possible. The solution investigated by Lee and Gørgens [2017] is to use a combination of quadrature and simulation methods to evaluateL(θ).

The integrals over the random effects are one-dimensional and can be handled by e.g. Gaussian quadrature. The integral over the unobserved history is difficult to evaluate, essentially because the dimension of b₁ is unknown. To handle that, we consider two importance sampling simulation methods, unnormalised (ISU) and normalised (ISN). For the ISU method, the simulated log likelihood function that we maximise is

L(θ)≈ N X i=1 " (1−Ci)ln Q X q=1 wqg1(bi1|vq,θ) ! +Ciln Q X q=1 wq 1 R ( _R X r=1 g2(bi2|bqri1,vq,θ) g1(bqri1|vq,θ) g1(bqri1|vq,θ∗) )!# , (3.7)

where thevqs are Gauss-Hermite quadrature points and the wqs are the corresponding weights, and the bqr_i₁s are simulated pseudo-event histories. The idea of importance sampling is to draw bqr_i₁ from g1(·|vq,θ∗) using a fixed θ∗ instead of drawing from g1(·|vq,θ) using the θ at which the likelihood function is evaluated, and correct the ‘mismatch’ through the adjustment factor g1(bqri1|vq,θ)/g1(bqri1|vq,θ∗).13 One of the advantages of using importance sampling is that the simulated likelihood function is continuous inθ, so gradient-based algorithms can be used to find the maximum. Since event timings depend on prior history, it is not possible to draw an entire history

bqr_i₁ from g1(·|vq,θ∗) in a single step. Instead, it is necessary to draw the individual

In our empirical analysis, we set Q = 10 and R = 100. For θ∗, we use estimates obtained using Heckman’s approach as discussed in Lee and Gørgens [2017], with the modification that lnσ₁∗= lnσ∗2=0. When estimating a model without random effects for female Europeans, Heckman’s approach

resulted in non-sensible estimates, so we substituted the estimates for female Maoris. (Using estimates for male Europeans gave similar final estimates.)

pseudo-event timings sequentially; see Lee and Gørgens [2017] for details.

For the ISN method, the simulated log likelihood function is essentially the same; the only difference is that the adjustment factors are normalised so that they sum toR. That is, g1(bqri1|vq,θ)/g1(bqri1|vq,θ∗)in Equation (3.7) is replaced by

g1(bqri1|vq,θ) g1(bqri1|vq,θ∗) , 1 R R X r=1 g1(bqri1|vq,θ) g1(bqri1|vq,θ∗) . (3.8)

Since ISU and ISN are just different ways of approximating the exact likelihood function in Equation (3.6), both methods should provide similar results. In practice, there may be some differences, and we report estimates from both methods in the discussion of the results.

The MSL estimation method is computationally burdensome, because a large number of draws is required in order to obtain a satisfactory approximation to the exact likelihood function for the observed data (largeQand largeR). However, as we show in related research, MSL estimation is more efficient in handling the left-censoring problem than the ad hoc solutions that previously have been considered in the literature [Lee and Gørgens, 2017]. In the present context, this is particularly true for the risk of the first event.

In document Essays on missing data problems: MSL estimation in the analysis of censored data and doubly robust estimation in the analysis of treatment effects (Page 74-80)