Structural Nested Accelerated Failure Time models

CHAPTER I. INTRODUCTION

1.5 Proposed solutions considered in this dissertation

1.5.1 Structural Nested Accelerated Failure Time models

Robins proposed the use of structural nested models (specifically Structural Nested Accelerated Failure Time[SNAFT]models in the case of survival outcomes) as a way to estimate causal effects in the presence of time-varying mediating confounders (Robins and Tsiatis(1992)). These models adjust for confounding by time-varying covariates using g-estimation (Robins(1989)). G-estimation allows estimation of a marginal pa-

rameter, as with the parametric g-formula, and is not subject to the pitfalls of regression models noted above.

A SNAFT model can be expressed in a simple case as

T¯0=

k=0

exp(ψXk)d k (1.9)

WhereT¯0 _{is the failure time we would observe if the individual had never been ex-} posed. The parameter of interest exp(₋ψ)is interpreted as the factor by which exposure contracts one’s lifespan, For example, if exp(₋ψ) =1/2, exposure cuts the amount of potential time one could live in half - that is, the failure time observed under con- stant exposure is half that we would observe under no exposure. To account for time varying exposures, we have to integrate the exposure function over the observed time. This model allows us to calculate the potential failure time under no exposure using the data. For example, ifψ=0.2 for an individual with an observed failure timeT =2.2 and exposure historyX¯k = (1, 0, 1), the survival time under the exposure history “never

exposed” (X¯k = (0, 0, 0)) would be

T¯0 = e0.2∗1∗(1−0)years + e0.2∗0∗(2−1)years + e0.2∗1∗(2.2−2)years = 2.47 years Estimation of the SNAFT model is done using g-estimationRobins(1989);Robins and Tsiatis(1992). A simple g-estimation algorithm would proceed as follows:

First, take a guess atψ(calledψe) and use equation 1.9 to generate a set of the potential failure times under no exposure atψecalledTe

¯ 0_{as in} e T0¯= T Z k=0 exp(ψeX_k)d k.

Next, one uses an estimating equation to evaluate the association betweenTe ¯ 0_and

Xk. This could be a logistic model such as l o g i t[P r(Xk=1|VL¯k,X¯k−1,Te ¯ 0_,_T _>_k_;_β_,_θ_{)] =}_β 0k+Vβ1+L¯kβ2+X¯k−1β3+Te ¯ 0_θ _(1.10) in which the dependent variable is the exposure at timek(Witteman et al.(1998);Hernán et al.(2005)). The model includes a interceptβ₀_kthat may be time varying. Because the outcome we would observe under no exposure can only depend on exposure if there is unmeasured confounding (i.e.T0¯_{is treated as a baseline variable determined prior} to any exposure or covariates), if all confounders are accounted for in the right side of equation 1.10,Te

0 _{will equal}_T¯0_when _θ ₌_{0 (}_Robins_{(1989)). One can perform a grid} search over a reasonable range ofψe, and theψethat yields a Z-statistic forθ equal to 0 (or very close to 0) is the estimate ˆψ. G-estimation is the term given to this search.

To develop some intuition, one could consider T0¯ _{to be the “residual” outcome} once the net effects (the parameter of interest) of exposure are removed. It is the variation in the observed outcome that is not due to exposure. Theψparameter corre- sponds to this net effect. Once we have removed the variation in the outcome not due to exposure, then the residual outcome should be independent with an increment of exposure (Xk). We might expect that the residual outcome would vary between indi-

viduals with different employment and exposure histories, so we test for this indepen- dence within strata of these variables.

The formulation of a SNAFT model described in this section assumes that all outcomes are observed. This model requires modification when when some outcomes are unobserved due to censoring. A description of the g-estimation of a SNAFT model in the presence of censoring can be found inWitteman et al.(1998), or inJoffe et al.

(2012) (the latter of which is more technical).

The positivity assumption for g-estimation As discussed in Appendix C.3, the positivity assumption requires that we observe all levels of the exposure in all levels of the

covariates. Using the estimating equation 1.10 as an example, note that occupational studies are sensitive to this assumption. If exposure cannot occur off of work, then the probability of exposure whenLk =0 (currently off work, a component ofL¯k ) will

always be zero. In such a case, the coefficientsβ₁,β₂andθ are not estimable in a sat- urated model.

G-estimation can overcome such violations of the positivity assumptions, provided we replace it with another assumption (Joffe et al.(2010)). An example of this assumption was given byChevrier et al.(2012), who used a modified estimating equation of the form l o g i tP r(Xk=1|V,X¯k−1,Te ¯ 0_,_T _>_k_,_L k =1;β,θ) =β0k+Vβ1+X¯k−1β2+Te ¯ 0_θ

Where the estimating equation is only used on the person-years of active employment in the data (Lk =1). Using occupational data, we can estimate all coefficients

in this model because there is no longer a violation of positivity. This amounts to a relaxation of the assumption of no unmeasured confounding (see appendix C.3) so that it only applies to the person time at work. Joffe et al.(2010) refer to this as a se- lective ignorability assumption and offer several examples in which, even though the assumption of no unmeasured confounding does not hold for all data, information from a subset in which it does hold (e.g. subsets with better exposure measurement or no missing covariates) can be leveraged to estimate causal effects with less bias than would be possible using the full data.

An alternative to the approach ofChevrier et al.(2012) is to fit the model in equation 1.10 using an unsaturated model. In that case, empty strata will be smoothed over. However, in occupational data, the approach ofChevrier et al. (2012) is likely more defensible, since the smoothing done by the model would amount to allowing some individuals to be exposed off work. The smoothing approach is likely a good choice

when nonpositivity may happen because of sparse data, such as in analyses of data with long follow-up and many potential confounders.

The validity of this approach relies on specification of the structural nested model and the model proposed for the estimating equation. This is fewer models than are required by the g-formula, but more than a standard regression approach. Most examples of structural nested models in the literature are relatively simple models with dichotomous exposures (Joffe et al.(1998);Hernán et al.(2005);Chevrier et al.(2012);

Neophytou et al.(2014)) and there are relatively few examples of fitting SNAFT models with quantitative exposures (Joffe et al.(2012);Naimi et al.(2014a)). Even in simulated data, SNAFT models may not perform well in reasonably sized data sets (Young et al.

(2010)), andJoffe et al.(2012) noted difficulties with g-estimation when trying to estimate multiple SNAFT model parameters. However, these models may complement standard regression models, and they appear promising for wider use in occupational studies to control healthy worker survivor bias (Joffe(2012)).

In document Keil_unc_0153D_14884.pdf (Page 46-50)