Sensitivity analysis: exposure model used for g-estimation

CHAPTER IV. CONCLUSION

A.4 Sensitivity analyses for structural nested models

A.4.1 Sensitivity analysis: exposure model used for g-estimation

A.4.1.1 Methods

Alternative exposure model To assess the sensitivity of the TR to parametric as- sumptions of the exposure model, we fit several SNAFT models under different as- sumptions about the exposure. Namely we fit the following exposure models

• Zero-inflated log-linear model (as reported in the text)

• Zero-inflated log-linear model with a log transform on time variables

• Zero-inflated linear model for exposure

• Zero-inflated log normal model for exposure, as inLi et al.(2011)

• Zero-inflated three parameter Gamma model for exposure

• Zero-inflated multinomial model (cumulative logit) for exposure deciles, as in

Kelley and Anderson(2008)

• Generalized additive model for exposure

We fit the adjusted exposure model as in the text along with an “exposure adjusted only” SNAFT model in which we removed employment duration from the exposure model. The latter model provides partial adjustment for healthy worker survivor bias and may be useful for considering model fit.

A.4.1.2 Results

Similar to the model in the main text, the “zero-inflated” models are simply models with two-part likelihoods that factorize into a binary (logistic) model for any exposure

(X >0) and a separate Log-normal or Gamma model fit among the exposed. These have also been referred to as generalized linear models with composite links and ex- ploded likelihoods (Rabe-Hesketh and Skrondal(2007)). The only models for which our SNAFT model converged to an estimate were the Zero-inflated log-linear model and the Zero-inflated linear model (Table A.4). The fully adjusted linear models yielded smaller time ratios and smaller apparent differences between exposure adjusted and fully adjusted models.

Table A.4: Comparing alternative exposure models for SNAFT model for the radon- lung cancer dose-response

95% CI

Exposure model Adjustment set Time ratio U L AIC*

Zero-inflated log-linear** Adjusted 1.166 1.152 1.174 -313202 Exposure only 1.088 1.084 1.090 -305054

Zero-inflated log-linear Adjusted 1.167 1.152 1.172 -313240 (log-time in model) Exposure only 1.091 1.086 1.100 -304848

Zero inflated linear Adjusted 1.057 1.047 1.075 -309094 Exposure only 1.048 1.032 1.059 -305354

*AIC=-2*(log-likelihood - p) where p is total number of parameters estimated in the model, estimation algorithms did not converge for zero-inflated log-normal, zero- inflated gamma, or a generalized additive model, so these models are not included **Also reported in the main text

A.4.1.3 Discussion

Because our SNAFT model model requires a model for the exposure, a model that does not accurately characterize the patterns of exposure in the CPUM data may result in bias. To see why, note that the independence function shown in 8 tests the independence between exposure and the potential outcome within strata of prior confounders. Another way to conceptualize this is to think of testing the independence of the “resid-

ual” outcome and the “residual” exposure, where the residual outcome is the part of the outcome variable that remains after subtracting (or dividing) out the part of the outcome due to exposure effects and the residual exposure is the part of the exposure variable that remains after subtracting out the part of the exposure that is predicted by prior covariates. Thus, in order for the SNAFT model to accurately characterize the exposure effects, the residual outcome and residual exposure will appear to be independent for the residual outcome at the true time ratio. However, if the exposure model is incorrect, the “residual” exposure will characterize not only the novel part of the exposure (the part of not predicted by prior covariates), but it will also reflect a poor cor- respondence between the model and the population function from which exposures arise.

The robustness of SNAFT model results to misspecification of a parametric exposure model is of some interest. While it is worth exploring alternative parametric or semi-parametric models for the exposure, in the CPUM there are limits to the useful- ness of such an approach. Namely, exposure data are derived from records that record the date of reaching a certain cumulative exposure threshold. We created monthly exposure data using these records using a linear interpolation of the exposure rate, such that the exposure rate was assumed to be constant within each interval of time between exposure thresholds. As a consequence, model fit statistics for the exposure model are biased at best and may be misleading if used in a model selection algorithm. For example, a linear model for exposure could appear to provide a good fit to the data, but this may be a strong artifact of the way in which the data were generated. Figure A.3 shows the distribution of intervals between exposure readings in the CPUM. While a majority of the 19,260 records are more frequent than 1 year (median=244 days), only six percent of the records are 31 days long or shorter. Thus, there is an inherent tension between capturing the variation in exposure over short intervals (to accurately capture the time-varying nature of the data) and bias in parametric modeling of exposure due

to inclusion of large amounts of linearly interpolated data.

Days between exposure readings

Frequency 0 5000 10000 15000 0 5000 10000 15000 as.numeric(diff[diff < 1000]) Frequency 0 200 600 1000 0 1000 2500

Figure A.3: Frequency of interval (days) between exposure readings in source data from Colorado Plateau Uranium miners data. Inset is histogram showing conditional distribution of intervals less than 1000 days (the largest bar in the primary figure).

Joffe et al 2012 also noted in an analysis of mortality among hemodialysis patients that some parametric exposure models did not ultimately yield estimates from a SNAFT model. This can occur either due to difficulties with an optimization algorithm or due to more fundamental issues with G-estimation. Namely, G-estimation relies on a series of nested hypothesis tests to establish independence between exposure and the potential outcome under the exposure regime “never exposed”. If, for example the potential outcome is not independent of exposure at any realistic value of the time ratio, then the SNAFT model will not yield a time ratio estimate. Under the fully adjusted, zero- inflated log-linear model from the main text, the G-function (theχ2_{value testing the} null hypothesis of no association between the exposure and the potential outcome under some value for the time ratio) has a clear minimum atψ=0.067 (TR=1.069, figure A4). Contrast this with the SNAFT G-function under a zero-inflated log-normal model for exposure, in which theχ2_{value never drops below 130. While the G-function clearly} trends towards a minimum, the strong association between the potential outcome and

exposure suggests that the zero-inflated log-linear model for exposure does not fit the data well. Consequently, we do not report results for SNAFT models in which the exposure model was a zero-inflated log-normal model, the zero-inflated gamma model, nor a generalized additive model.

Since, to be valid, the exposure model used in G-estimation needs only to provide a valid hypothesis test of the association between the potential outcome and exposure, the linear model may seem like a useful model since it is known to be relatively robust to misspecification. However, while the results from Table A.2 show that the using a linear model for log-linearly distributed exposure data results in a relatively unbiased SNAFT parameter, the G-function may also support several different estimates ofψ. As shown in Figure A.4, which is typical of the simulation results for linear models, the G-function reaches a minimum aroundψ=0.075, but the confidence interval set (in which values of theχ2_{statistic is below 3.8) includes values all the way to the bottom} of the grid search atψ = 0.03. The G-functions fit to the CPUM data did not show this same characteristic. While existing algorithms for simulating from a SNAFT model (Young et al.(2010)) are useful, there is more work needed to develop more robust ways of simulating more realistic scenarios.

A.4.2 Sensitivity analysis: structural nested accelerated failure time model

In document Keil_unc_0153D_14884.pdf (Page 123-127)