Estimation approach - Flexible causal mediation analysis using natural effect models

Vansteelandt et al. (2012b) proposed an imputation procedure for fitting natural effect models for single mediators (also see Steen et al. (2016b); Loeys et al. (2013)). Below we describe how this procedure can be extended to recover all possible three-way decompositions in Table 5.2 in settings with a binary exposure (coded 0/1) and two sequential mediators. We first focus on estimation of component effects as defined within strata of C, a covariate set assumed to be sufficient for conditions (i’)-(vi’) to be met, and next describe how population-average analogs can be obtained. In Technical appendix 5.A.3 we provide some intuition as to why this procedure works and how it relates to Monte Carlo procedures based on generalizations of Pearl (2001, 2012)’s mediation formula (Albert and Nelson, 2011; Daniel et al., 2015).

1. Fit a suitable model for the probability (density) of either

(i) the first mediator conditional on exposure and covariate set C, for instance, a logistic regression model for binary M1

logitP(M1 =1|A, C) = β0+β1A+β>₂C, (5.11)

5

tor and covariate set C, for instance, a linear regression model for normally distributed M2with constant variance σ2

f(M2|A, M1, C) = N

γ0+γ1A+γ2M1+γ3AM1+γ>₄C, σ2. (5.12) 2. Fit a suitable model for the outcome mean conditional on exposure, both mediators and covariate set C, for instance, a logistic regression model for binary outcome Y

logitP(Y=1_|A, M1, M2, C)

=δ0+δ1A+δ2M1+δ3M2+δ4AM1+δ5AM2

+δ6M1M2+δ7AM1M2+δ₈>C (5.13)

3. Construct an extended data set by replicating the observed data set 4 times. A similar step has previously been described by Lange et al. (2014) and is best understood in terms of sequential duplication. For the first duplication, add three auxiliary variables a, a0_{and a}00_{. Let a}

take on the value of the observed exposure A_ifor the first replication and of the counterfactual exposure 1−Ai for the second replication (for each individual i). Let both a0_{and a}00_{take on the observed expo-}

sure level for both replications. Next, duplicate the resulting extended data once again, now letting a0_(a00_{) take on counterfactual exposure}

level 1₋A_i if model (5.11) ((5.12)) is selected as working model (as illustrated in Tables 5.3 and 5.4, respectively).

4. If model (5.11) is selected, compute weights

W_1i,a0 = ˆP(M1 =M1i|A=a

0_{, C}_i₎

ˆP(M1 =M1i|A=a00, Ci) =

ˆP(M1 =M1i|A=a0, Ci) ˆP(M1 = M1i|A= Ai, Ci), or, if model (5.12) is selected, compute weights

W_2i,a00 = ˆf(M2 = M2i|A= a

00_{, M}_1i_{, C}_i₎

ˆf(M2 = M2i|A= a0, M1i, Ci)

= ˆf(M2= M2i|A =a00, M1i, Ci) ˆf(M2 =M2i|A= Ai, M1i, Ci)

5

for each row in the extended data set.

5. Impute nested counterfactuals Yi(a, M1i(a0), M2i(a00, M1i(a0)))as fitted values ˆE(Y_i_|A= a, M_1i, M_2i, C_i)from outcome model (5.13) in step 2, for each row in the extended data set.

6. Fit a natural effect model of interest for

E{Y(a, M1i(a0), M2i(a00, M1i(a0)))|C}

to the extended data by regressing the imputed outcomes on a, a0_{, a}00

and C, weighting by the weights obtained in step 4.

In contrast to direct application of the generalized mediation formula (Al- bert and Nelson, 2011; Daniel et al., 2015), which relies on a model for the distribution of each of the mediators, our procedure requires only one of these models. This allows investigators to weight by the ratio of densities of the mediator whose corresponding model they believe is less prone to

i Ai a a0 a00 Yi(a·a0·a00) ˆYi,a W1i,a0

1 1 1 1 1 Y1 ˆY1,1 W11,1 1 0 1 1 . ˆY_1,0 W_11,1 1 1 0 1 . ˆY1,1 W11,0 1 0 0 1 . ˆY_1,0 W_11,0 2 0 0 0 0 Y2 ˆY2,0 W12,0 0 1 0 0 . ˆY_2,1 W_12,0 0 0 1 0 . ˆY2,0 W12,1 0 1 1 0 . ˆY_2,1 W_12,1 ... ... ... ... ... ... ... ...

Table 5.3:Data extension for working models E(Y|A, M1, M2, C)and P(M1|A, C).

We use Y(a·a0_·_a00₎_{and ˆY}_i,a_{as shorthand notation for Y}₍_{a, M}₁₍_a0₎_{, M}₂₍_a00_{, M}₁₍_a0₎₎₎

and ˆE(Yi|Ai = a, M1i, M2i, Ci), respectively. Imputed nested counterfactuals ˆYi,a

for which a0 ₆₌_a00_{(in dark gray) need to be weighted by W}_1i,a₀ ₌ ˆP₍_M₁ ₌ _M_1i_|_A₌

5

i Ai a a0 a00 Yi(a·a0·a00) ˆYi,a W2i,a00

1 1 1 1 1 Y1 ˆY1,1 W21,1 1 0 1 1 . ˆY_1,0 W_21,1 1 1 1 0 . ˆY1,1 W21,0 1 0 1 0 . ˆY_1,0 W_21,0 2 0 0 0 0 Y2 ˆY2,0 W22,0 0 1 0 0 . ˆY_2,1 W22,0 0 0 0 1 . ˆY2,0 W22,1 0 1 0 1 . ˆY_2,1 W_22,1 ... ... ... ... ... ... ... ...

Table 5.4: Data extension for working models E(Y|A, M1, M2, C) and

f(M2|A, M1, C). We use Y(a · a0 · a00) and ˆYi,a as shorthand notation for

Y(a, M1(a0), M2(a00, M1(a0)))and ˆE(Yi|Ai =a, M1i, M2i, Ci), respectively. Imputed

nested counterfactuals ˆYi,afor which a0 6=a00(in dark gray) need to be weighted by

W2i,a00 = ˆP(M₂ = M2i|A= a00, M_1i, C_i)/ ˆP(M₂ = M_2i|A= a0, M_1i, C_i).

misspecification. If, for instance, M1 is binary and M2 continuous, as in the examples given for models (5.11) and (5.12), weighting for M1would be most appropriate, since it allows analysts to refrain from modeling the (conditional) relationship between the mediators and making distributional assumptions.

The natural effect model from step 6 can be fitted to the weighted impu- tations to obtain estimates for stratum-specific component effects. If both exposure A and confounders C are discrete, saturated models can be fitted as long as C is not high-dimensional. In all other cases, our approach de- mands model restrictions. This improves interpretability of the results, but also increases the risk of misspecification of the natural effect model which may, in turn, lead to biased estimation of the component effects. However, as long as the structure of the imputation model is chosen sufficiently rich so as to minimize the risk of it being misspecified, results from an overly restrictive natural effect model may still be viewed as a useful summary (Vansteelandt et al., 2012b).

5

Component effects within strata of C∗_{, a subset of C, can be obtained}

by fitting a natural effect model for E{Y(a, M1(a0), M2(a00, M1(a0)))|C∗} conditional on a, a0_{, a}00 _{and C}∗ _{upon multiplying the weights from step 4}

by ˆP(A= Ai|C∗_i)/ ˆP(A= Ai|Ci). If C∗ is empty, the corresponding natural effect model encodes population-average rather than stratum-specific effects and the numerator can simply be replaced by 1. Inverse weighting then enables transporting results to the general population as it accounts for the possibly selective nature of subjects with observed exposure A= A_i.

Finally, standard errors and confidence intervals for this imputation estimator can be obtained using a bootstrap procedure (including steps 1-6). Bootstrapping is preferred over use of default standard errors for parameter estimates of natural effect models returned by statistical software as the latter fail to account for uncertainty due to estimation of the working models.

Technical appendix 5.A.4 provides a detailed description on how to adapt the above procedure to continuous exposures (building on Vanstee- landt et al. (2012b)), and to settings without interactions between component effects (building on an estimation procedure similar to the one described in VanderWeele et al. (2014)). It also explains how to implement our procedure and obtain bootstrap-based standard errors and confidence intervals in R.

In the next section, we reassess the mediating mechanisms from the empirical example introduced earlier by applying our suggested procedure to obtain a three-way decomposition of the total effect of dampness or mold exposure (A) on the presence of depressive symptoms (Y).

In document Flexible causal mediation analysis using natural effect models (Page 171-175)