The adjustment formula

In document Flexible causal mediation analysis using natural effect models (Page 39-44)

The previous example illustrates that, in some cases, the identification result for P(Y|do(A = a)) obtained via the truncated factorization formula (in expression (2.5)) may be simplified to expression (2.7).

2.4.1 Conditional ignorability

This result can, in fact, be shown to naturally relate to a sufficient condition for identification of causal effects defined in the counterfactual outcomes framework, i.e. that of conditional ignorability. This assumption, denoted as a conditional independence statement involving counterfactual outcomes



A|C, for all a (2.8)

states that the counterfactual outcome Y(a) that – possibly contrary to the fact – would have been observed under intervention that sets A = a, does not depend on the actual level A within strata of a set of covariates C. Assumption (2.8) has also been named the assumption of no omitted confounders or no unmeasured confounding, to capture the more intuitive notion that C constitutes a sufficient set to adjust for potential confounding of the relation between A and Y.

When combined with a consistency assumption, which states that Y = Y(a)if A=a, conditional ignorability (2.8) allows the counterfactual distri- bution P(Y(a))– which essentially corresponds to P(Y|do(A=a))– to be expressed by the adjustment formula (2.7) as follows:

P(Y(a)) =

c P(Y(a)|C=c)P(C =c) =

c P(Y(a)|A=a, C =c)P(C =c) =

c P(Y|A=a, C =c)P(C =c).

2.4.2 The adjustment criterion


cation of P(Y|do(A =a))by the adjustment formula (2.7); a criterion that, in other words, permits to find all possible adjustment sets C that satisfy conditional ignorability (2.8). This adjustment criterion has been shown to generalize and subsume Pearl (1995a)’s back-door criterion.4

In order to provide a more precise and formal definition of this criterion, especially in the case where A may be a joint or sequential intervention, as in the examples discussed below, we will need to introduce the following terminology.

Definition 2.4.1. Proper causal path (Shpitser et al., 2010) Let X, Y be sets of

nodes. A directed path from a node in A X to a node in Y is called proper causal with respect to X if it does not intersect X except at A.

More generally, a path from X to Y is called proper if only its first node is in X (Perkovi´c et al., 2015). For example, suppose X ={A, M}in the graphs in Figure 2.3. In the graph in panel (A), there are two proper causal paths from X to Y, i.e. A →Y and M →Y. Note that A → M→Y is not proper causal with respect to X because it intersects X at M. In the graph in panel (B), there is an additional proper causal path from X to Y, i.e. A →L →Y.

Definition 2.4.2. Adjustment criterion (Shpitser et al., 2010) Z satisfies the

adjustment criterion relative to(X, Y)in the original graphG if

(i) No element in Z is a descendant inGXof any W 6∈ X which lies on a proper causal path from X to Y, and

(ii) All proper5non-causal paths inG from X to Y are blocked by Z.

The only non-causal path from{A, M}to Y in the graph in Figure 2.3A is M ← C →Y. This path can be blocked by C, which is not on a proper causal path from {A, M} to Y, nor is it a descendant of a node on such a proper causal path. So C satisfies the adjustment criterion relative to

4For this reason, the back-door criterion is not further discussed.

5Shpitser et al. (2010)’s original formulation claimed that all non-causal paths inGfrom

X to Y should be blocked by Z. However, in accordance with Perkovi´c et al. (2015), we provide a slight reformulation in which this is only required for all proper non-causal paths.


(A) A M Y C (B) A M Y L

Figure 2.3:Two mediation graphs with different proper causal paths from{A, M}

to Y.

({A, M}, Y)in this graph, such that P(Y|do(A = a, M =m))is identified by

P(Y|do(A =a, M=m)) =

c P(Y|A=a, M =m, C =c)P(C=c). Likewise, in the graph in Figure 2.3B, L blocks the only non-causal path from{A, M}to Y, i.e. M L Y. However, L lies on the proper causal path A LY inGAM and thus does not satisfy the adjustment criterion relative to({A, M}, Y)in this graph. Nonetheless, P(Y|do(A =a, M=m)) can be computed from the observed data by expression (2.5), which yields

P(Y|do(A=a, M =m)) =


P(Y|A =a, M=m, L =l)P(L =l|A =a).

Intuitively, these examples illustrate that the first part of the adjustment criterion keeps us from adjusting for mediators, whereas the second part ensures that we adjust for common causes.

2.4.3 Flexible estimation strategies for the adjustment formula

Most often interest lies in comparing some mean outcome of interest under different hypothetical interventions in the population. That is, E(Y|do(A= a))is the causal quantity of interest, rather than the interventional distri- bution P(Y|do(A=a))per se. Estimating this quantity from observed data via direct application of the adjustment formula may be cumbersome, as it requires modeling P(C=c). This can be challenging, especially when C contains continuous covariates and/or high-dimensional and data is sparse.


Below we show that there are two ways of rewriting the adjustment for- mula that give rise to estimators that may considerably reduce modeling demands in the sense that neither require modeling P(C =c).

Inverse probability weighting

The first estimator arises from rewriting the adjustment formula as follows E(Y|do(A =a)) =

y,cy·P(Y =y|A= a, C=c)P(C=c) =

y,c y·P(Y =y, A=a, C =c) P(A =a|C =c) =

y,c y·P(Y =y, C=c|A =a)P(A=a) P(A= a|C=c) =E  YI(A =a) P(A =a|C)  .

The corresponding sample estimator


n i=1

YiI(Ai =a) ˆP(Ai =a|Ci)

corresponds to a weighted mean outcome, where each individual exposed at level A= a is weighted by the inverse of its propensity of being exposed at that exposure level given baseline covariates C, ˆP(A = a|C). Inverse weighting can be thought of aiming to construct a pseudo-population in which confounding by C is eliminated (i.e. mimicking a randomized trial). This weighted-based estimator thus focuses solely on modeling the relation between A and C as it only requires a propensity score model for P(A|C).


The second estimator results from simply applying the law of iterated expectations, so that one can average over the empirical distribution of C in


the observed data, as follows: E(Y|do(A=a)) =

c E(Y|A= a, C=c)P(C=c) =E[E(Y|A=a, C)|A=a].

The resulting expression gives rise to an imputation-based estimator


n i=1

ˆE(Yi|Ai =a, Ci)

that requires imputing each individual’s outcome under observed levels of the covariate set C but a (possibly) counterfactual exposure level a. E(Y|do(A = a)) can then be estimated by simply calculating the mean of these imputed outcomes. This estimator thus focuses on modeling the relation between Y and C within strata of A as it only requires an imputation model for the mean outcome E(Y|A, C).

Marginal structural models

E(Y|do(A =a))or E(Y(a))can be parameterized using so-called marginal structural models (Robins, 1999; Robins et al., 2000). The parameters of such models correspond to interventional contrasts of interest. For instance, in the marginal structural model

E(Y(a)) =β0+β1a, (2.9)

β1 captures the average causal effect corresponding to a change in the exposure from A=0 to A=a, i.e. E(Y(a)−Y(0)).

Model (2.9) could be considered a special case of a wider class of gener- alized linear marginal structural models

E(Y(a)) = g−1{

β>W(a)} (2.10)

with W(a) a known vector with components that may depend on a. W may be specified so as to accommodate non-linearities in the case of a


continuous exposure. β is an unknown parameter vector and g(·)a known link function, the choice of which permits some flexibility as to the scale on which the causal effect of interest is desired to be expressed.

The marginal structural model framework provides a natural environ- ment for implementing the aforementioned estimators. That is, marginal structural models are traditionally fitted by weighted regression models, in which the weights correspond to the inverse probability weights discussed in section 2.4.3 (Robins et al., 2000). Alternatively, one may regress imputed mean outcomes on the exposure (Snowden et al., 2011). The latter approach is, however, computationally more intensive, as it requires replicating the original data along multiple values of the exposure and imputing outcomes for each individual under each of these exposure levels.

In chapter 3, similar estimators will be developed for estimating natu- ral direct and indirect effects in a mediation context. Similarly, marginal structural models will be generalized to parameterize mean nested coun- terfactuals E(Y(a, M(a0))). The motivation for these extensions follows

from the fact that the adjustment criterion can be generalized to covariate sets that enable identifying natural direct and indirect effects by a general- ized adjustment formula for mediation analysis (Shpitser and VanderWeele, 2011).

In document Flexible causal mediation analysis using natural effect models (Page 39-44)