The previous example illustrates that, in some cases, the identification result for P(Y|do(A = a)) obtained via the truncated factorization formula (in expression (2.5)) may be simplified to expression (2.7).

**2.4.1** **Conditional ignorability**

This result can, in fact, be shown to naturally relate to a sufficient condition for identification of causal effects defined in the counterfactual outcomes framework, i.e. that of conditional ignorability. This assumption, denoted as a conditional independence statement involving counterfactual outcomes

Y(a)

### ⊥⊥

A|C, for all a (2.8)states that the counterfactual outcome Y(a) that – possibly contrary to the fact – would have been observed under intervention that sets A = a, does not depend on the actual level A within strata of a set of covariates C. Assumption (2.8) has also been named the assumption of no omitted confounders or no unmeasured confounding, to capture the more intuitive notion that C constitutes a sufficient set to adjust for potential confounding of the relation between A and Y.

When combined with a consistency assumption, which states that Y =
Y(a)if A=a, conditional ignorability (2.8) allows the counterfactual distri-
bution P(Y(a))– which essentially corresponds to P(Y_{|}do(A=a))– to be
expressed by the adjustment formula (2.7) as follows:

P(Y(a)) =

### ∑

c P(Y(a)|C=c)P(C =c) =### ∑

c P(Y(a)|A=a, C =c)P(C =c) =### ∑

c P(Y|A=a, C =c)P(C =c).**2.4.2** **The adjustment criterion**

### 2

cation of P(Y_{|}do(A =a))by the adjustment formula (2.7); a criterion that,
in other words, permits to find all possible adjustment sets C that satisfy
conditional ignorability (2.8). This adjustment criterion has been shown to
generalize and subsume Pearl (1995a)’s back-door criterion.4

In order to provide a more precise and formal definition of this criterion, especially in the case where A may be a joint or sequential intervention, as in the examples discussed below, we will need to introduce the following terminology.

**Definition 2.4.1.** Proper causal path (Shpitser et al., 2010) Let X, Y be sets of

nodes. A directed path from a node in A_{∈} X to a node in Y is called proper causal
with respect to X if it does not intersect X except at A.

More generally, a path from X to Y is called proper if only its first node is
in X (Perkovi´c et al., 2015). For example, suppose X =_{{}A, M_{}}in the graphs
in Figure 2.3. In the graph in panel (A), there are two proper causal paths
from X to Y, i.e. A →Y and M →Y. Note that A → M→Y is not proper
causal with respect to X because it intersects X at M. In the graph in panel
(B), there is an additional proper causal path from X to Y, i.e. A →L →Y.

**Definition 2.4.2.** Adjustment criterion (Shpitser et al., 2010) Z satisfies the

adjustment criterion relative to(X, Y)in the original graphG if

(i) No element in Z is a descendant inG_{X}of any W 6∈ X which lies on a proper
causal path from X to Y, and

(ii) All proper5_{non-causal paths in}_{G} _{from X to Y are blocked by Z.}

The only non-causal path from{A, M}to Y in the graph in Figure 2.3A is M ← C →Y. This path can be blocked by C, which is not on a proper causal path from {A, M} to Y, nor is it a descendant of a node on such a proper causal path. So C satisfies the adjustment criterion relative to

4_{For this reason, the back-door criterion is not further discussed.}

5_{Shpitser et al. (2010)’s original formulation claimed that all non-causal paths in}_{G}_{from}

X to Y should be blocked by Z. However, in accordance with Perkovi´c et al. (2015), we provide a slight reformulation in which this is only required for all proper non-causal paths.

### 2

(A) A M Y C (B) A M Y L**Figure 2.3:**Two mediation graphs with different proper causal paths from{A, M}

to Y.

({A, M}, Y)in this graph, such that P(Y|do(A = a, M =m))is identified by

P(Y_{|}do(A =a, M=m)) =

### ∑

c P(Y|A=a, M =m, C =c)P(C=c).
Likewise, in the graph in Figure 2.3B, L blocks the only non-causal path
from_{{}A, M_{}}to Y, i.e. M _{←} L _{→}Y. However, L lies on the proper causal
path A_{→} L_{→}Y in_{G}_{AM} and thus does not satisfy the adjustment criterion
relative to({A, M}, Y)in this graph. Nonetheless, P(Y|do(A =a, M=m))
can be computed from the observed data by expression (2.5), which yields

P(Y_{|}do(A=a, M =m)) =

### ∑

lP(Y_{|}A =a, M=m, L =l)P(L =l_{|}A =a).

Intuitively, these examples illustrate that the first part of the adjustment criterion keeps us from adjusting for mediators, whereas the second part ensures that we adjust for common causes.

**2.4.3** **Flexible estimation strategies for the adjustment formula**

Most often interest lies in comparing some mean outcome of interest under
different hypothetical interventions in the population. That is, E(Y_{|}do(A=
a))is the causal quantity of interest, rather than the interventional distri-
bution P(Y_{|}do(A=a))per se. Estimating this quantity from observed data
via direct application of the adjustment formula may be cumbersome, as
it requires modeling P(C=c). This can be challenging, especially when C
contains continuous covariates and/or high-dimensional and data is sparse.

### 2

Below we show that there are two ways of rewriting the adjustment for- mula that give rise to estimators that may considerably reduce modeling demands in the sense that neither require modeling P(C =c).

**Inverse probability weighting**

The first estimator arises from rewriting the adjustment formula as follows
E(Y_{|}do(A =a)) =

### ∑

y,cy·P(Y =y|A= a, C=c)P(C=c) =

### ∑

y,c y_{·}P(Y =y, A=a, C =c) P(A =a|C =c) =

### ∑

y,c y·P(Y =y, C=c|A =a)P(A=a) P(A= a|C=c) =E YI(A =a) P(A =a_{|}C) .

The corresponding sample estimator

n−1

_{∑}

n
i=1
Y_{i}I(A_{i} =a)
ˆP(A_{i} =a_{|}C_{i})

corresponds to a weighted mean outcome, where each individual exposed
at level A= a is weighted by the inverse of its propensity of being exposed
at that exposure level given baseline covariates C, ˆP(A = a_{|}C). Inverse
weighting can be thought of aiming to construct a pseudo-population in
which confounding by C is eliminated (i.e. mimicking a randomized trial).
This weighted-based estimator thus focuses solely on modeling the relation
between A and C as it only requires a propensity score model for P(A|C).

**Imputation**

The second estimator results from simply applying the law of iterated expectations, so that one can average over the empirical distribution of C in

### 2

the observed data, as follows:
E(Y_{|}do(A=a)) =

### ∑

c E(Y|A= a, C=c)P(C=c) =E[E(Y|A=a, C)|A=a].

The resulting expression gives rise to an imputation-based estimator

n−1

_{∑}

n
i=1
ˆE(Y_{i}_{|}A_{i} =a, C_{i})

that requires imputing each individual’s outcome under observed levels of the covariate set C but a (possibly) counterfactual exposure level a. E(Y|do(A = a)) can then be estimated by simply calculating the mean of these imputed outcomes. This estimator thus focuses on modeling the relation between Y and C within strata of A as it only requires an imputation model for the mean outcome E(Y|A, C).

**Marginal structural models**

E(Y_{|}do(A =a))or E(Y(a))can be parameterized using so-called marginal
structural models (Robins, 1999; Robins et al., 2000). The parameters of such
models correspond to interventional contrasts of interest. For instance, in
the marginal structural model

E(Y(a)) =*β*0+*β*1a, (2.9)

*β*1 captures the average causal effect corresponding to a change in the
exposure from A=0 to A=a, i.e. E(Y(a)−Y(0)).

Model (2.9) could be considered a special case of a wider class of gener- alized linear marginal structural models

E(Y(a)) = g−1_{{}

*β*>W(a)} (2.10)

with W(a) a known vector with components that may depend on a. W may be specified so as to accommodate non-linearities in the case of a

### 2

*continuous exposure. β is an unknown parameter vector and g(*_{·)}a known
link function, the choice of which permits some flexibility as to the scale on
which the causal effect of interest is desired to be expressed.

The marginal structural model framework provides a natural environ- ment for implementing the aforementioned estimators. That is, marginal structural models are traditionally fitted by weighted regression models, in which the weights correspond to the inverse probability weights discussed in section 2.4.3 (Robins et al., 2000). Alternatively, one may regress imputed mean outcomes on the exposure (Snowden et al., 2011). The latter approach is, however, computationally more intensive, as it requires replicating the original data along multiple values of the exposure and imputing outcomes for each individual under each of these exposure levels.

In chapter 3, similar estimators will be developed for estimating natu-
ral direct and indirect effects in a mediation context. Similarly, marginal
structural models will be generalized to parameterize mean nested coun-
terfactuals E(Y(a, M(a0_{))). The motivation for these extensions follows}

from the fact that the adjustment criterion can be generalized to covariate sets that enable identifying natural direct and indirect effects by a general- ized adjustment formula for mediation analysis (Shpitser and VanderWeele, 2011).