**2.5 Identifiability in the presence of hidden variables**

**2.5.5 Conditional causal effects**

**An extension of the ID algorithm for identifying conditional causal ef-**
fects in subsets of the population defined by strata of a covariate set C –
i.e. causal queries of the form P(Y_{|}do(A = a), C) – was later developed
**by Shpitser and Pearl (2006a) and is referred to as the IDC algorithm,**
as shown in Figure 2.8. The logic behind this algorithm is to re-express
P(Y|do(A = a), C)in terms of unconditional interventional distributions,
**such that further identification can be obtained using the ID algorithm. This**

### 2

INPUT: disjoint sets A, Y, Z_{⊂}V.

OUTPUT: Expression for P(Y|do(A=a), Z)in terms of P or FAIL.
1. If there exists a variable W _{∈} Z such that Y

_{⊥⊥}

W_{|}A, Z

_{\ {}W

_{}}in the

subgraphG_{AW}**, return IDC(Y, A**∪ {W}, Z\ {W}).
2. Else let P0_{= ID(Y}_{∪}_{Z, A)}_{and return} P0

∑yP0.

**Figure 2.8:Algorithm IDC**(Y, A, Z)(Shpitser and Pearl, 2006a)

can be achieved by relying on rule 2 of Pearl (1995a)’s do-calculus, which allows interventions do(W =w)and observations W =w to be exchanged if the conditional independence Y

_{⊥⊥}

W_{|}A, Z

_{\ {}W

_{}}holds in the subgraph G

_{AW}, obtained by removing from the original graph G all edges pointing to nodes in A and all edges emanating from nodes in W. The idea is to iteratively apply this rule to find a unique maximal set W

_{⊆}C that enables expressing P(Y|do(A = a), C) as P(Y|do(A = a, W), C\W). If W = C, P(Y|do(A =a), C)then simply equals P(Y|do(A =a, C)). However, often one may not get rid of all conditioning variables, that is W ⊂C. In this case, P(Y|do(A =a), C)equals

P(Y, C\W|do(A=a, W)) P(C\W|do(A=a, W)) =

P(Y, C\W|do(A =a, W))
∑yP(Y =y, C\W|do(A= a, W)).
such that its identification ultimately depends on identification of the uncon-
ditional joint interventional distribution P(Y, C\W|do(A= a, W)), which
**can be assessed by the ID algorithm. Suppose that interest lies in the**
effect of A on Y conditional on_{{}C1, C2, C3}, i.e. P(Y|do(A=a), C1, C2, C3),
in graph_{G} in Figure 2.7 (or Figure 2.10).13 _{It follows from the subgraphs}

13_{It is worth noting here that if, contrary to the fact, P}_{(}_{Y}_{|}_{do}_{(}_{A}_{=}_{a}_{))}_{were identified by}

the adjustment formula upon adjusting for{C1, C2, C3}(refer to section 2.5.4), this would

have necessarily implied that P(Y|do(A=a), C1, C2, C3)were likewise identified. This can

be seen upon noting that identification of both interventional distributions can be obtained under conditional ignorability Y⊥⊥A|C1, C2, C3, which would be implied if{C1, C2, C3}

would satisfy the adjustment criterion with respect to(A, Y). As opposed to more general
**identification strategies, such as the ID algorithm, identification by the adjustment criterion**
can thus be conceived as being agnostic as to whether the interventional distribution is

### 2

_{(Y}

_{⊥⊥}

_{C}

_{1}

_{|}

_{A, C}

_{2}

_{, C}

_{3}

_{)}

_{G}AC1 A M Y C1 C2 C3 U1 U2 (Y

### ⊥⊥

6 C2|A, C1, C3)GAC2 A M Y C1 C2 C3 U1 U2 (Y### ⊥⊥

6 C3|A, C1, C2)GAC3 A M Y C1 C2 C3 U1 U2**Figure 2.9:** Different subgraphs of_{G} Figure 2.7 that aid in finding a maximal set
W ⊆**C through recursive applications of the first step of the IDC algorithm.**

### 2

G A M Y C1 C2 C3 U1 U2 U3 GV\{A,C1} =GD0 M Y C2 C3 U1 U2**Figure 2.10:** A somewhat more involved graph G and a subgraph required for
**application of the second step of the IDC algorithm.**

G_{AC}_{1},G_{AC}_{2} andG_{AC}_{3} in Figure 2.9 that the unique maximal set W ∈ C =
{C1, C2, C3}such that P(Y|do(A =a), C) = P(Y|do(A= a, W), C\W)con-
tains only C1**. That is, the first iteration of the IDC algorithm (Figure 2.8) first**
picks W = C1**, then reinvokes the algorithm as IDC(Y,**{A, C1},{C2, C3})
which assesses whether Y

_{⊥⊥}

C2|A, C1, C3in the subgraphG_{A,C}

_{1}

_{C}

_{2}or Y

### ⊥⊥

C3|A, C1, C2in the subgraphG_{A,C}

_{1}

_{C}

_{3}. However, note that, since no edges are entering C1, G

_{A,C}

_{1}

_{C}

_{2}andG

_{A,C}

_{1}

_{C}

_{3}correspond toG

_{AC}

_{2}andG

_{AC}

_{3}in Figure 2.9, respectively. Since we already have that the above conditional independen- cies do not hold in the latter subgraphs, we conclude that C1is the unique maximal set such that P(Y|do(A = a), C) = P(Y|do(A = a, W), C\W). Consequently, we have

P(Y|do(A=a), C1, C2, C3) = P(Y, C2, C3|do(A=a, C1)) ∑yP(Y =y, C2, C3|do(A =a, C1)), such that identification of P(Y|do(A =a), C1, C2, C3)depends on identifica- tion of the unconditional joint interventional distribution P(Y, C2, C3|do(A = a, C1)), which can be obtained by the ID algorithm as follows.

All variables in the subgraph GV\{A,C1} (Figure 2.10), are ancestors of

{Y, C2, C3}, such thatGV\{A,C1} = GD0.

14 _{This subgraph}_{G}

D0 contains two

conditional on (a subset of) covariates in the sufficient adjustment set.

14_{In order to avoid confusion, we will denote the set of ancestors of} _{{}_{Y, C}

2, C3} in
GV\{A,C1}by D0and the districts inGD0by D_{i}0.

### 2

districts, i.e. D0

1={C2, C3, M}and D02={Y}, such that P(Y, C2, C3|do(A= a, C1 =c1))can be expressed as

### ∑

c2,c3,m

Q[_{{}C2, C3, M}]Q[{Y}].

It was shown in section 2.5.4 that Q[{Y}] = P(Y|A, M, C3), so we only need to obtain Q[D0

1]from its corresponding district inG, i.e. Q[S2]. For
**this purpose, we need to invoke Identify(D**0

1, S2, Q[S2]). However, because the set of ancestors of D0

1 in the subgraphGS2 (Figure 2.7) coincides with

S2, Q[D01] is not identifiable. Because identification of Q[D10] fails, identi-
fication of P(Y, C2, C3|do(A = a, C1))also fails, which ultimately leads to
the conclusion that the conditional effect P(Y_{|}do(A =a), C1, C2, C3)is not
identifiable from observable data.

However, it can easily be shown that, in contrast, e.g. P(Y_{|}do(A =
a), C1, C3)is identifiable. This can mainly be appreciated upon noting that
by avoiding to condition on collider C2, C3may – in addition to C1– also be
included in the unique maximal subset W such that P(Y|do(A =a), C\C2)
= P(Y|do(A = a, W),(C\C2)\W). As a result, P(Y|do(A = a), C1, C3)
equals P(Y|do(A=a, C1, C3)), which can be shown to be identified via the

**Identifying natural and**

**path-specific effects from**

**observed data**

This chapter is an adapted version of a handbook chapter submitted for peer review in M. Drton, S. Lauritzen, M. Maathuis, M. Wainwright (Eds.), Handbook of Graphical Models. CRC Press.

In this chapter, we will study non-parametric identification of natural direct and indirect effects, and of path-specific effects (Avin et al., 2005) in general. In particular, we revisit earlier identifying assumptions (Pearl, 2001) in the light of a recently proposed graphical identification criterion for path-specific effects (Shpitser, 2013) that extends previous work on complete conditions for non-parametric identification of total treatment effects (Huang and Valtorta, 2006; Shpitser and Pearl, 2006a,b, 2008a; Tian and Pearl, 2002, 2003) – as discussed in chapter 2 – to allow for effect de- composition. Through various worked-out examples, we aim to provide insight into the use of this graphical criterion, as well as into the nature of the assumptions on which mediation analysis relies. Before conclud- ing this chapter by extending notions of natural direct and indirect effects to more generally defined path-specific effects, we highlight that Shpitser

### 3

(2013)’s graphical criterion leads to novel insights that may contribute to a more comprehensive understanding of recent conceptual developments and formulations inspired by the debate about the distinct and controversial na- ture of both definitions and required assumptions of targeted path-specific effects (Robins and Richardson, 2010).

**3.1**

**Cross-world counterfactuals...**

Despite their formal and intuitive appeal, non-parametric identification of
natural effects is subtle and a source of much controversy. The reason is that
the usual consistency assumptions alone – namely that M(a)and Y(a)equal
M and Y, respectively, when A=a, and that Y(a, m)equals Y when A= a
and M =m – do not suffice to link all counterfactual data to observed data.
In particular, nested counterfactual outcomes Y(a, M(a0_{))}_{are unobservable}

when a 6= a0_{. Data, whether experimental or observational, thus never carry}

information about the distribution of these counterfactuals as they imply
a union of two incompatible states a and a0 _{that may only seem to coexist}

‘across multiple worlds’. Mediation analyses based on natural effects are
thus bound to rely on assumptions that cannot be empirically verified or
guaranteed by the study design. Even randomised cross-over trials, where
one would first manipulate A to a0_{to observe M(a}0_{), and then manipulate}

A to a and M to M(a0_{)}_{to finally observe Y(a, M(a}0_{)), would require strong}

assumptions of no period effect and no carry-over effects at the individual level (Imai et al., 2013; Josephy et al., 2015; Robins and Greenland, 1992).

**3.2**

**... require cross-world assumptions**

To develop intuition into non-parametric identification of natural effects – and, by extension, path-specific effects – we will work through a number of simple, but typical examples.

Consider the basic mediation setting depicted in the causal diagram in
Figure 3.1. Identification of natural effects in this setting can be obtained if
we recover the distribution of nested counterfactuals P(Y(a, M(a0_{)) =} _{y).}

### 3

A M

Y

**Figure 3.1:**A simple, typically over-simplistic, mediation graph.

y, M(a0_{) =}_{m)}_{over m. When a}_{6=} _{a}0_{, the observed data carry no information}

about the dependence of Y(a, m)and M(a0_{). This articulates why natural}

effects cannot, in general, be identified from experimental data without further, untestable assumptions.

One such assumption is that of cross-world independence,

Y(a, m)

### ⊥⊥

M(a0_{),}

_{(i)}

which Pearl (2001) claimed to be key to ‘experimental’ identification of natu-
ral effects. Under this assumption, we can factorize P(Y(a, m) = y, M(a0_{) =}

m)as a product of interventional distributions1– each of which is identified from observed data under the assumptions encoded in the causal diagram in Figure 3.1 – as follows

P(Y(a, M(a0_{)) =}_{y) =}

_{∑}

m P(Y(a, m) =y, M(a

0_{) =} _{m)}

=

### ∑

m P(Y(a, m) =y)P(M(a

0_{) =}_{m)}

=

### ∑

m P(Y =y|A =a, M=m)P(M =m|A=a

0_{).}