Conditional causal effects

In document Flexible causal mediation analysis using natural effect models (Page 56-63)

2.5 Identifiability in the presence of hidden variables

2.5.5 Conditional causal effects

An extension of the ID algorithm for identifying conditional causal ef- fects in subsets of the population defined by strata of a covariate set C – i.e. causal queries of the form P(Y|do(A = a), C) – was later developed by Shpitser and Pearl (2006a) and is referred to as the IDC algorithm, as shown in Figure 2.8. The logic behind this algorithm is to re-express P(Y|do(A = a), C)in terms of unconditional interventional distributions, such that further identification can be obtained using the ID algorithm. This


INPUT: disjoint sets A, Y, ZV.

OUTPUT: Expression for P(Y|do(A=a), Z)in terms of P or FAIL. 1. If there exists a variable W Z such that Y


W|A, Z\ {W}in the

subgraphGAW, return IDC(Y, A∪ {W}, Z\ {W}). 2. Else let P0= ID(YZ, A)and return P0


Figure 2.8:Algorithm IDC(Y, A, Z)(Shpitser and Pearl, 2006a)

can be achieved by relying on rule 2 of Pearl (1995a)’s do-calculus, which allows interventions do(W =w)and observations W =w to be exchanged if the conditional independence Y


W|A, Z\ {W}holds in the subgraph GAW, obtained by removing from the original graph G all edges pointing to nodes in A and all edges emanating from nodes in W. The idea is to iteratively apply this rule to find a unique maximal set W C that enables expressing P(Y|do(A = a), C) as P(Y|do(A = a, W), C\W). If W = C, P(Y|do(A =a), C)then simply equals P(Y|do(A =a, C)). However, often one may not get rid of all conditioning variables, that is W ⊂C. In this case, P(Y|do(A =a), C)equals

P(Y, C\W|do(A=a, W)) P(C\W|do(A=a, W)) =

P(Y, C\W|do(A =a, W)) ∑yP(Y =y, C\W|do(A= a, W)). such that its identification ultimately depends on identification of the uncon- ditional joint interventional distribution P(Y, C\W|do(A= a, W)), which can be assessed by the ID algorithm. Suppose that interest lies in the effect of A on Y conditional on{C1, C2, C3}, i.e. P(Y|do(A=a), C1, C2, C3), in graphG in Figure 2.7 (or Figure 2.10).13 It follows from the subgraphs

13It is worth noting here that if, contrary to the fact, P(Y|do(A=a))were identified by

the adjustment formula upon adjusting for{C1, C2, C3}(refer to section 2.5.4), this would

have necessarily implied that P(Y|do(A=a), C1, C2, C3)were likewise identified. This can

be seen upon noting that identification of both interventional distributions can be obtained under conditional ignorability Y⊥⊥A|C1, C2, C3, which would be implied if{C1, C2, C3}

would satisfy the adjustment criterion with respect to(A, Y). As opposed to more general identification strategies, such as the ID algorithm, identification by the adjustment criterion can thus be conceived as being agnostic as to whether the interventional distribution is




C1|A, C2, C3)G AC1 A M Y C1 C2 C3 U1 U2 (Y


6 C2|A, C1, C3)GAC2 A M Y C1 C2 C3 U1 U2 (Y


6 C3|A, C1, C2)GAC3 A M Y C1 C2 C3 U1 U2

Figure 2.9: Different subgraphs ofG Figure 2.7 that aid in finding a maximal set W ⊆C through recursive applications of the first step of the IDC algorithm.


G A M Y C1 C2 C3 U1 U2 U3 GV\{A,C1} =GD0 M Y C2 C3 U1 U2

Figure 2.10: A somewhat more involved graph G and a subgraph required for application of the second step of the IDC algorithm.

GAC1,GAC2 andGAC3 in Figure 2.9 that the unique maximal set W ∈ C = {C1, C2, C3}such that P(Y|do(A =a), C) = P(Y|do(A= a, W), C\W)con- tains only C1. That is, the first iteration of the IDC algorithm (Figure 2.8) first picks W = C1, then reinvokes the algorithm as IDC(Y,{A, C1},{C2, C3}) which assesses whether Y


C2|A, C1, C3in the subgraphGA,C1C2 or Y


C3|A, C1, C2in the subgraphGA,C1C3. However, note that, since no edges are entering C1, GA,C1C2 andGA,C1C3 correspond toGAC2 andGAC3 in Figure 2.9, respectively. Since we already have that the above conditional independen- cies do not hold in the latter subgraphs, we conclude that C1is the unique maximal set such that P(Y|do(A = a), C) = P(Y|do(A = a, W), C\W). Consequently, we have

P(Y|do(A=a), C1, C2, C3) = P(Y, C2, C3|do(A=a, C1)) ∑yP(Y =y, C2, C3|do(A =a, C1)), such that identification of P(Y|do(A =a), C1, C2, C3)depends on identifica- tion of the unconditional joint interventional distribution P(Y, C2, C3|do(A = a, C1)), which can be obtained by the ID algorithm as follows.

All variables in the subgraph GV\{A,C1} (Figure 2.10), are ancestors of

{Y, C2, C3}, such thatGV\{A,C1} = GD0.

14 This subgraphG

D0 contains two

conditional on (a subset of) covariates in the sufficient adjustment set.

14In order to avoid confusion, we will denote the set of ancestors of {Y, C

2, C3} in GV\{A,C1}by D0and the districts inGD0by Di0.


districts, i.e. D0

1={C2, C3, M}and D02={Y}, such that P(Y, C2, C3|do(A= a, C1 =c1))can be expressed as


Q[{C2, C3, M}]Q[{Y}].

It was shown in section 2.5.4 that Q[{Y}] = P(Y|A, M, C3), so we only need to obtain Q[D0

1]from its corresponding district inG, i.e. Q[S2]. For this purpose, we need to invoke Identify(D0

1, S2, Q[S2]). However, because the set of ancestors of D0

1 in the subgraphGS2 (Figure 2.7) coincides with

S2, Q[D01] is not identifiable. Because identification of Q[D10] fails, identi- fication of P(Y, C2, C3|do(A = a, C1))also fails, which ultimately leads to the conclusion that the conditional effect P(Y|do(A =a), C1, C2, C3)is not identifiable from observable data.

However, it can easily be shown that, in contrast, e.g. P(Y|do(A = a), C1, C3)is identifiable. This can mainly be appreciated upon noting that by avoiding to condition on collider C2, C3may – in addition to C1– also be included in the unique maximal subset W such that P(Y|do(A =a), C\C2) = P(Y|do(A = a, W),(C\C2)\W). As a result, P(Y|do(A = a), C1, C3) equals P(Y|do(A=a, C1, C3)), which can be shown to be identified via the

Identifying natural and

path-specific effects from

observed data

This chapter is an adapted version of a handbook chapter submitted for peer review in M. Drton, S. Lauritzen, M. Maathuis, M. Wainwright (Eds.), Handbook of Graphical Models. CRC Press.

In this chapter, we will study non-parametric identification of natural direct and indirect effects, and of path-specific effects (Avin et al., 2005) in general. In particular, we revisit earlier identifying assumptions (Pearl, 2001) in the light of a recently proposed graphical identification criterion for path-specific effects (Shpitser, 2013) that extends previous work on complete conditions for non-parametric identification of total treatment effects (Huang and Valtorta, 2006; Shpitser and Pearl, 2006a,b, 2008a; Tian and Pearl, 2002, 2003) – as discussed in chapter 2 – to allow for effect de- composition. Through various worked-out examples, we aim to provide insight into the use of this graphical criterion, as well as into the nature of the assumptions on which mediation analysis relies. Before conclud- ing this chapter by extending notions of natural direct and indirect effects to more generally defined path-specific effects, we highlight that Shpitser


(2013)’s graphical criterion leads to novel insights that may contribute to a more comprehensive understanding of recent conceptual developments and formulations inspired by the debate about the distinct and controversial na- ture of both definitions and required assumptions of targeted path-specific effects (Robins and Richardson, 2010).


Cross-world counterfactuals...

Despite their formal and intuitive appeal, non-parametric identification of natural effects is subtle and a source of much controversy. The reason is that the usual consistency assumptions alone – namely that M(a)and Y(a)equal M and Y, respectively, when A=a, and that Y(a, m)equals Y when A= a and M =m – do not suffice to link all counterfactual data to observed data. In particular, nested counterfactual outcomes Y(a, M(a0))are unobservable

when a 6= a0. Data, whether experimental or observational, thus never carry

information about the distribution of these counterfactuals as they imply a union of two incompatible states a and a0 that may only seem to coexist

‘across multiple worlds’. Mediation analyses based on natural effects are thus bound to rely on assumptions that cannot be empirically verified or guaranteed by the study design. Even randomised cross-over trials, where one would first manipulate A to a0to observe M(a0), and then manipulate

A to a and M to M(a0)to finally observe Y(a, M(a0)), would require strong

assumptions of no period effect and no carry-over effects at the individual level (Imai et al., 2013; Josephy et al., 2015; Robins and Greenland, 1992).


... require cross-world assumptions

To develop intuition into non-parametric identification of natural effects – and, by extension, path-specific effects – we will work through a number of simple, but typical examples.

Consider the basic mediation setting depicted in the causal diagram in Figure 3.1. Identification of natural effects in this setting can be obtained if we recover the distribution of nested counterfactuals P(Y(a, M(a0)) = y).




Figure 3.1:A simple, typically over-simplistic, mediation graph.

y, M(a0) =m)over m. When a6= a0, the observed data carry no information

about the dependence of Y(a, m)and M(a0). This articulates why natural

effects cannot, in general, be identified from experimental data without further, untestable assumptions.

One such assumption is that of cross-world independence,

Y(a, m)


M(a0), (i)

which Pearl (2001) claimed to be key to ‘experimental’ identification of natu- ral effects. Under this assumption, we can factorize P(Y(a, m) = y, M(a0) =

m)as a product of interventional distributions1– each of which is identified from observed data under the assumptions encoded in the causal diagram in Figure 3.1 – as follows

P(Y(a, M(a0)) =y) =

m P(Y(a, m) =y, M(a

0) = m)


m P(Y(a, m) =y)P(M(a

0) =m)


m P(Y =y|A =a, M=m)P(M =m|A=a


In document Flexible causal mediation analysis using natural effect models (Page 56-63)