• No results found

Flexible causal mediation analysis using natural effect models

N/A
N/A
Protected

Academic year: 2021

Share "Flexible causal mediation analysis using natural effect models"

Copied!
277
0
0

Loading.... (view fulltext now)

Full text

(1)

commodes, qui l’une et l’autre nous dispensent de r´efl´echir.”

(2)

Prof. dr. Stijn Vansteelandt

Department of Applied Mathematics, Computer Science and Statistics Faculty of Sciences

Prof. dr. Tom Loeys Department of Data Analysis

Faculty of Psychology and Educational Sciences Prof. dr. Beatrijs Moerkerke

Department of Data Analysis

Faculty of Psychology and Educational Sciences

Other members of the examination board Prof. dr. Vanessa Didelez

Leibniz Institute for Prevention Research and Epidemiology University of Bremen, Germany

Prof. dr. Theis Lange

Department of Public Health, Section of Biostatistics University of Copenhagen, Denmark

Prof. dr. Yves Rosseel Department of Data Analysis

Faculty of Psychology and Educational Sciences Prof. dr. ir. Olivier Thas (chair)

Department of Mathematical Modelling, Statistics and Bio-informatics Faculty of Bioscience Engineering

Dr. Karel Vermeulen

Department of Mathematical Modelling, Statistics and Bio-informatics Faculty of Bioscience Engineering

DeanProf. dr. Herwig Dejonghe

Department of Physics and Astronomy Faculty of Sciences

RectorProf. dr. Anne De Paepe

Department of Pediatrics and Medical Genetics Faculty of Medicine and Health Sciences

(3)

using natural effect models

Johan Steen

Dissertation submitted in fulfilment of the requirements for the degree of Doctor in Statistical Data Analysis

Academic year 2016–2017

Department of Applied Mathematics, Computer Science and Statistics

(4)
(5)

Research Foundation Flanders (G.0111.12)

Data analyzed to illustrate the methods in this thesis were kindly provided by

the Department of Experimental-Clinical and Health Psychology, Faculty of Psy-chology, Ghent University (Interdisciplinary Project for the Optimisation of Separa-tion trajectories; IPOS).

the World Health Organizations European Centre for Environment and Health, Bonn office (Large Analysis and Review of European Housing and Health Status; LARES). The corresponding chapters in this thesis reflect the author’s opinion and not necessarily the position of the WHO.

(6)
(7)

Dankwoord xiii

1 Introduction 1

1.1 Motivating examples . . . 2

1.1.1 The Job Search Intervention Study (JOBS II) . . . 2

1.1.2 The Interdisciplinary Project for the Optimization of Separation trajectories . . . 3

1.1.3 The Large Analysis and Review of European Housing and Health Status project . . . 4

1.2 Counterfactual outcomes . . . 4

1.3 Natural direct and indirect effects . . . 5

1.4 Challenges in mediation analysis . . . 7

1.4.1 Causal assumptions . . . 7

1.4.2 Modeling assumptions . . . 8

1.5 Main contributions . . . 9

1.6 Outline of this thesis . . . 11

2 Inferring causal effects from observed data 15 2.1 Encoding conditional independencies in a graph . . . 15

2.1.1 d-separation . . . 16

2.1.2 Observational equivalence . . . 19

2.2 What makes a diagram acausaldiagram . . . 20

2.3 The truncated factorization formula . . . 21

2.3.1 An example . . . 22

(8)

2.4.2 The adjustment criterion . . . 23

2.4.3 Flexible estimation strategies for the adjustment for-mula . . . 25

2.5 Identifiability in the presence of hidden variables . . . 28

2.5.1 A simple example: the front-door formula . . . 29

2.5.2 C-component factorization . . . 31

2.5.3 A complete identification algorithm . . . 33

2.5.4 A somewhat more involved example . . . 35

2.5.5 Conditional causal effects . . . 40

3 Identifying natural and path-specific effects from observed data 45 3.1 Cross-world counterfactuals... . . 46

3.2 ... require cross-world assumptions . . . 46

3.2.1 Non-parametric structural equation models . . . 47

3.2.2 Unmeasured mediator-outcome confounding . . . . 48

3.2.3 Identification by the mediation formula . . . 50

3.2.4 Treatment-induced mediator-outcome confounding 51 3.2.5 Pearl’s graphical criteria for cross-world independence . . . 53

3.3 Avoiding recantation... . . 53

3.3.1 From recanting witnesses... . . 54

3.3.2 ... to recanting districts . . . 54

3.3.3 Some examples . . . 55

3.4 ...yields interventional identification . . . 56

3.4.1 The recanting district criterion . . . 56

3.4.2 Interventional identification 1.0 . . . 57

3.4.3 Interventional identification 2.0 . . . 62

3.4.4 Stratum-specific natural effects . . . 66

3.5 Complementary identification strategies . . . 67

3.5.1 Interchanging cross-world assumptions . . . 67

3.5.2 Two types of auxiliary variables . . . 69

3.6 From mediating instruments to conceptual clarity . . . 70

(9)

3.6.3 Examples . . . 75

3.7 Path-specific effects . . . 76

3.7.1 Alternative decompositions in the presence of multi-ple mediators or intermediate confounding . . . 77

3.7.2 Coarser decompositions in the presence of unobserved confounding . . . 78

3.7.3 Costs of fine-grained decompositions: assumptions . 80 3.7.4 Costs of fine-grained decompositions: interpretation 81 4 Flexible mediation analysis with a single mediator 83 4.1 Introduction . . . 84

4.2 The mediation formula . . . 87

4.2.1 Counterfactual outcomes and effect decomposition . 87 4.2.2 The mediation formula . . . 90

4.2.3 Applying the mediation formula in practice . . . 91

4.3 Mediation analysis via natural effect models . . . 94

4.3.1 Fitting natural effect models . . . 95

4.3.2 Weighting-based approach . . . 98

4.3.3 Imputation-based approach . . . 105

4.4 Dealing with different types of variables . . . 109

4.4.1 Multicategorical exposures . . . 110

4.4.2 Continuous exposures . . . 113

4.5 Effect modification of natural effects . . . 115

4.5.1 Exposure-mediator interactions . . . 115

4.5.2 Effect modification by baseline covariates . . . 118

4.6 Tools for calculating and visualizing causal effect estimates . 120 4.6.1 Linear combinations of parameter estimates . . . 120

4.6.2 Effect decomposition . . . 122

4.6.3 Global hypothesis tests . . . 125

4.6.4 Visualizing effect estimates and their uncertainty . . 125

4.7 Population-average natural effects . . . 126

4.8 Intermediate confounding: a joint mediation approach . . . 128

(10)

4.9.2 Missing data . . . 136

4.10 Concluding remarks . . . 138

4.A Technical appendices . . . 139

4.A.1 Semi-parametric estimators . . . 139

4.A.2 Constructing sandwich estimators . . . 139

5 Flexible mediation analysis with multiple mediators 143 5.1 Introduction . . . 144

5.2 Effect decomposition into path-specific effects . . . 145

5.2.1 Decomposition in a single mediator setting . . . 145

5.2.2 Decomposition in a setting with two sequential medi-ators . . . 148

5.3 Estimation approach . . . 155

5.4 Motivating example revisited . . . 159

5.5 Discussion . . . 162

5.A Technical appendices . . . 165

5.A.1 Targeted decompositions . . . 165

5.A.2 Identification . . . 168

5.A.3 Relation between weighted imputation and direct ap-plication of the generalized mediation formula . . . . 183

5.A.4 Estimation procedure . . . 188

5.B Empirical analysis . . . 210

5.B.1 Data set and baseline covariates . . . 210

5.B.2 Working models . . . 210

5.B.3 Conditional logistic natural effect model . . . 212

5.B.4 Marginal logistic natural effect model . . . 215

6 Discussion 219 6.1 Identifying assumptions . . . 219

6.1.1 Why non-parametric identification? . . . 219

6.1.2 Identification via the adjustment criterion . . . 220

6.1.3 Beyond the adjustment criterion . . . 224

(11)

6.2 Flexible modeling using natural effect models . . . 231

6.2.1 Strengths and weaknesses of the proposed estimators 232 6.2.2 Multiply robust estimators . . . 234

6.2.3 Inverse odds weighting . . . 235

6.2.4 Multiple sequential mediators . . . 236

6.2.5 Finite sample performance . . . 237

6.2.6 Measures of precision . . . 238

6.3 Further challenges . . . 238

6.3.1 Mediation analysis with time-to-event outcomes . . . 238

6.3.2 Mediation analysis with longitudinal measurements and latent constructs . . . 239

7 Samenvatting 241

(12)
(13)

Het heeft heel wat voeten in de aarde gehad. Dat is wel het minste wat je kunt zeggen. Ik denk dat heel wat mensen in mijn nabije omgeving dat inderdaad kunnen beamen. Nochtans, zonder hen was wat nu voor uw neus ligt niet geworden wat het geworden is. Een kleine bedanking is daarom wel op zijn plaats.

Niet ´e´en, niet twee, maardriepromotoren bleken nodig om bij mij het onderste uit de kan te halen. Bieke, ik heb daarom ook geen moment spijt gehad dat ik een tijdje geleden de nodige formulieren in orde gebracht heb om je ook officieel mijn derde promotor te mogen noemen. Bedankt voor de humoristische en relativerende noot!

Tom, al van tijdens je begeleiding bij mijn master thesis gaf je blijk mijn werk te appreci¨eren. De wederzijdse waardering maakte de samenwerking de voorbije jaren dan ook enkel aangenamer.

Stijn, van jou heb ik vooral geleerd dat ook moeilijke materie makkelijk verteerbaar wordt dankzij het aanbrengen van wat context en intu¨ıtie. Be-dankt vooral om telkens de rust te bewaren wanneer keer op keer moest blijken dat ik een afgesproken deadline niet zou halen. Je kalmte, eindeloze geduld en geloof in een goede afloop hebben ongetwijfeld bijgedragen tot waar ik nu sta. Ik geloof dat ik na deze 4,5 jaar misschien iets meer statisti-cus ben geworden, en jij iets meer psycholoog. Bedankt, Tom, Bieke en Stijn, voor de kansen die jullie me hebben gegeven. Ik heb veel van jullie geleerd de voorbije jaren!

I’ld also like to express my gratitude towards the members of the reading committee, Vanessa Didelez, Theis Lange, Yves Rosseel, Karel Vermeulen and Olivier Thas, for their careful reading of this thesis, their insightful,

(14)

step back and try to get a grip on the bigger picture.

Theis, thanks for detecting numerous bugs in medflex, for providing

valuable input and the opportunity to contribute to the mediation workshop in Copenhagen. But most of all thanks for being such an enthusiastic advertiser of the package!

Yves, ik heb niet enkel een sterk vermoeden, maar ben er redelijk zeker van dat jij mijn interesse in statistiek hebt aangewakkerd.

Karel, jou wil ik ook minstens even hard bedanken voor de tijd als collega-student als voor al je inspanningen als jurylid van m’n proefschrift. Jouw deur stond altijd open op de momenten waarop ik dacht dat ik het onmogelijk kon maken om Stijn n´og maar eens lastig te vallen met technis-che vragen of verduidelijken. Machteld, ook jij bedankt om je deur telkens op een kiertje te laten. Samen met jullie en Joke waren Boston en New York trouwens een ongelooflijke ervaring!

Bedankt ook aan de vele andere fijne TWIST-collega’s, in het bijzonder diegene waarmee ik het bureau gedeeld heb, en niet te vergeten de mensen van het secretariaat. Helaas zijn jullie, samen met de ex-collega’s, te talrijk om allemaal bij naam te noemen. Het risico om iemand te vergeten is immers ook te groot. Nonetheless, I would like to especially thank Bashir, Hilmar, Bea, Diego, Oliver, Mushthofa, Gustavo, Holger, Xianming, Koen, V´eronique, Vahe, Sjouke, Christophe, Jos´e, Camila and Paula, for the good times, lunches, dinners, trips and interesting intercultural discussions. I wish I had been less of a procrastinator, so that I would have had the time to thank each one of you individually.

I’ld also like to express my gratitude towards Ilya Shpitser for the many e-mail replies with technical clarifications on his work. They have surely kept me from getting cross eyed on all those cross-world counterfactual independencies. Many thanks to Kathleen Felix for granting me permission to use the Rube Goldberg cartoon as cover art for this thesis.

Bedankt ook aan de ‘experimentele’ klasgenootjes voor de vele re ¨unies tussendoor. De boys, bedankt voor de nodige onzin en baldadigheden. De steeds zeldzamer wordende ‘banquets’ waren iedere keer een veradem-ing. Ik vergeet ook niet snel de legendarische fietstochten (met of zonder

(15)

oppikken! Dank ook aan de bende van de Doornlaan voor de vele fijne etentjes en weekendjes, waar telkens weer naar uitgekeken wordt!

Ann, ook jou kan ik niet genoeg bedanken voor je steun en de altijd warme ontvangst in ‘hotel Vorselaar’ (zie ook Landuyt D., 2015). Ik hoop dat Marc ook stiekem fier zou geweest zijn. Bedankt, Evelien, voor je onuitputtelijk enthousiasme en je ondernemingszin, die er ongetwijfeld voor zullen zorgen dat de catering een waar succes wordt! Dries, een toffere en handigere schoonbroer kan een mens niet wensen. Wie weet leggen we ons ooit samen toe op Bayesian belief networks?

Ma, pa, hoe vervelend ik het ook vind om in clich´es te vervallen, ik kan er niet omheen. Bedankt voor de kansen die jullie me hebben gegeven. Het waren (en zijn) er veel! Ik kan jullie, samen met David en Nathan, niet genoeg bedanken om me doorheen die moeilijke, donkere periode te sleuren. Dank ook aan de rest van de familie, in het bijzonder mijn grootmoeders, voor jullie zorg en onvoorwaardelijke steun.

Carmen, jouw geduld heb ik waarschijnlijk nog het meeste op de proef gesteld. Je immer positieve ingesteldheid heeft altijd zijn effect gehad, ook al had ze soms tijd nodig om onderhuids op me in te werken. Ik kan me geen beter lief inbeelden. Dankzij jou weet ik wat ‘geluk’ betekent. Dank je voor wat je doet. Maar vooral, voor wie je bent.

Johan Steen Destelbergen, November 2016

(16)
(17)

Introduction

The well-known mantraassociation is not causationhas led to the widespread belief that one can only infer causal relations from randomized trials, as they are often considered the gold standard for causal inference.

For example, observational studies in the 1950s reporting associations be-tween smoking and lung cancer have long been criticized for not providing decisive evidence on the supposed causal effect of smoking on lung cancer, because of the simple fact that smokers and non-smokers are different not only in their smoking behavior, but also in many other respects. Both the tobacco industry and some prominent statisticians strongly supported the hypothesis that this association could be explained by a genetic predispo-sition to both lung cancer and smoking. Although the impact of potential confounding factors, such as a genetic predisposition, is eliminated by de-sign in randomized trials, these dede-signs are often not feasible because of ethical concerns.

Over the last few decades, methodological advances in the causal infer-ence literature have successfully demonstrated that appropriately analyzed data from observational studies may, nonetheless, shed light on causal enquiries. In particular, thepotential outcomes framework(Splawa-Neyman et al., 1990; Rubin, 1974) has provided a formal language for clarifying and communicating sufficient conditions under which well-defined causal effects can be estimated from the data at hand.

(18)

1

from studies that aim to open the ‘black box’ of causality in order to deepen our understanding of the precise mechanisms behind established cause-effect relations, as witnessed by the widespread usage ofmediation analyses. This statistical tool, which is the main topic of this thesis, aims to unravel dif-ferent causal pathways by separating the component effect that acts through a given intermediate variable or so-called mediator – i.e. anindirecteffect – from the remainingdirecteffect and by quantifying each of their respective contributions to the overall causal effect. The improved understanding into underlying processes that results from such analyses may not only be of pure scientific or etiologic interest, but may also inform policymakers as to which type of intervention or reform is most effective.

Below, we first list three empirical studies that focused on better under-standing of the causal mechanisms behind the effect of a certain intervention or exposure. Each of these examples will be discussed and/or analyzed in more detail in later chapters of this thesis. Next, we briefly introduce the central notion of potential or counterfactual outcomes which naturally leads to formal yet intuitive definitions of the causal effects of interest and enables clearly articulatingcausal assumptionsthat are required for obtaining unbiased and valid estimates of these effects from observed data. We then provide some intuition into the main challenges in mediation analysis and give a short overview of the contributions of this thesis in terms of dealing with these challenges, followed by a more detailed outline of the subsequent chapters of this thesis.

1.1

Motivating examples

1.1.1 The Job Search Intervention Study (JOBS II)

The JOBS II field experiment (Vinokur et al., 1995), an often cited empirical mediation example, was designed to assess the effectiveness of a theory-driven job training intervention that aimed to both increase reemployment and reduce depressive symptoms in unemployed workers. 1,801 subjects were randomly assigned to either participate in several sessions of job search skills workshops that also focused on enhancing one’s sense of

(19)

1

mastery or self-efficacy and inoculation against setbacks after losing one’s

job (treatment group) or receive a booklet with job search tips (control group).

Vinokur and Schul (1997) conducted a detailed analysis of potential mediating mechanisms after beneficial effects on both reemployment and mental health had been established in earlier analyses (Vinokur et al., 1995). One mediation question of interest was whether workshop participation leads to reduction in depressive symptoms (at two months follow-up) by increasing chances of getting reemployed (at two months follow-up).

1.1.2 The Interdisciplinary Project for the Optimization of Separation trajectories

The Interdisciplinary Project for the Optimization of Separation trajectories (Ghent University and Catholic University of Louvain, 2010) was a large-scale survey study which involved the recruitment of individuals who divorced between March 2008 and March 2009 in four major courts in Flanders. The main aim of this project was to improve the quality of life in families during and after the divorce by translating research findings into practical guidelines for separation specialists (such as lawyers, judges, psychologists, welfare workers...) and by promoting evidence-based policy. A subsample of 385 individuals responded to a battery of questionnaires related to romantic relationship characteristics, such as adult attachment style, and break-up characteristics, such as break-up initiator status, expe-riencing negative affectivity and engaging in unwanted pursuit behaviors towards the ex-partner (De Smet et al., 2012). Respondents were asked to imagine their former partner as well as possible and to remember how they generally felt in their relationshipbeforethe breakup when completing the attachment style questionnaire. The mediation hypothesis of interest concerned the question whether and to what extent the level of emotional distress or negative affectivity experiencedduringthe breakup mediates the effect of attachment style towards the ex-partnerbeforethe breakup exerts on the potential display of unwanted pursuit behaviorsafterthe breakup (Loeys et al., 2013).

(20)

1

1.1.3 The Large Analysis and Review of European Housing and Health

Status project

The last motivating example also concerns a survey study. The Large Anal-ysis and Review of European Housing and Health Status (LARES) project conducted by the World Health Organization (Shenassa et al., 2007) col-lected survey data in the winter and spring of 2002/2003 from 5,882 adult respondents from 2,983 households in 8 European cities. Baseline mea-surements were available on both respondent characteristics (age, gender, marital status, education level, employment, smoking and environmental tobacco smoke at home) and household characteristics (ownership, size, tenure, crowding, ventilation, natural light, heating and city of residence).

One of the mediation questions of interest was whether and to what extent the effect of living in damp and moldy conditions on the risk of depression is mediated by respondent’s perceived control over one’s home.

1.2

Counterfactual outcomes

The counterfactual or potential outcomes framework appeals to human intuition, because it defines causal effects by comparing an outcome of inter-est in the population under different hypothetical scenarios or interventions. For instance, in this framework, the causal effect of smoking on lung cancer could be defined as the difference in lung cancer incidence if the entire population were to smoke versus no-one would smoke.

This ‘what if’ type of reasoning has been formalized by the use of so-calledcounterfactualorpotential outcomes. For instance, when Adenotes the exposure or treatment of interest andYthe outcome of interest, thenY(a) denotes the value of the outcome that would have been observed had A – possibly contrary to the fact – been set to level a. This notation enables definingtotal causal effectsasE{Y(a)Y(a0)}whereaanda0 correspond to meaningful choices for active and reference (baseline) levels of treatment or exposure, respectively.1 For expositional simplicity, we will restrict our 1This is essentially identical to the interventional contrast E(Y|do(A = a)) E(Y|do(A=a0))in terms of Pearl’sdo-operator.

(21)

1

current presentation to binary treatments (a = 1 and a0 = 0), although

definitions and results extend to multicategorical or continuous treatments. The population-average effect of smoking A – where A = 1 indicates smoking status – on lung cancer Y would thus be defined as E{Y(1)−

Y(0)}.

1.3

Natural direct and indirect effects

Mediation analysis aims to decompose the average treatment or exposure effect,E{Y(1)−Y(0)}, into the components that respectively capture the treatment’sindirecteffect on the outcome along an intermediate variable of interest M, and the treatment’s remainingdirecteffect via potential other mechanisms.

Robins and Greenland (1992) laid the foundations for such decompo-sition by introducingnested counterfactuals Y(a,M(a0)), which denote the value of the outcome that would have been observed had – possibly con-trary to the fact – Abeen set to levelaandMtoM(a0), the value that would have been observed for the mediator had A been set to a0. Using such nested counterfactuals, one can now isolate and quantify part of the treat-ment effect that is transmitted through the mediatorMby leaving treatment unchanged at A=1, but changing the counterfactual intermediate outcome M(1)to M(0), the value it would have taken under no treatment, leading to the definition of the so-calledtotal indirect effect

E{Y(1,M(1))Y(1,M(0))}. Its complement, thepure direct effect

E{Y(1,M(0))Y(0,M(0))},

then captures the intuitive notion of blocking the treatment’s effect on the mediator by keeping the latter fixed at whatever value it would have attained under no treatment.

(22)

1

decompose the total exposure effect of mold on mental health, which com-pares the average risk of depression in the population if everyone were to be exposed to mold versus no-one were exposed. The total indirect effect then captures the average change in risk of depression in the population if everyone’s perception of control were to be changed from what it would be under exposure to mold to what it would be under no exposure. The pure direct effect, on the other hand, captures the average change in risk of depression in the population if we were to change everyone’s exposure status from being unexposed to being exposed, while leaving unchanged everyone’s perceived control at the level that it would be under no exposure.

A primary appeal of these – and similar – effect estimands is that, as opposed to definitions in the linear structural equation modeling tradition, they are model-free: they combine to produce the total effect, irrespective of the scale of interest or the presence of interactions or nonlinearities, under the composition assumption thatY(a,M(a)) =Y(a). For instance, although the above effects are expressed in terms of mean differences, the total effect risk ratio of a binary outcome could similarly be expressed as the product of the pure direct effect risk ratio and the total indirect effect risk ratio

P{Y(1) =1} P{Y(0) =1} = P{Y(1,M(0)) =1} P{Y(0,M(0)) =1} P{Y(1,M(1)) = 1} P{Y(1,M(0)) = 1}.

The expectation of nested counterfactuals can be modelled using a so-callednatural effect model(Lange et al., 2012, 2014; Loeys et al., 2013; Steen et al., 2016a,b; Vansteelandt et al., 2012a), e.g.

E{Y(a,M(a0))} =g−1

{β0+β1a+β2a0+β3aa0},

where g(·)is a known link function. Ifg(·)is the identity link,β1captures the pure direct effect and β2+β3captures the total indirect effect.2 By dif-ferently apportioning the interaction termβ3, an alternative decomposition

2Similary, effect estimates on the risk and odds ratio scale can be obtained by choosing g(·)to represent the log and logit link function, respectively.

(23)

1

can be obtained in terms of thetotal direct effect

E{Y(1,M(1))Y(0,M(1))}, as captured byβ1+β3and thepure indirect effect

E{Y(0,M(1))Y(0,M(0))},

as captured byβ2. In accordance with VanderWeele (2013), any of these two decompositions can thus be further refined leading to the same unique three-way decomposition into the pure direct effect β1, the pure indirect effectβ2, and a mediated interactive effectβ3. Pearl (2001) later adopted the same definitions but named these parametersnatural(rather than pure) direct and indirect effects to refer to the fact that pure direct effects, as opposed to controlleddirect effects E{Y(1,m)−Y(0,m)}, allow fornaturalvariation in the mediator. That is, pure direct effects reflect the effect of treatment upon fixing the mediator at values that would, for each individual, havenaturally occurred under no treatment, rather than at some predetermined levelm (uniformly across the population). In the remainder of this thesis, we will adopt Pearl’s terminology of ‘natural’ effects to refer to any of the above instances.

1.4

Challenges in mediation analysis

1.4.1 Causal assumptions

Adopting this counterfactual notation naturally leads to framing causal inference as a missing data problem (Holland, 1986). That is, for each subject i, only one counterfactual outcome, i.e. Yi = Yi(Ai) = Yi(Ai,Mi(Ai)), is observed. In order to infer causal effects from observational data, we will thus inevitably need to make some causal assumptions.

Although such assumptions will be discussed more formally and in more detail in the next chapters, an important difference between inferring a total causal effect (in point exposure studies) and, subsequently, learning about its component effects – such as natural direct and indirect effects – merits

(24)

1

attention here. While the former mainly requires that common causes of treatment or exposure and outcome are adjusted for by statistical methods or eliminated by experimental design, the latter, in addition, requires to adjust for common causes of mediator and outcome.

Moreover, additional complexities arise when such mediator-outcome confounders are themselves affected by treatment, because such variables are then simultaneously a confounder and a mediator on the causal path-ways that we aim to disentangle. For this reason, causal assumptions generally get more complicated in mediation settings.

For instance, in the motivating example in section 1.1.3, the relation between perceived control over one’s household and mental health may be confounded by many factors, such as age, education level, ventilation in the house, etc... Such potential common causes thus need to be taken into account in statistical analyses. However, some of these potential con-founders, such as physical health, are likely also affected by exposure to mold (Kaufman, 2010).

As will be discussed in more detail later, the presence of such so-called intermediate confoundersgenerally prevents us from obtaining valid estimates of natural direct and indirect effects with respect to the mediator of interest. Nonetheless, in cases with multiple sequential mediators, alternative de-compositions of the total effect may still be obtained from the data at hand, in order to shed light on underlying causal mechanisms.

1.4.2 Modeling assumptions

It thus seems that answering mediation questions often, if not always, requires some form of statistical adjustment for confounders. In most applications, the set of confounders will be high-dimensional and will usually consist of a mix of discrete and continuous covariates. To deal with the curse of dimensionality, we will thus necessarily need to rely on some modeling assumptions, preferably as few as possible. A further challenge is that the risk of making incorrect modeling assumptions increases as more and more confounders and mediators enter the picture. Although this challenge is not unique to mediation analysis, semi-parametric approaches,

(25)

1

which allow to relax certain modeling assumptions, have only recently been

adapted to this setting (Tchetgen Tchetgen and Shpitser, 2012, 2014; Zheng and van der Laan, 2012).

1.5

Main contributions

In this thesis, we aim to contribute to the fast-growing field of mediation analysis by – at least partially – addressing each of the aforementioned challenges.

First, we give a detailed and up-to-date review of causal assumptions that permit toidentify– i.e. obtain consistent estimates of – component or path-specific effects of interest from observed data. Recently, significant advances have been made towards a complete characterization of causal scenarios that permit non-parametric identification of natural (and more generally defined path-specific) effects, thus providing both sufficient and necessary conditions (Shpitser, 2013). However, to the best of our knowl-edge, a systematic comparison of this recent work on complete conditions and earlier work on sufficient conditions (Pearl, 2001) is currently lacking. We contribute to the field by providing such a detailed comparison. In doing so, we aim to offer the reader some deeper intuitive understanding of particular obstacles that may prevent us from making progress in our quest to learn about causal mechanisms. Such improved understanding of necessary causal assumptions – often encoded in graphical models – may ‘aid [applied researchers] in planning of data collection and analysis, in communication of results, and in avoiding subtle pitfalls of confounder selection’ (Greenland et al., 1999). Importantly, we further reflect upon the specific implications of the completeness of this recent result in terms of complementary identification strategies that rely on so-called mediating instruments. Moreover, we integrate these novel insights with earlier con-ceptual considerations on the controversial nature of certain key identifying assumptions (Robins and Richardson, 2010).

Second, we provide practical solutions for mediation analysis tailored to the needs of applied researchers. In doing so, we build on a recently proposed unified and flexible modeling framework for mediation analysis

(26)

1

(Lange et al., 2012, 2014; Loeys et al., 2013; Vansteelandt et al., 2012b) that, as compared to other modeling approaches, has the potential to both con-siderably simplify result reporting and hypothesis testing, and to enable straightforward implementations in standard statistical software. A main contribution of this thesis, in this respect, is the development of a user-friendly software package that implements two proposed semi-parametric estimators within this modeling framework (Steen et al., 2016b), each of which reduces modeling demands by allowing to refrain from modeling certain aspects of the observed data distribution. Importantly, this package handles a larger class of parametric models for mediator and outcome than alternative software applications for modern mediation analysis that rely on closed-form expressions (Valeri and VanderWeele, 2013), and is less computer-intensive as compared to implementations that rely on Monte Carlo integration (Imai et al., 2010a; Tingley et al., 2014a). The latter asset is, in part, due to the development and implementation of robust sandwich variance estimators, which permit to avoid reliance on bootstrap procedures. Finally, we further extend thisnatural effect modelingframework, along with semi-parametric estimators, to accommodate more complex mediation set-tings with multiple, causally ordered mediators (Steen et al., 2016a). In particular, we demonstrate that such an extension both enables a more comprehensive assessment of underlying mechanisms and their potential interactions, as compared to existing analytical approaches (VanderWeele and Vansteelandt, 2013), and reduces modeling demands – and thus risk of model misspecification bias – as compared to fully parametric approaches (e.g. Daniel et al., 2015). Moreover, it offers a more principled solution to cope with increasing complexity in the face of multiple mediators. In addition, we propose a sufficient criterion for identification of(k+1)-way decompositions in the presence of ksequential mediators. This criterion extends previous work, as it boils down to sequential application of an existing graphical identification criterion for adjustment for a common set of covariates (Shpitser et al., 2010; Shpitser and VanderWeele, 2011), leading to a standard and generally applicable identification result. Its simplicity can be considered to induce a trade-off between general applicability and reduced identification power.

(27)

1

1.6

Outline of this thesis

In the next two chapters, we mainly focus on causal assumptions.

Inchapter 2, we first introduce the necessary theoretical background on

graphical causal models, which are commonly used to visually encode and communicate the causal assumptions that serve to provide certain statistical parameters a causal interpretation. Moreover, we review some important algorithms that have mainly been developed within the field of artificial in-telligence, but that can be widely applied in any field of empirical research that attempts to address causal queries. Their importance follows from the fact that, whereas often sufficient conditions are articulated, these algo-rithms enable to deduce conditions that are both sufficient and necessary for identifying total causal effects from available observed data, thus providing a (more) complete characterization of hypothetical causal scenarios that permit identification. As discussed in more detail in this chapter, this is of particular relevance for graphical causal models that considerably weaken certain causal assumptions by allowing for the presence of unobserved common causes.

In chapter 3, we provide intuition into the distinct and controversial

nature of some of the identifying assumptions for mediation analysis. In particular, we revisit earlier assumptions for identifying natural direct and indirect effects (Pearl, 2001) in the light of recent developments (Shpitser, 2013) that build on the insights and algorithms discussed in chapter 2. Importantly, we point out that these recent developments also lead to novel insights that are in line with and help to frame some recent conceptual formulations that were inspired by the debate about the controversial nature of the targeted effects.

In the remaining chapters, we shift focus to flexible modeling and esti-mation of the causal effects of interest. Inchapter 4, we discuss estimation

of so-called natural effect models (Lange et al., 2012, 2014; Loeys et al., 2013; Vansteelandt et al., 2012b), which were recently introduced in the literature to offer a simple yet flexible alternative to other state-of-the-art modeling approaches that, from the perspective of an applied researcher, may either complicate obtaining interpretable results or hypothesis testing (Imai et al.,

(28)

1

2010a) or pose a barrier to routine application because of their relative com-plexity (Tchetgen Tchetgen and Shpitser, 2011; van der Laan and Petersen, 2008). In this chapter, we moreover give a detailed discussion of the features ofmedflex, our free, open-source software package forR, which implements

two proposed semi-parametric estimators within this modeling framework. More general methods for mediation analysis are provided in chap-ter 5, in which we extend the natural effect modeling framework to settings

with multiple sequential mediators. Not only does such an extension offer feasible alternative decompositions in settings in which the mediator of interest is subject to intermediate confounding, it also enables parsimo-nious modeling, which may be advocated given the multitude of possible decompositions in the presence of an increasing number of mediators.

We conclude inchapter 6with some further reflections and challenges.

Individual contributions

The major parts of this dissertation are based on two accepted papers, one submit-ted handbook chapter and a software package. Although the aim is to present a coherent and well-structured overview of my research, inevitably, by merging these papers and chapters, which may not all have been presented in chronological order of writing, some repetition and loss of continuity may arise. In this subsection, a chronological overview is presented of the work in this thesis, along with a list of my individual contributions to each of the chapters, excluding the introduction and discussion (hence the switch in narrative voice).

Chapter 4andchapter 5can be considered as a product of a close collaboration with Stijn Vansteelandt, Tom Loeys and Beatrijs Moerkerke. I have developed and documented themedflexpackage, which implements the methods in Lange

et al. (2012) and Vansteelandt et al. (2012b) and is currently available3from CRAN: https://cran.r-project.org/package=medflex. In order to ensure both com-patibility with future extensions of the package and optimal user experience, certain crucial choices had to be made, mainly with respect to the core structure of the package. These choices have greatly benefited from close consultation with S. Vansteelandt, T. Loeys and B. Moerkerke. Valuable input, especially concerning the

3Up-to-date development releases of the package are available fromhttps://github. com/jmpsteen/medflex/.

(29)

1

neWeightfunction, has also been provided by Theis Lange. Occasional technical

support in the developing stages of the package has been provided by Joris Meys. Several bugs have been reported by S. Vansteelandt, T. Loeys, T. Lange, as well as by users of the package (a more detailed list, along with some patches provided by users, can be found at https://github.com/jmpsteen/medflex/issues). S. Vansteelandt and Karel Vermeulen have provided guidance in constructing generic robust sandwich variance estimators for combinations of a wide class of parametric models (with canonical link functions).

Chapter 4provides a detailed user guide for the package, using a dataset that has also been used in Loeys et al. (2013) as an illustrating example. The theoretical content of this chapter is largely based on Vansteelandt et al. (2012b), Lange et al. (2012) and Loeys et al. (2013) (a paper to which I have also contributed). I have taken the lead in writing this chapter, which is available as a vignette to the package, in a slightly modified version, and has been accepted for publication inJournal of Statistical Software(Steen et al., 2016b).

I have also taken the lead in writingchapter 5, although S. Vansteelandt has made major contributions in rewriting parts of this chapter in order to make it more accessible for an epidemiologic audience. The estimation procedure and graphical translation of identifying assumptions into a generalization of the adjustment criterion (Shpitser et al., 2010; Shpitser and VanderWeele, 2011) (in the technical appendix) were developed by myself, with guidance from S. Vansteelandt, T. Loeys and B. Moerkerke. In addition, I have implemented allRcode in the technical

appendix, and have conducted all data analyses. This chapter has been accepted for publication inAmerican Journal of Epidemiology(Steen et al., 2016a).

The content ofchapter 2is largely based on other introductory texts including Elwert (2013), Pearl (2000), Pearl et al. (2016), and Tian and Shpitser (2010).

Chapter 3is based on a chapter that has recently been submitted for peer review and is to appear in M. Drton, S. Lauritzen, M. Maathuis, M. Wainwright (Eds.), Handbook of Graphical Models. CRC Press. The detailed comparison of identifying assumptions, novel insights and relation with Robins and Richardson (2010), as mentioned in section 1.5, are mainly individual contributions. Ilya Shpitser has helped a great deal in the shaping of this chapter by providing valuable clarifica-tions regarding his paper in Cognitive Science (Shpitser, 2013). S. Vansteelandt has sigificantly contributed by improving the structure and clarity of earlier versions of this chapter.

(30)
(31)

Inferring causal effects from

observed data

Over the years, graphical models have proven to be an indispensable tool for visualizing and communicating causal assumptions within a given research context. Such models typically consist of a causal diagram or causal directed acyclic graph (DAG) G with nodes (or vertices) V = {V1, ...,Vn}

representing random variables of interest and directed edges (or arrows) connecting these nodes.1

2.1

Encoding conditional independencies in a graph

These diagrams are used to visualize a set of assumed conditional inde-pendencies. More specifically, whereas arrows between variables encode probabilistic dependencies among those variables, the absence of an arrow translates into an assumption of conditional independence stating that each variableVi is independent of its non-descendants conditional on its parents

PAi in the graph (i.e. the variables that have an arrow feeding directly

intoVi). ThisMarkov assumptionallows linking the structure of the graph

to the observed data onV. In particular, these conditional independence assumptions impose a set of restrictions on the joint probability distribution 1Typically, kinship terminology (i.e. ‘parents’, ‘children’, ‘ancestors’ and ‘descendants’) is used to describe the relationships between nodes implied by the arrows connecting them. By convention, we will denoteVito be both an ancestor and a descendant ofVi.

(32)

2

of V, P(V) so that it factorizes as a product of conditional distributions P(Vi|PAi)which only involve the parentsPAifor eachVi:

P(V) =

i

P(Vi|PAi), (2.1)

such thatP(V)satisfies the global Markov property relative toG (see next section).

Consider, for instance, the diagram in Figure 2.1A withV ={C,A,M,Y}. It follows from the Markov assumption relative to this diagram thatMand Care conditionally independent given A

P(M|A,C) = P(M|A) (2.2)

denoted, M

⊥⊥

C|A, and thatYand Aare conditionally independent given

{M,C}, i.e.Y

⊥⊥

A|M,C,

P(Y|A,M,C) = P(Y|M,C). (2.3) P(V)thus factorizes as

P(C,A,M,Y) = P(Y|M,C)P(M|A)P(A|C)P(C).

2.1.1 d-separation

In this simple example, all conditional independencies encoded in the graph follow directly from the local Markov property. More generally, Pearl (1988)’sd-separation criterion provides a graphical rule that enables

(A)G A M Y C (B)GA A =a M Y C (C)G 0 A M Y U

Figure 2.1: Original graphG (A), mutilated graph GA(B), and graph G0 with C

(33)

2

summarizing all (conditional) independencies encoded in a given graph,

irrespective of its complexity. To fully appreciate this rule, however, one needs to distinguish three elementary causal structures, which can be con-sidered the building blocks of every causal DAG. Each of these structures corresponds to a different source of association between observed variables.

Confounding and – especially – causation are two potential sources of association that match relatively well with human intuition. Their causal structures correspond to chains Vi → Vj → Vk and forks Vi ← Vj → Vk,

respectively. In both of these structuresViandVkare marginally dependent, but conditionally independent givenVj. That is,Viand Vk are said to be d-connected. If these causal structures were viewed as an electric net (Shipley, 2002), in both cases,Vjcould be considered an active switch that enables electricity to be transmitted betweenViandVkalong their connecting edges. The circuit can be broken by turning off the switch. Similarly, the path connectingViandVkcan be blocked upon conditioning onVj, renderingVi

andVk d-separated.

A third type of association, in contrast, arises when conditioning on a third variable. That is, if the structure is an inverted fork Vi → Vj ← Vk,

Vi andVk are marginally independent, but they become dependent when conditioning on their common effect Vj. Nodes with converging edges, so-calledcolliders, such as Vj, act like inactive switches that do not trans-mit electricity, unless they are conditioned on. In this case, the blocked path betweenVi andVk is unblocked, rendering these formerlyd-separated nodesd-connected. Conditioning on a collider may thus induce artificial or spurious associations. This seems to be at odds with human intuition (Burns and Wieth, 2004), as many would assume that conditioning on a third variable would, if anything, reduce or eliminate any dependence. A simple example may, however, help to elucidate this counterintuitive phenomenon (Pearl, 2000). Suppose the admission criteria for a graduate school are high grades and/or unusual musical talent and suppose one may assume these attributes to be uncorrelated in the general population. Learning that a random person has obtained high or low grades is thus uninformative as to whether this person has unusual musical talent (and vice versa). However, learning that a student of that school has obtained low grades tells us that

(34)

2

this student must be exceptionally gifted in music. Likewise, students that are not musically talented, are more likely to have obtained higher grades. These two causal attributes, which are uncorrelated (or marginally inde-pendent) in the general population, thus become dependent upon learning about their common consequence, i.e. that a student has gained admission. This phenomenon, which also occurs when conditioning on a descendant of a collider, has been termedBerkson’s paradoxin epidemiology and statistics (Berkson, 1946) or the explaining away effectin artificial intelligence (Kim and Pearl, 1983). Other commonly used terms arecollider(-stratification) bias (Greenland, 2003) or selection bias(Hern´an et al., 2004). The latter terms clarify, as in the above example, that this bias may not only occur because of, for instance, regression adjustment, but also by selective sampling from a specific subpopulation (i.e. stratification).

In contrast to the graphs associated with these three elementary struc-tures, most graphs are of considerably higher complexity, containing both more nodes and more edges. In particular, two nodes possibly have multi-ple paths2connecting them, each of which may contain any combination of these structures and may hence be blocked or unblocked by a set of other nodes. Given these elementary structures, however, we may predict the dependencies encoded in a graph of any level of complexity, using the following graphical criterion.

Definition 2.1.1. d-separation (Pearl, 2000) A path p is said to be d-separated (or

blocked) by a set of nodes Z if and only if

(i) p contains a chain Vi → Vj → Vk or a fork Vi ← Vj → Vk such that the

middle node Vj is in Z, or

(ii) p contains an inverted fork Vi →Vj ←Vk such that the middle node Vj is

not in Z and such that no descendant of Vjis in Z.

A set Z is said to d-separate X from Y if and only if Z blocks every path from a node in X to a node in Y.

2A path is a sequence of distinct nodes where any two adjacent nodes in the sequence are connected by an edge (of any directionality).

(35)

2

While XandYare said to be conditionally independent givenZif they

ared-separated byZ, the converse does not necessarily hold. For instance, d-connected nodes may be independent if an exact cancellation of positive and negative effects occur. Because such exact cancellations are unlikely to occur, it is usually assumed thatd-connected nodes are dependent, an assumption referred to asfaithfulness(Spirtes et al., 1993).

2.1.2 Observational equivalence

Importantly, since conditional independencies encoded in a graph impose constraints on the probability distribution that governs the generated data, they can be tested from observed data on the variables in the graph. This enables us to partially test the validity of the causal model associated with a given graph, but also serves as the basis for causal discovery algorithms. However, the ability to falsify a given graphical model from observable data does usually not permit to distinguish between multiple graphs that are compatible with observed data.

For instance, the three graphs in Figure 2.2 encode the same set of conditional independencies, i.e. X

⊥⊥

Y|W andW

⊥⊥

Z|X,Y. Because they share an identical set of testable implications, observational data does not carry any information to decide which of the three graphical models reflects the true underlying data generating mechanism. This example illustrates that conditional independencies usually do not allow us to infer directionality for all edges on a given graph. Nonetheless, we can infer some information about directionality, in each of the three graphs. Since we may learn from observed data that X

⊥⊥

Y|W and that X

⊥⊥

6 Y|W,Z, we can infer thatZmust be a collider and hence that the edges betweenZand

(A) W X Y Z (B) W X Y Z (C) W X Y Z

(36)

2

Xand betweenZandYmust be pointing towards Z. Directionality may thus to some extent be inferred by discovering so-calledv-structures (i.e. colliders whose parents are not adjacent).

Graphs that share a common skeleton (i.e. the same configuration of edges, irrespective of their direction) and commonv-structures, such as the graphs in Figure 2.2, are said to beobservationally equivalent or to belong to the sameMarkov equivalence class (Verma and Pearl, 1991). That is, be-cause they share an identical set of conditional independencies, they are empirically indistinguishable. To assess the causal effect of, say W onZ, it is, however, crucial to distinguish between each of these graphs. Nec-essarily, to make progress, we will need to make certain assertions about directionality based on subject matter knowledge and/or expert judgment.

2.2

What makes a diagram a

causal

diagram

Since the notion of causation is often formalized by referring to hypothetical interventions, e.g. setting Atoa, we ultimately wish to learn about some aspects of the joint distribution of the other observed variables P(V\ A) (i.e. usually the mean of some outcomeY V) under such different inter-ventions in the population. Our ability to do so rests on the assumption that the directed edges in a graph represent causal influences between the corresponding variables and that the graph can be conceived to reflect a modular system, in the sense that one can manipulate or change one part of the system without affecting the rest. More specifically, thisinvariance propertystates that each parent-child relation represents a stable and au-tonomous physical mechanism. The ideas of intervention andmodularity match the intuitive notion of causation and conditions that enable turning purely correlational claims into causal ones. These are therefore considered to grant causal DAGs their causal interpretation.

Consider again, for example, the graph in Figure 2.1A. If we were to intervene locally on A, fixing it to a, we would only curtail A’s natural tendency to vary in response toC(e.g. a potential confounder), without affecting the natural responses of the other variables. This action is often represented graphically by performing a kind of surgery on the original

(37)

2

graph G, turning it intoGA, by removing all directed edges into A (as in

Figure 2.1B), or mathematically, using Pearl (2000)’s do-operator, where do(A =a)represents the action or intervention that fixes Atoa. In order to learn about causal effects, we thus aim to compare joint interventional distributionsP(V\A|do(A =a))– or interventional distributions of an out-come of interestP(Y|do(A=a))– corresponding to different hypothetical interventions enforced uniformly over the population.

2.3

The truncated factorization formula

Importantly, assuming modularity enables us to obtain the joint interven-tional distribution by applying the usual factorization to the manipulated graphGA

P(V\A|do(A=a)) =

i|Vi6∈A

P(Vi|PAi)I(A =a), (2.4)

since the factors P(Vi|PAi) corresponding to variables in A are either 1 (when A = a) or 0 (when A 6=a), while those corresponding to the other variables remain unaltered. It can be seen that the resultingtruncated fac-torization formula(Pearl, 1995a) – which has been referred to earlier as the g-computation formula (Robins, 1986) and is implied by the manipulation theorem(Spirtes et al., 1993)) – in expression (2.4) simply omits (from ex-pression (2.1)) the conditional distribution of the nodeAthat we intervene on. The interventional distribution of some outcome of interestYcan then simply be obtained by summing3expression (2.4) overV\ {A,Y}

P(Y|do(A= a)) =

v\{a,y}i|V

i6∈A

P(Vi|PAi = pai), (2.5)

where pai denotes the vector of value assignments to PAi such that, if A ∈ PAi, value assignment PAi = pai is consistent with A = a. Note that, in the absence of hidden variables, the modularity assumption implies

3Throughout, for continuousV

i, replace summations by integrals and probabilities by

(38)

2

P(Vi|PAi) = P(Vi|do(PAi))for eachVi, such that the truncated factorization in expression (2.5) can be rewritten in terms of interventional distributions

P(Y|do(A =a)) =

v\{a,y}i|V

i6∈A

P(Vi|do(PAi = pai)). (2.6)

2.3.1 An example

Suppose, for example, that the variables in Figure 2.1A, as in Pearl (2000), represent smokingA, amount of tar deposited in the lungsM, development of lung cancerYand a certain genotypeCthat predisposes to both smoking and developing lung cancer. Application of the truncated factorization formula yields that, under the assumptions encoded in the graph in Fig-ure 2.1A, the interventional distribution ofY under an intervention that would, irrespective of potential ethical objections, either ban, i.e. do(A=0), or enforce smoking, i.e. do(A =1)– or more generally,do(A=a)– in the general population equals

P(Y|do(A =a)) =

c,mP(Y|M=m,C =c)P(M =m|A=a)P(C=c).

Moreover, exploiting the conditional independencies (2.2) and (2.3) encoded in the graph, we can simplify this resulting expression as follows:

c,mP(Y|A= a,M =m,C =c)P(M=m|A= a,C =c)P(C=c) =

c,mP(Y,M=m|A= a,C =c)P(C=c) =

c P(Y|A=a,C =c)P(C =c). (2.7)

This yields an expression commonly referred to as theadjustment formulaor theback-door formula(Pearl, 1993).

(39)

2

2.4

The adjustment formula

The previous example illustrates that, in some cases, the identification result for P(Y|do(A = a)) obtained via the truncated factorization formula (in expression (2.5)) may be simplified to expression (2.7).

2.4.1 Conditional ignorability

This result can, in fact, be shown to naturally relate to a sufficient condition for identification of causal effects defined in the counterfactual outcomes framework, i.e. that ofconditional ignorability. This assumption, denoted as a conditional independence statement involving counterfactual outcomes

Y(a)

⊥⊥

A|C, for alla (2.8)

states that the counterfactual outcome Y(a) that – possibly contrary to the fact – would have been observed under intervention that sets A = a, does not depend on the actual level Awithin strata of a set of covariates C. Assumption (2.8) has also been named the assumption of no omitted confounders or no unmeasured confounding, to capture the more intuitive notion thatCconstitutes a sufficient set to adjust for potential confounding of the relation between AandY.

When combined with a consistency assumption, which states that Y = Y(a)if A=a, conditional ignorability (2.8) allows the counterfactual distri-bution P(Y(a))– which essentially corresponds toP(Y|do(A=a))– to be expressed by the adjustment formula (2.7) as follows:

P(Y(a)) =

c P(Y(a)|C=c)P(C =c) =

c P(Y(a)|A=a,C =c)P(C =c) =

c P(Y|A=a,C =c)P(C =c). 2.4.2 The adjustment criterion
(40)

identifi-2

cation of P(Y|do(A =a))by the adjustment formula (2.7); a criterion that, in other words, permits to find all possible adjustment setsC that satisfy conditional ignorability (2.8). Thisadjustment criterionhas been shown to generalize and subsume Pearl (1995a)’sback-door criterion.4

In order to provide a more precise and formal definition of this criterion, especially in the case where Amay be a joint or sequential intervention, as in the examples discussed below, we will need to introduce the following terminology.

Definition 2.4.1. Proper causal path (Shpitser et al., 2010) Let X,Y be sets of

nodes. A directed path from a node in A X to a node in Y is called proper causal with respect to X if it does not intersect X except at A.

More generally, a path fromXtoYis calledproperif only its first node is inX(Perkovi´c et al., 2015). For example, supposeX ={A,M}in the graphs in Figure 2.3. In the graph in panel (A), there are two proper causal paths fromXtoY, i.e. A →YandM →Y. Note that A → M→Yis not proper causal with respect toXbecause it intersectsXatM. In the graph in panel (B), there is an additional proper causal path fromXtoY, i.e. A →L →Y.

Definition 2.4.2. Adjustment criterion (Shpitser et al., 2010) Z satisfies the

adjustment criterion relative to(X,Y)in the original graphG if

(i) No element in Z is a descendant inGXof any W 6∈ X which lies on a proper causal path from X to Y, and

(ii) All proper5non-causal paths inG from X to Y are blocked by Z.

The only non-causal path from{A,M}toYin the graph in Figure 2.3A is M ← C →Y. This path can be blocked byC, which is not on a proper causal path from {A,M} to Y, nor is it a descendant of a node on such a proper causal path. So C satisfies the adjustment criterion relative to

4For this reason, the back-door criterion is not further discussed.

5Shpitser et al. (2010)’s original formulation claimed that all non-causal paths inGfrom XtoYshould be blocked byZ. However, in accordance with Perkovi´c et al. (2015), we provide a slight reformulation in which this is only required for allpropernon-causal paths.

(41)

2

(A) A M Y C (B) A M Y L

Figure 2.3:Two mediation graphs with different proper causal paths from{A,M} toY.

({A,M},Y)in this graph, such that P(Y|do(A = a,M =m))is identified by

P(Y|do(A =a,M=m)) =

c P(Y|A=a,M =m,C =c)P(C=c).

Likewise, in the graph in Figure 2.3B, L blocks the only non-causal path from{A,M}toY, i.e. M L Y. However, Llies on the proper causal path A LYinGAM and thus does not satisfy the adjustment criterion relative to({A,M},Y)in this graph. Nonetheless,P(Y|do(A =a,M =m)) can be computed from the observed data by expression (2.5), which yields

P(Y|do(A=a,M =m)) =

l

P(Y|A =a,M=m,L =l)P(L =l|A =a). Intuitively, these examples illustrate that the first part of the adjustment criterion keeps us from adjusting for mediators, whereas the second part ensures that we adjust for common causes.

2.4.3 Flexible estimation strategies for the adjustment formula

Most often interest lies in comparing some mean outcome of interest under different hypothetical interventions in the population. That is,E(Y|do(A= a))is the causal quantity of interest, rather than the interventional distri-butionP(Y|do(A=a))per se. Estimating this quantity from observed data via direct application of the adjustment formula may be cumbersome, as it requires modelingP(C=c). This can be challenging, especially whenC contains continuous covariates and/or high-dimensional and data is sparse.

(42)

2

Below we show that there are two ways of rewriting the adjustment for-mula that give rise to estimators that may considerably reduce modeling demands in the sense that neither require modelingP(C =c).

Inverse probability weighting

The first estimator arises from rewriting the adjustment formula as follows E(Y|do(A =a)) =

y,cy·P(Y =y|A= a,C =c)P(C=c) =

y,c y·P(Y =y,A=a,C =c) P(A =a|C =c) =

y,c y·P(Y =y,C=c|A =a)P(A=a) P(A= a|C=c) =E YI(A =a) P(A =a|C) . The corresponding sample estimator

n−1

n

i=1

YiI(Ai =a) ˆ

P(Ai =a|Ci)

corresponds to a weighted mean outcome, where each individual exposed at level A= ais weighted by the inverse of its propensity of being exposed at that exposure level given baseline covariates C, ˆP(A = a|C). Inverse weighting can be thought of aiming to construct a pseudo-population in which confounding byCis eliminated (i.e. mimicking a randomized trial). This weighted-based estimator thus focuses solely on modeling the relation between AandCas it only requires a propensity score model forP(A|C).

Imputation

The second estimator results from simply applying the law of iterated expectations, so that one can average over the empirical distribution ofCin

(43)

2

the observed data, as follows:

E(Y|do(A=a)) =

c E(Y|A= a,C =c)P(C=c)

=E[E(Y|A=a,C)|A=a].

The resulting expression gives rise to an imputation-based estimator

n−1

n

i=1 ˆ

E(Yi|Ai =a,Ci)

that requires imputing each individual’s outcome under observed levels of the covariate set C but a (possibly) counterfactual exposure level a. E(Y|do(A = a)) can then be estimated by simply calculating the mean of these imputed outcomes. This estimator thus focuses on modeling the relation betweenYandCwithin strata of Aas it only requires an imputation model for the mean outcome E(Y|A,C).

Marginal structural models

E(Y|do(A =a))orE(Y(a))can be parameterized using so-calledmarginal structural models(Robins, 1999; Robins et al., 2000). The parameters of such models correspond to interventional contrasts of interest. For instance, in the marginal structural model

E(Y(a)) =β0+β1a, (2.9)

β1 captures the average causal effect corresponding to a change in the exposure from A=0 toA =a, i.e. E(Y(a)−Y(0)).

Model (2.9) could be considered a special case of a wider class of gener-alized linear marginal structural models

E(Y(a)) = g−1{

β>W(a)} (2.10)

with W(a) a known vector with components that may depend on a. W may be specified so as to accommodate non-linearities in the case of a

(44)

2

continuous exposure. βis an unknown parameter vector andg(·)a known link function, the choice of which permits some flexibility as to the scale on which the causal effect of interest is desired to be expressed.

The marginal structural model framework provides a natural environ-ment for impleenviron-menting the aforeenviron-mentioned estimators. That is, marginal structural models are traditionally fitted by weighted regression models, in which the weights correspond to the inverse probability weights discussed in section 2.4.3 (Robins et al., 2000). Alternatively, one may regress imputed mean outcomes on the exposure (Snowden et al., 2011). The latter approach is, however, computationally more intensive, as it requires replicating the original data along multiple values of the exposure and imputing outcomes for each individual under each of these exposure levels.

In chapter 3, similar estimators will be developed for estimating natu-ral direct and indirect effects in a mediation context. Similarly, marginal structural models will be generalized to parameterize mean nested coun-terfactuals E(Y(a,M(a0))). The motivation for these extensions follows from the fact that the adjustment criterion can be generalized to covariate sets that enable identifying natural direct and indirect effects by a general-ized adjustment formula for mediation analysis (Shpitser and VanderWeele, 2011).

2.5

Identifiability in the presence of hidden variables

When all relevant variables are observed, all causal queries of the form P(Y|do(A =a))can be computed from the observed joint distributionP(V) via the truncated factorization formula (expression (2.4)). However, the assumption that all common causes of any two (or more) variables in the graph are also included in the graph, i.e. that ofcausal sufficiency, is often unrealistic because it dismisses the possibility of unmeasured confounding. Whenever we relax this assumption, the question of identifiability arises, i.e. whether P(Y|do(A = a))can be expressed as a function of the joint distribution of observed variablesP(V).

(45)

2

2.5.1 A simple example: the front-door formula

Consider again the smoking example. Suppose the genetic predisposition for both smoking and developing lung cancer is unmeasured and namedU, as in Figure 2.1C. The graphical model associated with this causal diagram can be considered asemi-Markovianmodel.6 Often, semi-Markovian mod-els are represented byacyclic directed mixed graphs(ADMGs) (Richardson, 2003), where the presence of an unobserved common cause of two nodes is indicated by bi-directed edges (↔). However, for the purpose of our presentation, we will explicitly represent hidden variables U by circled nodes and their direct effects on observed variablesVby dashed edges.

Since U is unobserved, the adjustment criterion cannot be satisfied.7 Likewise, the truncated factorization formula (expression (2.4)) yields

P(Y|do(A =a)) =

u,mP(M=m,Y,U =u|do(A =a))

=

u,mP(Y|M=m,U =u)P(M =m|A=a)P(U =u),

(2.11) which involvesU and thus cannot be evaluated. However, progress can be made upon noting that, when recovering the joint distribution P(V)by summing overU, factors involving observed variables without unobserved parents, such asM, ‘factor out’ of the summation, as follows:

P(A,M,Y) =

u P(Y|M,U =u)P(M|A)P(A|U =u)P(U =u)

=P(M|A)

u P(Y|M,U =u)P(A|U =u)P(U =u). (2.12)

The joint distribution P(A,M,Y) can thus be written as the product of 6A model whose corresponding graph only includes unobserved variables that have (i) no parents (i.e. is a root node) and (ii) exactly two observed children, is called a semi-Markovian model. Even though identification results and algorithms described below can be extended to more general Markovian models with arbitrary sets of unobserved variables upon obtaining a semi-Markovian projection of these models (Tian and Pearl, 2003), for ease of exposition, throughout this thesis, we will focus on semi-Markovian models.

7Also note that the graph in Figure 2.1C carries no more testable implications since all conditional independencies encoded in the graph involveU.

(46)

2

P(M|A)and a factor that involves the confounded nodesAandY.

A key observation is that, despiteUbeing unobserved, the second factor can be expressed in terms of the observed dataV ={A,M,Y}, as follows (from expression (2.12)):

u P(Y|M,U =u)P(A|U =u)P(U =u)

= P(A,M,Y)/P(M|A) = P(Y|A,M)P(A). (2.13) Moreover, because no factors in the summation (overU) depend onA, we can rewrite expression (2.11) as

m P(M=m|A =a)

a0,u

P(Y|M=m,U =u)P(A =a0|U =u)P(U =u),

which, by expression (2.13) reduces to

m P(M=m|A =a)

a0

P(Y|A= a0,M =m)P(A =a0), (2.14) an expression generally referred to as thefront-doorformula (Pearl, 1995a).

This example illustrates that, at least in some settings, we may still be able to identify P(Y|do(A = a)) from P(V), despite the presence of unmeasured confounding. In fact, as will be elucidated in section 2.5.3, identification via the front-door formula can be considered to arise via sequential application of the adjustment formula, by whichP(M|do(A = a)) is identified by P(M|A = a) via adjustment for the empty set, while P(Y|do(M = m)) is identified by aP(Y|A = a,M = m)P(A = a) via adjustment forA. A crucial assumption here, though, is that Mintercepts all directed paths fromAtoY, or in other words, that Mmediates the entire effect of AonY. If this exclusion restrictionwould not hold, we could not have written expression (2.11) as expression (2.14) and we would not have obtained identification.

(47)

2

V1 V2 V3 V4 V5

U1

U2 U3

Figure 2.4: Graph for a semi-Markovian model with c-components {V1,V3,V5} and{V2,V4}.

2.5.2 C-component factorization

The factorization in expression (2.12) moreover illustrates that the set of observed variablesVcan be partitioned into jdisjoint sets or components, according to whether they share common unobserved parents. These dis-joint sets have been referred to asconfounded components(abbreviated: c-components) (Tian and Pearl, 2002) or districts (Richardson, 2009). More generally, it is said that any two observed variables sharing a common unobserved parent belong to the same c-componentSj. The importance of c-components can be appreciated by the fact that their disjointness implies that the joint distribution of observed v

References

Related documents

Comparison of appendicular bones of ma- caws: in each group of three, Ara glaucogularis FMNH 337727 is on the left, Anodorhynchus leari FMNH 337716 is on the right, and Ara

NOW, THEREFORE, BE IT RESOLVED by the Mayor and Borough Council of the Borough of Barnegat Light, County of Ocean, State of New Jersey, as follows:.. That the Mayor and Borough

Figure 8 shows the trend in average newly delivered single-aisle and small twin-aisle aircraft in terms of metric value (normalized to 1968 values) versus the fuel burn

This is a great chance for me to present my internship report on “Human Resource Management Practices: The Study on Recruitment and Selection Process of Abdul Monem Ltd.” I

However, the growth mechanism of nano-fibers from the atomistic and crystal points of views is not well understood. In this article, the diffraction techniques of X -ray as well

Result summary full year 2006 Nordea’s increased focus on profitable organic growth and cross-selling con- tinues to pay off with a strong perfor- mance in all business areas.. The

Let’s consider a trivariate DGP with a trend inside the cointegration space. We have rightly determined the cointegration rank to be two and try to identify the parameter space.

Diebold attempted a formal definition of big data: "explosion in the quantity (and sometimes, quality) of available and potentially relevant data, largely