• No results found

Chapter 3 Methods for estimating causal effects in longitudinal data

3.2 DAG-informed regression methods

As introduced previously (Β§2.5.2), a DAG is a qualitative (i.e. nonparametric) map of the data- generating process for a set of variables (39). For any given DAG, the principles of graphical model theory provide a way of determining whether a causal effect can be identified and, if so, what set(s) of variables need to be conditioned on to do so.

Where the true structure of a DAG is not known, as in almost all observational contexts, its structure must be assumed based upon subject matter knowledge and theories, and then tested and further refined according to available data (47, 57). In this way, the DAG represents the hypothesised data-generating process, and all inferences are made subject to the DAG being correct.

This DAG may then also be combined with parametric assumptions about the data-generating process in order to estimate causal effects. The primary method for achieving this is through regression modelling. In the following subsections, we outline how DAG-informed regression modelling can be implemented in order to estimate causal effects in observational data, for both time-fixed (Β§3.2.1) and time-varying exposures (Β§3.2.2).

Throughout, we use capital letters (e.g. π‘Œ) to denote random variables and small letters to denote specific values (e.g. 𝑦 = 0 or 𝑦 = 1), by convention (26).

3.2.1 For time-fixed exposures

To illustrate, we consider the DAG in Figure 3.1, which represents the hypothesised data- generating process for a time-fixed exposure 𝑋, outcome π‘Œ, confounders 𝐴, 𝐡, and 𝐢, and mediator 𝐷 in a population of individuals (all continuous random variables).

Figure 3.1 DAG depicting the hypothesised data-generating process for a time-fixed exposure 𝑿, an outcome 𝒀, a set of confounders 𝑨, 𝑩, and π‘ͺ, and a mediator 𝑫

Observed and/or measured variables are depicted in rectangular boxes, and latent variables are depicted in ovals.

By the backdoor criterion (Β§2.5.3.1), there exist two sets of variables which are minimally sufficient for identifying the total causal effect of 𝑋 on π‘Œ:

Set 1: 𝐴 and 𝐡 Set 2: 𝐢

Therefore, conditioning on either of these sets of variables will allow us to estimate the desired total causal effect. However, given that 𝐢 is unmeasured, Set 1 would be chosen as the conditioning set.

In the context of linear regression, conditioning is achieved by including the variable as a covariate in the model. Estimating the total causal effect of 𝑋 on π‘Œ in our example context thus becomes a matter of estimating the parameters of the following model:

π‘Œ = 𝛽0+ 𝛽1𝑋 + 𝛽2𝐴 + 𝛽3𝐡 + πœ€

Assuming the model has been correctly parameterised, we are able to interpret 𝛽̂1 as the

estimated total causal effect of 𝑋 on π‘Œ. In other words, for individuals with the same values of 𝐴 and 𝐡 (i.e. conditionally exchangeable groups), every one-unit difference in the exposure corresponds to an expected difference in the outcome of 𝛽̂1.

The expected counterfactual outcome associated with a particular value π‘₯ of the exposure for an individual whose values of 𝐴 and 𝐡 were equal to π‘Ž and 𝑏, respectively, can thus be computed as:

π‘ŒΜ‚ = 𝛽̂0+ 𝛽̂1π‘₯ + 𝛽̂2π‘Ž + 𝛽̂3𝑏

3.2.2 For time-varying exposures

We next consider the DAG in Figure 3.2, which represents the hypothesised data-generating process for two measurements of a time-varying exposure 𝑋 (i.e. 𝑋0 and 𝑋1), one subsequent

outcome π‘Œ, and one time-dependent confounder 𝑀1 (all continuous random variables) in a

Figure 3.2 DAG depicting the hypothesised data-generating process for two measurements of a time-varying exposure 𝑿 (i.e. π‘ΏπŸŽ and π‘ΏπŸ), one outcome 𝒀, and one time-dependent

confounder π‘΄πŸ

The joint causal effect of 𝑋0 and 𝑋1 on the outcome π‘Œ is identifiable by the sequential

backdoor criterion (39). However, simultaneously conditioning and not conditioning on 𝑀1 is

impossible in a conventional single-equation regression model (39). Thus, one of the three β€˜g- methods’ may be used to estimate the average counterfactual outcomes associated with different exposure regimes. Each g-method is summarised in the following subsections; more detailed descriptions are provided by Robins, J.M. and M.A. HernΓ‘n (26), Naimi, A.I. et al. (56), Daniel, R.M. et al. (58), Arnold, K.F. and M.S. Gilthorpe (59), Taubman, S.L. et al. (60), Robins, J.M. et al. (61), Vansteelandt, S. and M. Joffe (62), and Picciotto, S. and A.M. Neophytou (63).

3.2.2.1 The (parametric) g-formula

Implementing the parametric g-formula requires that we first use our data to estimate the functions which govern the data-generating process, thereby creating a sequence of functions which combine to generate the values for every endogenous node in the DAG.8 For example, if

we assume a linear process, we would estimate the parameters for each of the following models:

𝑀1 = 𝛽00+ 𝛽10𝑋0+ πœ€π‘€1

𝑋1= 𝛽01+ 𝛽11𝑋0+ 𝛽21𝑀1+ πœ€π‘‹1 π‘Œ = 𝛽02+ 𝛽

12𝑋0+ 𝛽22𝑀1+ 𝛽32𝑋1+ πœ€π‘Œ1

Estimating the average value of π‘Œ that would have been observed if the exposures 𝑋0 and 𝑋1

had been equal to whatever values we are interested in (e.g. π‘₯0 and π‘₯1, respectively) therefore

requires replacing 𝑋0 with π‘₯0 and 𝑋1 with π‘₯1 in our estimated models and sequentially

computing the expected value of each variable, as in: 𝑀̂1= 𝛽̂00+ 𝛽̂10π‘₯0

𝑋1= π‘₯0 π‘ŒΜ‚ = 𝛽̂02+ 𝛽̂

12π‘₯0+ 𝛽̂22𝑀̂1+ 𝛽̂32π‘₯1

8 In low-dimensional settings with discrete data, the conditional probability of each variable may be

estimated nonparametrically; in such cases, this method is simply referred to as β€˜the g-formula’ (64).

The g-formula thus effectively simulates the joint distribution of the variables that would have been observed under a hypothetical intervention targeting the exposure, based on the joint distribution that was actually observed (6).

3.2.2.2 Inverse probability of treatment weighting (IPTW) of marginal structural

models

The second g-method uses weighting instead of conditioning to estimate the average counterfactual outcome associated with different exposure regimes.

Inverse probability of treatment weighting (IPTW) refers to the process of creating a β€˜pseudo- population’ by estimating the expected value of each measurement of the exposure

conditional on previous exposure and confounding history in the whole sample, calculating the expected value of each measurement of the exposure for each individual, and then weighting each individual by the inverse of their expected value of each measurement of the exposure. For example, based on the DAG in Figure 3.2 and assuming linearity, we would first estimate the parameters of the following models:

𝑋0= 𝛼00+ πœ€π‘‹0

𝑋1= 𝛼01+ 𝛼11𝑋0+ 𝛼21𝑀1+ πœ€π‘‹1

For any individual, we can then calculate the expected value of 𝑋0, and the expected value of

𝑋1 when 𝑋0= π‘₯0 and 𝑀1= π‘š1 as:

𝑋̂0= 𝛼̂00

𝑋̂1= 𝛼̂01+ 𝛼̂11π‘₯0+ 𝛼̂21π‘š1

Each individual’s weight (𝑀) is then calculated by multiplying the inverse of their expected 𝑋0

by the inverse of their expected 𝑋1, i.e.:

𝑀 = 1

𝑋̂0βˆ™ 1 𝑋̂1

In the resulting pseudo-population, the counterfactual mean associated with each exposure regime is equal to that in the true population, but the exposure at each time point depends only on prior exposure history (i.e. there is no time-dependent confounding). The DAG for the pseudo-population is depicted in Figure 3.3, in which there is no arrow between 𝑀1 and 𝑋1.

Figure 3.3 DAG depicting the pseudo-population created by inverse probability of treatment weighting (IPTW) for the DAG in Figure 3.2

IPTW creates a pseudo-population in which there exists no time-dependent confounding (i.e. there is no arrow between 𝑀1 and 𝑋1).

Because there exists no time-dependent confounding in the pseudo-population, the joint effect of 𝑋0 and 𝑋1 on π‘Œ can be estimated by estimating the parameters of a single model:

π‘Œ = 𝛽0+ 𝛽1𝑋0+ 𝛽2𝑋1+ 𝛽3𝑋0𝑋1+ πœ€π‘Œ

In the above β€˜marginal structural model’, 𝛽1 represents the average effect of 𝑋0, 𝛽2 represents

the average effect of 𝑋1, and 𝛽3 represents the average additional joint effect of 𝑋0 and 𝑋1.

The average value of π‘Œ that would have been observed if the exposures 𝑋0 and 𝑋1 had been

equal to whatever values we are interested in (e.g. π‘₯0 and π‘₯1, respectively) is therefore:

π‘ŒΜ‚ = 𝛽̂0+ 𝛽̂1π‘₯0+ 𝛽̂2π‘₯1+ 𝛽̂3π‘₯0π‘₯1

3.2.2.3 G-estimation of structural nested models (SNMs)

The condition of sequential conditional exchangeability underlies causal inference for time- varying exposures, as outlined in Chapter 2. Moreover, the conceptualisation of longitudinal data as arising from a β€˜nested’ sequence of trials is the foundation for g-estimation, which exploits conditional exchangeability to estimate average counterfactual outcomes (63). Heuristically, the idea is to estimate the average effect of the exposure for the innermost (most recent) trial first (i.e. the average effect of 𝑋1 on π‘Œ), while adjusting for past exposure

and covariate history (i.e. 𝑋0 and 𝑀1, respectively). The estimated effect of 𝑋1 is then removed

from π‘Œ, and the process is repeated for 𝑋0. Ultimately, the average counterfactual outcome

associated with the exposure regime π‘₯0, π‘₯1 is computed.

For the DAG in Figure 3.2 and assuming linearity, for example, we could construct the following two structural nested models (SNMs):

π‘Œ = 𝛽01+ 𝛽11𝑋1+ 𝛽21𝑋1𝑀1+ 𝛽31𝑋1𝑋0+ 𝛽41𝑋1𝑀1𝑋0+ πœ€π‘Œ1

π‘Œ = 𝛽02+ 𝛽12𝑋0+ πœ€π‘Œ2

G-estimation refers to the method by which the parameters of the above models are

estimated. The first model expresses the average effect of 𝑋1 on π‘Œ, which may be modified by

𝑋0 and 𝑀1. The second model expresses the average effect of 𝑋0 on π‘Œ, when the exposure at

time 1 is set to some counterfactual value of interest (i.e. 𝑋1= π‘₯1).

Sequential conditional exchangeability implies that the counterfactual outcome associated with a particular exposure regime π‘₯0, π‘₯1 is independent of the exposure regime that was

actually observed. G-estimation directly leverages this assumption by determining the parameters for which the counterfactual outcomes are statistically independent of the observed exposures. In practice, this often involves a grid search or optimisation algorithm (63).