DAG-informed regression methods - Methods for estimating causal effects in longitudinal data

Chapter 3 Methods for estimating causal effects in longitudinal data

3.2 DAG-informed regression methods

As introduced previously (§2.5.2), a DAG is a qualitative (i.e. nonparametric) map of the data- generating process for a set of variables (39). For any given DAG, the principles of graphical model theory provide a way of determining whether a causal effect can be identified and, if so, what set(s) of variables need to be conditioned on to do so.

Where the true structure of a DAG is not known, as in almost all observational contexts, its structure must be assumed based upon subject matter knowledge and theories, and then tested and further refined according to available data (47, 57). In this way, the DAG represents the hypothesised data-generating process, and all inferences are made subject to the DAG being correct.

This DAG may then also be combined with parametric assumptions about the data-generating process in order to estimate causal effects. The primary method for achieving this is through regression modelling. In the following subsections, we outline how DAG-informed regression modelling can be implemented in order to estimate causal effects in observational data, for both time-fixed (§3.2.1) and time-varying exposures (§3.2.2).

Throughout, we use capital letters (e.g. 𝑌) to denote random variables and small letters to denote specific values (e.g. 𝑦 = 0 or 𝑦 = 1), by convention (26).

3.2.1 For time-fixed exposures

To illustrate, we consider the DAG in Figure 3.1, which represents the hypothesised data- generating process for a time-fixed exposure 𝑋, outcome 𝑌, confounders 𝐴, 𝐵, and 𝐶, and mediator 𝐷 in a population of individuals (all continuous random variables).

Figure 3.1 DAG depicting the hypothesised data-generating process for a time-fixed exposure 𝑿, an outcome 𝒀, a set of confounders 𝑨, 𝑩, and 𝑪, and a mediator 𝑫

Observed and/or measured variables are depicted in rectangular boxes, and latent variables are depicted in ovals.

By the backdoor criterion (§2.5.3.1), there exist two sets of variables which are minimally sufficient for identifying the total causal effect of 𝑋 on 𝑌:

Set 1: 𝐴 and 𝐵 Set 2: 𝐶

Therefore, conditioning on either of these sets of variables will allow us to estimate the desired total causal effect. However, given that 𝐶 is unmeasured, Set 1 would be chosen as the conditioning set.

In the context of linear regression, conditioning is achieved by including the variable as a covariate in the model. Estimating the total causal effect of 𝑋 on 𝑌 in our example context thus becomes a matter of estimating the parameters of the following model:

𝑌 = 𝛽₀+ 𝛽₁𝑋 + 𝛽₂𝐴 + 𝛽₃𝐵 + 𝜀

Assuming the model has been correctly parameterised, we are able to interpret 𝛽̂1 as the

estimated total causal effect of 𝑋 on 𝑌. In other words, for individuals with the same values of 𝐴 and 𝐵 (i.e. conditionally exchangeable groups), every one-unit difference in the exposure corresponds to an expected difference in the outcome of 𝛽̂1.

The expected counterfactual outcome associated with a particular value 𝑥 of the exposure for an individual whose values of 𝐴 and 𝐵 were equal to 𝑎 and 𝑏, respectively, can thus be computed as:

𝑌̂ = 𝛽̂₀+ 𝛽̂₁𝑥 + 𝛽̂₂𝑎 + 𝛽̂₃𝑏

3.2.2 For time-varying exposures

We next consider the DAG in Figure 3.2, which represents the hypothesised data-generating process for two measurements of a time-varying exposure 𝑋 (i.e. 𝑋0 and 𝑋1), one subsequent

outcome 𝑌, and one time-dependent confounder 𝑀1 (all continuous random variables) in a

Figure 3.2 DAG depicting the hypothesised data-generating process for two measurements of a time-varying exposure 𝑿 (i.e. 𝑿𝟎 and 𝑿𝟏), one outcome 𝒀, and one time-dependent

confounder 𝑴𝟏

The joint causal effect of 𝑋0 and 𝑋1 on the outcome 𝑌 is identifiable by the sequential

backdoor criterion (39). However, simultaneously conditioning and not conditioning on 𝑀1 is

impossible in a conventional single-equation regression model (39). Thus, one of the three ‘g- methods’ may be used to estimate the average counterfactual outcomes associated with different exposure regimes. Each g-method is summarised in the following subsections; more detailed descriptions are provided by Robins, J.M. and M.A. Hernán (26), Naimi, A.I. et al. (56), Daniel, R.M. et al. (58), Arnold, K.F. and M.S. Gilthorpe (59), Taubman, S.L. et al. (60), Robins, J.M. et al. (61), Vansteelandt, S. and M. Joffe (62), and Picciotto, S. and A.M. Neophytou (63).

3.2.2.1 The (parametric) g-formula

Implementing the parametric g-formula requires that we first use our data to estimate the functions which govern the data-generating process, thereby creating a sequence of functions which combine to generate the values for every endogenous node in the DAG.8_{For example, if}

we assume a linear process, we would estimate the parameters for each of the following models:

𝑀1 = 𝛽00+ 𝛽10𝑋0+ 𝜀𝑀1

𝑋1= 𝛽01+ 𝛽11𝑋0+ 𝛽21𝑀1+ 𝜀𝑋1 𝑌 = 𝛽₀2_{+ 𝛽}

12𝑋0+ 𝛽22𝑀1+ 𝛽32𝑋1+ 𝜀𝑌1

Estimating the average value of 𝑌 that would have been observed if the exposures 𝑋0 and 𝑋1

had been equal to whatever values we are interested in (e.g. 𝑥0 and 𝑥1, respectively) therefore

requires replacing 𝑋0 with 𝑥0 and 𝑋1 with 𝑥1 in our estimated models and sequentially

computing the expected value of each variable, as in: 𝑀̂1= 𝛽̂00+ 𝛽̂10𝑥0

𝑋₁= 𝑥₀ 𝑌̂ = 𝛽̂₀2_{+ 𝛽̂}

12𝑥0+ 𝛽̂22𝑀̂1+ 𝛽̂32𝑥1

8_{In low-dimensional settings with discrete data, the conditional probability of each variable may be}

estimated nonparametrically; in such cases, this method is simply referred to as ‘the g-formula’ (64).

The g-formula thus effectively simulates the joint distribution of the variables that would have been observed under a hypothetical intervention targeting the exposure, based on the joint distribution that was actually observed (6).

3.2.2.2 Inverse probability of treatment weighting (IPTW) of marginal structural

models

The second g-method uses weighting instead of conditioning to estimate the average counterfactual outcome associated with different exposure regimes.

Inverse probability of treatment weighting (IPTW) refers to the process of creating a ‘pseudo- population’ by estimating the expected value of each measurement of the exposure

conditional on previous exposure and confounding history in the whole sample, calculating the expected value of each measurement of the exposure for each individual, and then weighting each individual by the inverse of their expected value of each measurement of the exposure. For example, based on the DAG in Figure 3.2 and assuming linearity, we would first estimate the parameters of the following models:

𝑋0= 𝛼00+ 𝜀𝑋0

𝑋1= 𝛼01+ 𝛼11𝑋0+ 𝛼21𝑀1+ 𝜀𝑋1

For any individual, we can then calculate the expected value of 𝑋0, and the expected value of

𝑋₁ when 𝑋0= 𝑥0 and 𝑀1= 𝑚1 as:

𝑋̂0= 𝛼̂00

𝑋̂1= 𝛼̂01+ 𝛼̂11𝑥0+ 𝛼̂21𝑚1

Each individual’s weight (𝑤) is then calculated by multiplying the inverse of their expected 𝑋0

by the inverse of their expected 𝑋1, i.e.:

𝑤 = 1

𝑋̂0∙ 1 𝑋̂1

In the resulting pseudo-population, the counterfactual mean associated with each exposure regime is equal to that in the true population, but the exposure at each time point depends only on prior exposure history (i.e. there is no time-dependent confounding). The DAG for the pseudo-population is depicted in Figure 3.3, in which there is no arrow between 𝑀1 and 𝑋1.

Figure 3.3 DAG depicting the pseudo-population created by inverse probability of treatment weighting (IPTW) for the DAG in Figure 3.2

IPTW creates a pseudo-population in which there exists no time-dependent confounding (i.e. there is no arrow between 𝑀1 and 𝑋1).

Because there exists no time-dependent confounding in the pseudo-population, the joint effect of 𝑋0 and 𝑋1 on 𝑌 can be estimated by estimating the parameters of a single model:

𝑌 = 𝛽₀+ 𝛽₁𝑋₀+ 𝛽₂𝑋₁+ 𝛽₃𝑋₀𝑋₁+ 𝜀_𝑌

In the above ‘marginal structural model’, 𝛽1 represents the average effect of 𝑋0, 𝛽2 represents

the average effect of 𝑋1, and 𝛽3 represents the average additional joint effect of 𝑋0 and 𝑋1.

The average value of 𝑌 that would have been observed if the exposures 𝑋0 and 𝑋1 had been

equal to whatever values we are interested in (e.g. 𝑥0 and 𝑥1, respectively) is therefore:

𝑌̂ = 𝛽̂₀+ 𝛽̂₁𝑥₀+ 𝛽̂₂𝑥₁+ 𝛽̂₃𝑥₀𝑥₁

3.2.2.3 G-estimation of structural nested models (SNMs)

The condition of sequential conditional exchangeability underlies causal inference for time- varying exposures, as outlined in Chapter 2. Moreover, the conceptualisation of longitudinal data as arising from a ‘nested’ sequence of trials is the foundation for g-estimation, which exploits conditional exchangeability to estimate average counterfactual outcomes (63). Heuristically, the idea is to estimate the average effect of the exposure for the innermost (most recent) trial first (i.e. the average effect of 𝑋1 on 𝑌), while adjusting for past exposure

and covariate history (i.e. 𝑋0 and 𝑀1, respectively). The estimated effect of 𝑋1 is then removed

from 𝑌, and the process is repeated for 𝑋0. Ultimately, the average counterfactual outcome

associated with the exposure regime 𝑥0, 𝑥1 is computed.

For the DAG in Figure 3.2 and assuming linearity, for example, we could construct the following two structural nested models (SNMs):

𝑌 = 𝛽01+ 𝛽11𝑋1+ 𝛽21𝑋1𝑀1+ 𝛽31𝑋1𝑋0+ 𝛽41𝑋1𝑀1𝑋0+ 𝜀𝑌1

𝑌 = 𝛽02+ 𝛽12𝑋0+ 𝜀𝑌2

G-estimation refers to the method by which the parameters of the above models are

estimated. The first model expresses the average effect of 𝑋1 on 𝑌, which may be modified by

𝑋₀ and 𝑀1. The second model expresses the average effect of 𝑋0 on 𝑌, when the exposure at

time 1 is set to some counterfactual value of interest (i.e. 𝑋1= 𝑥1).

Sequential conditional exchangeability implies that the counterfactual outcome associated with a particular exposure regime 𝑥0, 𝑥1 is independent of the exposure regime that was

actually observed. G-estimation directly leverages this assumption by determining the parameters for which the counterfactual outcomes are statistically independent of the observed exposures. In practice, this often involves a grid search or optimisation algorithm (63).

In document Statistical and simulation-based modelling approaches for causal inference in longitudinal data: Integrating counterfactual thinking into established methods for longitudinal data analysis (Page 44-49)