Chapter 3 Methods for estimating causal effects in longitudinal data
3.2 DAG-informed regression methods
As introduced previously (Β§2.5.2), a DAG is a qualitative (i.e. nonparametric) map of the data- generating process for a set of variables (39). For any given DAG, the principles of graphical model theory provide a way of determining whether a causal effect can be identified and, if so, what set(s) of variables need to be conditioned on to do so.
Where the true structure of a DAG is not known, as in almost all observational contexts, its structure must be assumed based upon subject matter knowledge and theories, and then tested and further refined according to available data (47, 57). In this way, the DAG represents the hypothesised data-generating process, and all inferences are made subject to the DAG being correct.
This DAG may then also be combined with parametric assumptions about the data-generating process in order to estimate causal effects. The primary method for achieving this is through regression modelling. In the following subsections, we outline how DAG-informed regression modelling can be implemented in order to estimate causal effects in observational data, for both time-fixed (Β§3.2.1) and time-varying exposures (Β§3.2.2).
Throughout, we use capital letters (e.g. π) to denote random variables and small letters to denote specific values (e.g. π¦ = 0 or π¦ = 1), by convention (26).
3.2.1 For time-fixed exposures
To illustrate, we consider the DAG in Figure 3.1, which represents the hypothesised data- generating process for a time-fixed exposure π, outcome π, confounders π΄, π΅, and πΆ, and mediator π· in a population of individuals (all continuous random variables).
Figure 3.1 DAG depicting the hypothesised data-generating process for a time-fixed exposure πΏ, an outcome π, a set of confounders π¨, π©, and πͺ, and a mediator π«
Observed and/or measured variables are depicted in rectangular boxes, and latent variables are depicted in ovals.
By the backdoor criterion (Β§2.5.3.1), there exist two sets of variables which are minimally sufficient for identifying the total causal effect of π on π:
Set 1: π΄ and π΅ Set 2: πΆ
Therefore, conditioning on either of these sets of variables will allow us to estimate the desired total causal effect. However, given that πΆ is unmeasured, Set 1 would be chosen as the conditioning set.
In the context of linear regression, conditioning is achieved by including the variable as a covariate in the model. Estimating the total causal effect of π on π in our example context thus becomes a matter of estimating the parameters of the following model:
π = π½0+ π½1π + π½2π΄ + π½3π΅ + π
Assuming the model has been correctly parameterised, we are able to interpret π½Μ1 as the
estimated total causal effect of π on π. In other words, for individuals with the same values of π΄ and π΅ (i.e. conditionally exchangeable groups), every one-unit difference in the exposure corresponds to an expected difference in the outcome of π½Μ1.
The expected counterfactual outcome associated with a particular value π₯ of the exposure for an individual whose values of π΄ and π΅ were equal to π and π, respectively, can thus be computed as:
πΜ = π½Μ0+ π½Μ1π₯ + π½Μ2π + π½Μ3π
3.2.2 For time-varying exposures
We next consider the DAG in Figure 3.2, which represents the hypothesised data-generating process for two measurements of a time-varying exposure π (i.e. π0 and π1), one subsequent
outcome π, and one time-dependent confounder π1 (all continuous random variables) in a
Figure 3.2 DAG depicting the hypothesised data-generating process for two measurements of a time-varying exposure πΏ (i.e. πΏπ and πΏπ), one outcome π, and one time-dependent
confounder π΄π
The joint causal effect of π0 and π1 on the outcome π is identifiable by the sequential
backdoor criterion (39). However, simultaneously conditioning and not conditioning on π1 is
impossible in a conventional single-equation regression model (39). Thus, one of the three βg- methodsβ may be used to estimate the average counterfactual outcomes associated with different exposure regimes. Each g-method is summarised in the following subsections; more detailed descriptions are provided by Robins, J.M. and M.A. HernΓ‘n (26), Naimi, A.I. et al. (56), Daniel, R.M. et al. (58), Arnold, K.F. and M.S. Gilthorpe (59), Taubman, S.L. et al. (60), Robins, J.M. et al. (61), Vansteelandt, S. and M. Joffe (62), and Picciotto, S. and A.M. Neophytou (63).
3.2.2.1 The (parametric) g-formula
Implementing the parametric g-formula requires that we first use our data to estimate the functions which govern the data-generating process, thereby creating a sequence of functions which combine to generate the values for every endogenous node in the DAG.8 For example, if
we assume a linear process, we would estimate the parameters for each of the following models:
π1 = π½00+ π½10π0+ ππ1
π1= π½01+ π½11π0+ π½21π1+ ππ1 π = π½02+ π½
12π0+ π½22π1+ π½32π1+ ππ1
Estimating the average value of π that would have been observed if the exposures π0 and π1
had been equal to whatever values we are interested in (e.g. π₯0 and π₯1, respectively) therefore
requires replacing π0 with π₯0 and π1 with π₯1 in our estimated models and sequentially
computing the expected value of each variable, as in: πΜ1= π½Μ00+ π½Μ10π₯0
π1= π₯0 πΜ = π½Μ02+ π½Μ
12π₯0+ π½Μ22πΜ1+ π½Μ32π₯1
8 In low-dimensional settings with discrete data, the conditional probability of each variable may be
estimated nonparametrically; in such cases, this method is simply referred to as βthe g-formulaβ (64).
The g-formula thus effectively simulates the joint distribution of the variables that would have been observed under a hypothetical intervention targeting the exposure, based on the joint distribution that was actually observed (6).
3.2.2.2 Inverse probability of treatment weighting (IPTW) of marginal structural
models
The second g-method uses weighting instead of conditioning to estimate the average counterfactual outcome associated with different exposure regimes.
Inverse probability of treatment weighting (IPTW) refers to the process of creating a βpseudo- populationβ by estimating the expected value of each measurement of the exposure
conditional on previous exposure and confounding history in the whole sample, calculating the expected value of each measurement of the exposure for each individual, and then weighting each individual by the inverse of their expected value of each measurement of the exposure. For example, based on the DAG in Figure 3.2 and assuming linearity, we would first estimate the parameters of the following models:
π0= πΌ00+ ππ0
π1= πΌ01+ πΌ11π0+ πΌ21π1+ ππ1
For any individual, we can then calculate the expected value of π0, and the expected value of
π1 when π0= π₯0 and π1= π1 as:
πΜ0= πΌΜ00
πΜ1= πΌΜ01+ πΌΜ11π₯0+ πΌΜ21π1
Each individualβs weight (π€) is then calculated by multiplying the inverse of their expected π0
by the inverse of their expected π1, i.e.:
π€ = 1
πΜ0β 1 πΜ1
In the resulting pseudo-population, the counterfactual mean associated with each exposure regime is equal to that in the true population, but the exposure at each time point depends only on prior exposure history (i.e. there is no time-dependent confounding). The DAG for the pseudo-population is depicted in Figure 3.3, in which there is no arrow between π1 and π1.
Figure 3.3 DAG depicting the pseudo-population created by inverse probability of treatment weighting (IPTW) for the DAG in Figure 3.2
IPTW creates a pseudo-population in which there exists no time-dependent confounding (i.e. there is no arrow between π1 and π1).
Because there exists no time-dependent confounding in the pseudo-population, the joint effect of π0 and π1 on π can be estimated by estimating the parameters of a single model:
π = π½0+ π½1π0+ π½2π1+ π½3π0π1+ ππ
In the above βmarginal structural modelβ, π½1 represents the average effect of π0, π½2 represents
the average effect of π1, and π½3 represents the average additional joint effect of π0 and π1.
The average value of π that would have been observed if the exposures π0 and π1 had been
equal to whatever values we are interested in (e.g. π₯0 and π₯1, respectively) is therefore:
πΜ = π½Μ0+ π½Μ1π₯0+ π½Μ2π₯1+ π½Μ3π₯0π₯1
3.2.2.3 G-estimation of structural nested models (SNMs)
The condition of sequential conditional exchangeability underlies causal inference for time- varying exposures, as outlined in Chapter 2. Moreover, the conceptualisation of longitudinal data as arising from a βnestedβ sequence of trials is the foundation for g-estimation, which exploits conditional exchangeability to estimate average counterfactual outcomes (63). Heuristically, the idea is to estimate the average effect of the exposure for the innermost (most recent) trial first (i.e. the average effect of π1 on π), while adjusting for past exposure
and covariate history (i.e. π0 and π1, respectively). The estimated effect of π1 is then removed
from π, and the process is repeated for π0. Ultimately, the average counterfactual outcome
associated with the exposure regime π₯0, π₯1 is computed.
For the DAG in Figure 3.2 and assuming linearity, for example, we could construct the following two structural nested models (SNMs):
π = π½01+ π½11π1+ π½21π1π1+ π½31π1π0+ π½41π1π1π0+ ππ1
π = π½02+ π½12π0+ ππ2
G-estimation refers to the method by which the parameters of the above models are
estimated. The first model expresses the average effect of π1 on π, which may be modified by
π0 and π1. The second model expresses the average effect of π0 on π, when the exposure at
time 1 is set to some counterfactual value of interest (i.e. π1= π₯1).
Sequential conditional exchangeability implies that the counterfactual outcome associated with a particular exposure regime π₯0, π₯1 is independent of the exposure regime that was
actually observed. G-estimation directly leverages this assumption by determining the parameters for which the counterfactual outcomes are statistically independent of the observed exposures. In practice, this often involves a grid search or optimisation algorithm (63).