Chapter 3 Methods for estimating causal effects in longitudinal data
6.5 Discussion
For our example scenario, our simulations broadly aligned with our expectations. That is, both the g-formula and microsimulation faithfully replicated the true natural and counterfactual histories of the population when they correctly modelled the data-generating process of the population. Our results also suggested that small mis-specifications in this context don’t make substantial differences for either the g-formula or microsimulation, but that more serious mis- specifications were more likely to negatively impact MSMs. It can be interpreted with cautious optimism that the most accurate results were produced by the most plausible hypothesised autocorrelation structures (i.e. AS1 and AS2). However, our simulations were deliberately simplified and thus the magnitude of any biases should not be assumed to be transferrable to other contexts.
Our sensitivity analyses produced a larger divergence between the correctly specified and slightly mis-specified autocorrelation structures, and provided evidence for the g-formula being more robust to small mis-specifications in the data-generating process. However, the magnitude of these differences was still relatively modest, suggesting they also may be the result of some other structural factor(s) present in the example context chosen. For instance, both obesity and diabetes were simulated to have a strong serial correlation, reflecting that they are conditions which are difficult to transition out of; the probability of becoming non- obese at any given time point ranged between 0.03 and 0.05 in the original simulation,
whereas the probability of becoming non-diabetic was zero. Moreover, diabetes incidence was simulated to be very low in absolute terms, both in the original simulation and subsequent sensitivity analyses. Across other contexts, in which individuals can more easily transition in and out of different states, the differences might become more pronounced.
Despite the g-formula being potentially more robust to mis-specifications than
microsimulation, it is worth keeping in mind that the utility of MSMs lies in their ability to produce estimates of a future population, which are inherently uncertain; thus, those employing MSMs may be more willing to sacrifice a certain degree accuracy and/or precision for the sake of utility. Nevertheless, where possible, researchers would benefit from modelling different plausible data-generation processes as sensitivity analyses.
6.5.1 Limitations and future work
Our simulations were deliberately simplified in several respects. First, we considered only three binary variables (i.e. sex, obesity, and diabetes), when in reality there are many others which are likely relevant to the causal processes of interest. This simplification also meant that the conditional probabilities of each variable could be nonparametrically estimated using both methods. Second, the true data-generating process (as depicted in Figure 6.1) had only first- order autocorrelation, i.e. where variables at one time point did not affect any future variables except for those in the immediately subsequent time point. For example, obesity status at time
𝑡 was dependent only on variables at time 𝑡 − 1 and not on any variables at time 𝑡 − 2. Third, as simulated, the true probabilities governing transitions in and out of obesity were the same for every time point, which could be interpreted as representing no change in the underlying obesogenic environment. We did not consider a situation involving transition problems which changed over time. We suspect that had the true data-generating process been more complex, and had the true transition probabilities been simulated to change over time, mis-
specifications in the hypothesised data-generating process when using the g-formula and microsimulation would have been more consequential.
Other limitations include that we did not consider interventions which varied over the course of the simulation, and that one of the hypothesised autocorrelation structures considered (i.e. AS3, in Figure 6.5) was so simple that it is unlikely to be encountered in practical applications. Nevertheless, it was chosen as part of a broad range of possible data-generating processes. We also did not consider the added complexity of parameterising our MSMs with estimates from different datasets, as this was not the primary focus of this particular research.27 Future
simulations are warranted to explore these issues, with the current simulation providing a foundation for doing so.
6.6 Summary
Microsimulation provides a promising method for estimating causal effects in a longitudinal setting via the simulation counterfactual scenarios. This chapter demonstrates the utility of DAGs for understanding how specification of data-generating processes impacts on estimation of both natural and counterfactual histories. DAGs are also demonstrated to be an invaluable tool for clearly explicating the assumptions made about the causal structure of an MSM, thereby aiding interpretability and reproducibility. The simulations presented in this chapter provide a framework for evaluating individual-based simulation methods intended for causal inference, and inform how the robustness and reliability of such methods may be improved by accurately capturing data-generating processes.
27 Murray, E.J. et al. (99) have begun to explore this issue, and provide a useful starting point for
considering some of the potential issues arising from the combination of parameter estimates which have come from populations which differ in their distribution of unmeasured confounders.