Conclusion - Robust methods in Mendelian randomization

To calculate the Sanderson-Windmeijer conditional F-statistics we require individual level data on the J genetic variants Gj (j = 1, . . . , J) and K risk factors Xk (k =

1, . . . , K). In order to assess the strength of the IVs when only summary level data on the risk factors are available (i.e. in a two–sample Mendelian randomization study), Sanderson et al. [82] have proposed an adapted version of the Cochran’s Q statistic to test for instrument strength. Although this modified version of the Cochran’s Q statistic performed well in the authors’ simulation study, the test statistic requires information on the covariance structure between the genetic associations of the K risk factors.

Benefits of multivariable Mendelian randomization

The main benefit of multivariable Mendelian randomization is that it offers an alternative to ‘univariable Mendelian randomization’ (as considered in Section 2.2), and can be used in the primary or sensitivity analysis. Using multivariable Mendelian randomization may be a good alternative to univariable Mendelian randomization if: a) the IV2 or IV3 assumptions for univariable Mendelian randomization are suspected to be violated; and/or b) the risk factor under consideration is known to be correlated with other risk factors, and the causal effect that all of these risk factors have on the outcome want to be investigated. For either case, assumptions about the relationships between the genetic variants, risk factors and outcome must be made, and this should be informed by biological evidence.

2.7 Conclusion

Ideally, a genetic variant should only be included in a Mendelian randomization analysis if its biological function is well understood. Although this should reduce the risk of including pleiotropic genetic variants, only considering variants with known biological mechanisms would severely limit the scope of Mendelian randomization. To provide robustness to the results from the main analysis, methods that account for pleiotropic genetic variants must be considered in the sensitivity analysis of a Mendelian randomization study.

Section 2.6 highlighted the range of sensitivity analyses that may be used in Mendelian randomization to account for pleiotropic genetic variants using summary level data. The majority of these methods focus on identifying and removing pleiotropic genetic variants from the analysis. Although the multivariable IVW method was developed to account for measured pleiotropy, there are no methods that can be used

as a sensitivity analysis under a multivariable framework. The application of the MR-Egger method to the multivariable setting in Chapter 4 should help to rectify this gap in the literature.

Since the IVW method is equivalent to performing a meta-analysis of the causal ratio estimates, a lot of the sensitivity analyses and tests for heterogeneity among the causal ratio estimates discussed in Sections 2.5 and 2.6 have been adapted from the meta-analysis literature [48, 63, 29]. Since pleiotropic genetic variants may appear as outlying data points in a Mendelian randomization analysis, we suspected that some of the methods in the robust statistics [84] literature that try to reduce the influence outlying data points have on the analysis may be a useful addition to Mendelian randomization (considered in Chapter 3). Some of these robust methods, such as Lasso penalization [65, 67, 85], have already been adapted to Mendelian randomization with individual level data. Since the use of summary level data in Mendelian randomization continues to grow in popularity, the application of methods like Lasso penalization to summary level data should be considered.

Chapter 3 Downweighting or removing

heterogeneous causal estimates:

robust methods for Mendelian

randomization with multiple

genetic variants

3.1 Introduction

If the genetic variants in a Mendelian randomization study are all valid IVs, then the individual causal ratio estimates should be similar. Pleiotropic genetic variants are likely to have heterogeneous causal ratio estimates. In the two applied examples considered in this Chapter, we found that heterogeneity of the causal estimates may be considered under two scenarios: 1) when there is over–dispersion in the estimates as there is more variance between the variant specific causal estimates than expected by chance (as seen in the effect of body mass index on schizophrenia); and 2) when specific variants have outlying causal estimates, and they alone are responsible for driving the observed heterogeneity (as seen in the effect of low–density lipoprotein on Alzheimer’s disease).

There are numerous methods in the Mendelian randomization literature that detect or account for pleiotropic variants using summary level data. As highlighted in Section 2.6, these methods can be divided into two broad categories: methods that detect pleiotropic genetic variants and either downweight or remove them; and methods

that estimate consistent causal effects in the presence of pleiotropic genetic variants without downweighting their contribution to the causal estimate. In this Chapter, we focus on identifying ‘robust methods’ for summary level data that either downweight or remove genetic variants with heterogeneous causal ratio estimates. We consult the literature on robust statistics and recent developments in Mendelian randomization to identify such methods. The methods identified in this Chapter will be applied to the two examples highlighted above (the effect of body mass index on schizophrenia, and the effect of low-density lipoprotein on Alzheimer’s disease), and we anticipate that these methods will be used in our investigation of the effect of adiposity and body composition on asthma in Chapter 6.

In Section 3.2, we provide a brief overview of the fundamental aims of robust statistics, and discuss why some of the methods in the robust statistics literature may be relevant to Mendelian randomization when there is heterogeneity among the causal ratio estimates. In Section 3.3, we introduce two additional robust methods from the robust statistics literature (robust regression (MM-estimation) and least trimmed squares selection), and outline a selection procedure based on Lasso regression and recent developments in Mendelian randomization [67]. We also outline a robust method that uses the Q-statistic to penalize genetic variants with heterogeneous causal ratio estimates. We apply the methods introduced in Section 3.3 to published data on body mass index and schizophrenia risk, and on low-density lipoprotein and Alzheimer’s disease risk (Section 3.4). In Section 3.5, we perform a simulation study to compare bias and coverage properties of the robust methods when some of the genetic variants are invalid IVs. Finally, we discuss the results of the Chapter and its implications to applied Mendelian randomization research (Section 3.6).

All of the computational work for the applied examples (Section 3.4) and the simulation study (Section 3.5) was written and performed by Jessica Rees in RStudio version 3.5.3 [86] unless explicitly stated otherwise. Details on the packages and libraries used in RStudio are provided throughout the Chapter.

In document Robust methods in Mendelian randomization (Page 65-68)