Downweighting or removing genetic variants

2.6 Sensitivity analyses

2.6.1 Downweighting or removing genetic variants

In this Section, we consider methods that try to identify pleiotropic genetic variants to downweight their contribution to the causal estimate, or exclude them from the analysis. These methods assume that there is a subgroup of genetic variants that satisfy the IV assumptions. Note that genetic variants may be assigned a weight of zero but are not explicitly removed from the dataset.

Downweighting genetic variants

The simple median estimator is the median of the J ratio estimates ˆθj (j = 1, . . . , J)

[52]. It will produce consistent causal estimates if at least 50% of the genetic variants are valid IVs, known as the ‘50% rule’ or ‘majority rule’ [70]. The standard error of the simple median estimate is obtained through bootstrapping methods. If there is variability in the precision of the ratio estimates then the efficiency of the median

2.6 Sensitivity analyses 23

estimator can be improved by using the inverse-variance weights, known as the ‘weighted median estimator’ [52]. The weighted median estimate is the 50th _{percentile of the}

inverse-variance weighted empirical distribution of ˆθ. The estimates will be consistent if

50% or more of the weights from valid IVs contribute to the weighted median estimate. If more than 50% of the genetic variants are invalid IVs, then the simple median estimator will be biased. To overcome this limitation, Hartwig et al. [71] have introduced a mode-based estimator (MBE) that produces asymptotically consistent causal estimates when more than 50% of the variants are invalid IVs. The estimates from the MBE will be consistent if the valid IVs make up the largest subset of homogeneous ratio estimates, known as the ‘zero modal pleiotropy assumption’ (ZEMPA). The estimate from the MBE is the mode of the smoothed empirical density function of the

J ratio estimates: f(x) = 1 h√2π J X j=1 wM BEjexp − 1 2 x − ˆθ j h 2 ,

where the causal estimate x is the value that maximizes f(x), h is the smoothing bandwidth, and wM BE are the weights [71]. The ratio estimates can either have an

equal contribution to x under the ‘simple MBE’, or standardized inverse-variance weights can be used under the ‘weighted MBE’. The value of h must be specified by the user, with larger values producing more precise estimates. Results from the MBE can be highly sensitive to the values of h [72]. Simulation studies have shown that the MBE may be less efficient than the weighted median estimator [71].

Burgess et al. [72] have developed a heterogeneity-penalized model-averaging method that is based on the ZEMPA assumption. The overall causal estimate from this method is the mode of the mixture distribution of the estimates obtained from each possible subset of genetic variants (excluding subsets with 0 or 1 genetic variants). Larger subsets of genetic variants are given a greater weight in the mixture distribution unless their ratio estimates are heterogeneous, in which case, the contribution of the subset to the mixture model is dramatically reduced. Unlike the method proposed by Hartwig et al. [71], a bandwidth does not need to be specified by the user, and the standard errors are obtained without using bootstrapping methods [72].

Although the median- or mode-based estimators do not explicitly remove any of the genetic variants from the analysis, some of the genetic variants will have no direct contribution to the overall causal estimate. As such, these methods may be less efficient than methods that explicitly exclude genetic variants from the analysis (considered below).

Removing genetic variants

The Mendelian randomization pleiotropy residual sum and outlier (MR-PRESSO) method has three main functions using summary level data: 1) to test for directional pleiotropy (MR-PRESSO global test); 2) to identify outlying variants that may be pleiotropic (MR-PRESSO outlier test); and 3) to empirically test the difference between the IVW estimates using the full and reduced sets of genetic variants [73]. The MR- PRESSO global test is performed by calculating the global observed residual sum of squares (RSSobs): RSSobs = J X j=1 RSSobsj = J X j=1 ( ˆβYj− ˆθIV W−jˆβXj) 2_{se( ˆβ} Y j)−2,

where ˆθIV W−j is the estimate from the IVW model when the jth genetic variant has

been removed. RSSobs is compared against a simulated expected distribution of the

residual sum of squares under the null hypothesis of no pleiotropy. The expected RSS is given by: RSSexp = J X j=1 RSSexpj = J X j=1 ( ˆβ′ Yj− ˆθIV W−jˆβ ′ Xj) 2_{se( ˆβ} Y j)−2,

where the genetic associations ˆβ′

Xj and ˆβ

′

Yj are simulated from the normal distributions:

ˆβ′

Xj ∼ N( ˆβXj,se( ˆβXj)

2_{) and ˆβ}′

Yj ∼ N(ˆθIV W−jˆβXj,se( ˆβYj)

2_{) .}

RSSexp is generated N times to obtain a distribution of N expected residual sum of

squares. An empirical p-value for the global test of directional pleiotropy is calculated as the proportion of times the N expected residual sum of squares is greater than RSSobs. Verbanck et al. [73] recommend that N ≥ 1, 000 to ensure there is adequate

precision of the p-value. Note that the individual residual sum of squares RSSobsj

can be used to identify pleiotropic genetic variants, allowing for the ‘corrected’ causal estimate to be obtained from the IVW method based on the reduced set of genetic variants.

The global and individual tests for direct effects (GLIDE) method has also been proposed to detect pleiotropy in summary level data [74]. Like the MR-PRESSO method, GLIDE tests for global pleiotropy among the genetic variants and tries to identify and exclude genetic variants that may be pleiotropic. This method uses Q-Q plots and permutation procedures to identify pleiotropic genetic variants. Rather than

2.6 Sensitivity analyses 25

considering a continuous outcome, the GLIDE method expresses the causal effect in terms of the relative risk.

Other methods have been proposed that identify and remove pleiotropic variants, including the heterogeneity in dependent instruments (HEIDI) method that was introduced under the summary data-based Mendelian randomization (SMR) method [75], and was developed further under the generalized SMR framework (GSMR).

In document Robust methods in Mendelian randomization (Page 52-55)