2.6 Sensitivity analyses
2.6.2 MR-Egger
The MR-Egger (Mendelian randomization-Egger) method [29] was adapted from Egger regression, a tool used in meta-analysis to detect small study bias [76]. MR-Egger can be used to estimate consistent causal effects in the presence of pleiotropic genetic variants, and to test the validity of the IV assumptions. The MR-Egger method replaces the IV3 condition with the untestable assumption that the genetic associations with the risk factor are independent of the direct effects of the genetic variants on the outcome (βX ⊥⊥ α), known as the InSIDE (instrument strength independent of direct
effect) assumption. Like the IVW method, MR-Egger also assumes that the NOME assumption is satisfied.
The MR-Egger method fits the weighted linear regression of the genetic association estimates with the risk factor ( ˆβXj) and the genetic association estimates with the
outcome ( ˆβYj) [45], with the inverse-variance as weights (se( ˆβYj)
−2) and the intercept
unrestrained:
ˆβYj = θ0E+ θEˆβXj+ ϵEj ϵEj ∼ N(0, φ
2
Ese( ˆβYj)
2) , (2.9)
where θ0E is the intercept term, θE is the MR-Egger causal effect, ϵEj is the error
term, and φE represents the residual standard error under the MR-Egger model. The
interpretation of the estimates ˆθ0E and ˆθE from the MR-Egger method are discussed
alongside the InSIDE and NOME assumptions in the subsections below. ˆθE and the InSIDE assumption
We initially assume that there is no estimation (or measurement) error in the genetic associations with the risk factor, i.e. the NOME assumption is satisfied. If there are no pleiotropic effects (α = 0), the MR-Egger estimate ˆθE should be asymptotically
equivalent to the IVW estimate ˆθIV W (Equation 2.7). The InSIDE assumption must
be satisfied for ˆθE to be a consistent estimate of the causal effect θ in the presence of
the weighted covariance of βX and α (covw(α, βX)) will tend to zero as the number
of genetic variants J tends to infinity. The estimate of θ from MR-Egger is: ˆθE = covw( ˆβY, ˆβX) varw( ˆβX) N →∞ −−−→ covw(βY, βX) varw(βX) = θ +covw(α, βX) varw(βX) , (2.10)
which is equal to θ if the InSIDE assumption is satisfied, where covw and varw represent
the weighted covariance and weighted variance using the inverse-variance as weights se( ˆβYj) −2: covw(α, βX) = P j(αj −¯αw)(βXj− ¯βXw) se( ˆβYj) −2 P jse( ˆβYj) −2 , varw(βX) = P j(βXj − ¯βXw)2se( ˆβYj) −2 P jse( ˆβYj) −2 , ¯αw = P jαjse( ˆβYj) −2 P jse( ˆβYj) −2 , ¯βXw = P jβXjse( ˆβYj) −2 P jse( ˆβYj) −2 .
If the InSIDE assumption is violated, and there is balanced or directional pleiotropy, then the estimate of θ from MR-Egger will be biased due to the non-zero bias term in Equation 2.10.
It seems more plausible that the InSIDE assumption will hold if the pleiotropic effects are independent of the unmeasured variables U that confound the X − Y association (δ = 0 in Figure 2.2). If the pleiotropic variants do lie on the same causal pathway as the unmeasured confounders U of the X − Y association (δ ̸= 0), then it is difficult to conceive how the InSIDE assumption would be satisfied as the strength of the genetic associations with X will depend on the strength of the pleiotropic effects via
U. As noted in Section 2.4, throughout this dissertation we assume that a pleiotropic
variant is associated with the outcome Y via a causal pathway that is independent of the risk factor X and the set of unmeasured confounding variables U (αj ̸= 0 in
Figure 2.2).
In terms of the standard error of ˆθE, estimating the intercept term in Equation 2.9
will result in less precise estimates of the causal effect from the MR-Egger method compared to the IVW method. Since the MR-Egger method allows for the possibility of the genetic variants to be pleiotropic, applying a fixed-effects model to Equation 2.9 would not be logical. A multiplicative random-effects model, where the residual standard
2.6 Sensitivity analyses 27
error φE is estimated, is therefore applied to the MR-Egger method throughout this
dissertation.
Orientation of the genetic variants
In this dissertation, we assume that Gj (j = 1, . . . , J) can take the value 0, 1 or 2,
representing the number of risk-increasing or risk-decreasing alleles of a bi-allelic genetic variant. The interpretation of the genetic associations with the risk factor or outcome will depend on whether Gj relates to the risk-increasing allele or risk-decreasing allele. If Gj has been orientated with respect to the risk-increasing allele for X, then the genetic
association with the risk factor represents the average change in X per additional copy of the risk factor-increasing allele. Since the intercept term in Equation 2.4 is fixed at zero, the orientation of the genetic variants has no affect on the estimate of the causal effect θ from the IVW method.
The orientation of the genetic variants will affect the estimates ˆθ0E and ˆθE from the
MR-Egger method as the orientation will determine the definition of the pleiotropic effect. Hence, the orientation of the genetic variants will also affect the defintion of the InSIDE assumption. Bowden et al. [29] therefore suggest that the genetic variants be orientated to ensure the direction of the genetic associations with the risk factor are either positive for all variants or negative for all variants.
ˆθ0E and the MR-Egger intercept test
If we assume that the genetic variants Gj (j = 1, . . . , J) are orientated with respect
to the risk factor-increasing alleles, and the InSIDE assumption is satisfied, then the estimate of the intercept term ˆθ0E can be interpreted as the average direct effect
of the J genetic variants with respect to the risk factor-increasing alleles [52]. The InSIDE assumption will be satisfied if βX ⊥⊥ αwhen the genetic variants are orientated
with respect to the risk factor-increasing alleles. If there is balanced pleiotropy, and the InSIDE assumption is valid, then the intercept term should tend to zero as the sample size increases. If the intercept term differs from zero, then either the InSIDE assumption is violated, or there is directional pleiotropy, or both conditions are violated. Testing the intercept term in Equation 2.9 is a way of assessing the validity of the IV assumptions, and is known as the ‘MR-Egger intercept test’ [46].
Violation of the NOME assumption
We now consider the impact the violation of the NOME assumption has on the MR- Egger method. First consider the weighted variance of the genetic associations with the risk factor:
varw( ˆβX) = varw(βX) + s2w,
where s2
w is the weighted average of the variability in ˆβX explained by estimation (or
measurement) error. If the NOME assumption is satisfied, there is no uncertainty in the genetic associations with the risk factor, and s2
w is equal to zero.
If the InSIDE assumption is satisfied, then the expected value of the MR-Egger estimate can be expresssed as [49]:
E[ˆθE] = E covw( ˆβY, ˆβX) varw(βX) varw(βX) varw( ˆβX) ≈ θ varw(βX) varw(βX) + s2w , (2.11)
and the MR-Egger method will produce a consistent estimate of the casual effect if the NOME assumption is satisfied. From Equation 2.11, the MR-Egger estimate will be attenuated towards zero if the NOME assumption is violated (s2
w ̸= 0). Violation of the
NOME assumption will also lead to an increased Type I error rate for the MR-Egger intercept test [49].
The extent to which the MR-Egger estimate ˆθE is attenuated is dependent upon
varw(βX) and s2w. If there is a lot of variability in βX, and little estimation error, then
the attenuation of the MR-Egger estimate towards the null will be small. However, if there is little variability in βX relative to the estimation error, then the attenuation of
the MR-Egger estimate will be more severe.
To account for the violation of the NOME assumption in Equation 2.11, we require an estimate of varw(βX)/ varw( ˆβX). Bowden et al. [49] have shown that
varw(βX)/ varw( ˆβX) can be estimated through an adapted version of the I2 statistic
used in the meta-analysis literature to assess heterogeneity:
I2 = (QGX −(J − 1)) QGX
, (2.12)
where QGX is Cochran’s Q statistic for the genetic associations with the risk factor: QGX = J X j=1 ( ˆβXjse( ˆβYj) −1− ¯ˆβ X)2 se( ˆβXj)2se( ˆβYj) −2 ,
2.6 Sensitivity analyses 29
and ¯ˆβX is the mean of the genetic associations with the risk factor weighted by se( ˆβXj)
−2.
The I2 statistic will lie between 0 and 1, with smaller values corresponding to more
biased MR-Egger estimates. If the I2 statistic is close to 1, then there should be little
or no attenuation of the causal estimate from the MR-Egger method.
Since Bowden et al. [49] obtained unstable results when the MR-Egger estimate ˆθE was divided by I2, the authors suggest that a simulation extrapolation (SIMEX)
method be used to adjust for the violation of the NOME assumption when I2 <0.9.
Under the SIMEX approach, estimates of the genetic associations with the risk factor ˆβλ
Xj (j = 1, . . . , J) are simulated from:
ˆβλ
Xj ∼ N( ˆβXj, λse( ˆβ
λ Xj)
2) ,
where ˆβXj and se( ˆβ
λ
Xj) are the observed data, and var( ˆβ
λ
X) = (1 + λ)se( ˆβ λ
X)2. For a
given value of λ > 0, the simulated genetic associations ˆβλ
Xj (j = 1, . . . , J) and the
observed genetic associations with the outcome ˆβYj (j = 1, . . . , J) are used to obtain
a MR-Egger estimate of the causal effect. This process is repeated multiple times to obtain an average value of the MR-Egger estimate for a specific value of λ. This whole process is then applied to a range of λ values that increase in small increments. As λ increases, the average MR-Egger estimate will decrease as there will be more attenuation towards zero. The average values for the MR-Egger estimates from the different values of λ are extrapolated to estimate what the MR-Egger estimate may have been if the NOME assumption had been satisfied.
Instrument strength
The strength of the association between the genetic variants and the risk factor for the IVW method is usually assessed through the F-statistic from the regression of the risk factor on the genetic variant(s). Genetic variants are often classified as ‘weak’ IVs if they have a F-statistic less than 10. Weak IVs will produce asymptotically unbiased causal estimates, but under finite samples they will bias the causal estimate (known as ‘weak instrument bias’) [77, 78]. For one–sample Mendelian randomization, this bias will be towards the confounded observational association, and for two–sample Mendelian randomization the bias will be towards the null [79].
For MR-Egger, instrument strength should be assessed through the I2 statistic
(Equation 2.12) rather than the F-statistic. An I2 value close to 1 suggests that the
MR-Egger estimate does not suffer from weak instrument bias. If the I2 statistic is
be approximately 10% of the causal effect θ. To correspond with the classification of ‘weak’ IVs under the F-statistic for the IVW method, Bowden et al. [49] suggest that
the SIMEX method be applied when the I2 is less than 0.9.