2.6 Sensitivity analyses
2.6.3 Multivariable Mendelian randomization
Multivariable Mendelian randomization assumes that the direct effect α of the genetic variants on the outcome is fully mediated through additional measured risk factors. Rather than replacing the IV3 assumption with a weaker assumption (as done by the MR-Egger method in Section 2.6.2), multivariable Mendelian randomization expands the IV assumptions to allow for the causal effects of multiple risk factors on the outcome to be estimated in the same model. Whilst the MR-Egger method should only be included in the sensitivity analysis of a Mendelian randomization study, multivariable Mendelian randomization can be used as a sensitivity analysis, or as the primary analysis model.
Instrumental variable assumptions
Suppose we have K continuous risk factors Xk (k = 1, . . . , K), a continuous outcome Y, and K sets of unmeasured confounding variables Uk (k = 1, . . . , K) of the X − Y
associations. The following assumptions must be satisfied in a multivariable Mendelian randomization analysis:
• IV1(M): each genetic variant Gj (j = 1, . . . , J) is associated with at least one of
the K risk factors Xk (k = 1, . . . , K),
• IV2(M): each risk factor Xk (k = 1, . . . , K) is associated with at least one of the J genetic variants Gj (j = 1, . . . , J),
• IV3(M): the variants Gj (j = 1, . . . , J) are independent of all unmeasured
confounders U of each of the risk factor–outcome associations, and
• IV4(M): the variants Gj (j = 1, . . . , J) are independent of the outcome Y condi-
tional on the risk factors X and confounders U [28].
From the above conditions, each genetic variant Gj (j = 1, . . . , J) must be associated
with at least one of the risk factors Xk (k = 1, . . . , K), and each risk factor Xk
(k = 1, . . . , K) must be associated with at least one of the genetic variants Gj (j =
1, . . . , J). Genetic variants that are associated with multiple risk factors can be used in multivariable Mendelian randomization provided that these risk factors are included
2.6 Sensitivity analyses 31
in the analysis. There must be as many genetic variants as there are risk factors for the causal effects to be estimated, i.e. J ≥ K.
Figure 2.4 contains a DAG where the IV assumptions for multivariable Mendelian randomization are satisfied for K = 3 risk factors. We assume that for each individual
i (i = 1, . . . , N1) each risk factor Xki is a linear function of the J genetic variants Gj
(j = 1, . . . , J), the unmeasured confounders Uki of the Xk− Y association, and the
error term ϵXki: Xki = βk0+ J X j=1 βXkjGij + Uki + ϵXki,
where βXkj is the effect of the jth genetic variant on Xk. Gij is the number of minor
alleles at the jth genetic variant for the ith individual, and can take the value 0, 1 or 2.
The J genetic associations with the risk factor Xk can be estimated by regressing Xk
against each of the genetic variants in linear regression models, where it is assumed that the minor allele has an additive effect on Xk. We also assume that for each individual i (i = 1, . . . , N2) the outcome is a linear function of the risk factors Xk (k = 1, . . . , K),
the unmeasured confounders Uki (k = 1, . . . , K) of the X − Y associations, and the
error term ϵYi. For K = 3, we can express Yi as:
Yi = θ0+ θ1X1i + θ2X2i+ θ3X3i+ U1i + U2i+ U3i + ϵYi.
where θ are the direct effects of the risk factors on the outcome. The J genetic associations with the outcome βYj (j = 1, . . . , J) can be estimated by regressing the
outcome against each of the genetic variants in linear regression models.
The aim of a multivariable Mendelian randomization analysis is to estimate the direct effects of the risk factors on the outcome, when conditioned on each other. These estimates can be obtained by using individual level data in one–sample multi- variable Mendelian randomization or summary level data in two–sample multivariable Mendelian randomization data as outlined in the subsections below (considered when
K = 3).
Individual level data
We assume that we have individual level data for the J genetic variants Gj(j = 1, . . . , J),
the three risk factors, and outcome on the same set of participants. Consistent estimates of θ can be obtained from TSLS regression in a one–sample multivariable Mendelian
𝑋" 𝐺$ 𝑋% 𝑌 𝑋' 𝛽)*+ 𝛽),+ 𝜃" 𝜃% 𝜃' 𝑈" 𝑈% 𝑈' 𝛽)/+
Fig. 2.4 Directed acyclic graph illustrating the multivariable Mendelian randomization
assumptions for the J genetic variants Gj (j = 1, . . . , J) to investigate the causal effect of
K = 3 continuous risk factors Xk (j = k, . . . , K) on a continuous outcome Y . The genetic
effect of Gj on Xk is βXkj, and the direct causal effect of the risk factor Xkon the outcome Y
is θk. Uk represents the set of unmeasured variables that confound the associations between
Xk and Y .
randomization study by: regressing each of the K risk factors against the J genetic variants Gj (j = 1, . . . , J) to obtain the predicted values of the risk factors ( ˆX1, ˆX2
and ˆX3); and then regressing the outcome Y against ˆX1, ˆX2 and ˆX3. The estimates for
ˆ
X1, ˆX2 and ˆX3 from the second stage regression model should be consistent estimates
of θ if the IV assumptions for multivariable Mendelian randomization are satisfied.
Summary level data
We assume that we have summary level data for the three risk factors and outcome, i.e. the genetic associations with the three risk factors are estimated in one sample, and the genetic associations with the outcome are estimated in an independent sample. Consistent estimates of θ can be obtained from the multivariable weighted linear regression of the genetic association estimates with the K risk factors and the genetic association estimates with the outcome, with se( ˆβYj)
−2 as weights and the intercept
set to zero (known as the ‘multivariable IVW method’) [80]. Assuming there are three risk factors, under the multivariable IVW method we consider:
ˆβYj = θ1MV ˆβX1j+ θ2MV ˆβX2j+ θ3MV ˆβX3j+ ϵM Vj, ϵM Vj ∼ N(0, φ
2
M V se( ˆβYj)
2) , (2.13)
where θM V are the causal effects of the risk factors Xk (k = 1, . . . , 3) on the outcome Y, when conditioned on each other and φM V represents the residual standard error
2.6 Sensitivity analyses 33
under the multivariable IVW model. If K = 1, then the multivariable IVW model is equivalent to the ‘univariable’ IVW method in Equation 2.7.
There may be circumstances where the risk factors X are linearly related. For example, suppose the risk factors under investigation in Figure 2.4 are low-density lipoprotein cholesterol (LDL-C), triglycerides and high-density lipoprotein cholesterol (HDL-C). LDL-C is rarely measured directly, but is estimated from measurements of total cholesterol, triglycerides and HDL-C via the Friedewald equation as total cholesterol minus HDL-C minus 0.2 times triglycerides (assuming all measurements are in mg/dL) [81]. We would therefore expect LDL-C, triglycerides and HDL-C measurements to be correlated, and since lipid fractions are associated with common genetic variants, we may find that the estimates of the genetic associations ( ˆβX)
are also correlated. If the estimates ˆβX are correlated, then the multivariable IVW
method (Equation 2.13) may be effected by collinearity, leading to imprecise estimates. Collinearity in multivariable Mendelian randomization is considered briefly in the main applied example of the dissertation (Chapter 6) by estimating the correlation structure between the risk factors and the correlation structure between the genetic associations of the risk factors.
Instrument strength
For multivariable Mendelian randomization, the set of genetic variants G are considered to be strong IVs if: a) the variants are associated with all of the K risk factors; and b) the variants are jointly associated with the K risk factors [82]. The first condition can be assessed through the F-statistics from the regression of G against each of the K risk factors Xk (k = 1, . . . , K). In order for the second condition to hold, the genetic
variants must be able to predict the values of each risk factor Xk after predicting the
values of the remaining K − 1 risk factors. In Figure 2.5, all of the risk factors are individually strongly predicted by the J genetic variants Gj (j = 1, . . . , J) for DAG A). X2 and X3 are jointly predicted by G in DAG A), but X1 is not jointly predicted by
G. In DAG B), all of the risk factors are individually predicted and jointly predicted
𝑋" 𝑮 𝑋% 𝑋& 𝑋" 𝑮 𝑋% 𝑋& A) B)
Fig. 2.5 Directed acyclic graph illustrating the potential set up for multivariable Mendelian
randomization for the set of genetic variants G and the risk factors X1, X2 and X3. In DAG
A), X1 is individually, but not jointly, strongly predicted by G. All of the risk factors in
DAG B) are both individually and jointly strongly predicted by G.
To assess whether the J genetic variants Gj (j = 1, . . . , J) are jointly associated
with the K risk factors, the Sanderson-Windmeijer conditional F-statistic should be estimated for each risk factor [82]. The conditional F-statistic for X1 when there are
K = 3 risk factors can be calculated by:
1. X2 is regressed against the J genetic variants Gj (j = 1, . . . , J) and the predicted
values of X2 ( ˆX2) calculated;
2. the above regression model is refitted with X2 replaced with X3, and ˆX3 calcu-
lated;
3. X1 is then regressed against ˆX2and ˆX3, and the residual errors from the regression
model (X1 − ˆX1) saved;
4. the saved residual errors are regressed against the J genetic variants Gj (j =
1, . . . , J), and the F-statistic obtained from this regression model, multiplied by a degrees of freedom correction factor of J/(J − 2) [83], is the ‘conditional’ F-statistic for X1.
The degrees of freedom correction factor in step 4 takes into consideration that the same set of genetic variants G were used to predict that values of ˆX2 and ˆX3 in steps
1 and 2. Note that fitting the three regression models in steps 1-3 is equivalent to performing TSLS regression when X1 is treated as the ‘outcome’, and X2 and X3 are
the risk factors. The conditional F-statistics for X2 and X3 should also be calculated,
and the J genetic variants Gj (j = 1, . . . , J) would be considered as strong IVs for
multivariable Mendelian randomization if the F-statistics and conditional F-statistics for all of the K risk factors Xk (k = 1, . . . , K) were sufficiently large, e.g. we may