Safety comparison - Real data analysis - Adjustment for Population Stratification in Sequencing

3.4 Real data analysis

3.4.2 Safety comparison

The safety outcome of interest is the risk of major bleeding events, including both intracra- nial and extracranial bleeding. A total of 2,327 major bleeding events were observed in the study sample, translating to an overall event rate of 12.6%. Of these, 618 major bleeding events were observed among dabigatran takers, corresponding to an event rate of 10.3% in the dabigatran arm, and 1,709 major bleeding events were observed among warfarin

Table 3.3: Estimated marginal hazard ratio for stroke in dabigatran takers Estimate Bootstrap SE Bootstrap 95% CI

MA-KM 0.869 0.135 [0.628, 1.168] MA-PSM 0.892 – – KM(PIP>0.5) 0.863 – – PSM(PIP>0.5) 0.911 – – KM(LASSO) 0.900 – – PSM(LASSO) 0.852 – – KM(all) 0.883 – – PSM(all) 0.965 – –

Figure 3.3: Posterior probability of inclusion for each potential confounder in efficacy comparison

takers, corresponding to an event rate of 13.7% in the warfarin arm.

Table 3.4 summarizes the estimated marginal hazard ratio for major bleeding events in dabigatran takers. All eight estimates were between 0.72 and 0.75, indicating that dabigatran may be associated with considerably reduced risk of major bleeding. The bootstrap SE obtained with MA-KM was very small at 0.039, suggesting that the point estimate was quite precise. The bootstrap 95% CI was [0.671, 0.820], suggesting that the hazard of bleed in dabigatran takers would be higher had they taken warfarin instead. Figure 3.4 displays the posterior inclusion probability for all 52 potential confounders in safety comparison. We notice that unlike the efficacy comparison, there is more variety in PIP across different potential confounders – 17 variables had a PIP above 90%, 6 variables

had a PIP between 20% and 60%, 29 variables had a PIP below 20%. The fact that 6 variables had a mid-range PIP value speaks to the importance of acknowledging uncertainty in confounder selection.

Table 3.4: Estimated marginal hazard ratio for major bleeding events in dabigatran takers Estimate Bootstrap SE Bootstrap 95% CI

MA-KM 0.733 0.039 [0.671, 0.820] MA-PSM 0.732 – – KM(PIP>0.5) 0.731 – – PSM(PIP>0.5) 0.732 – – KM(LASSO) 0.740 – – PSM(LASSO) 0.724 – – KM(all) 0.742 – – PSM(all) 0.734 – –

Figure 3.4: Posterior inclusion probability for each potential confounder in safety comparison

3.5 Discussion

We proposed a model averaged matching estimator to estimate the effect of a treatment in observational studies. The method accounts for the uncertainty in confounder selection using BMA. Following the BAC approach proposed by Wang et al. (2012) and Wang et al. (2015), our proposed method jointly considers a treatment model associating the

treatment variable with the vector of potential confounders and an outcome model relat- ing the outcome to the treatment variable and the potential confounders, and specifies a prior distribution on the model space such that any potential confounder that is signif- icantly associated with the treatment in the treatment variable will a priori have a large probability of being included in the outcome model as well.

However, our method differs from BAC in that we do not perform confounder selection and treatment effect estimation at the same time, as such a practice is vulnerable to model dependence and may result in biased treatment effect estimate. Instead, our method only uses the BAC approach to estimate posterior model uncertainty, and performs parameter estimation separately. In order to avoid model dependence, we proposed to compute the model-specific treatment effect estimates in the BMA framework using matching estimators.

Through simulations, we have demonstrated the validity of our method, and have shown that it performs at least as well as some other widely employed methods that do not take into account the uncertainty in confounder selection. While our method success- fully accounts for model uncertainty and avoids model dependence, it does have certain limitations. First, an MC3 algorithm is needed to estimate the posterior model proba- bilities, which can be computationally intensive for large n. This is, however, a common problem faced by Bayesian methods that require Monte-Carlo simulations. Second, any shortcoming of matching estimators would carry over to our method. For example, matching methods usually require substantial overlap in terms of baseline characteristics between treatment groups, which is usually satisfied in large samples, but not always guaranteed in small samples. A problem associated with this fact is that when the overlap is narrow between the treatment groups, then some treated subjects may need to be pruned from the sample, and in our case, since multiple models in the model space are considered, a different set of treated subjects may be pruned under each distinct model, leading to difficulty in the interpretation of the final model averaged matching estimate. Finally, while matching provides a useful way for the estimation of average treatment effect on the treated subpopulation, estimation of any causal quantity for the entire population is not possible.

References

1000 GENOMES PROJECT CONSORTIUM ET AL. (2015). A global reference for human

genetic variation. Nature52668–74.

ABADIE, A. and IMBENS, G. W. (2008). On the failure of the bootstrap for matching

estimators. Econometrica761537–1557.

AKAIKE, H. (1972). Information theory and an extension of the Maximum Likelihood Principle. Akademiai Kiado, Budapest.

ASIMIT, J. and ZEGGINI, E. (2010). Rare variant association analysis methods for complex

traits. Annual review of genetics44293–308.

AUSTIN, P. C. (2013). The performance of different propensity score methods for estimat-

ing marginal hazard ratios. Statistics in medicine322837–2849.

BABRON, M.-C., DETAYRAC, M., RUTLEDGE, D. N., ZEGGINI, E. and G ´ENIN, E. (2012). Rare and low frequency variant stratification in the UK population: Description and impact on association tests. PLoS ONE7.

BARBIERI, M. M. and BERGER, J. O. (2004). Optimal predictive model selection. Annals

of Statistics870–897.

BAYE, T. M., HE, H., DING, L., KUROWSKI, B. G., ZHANG, X. and MARTIN, L. J. (2011).

Population structure analysis using rare and common functional variants. InBMC proceedings, vol. 5.

after selection among high-dimensional controls.The Review of Economic Studies81608– 650.

BENDER, R., AUGUSTIN, T. and BLETTNER, M. (2005). Generating survival times to sim- ulate cox proportional hazards models. Statistics in medicine241713–1723.

BESAG, J. (1986). On the statistical analysis of dirty pictures. Journal of the Royal Statistical Society. Series B (Methodological)48259–302.

BODMER, W. and BONILLA, C. (2008). Common and rare variants in multifactorial sus-

ceptibility to common diseases. Nature genetics40695–701.

BOYD, A., GOLDING, J., MACLEOD, J., LAWLOR, D. A., FRASER, A., HENDERSON, J., MOLLOY, L., NESS, A., RING, S. and SMITH, G. D. (2012). Cohort profile: the ’children of the 90s’the index offspring of the Avon Longitudinal Study of Parents and Children.

International journal of epidemiology42111–127.

BRESLOW, N. E. and CLAYTON, D. G. (1993). Approximate inference in generalized linear

mixed models. Journal of the American statistical Association889–25.

CAMPBELL, C. D., OGBURN, E. L., LUNETTA, K. L., LYON, H. N., FREEDMAN, M. L.,

GROOP, L. C., ALTSHULER, D., ARDLIE, K. G. and HIRSCHHORN, J. N. (2005). Demon- strating stratification in a European American population. Nature genetics37868–872. CARDON, L. R. and PALMER, L. J. (2003). Population stratification and spurious allelic

association. The Lancet361598–604.

CHEN, H., WANG, C., CONOMOS, M. P., STILP, A. M., LI, Z., SOFER, T., SZPIRO, A. A.,

CHEN, W., BREHM, J. M., CELEDON´ , J. C.ET AL. (2016). Control for population struc-

ture and relatedness for binary traits in genetic association studies via logistic mixed models. The American Journal of Human Genetics98653–666.

COGAN, J. D., KROPSKI, J. A., ZHAO, M., MITCHELL, D. B., RIVES, L., MARKIN, C., GARNETT, E. T., MONTGOMERY, K. H., MASON, W. R., MCKEAN, D. F. ET AL. (2015).

Rare variants in RTEL1 are associated with familial interstitial pneumonia. American journal of respiratory and critical care medicine191646–655.

COHEN, J. C., KISS, R. S., PERTSEMLIDIS, A., MARCEL, Y. L., MCPHERSON, R. and HOBBS, H. H. (2004). Multiple rare alleles contribute to low plasma levels of HDL cholesterol. Science305869–872.

COHEN, J. C., PERTSEMLIDIS, A., FAHMI, S., ESMAIL, S., VEGA, G. L., GRUNDY, S. M.

and HOBBS, H. H. (2006). Multiple rare variants in NPC1L1 associated with reduced

sterol absorption and plasma low-density lipoprotein levels. Proceedings of the National Academy of Sciences of the United States of America1031810–1815.

CONNOLLY, S. J., EZEKOWITZ, M. D., YUSUF, S., EIKELBOOM, J., OLDGREN, J., PAREKH, A., POGUE, J., REILLY, P. A., THEMELES, E., VARRONE, J. ET AL. (2009). Dabigatran versus warfarin in patients with atrial fibrillation. N Engl j Med20091139–1151.

CROSS, G. R. and JAIN, A. K. (1983). Markov random field texture models. IEEE Trans-

actions on Pattern Analysis and Machine Intelligence25–39.

CRUCHAGA, C., CHAKRAVERTY, S., MAYO, K., VALLANIA, F. L., MITRA, R. D., FABER,

K., WILLIAMSON, J., BIRD, T., DIAZ-ARRASTIA, R., FOROUD, T. M. ET AL. (2012).

Rare variants in APP, PSEN1 and PSEN2 increase risk for AD in late-onset alzheimer’s disease families. PloS one7e31039.

DAVIES, R. B. (1980). The distribution of a linear combination of χ2 _{random variables.} Journal of the Royal Statistical Society. Series C (Applied Statistics)29323–333.

DEMPSTER, A. P., LAIRD, N. M. and RUBIN, D. B. (1977). Maximum likelihood from

incomplete data via the em algorithm. Journal of the Royal Statistical Society. Series B (Methodological)391–38.

DEVLIN, B. and ROEDER, K. (1999). Genomic control for association studies. Biometrics

DRAPER, D. (1995). Assessment and propagation of model uncertainty.Journal of the Royal

Statistical Society. Series B (Methodological)45–97.

DUBES, R. C. and JAIN, A. K. (1989). Random field models in image analysis. Journal of applied statistics16131–164.

DUCHON, J. (1977). Splines minimizing rotation-invariant semi-norms in Sobolev spaces.

InConstructive theory of functions of several variables. Springer, 85–100.

EICHLER, E. E., FLINT, J., GIBSON, G., KONG, A., LEAL, S. M., MOORE, J. H. and

NADEAU, J. H. (2010). Missing heritability and strategies for finding the underlying

causes of complex disease. Nature Reviews Genetics11446–450.

FAN, J. and LI, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association961348–1360.

FISHER, R. A. (1932). Statistical methods for research workers. 4th ed. Oliver and Boyd,

Edinburgh.

FORBES, F. and PEYRARD, N. (2003). Hidden Markov random field model selection crite- ria based on mean field-like approximations. IEEE Transactions on Pattern Analysis and Machine Intelligence251089–1101.

FREEDMAN, M. L., REICH, D., PENNEY, K. L., MCDONALD, G. J., MIGNAULT, A. A., PATTERSON, N., GABRIEL, S. B., TOPOL, E. J., SMOLLER, J. W., PATO, C. N. ET AL. (2004). Assessing the impact of population stratification on genetic association studies.

Nature genetics36388–393.

GEMAN, S. and GEMAN, D. (1984). Stochastic relaxation, Gibbs distributions, and the

Bayesian restoration of images. Pattern Analysis and Machine Intelligence, IEEE Transac- tions on721–741.

GEMAN, S. and GRAFFIGNE, C. (1986). Markov random field image models and their

applications to computer vision. In Proceedings of the International Congress of Mathe- maticians, vol. 1.

G ´ENIN, E., LETORT, S. and BABRON, M.-C. (2015). Population stratification of rare vari-

ants. InAssessing Rare Variation in Complex Traits. 227–237.

GHOSH, J. (2015). Bayesian model selection using the median probability model. Wiley Interdisciplinary Reviews: Computational Statistics7185–193.

GILMOUR, A. R., THOMPSON, R. and CULLIS, B. R. (1995). Average information REML: an efficient algorithm for variance parameter estimation in linear mixed models. Bio- metrics1440–1450.

GREEN, P. J. and SILVERMAN, B. W. (1993). Nonparametric regression and generalized linear

models: a roughness penalty approach. CRC Press.

GREENLAND, S. (1987). Interpretation and choice of effect measures in epidemiologic analyses. American journal of epidemiology125761–768.

GUDMUNDSSON, J., SULEM, P., GUDBJARTSSON, D. F., MASSON, G., AGNARSSON, B. A.,

BENEDIKTSDOTTIR, K. R., SIGURDSSON, A., MAGNUSSON, O. T., GUDJONSSON, S. A.,

MAGNUSDOTTIR, D. N. ET AL. (2012). A study based on whole-genome sequencing

yields a rare variant at 8q24 associated with prostate cancer. Nature genetics 44 1326– 1329.

HARROW, J., FRANKISH, A., GONZALEZ, J. M., TAPANARI, E., DIEKHANS, M., KOKOCINSKI, F., AKEN, B. L., BARRELL, D., ZADISSA, A., SEARLE, S. ET AL. (2012). GENCODE: the reference human genome annotation for the ENCODE project.Genome research221760–1774.

HARVILLE, D. A. (1977). Maximum likelihood approaches to variance component esti-

mation and to related problems.Journal of the American Statistical Association72320–338. HECKMAN, J. J., ICHIMURA, H. and TODD, P. E. (1997). Matching as an econometric

evaluation estimator: Evidence from evaluating a job training programme. The review of economic studies64605–654.

HELD, K., KOPS, E. R., KRAUSE, B. J., WELLS, W. M., KIKINIS, R. and MULLER-

GARTNER, H.-W. (1997). Markov random field segmentation of brain MR images.IEEE

transactions on medical imaging16878–886.

HELGASON, A., YNGVADOTTIR´ , B., HRAFNKELSSON, B., GULCHER, J. and STEFANSSON´ ,

K. (2005). An Icelandic example of the impact of population structure on association studies. Nature genetics3790–95.

HENDERSON, C. R. (1975). Best linear unbiased estimation and prediction under a selec-

tion model. Biometrics423–447.

HINDORFF, L. A., SETHUPATHY, P., JUNKINS, H. A., RAMOS, E. M., MEHTA, J. P.,

COLLINS, F. S. and MANOLIO, T. A. (2009). Potential etiologic and functional im-

plications of genome-wide association loci for human diseases and traits.Proceedings of the National Academy of Sciences1069362–9367.

HO, D. E., IMAI, K., KING, G. and STUART, E. A. (2007). Matching as nonparametric

preprocessing for reducing model dependence in parametric causal inference. Political analysis15199–236.

HUDSON, R. R. (2002). Generating samples under a Wright-Fisher neutral model of ge-

netic variation. Bioinformatics18337–338.

HUYGHE, J. R., JACKSON, A. U., FOGARTY, M. P., BUCHKOVICH, M. L., STANCˇAKOV´ A´,

A., STRINGHAM, H. M., SIM, X., YANG, L., FUCHSBERGER, C., CEDERBERG, H.ET AL.

(2013). Exome array analysis identifies new loci and low-frequency variants influencing insulin processing and secretion. Nature genetics45197–201.

IACUS, S. M., KING, G., PORRO, G. and KATZ, J. N. (2012). Causal inference without

balance checking: Coarsened exact matching. Political analysis1–24.

JI, W., FOO, J. N., O’ROAK, B. J., ZHAO, H., LARSON, M. G., SIMON, D. B., NEWTON- CHEH, C., STATE, M. W., LEVY, D., LIFTON, R. P. ET AL. (2008). Rare independent

mutations in renal salt handling genes contribute to blood pressure variation. Nature genetics40592–599.

JONSSON, T., ATWAL, J. K., STEINBERG, S., SNAEDAL, J., JONSSON, P. V., BJORNSSON, S.,

STEFANSSON, H., SULEM, P., GUDBJARTSSON, D., MALONEY, J. ET AL. (2012). A mu-

tation in APP protects against Alzheimer’s disease and age-related cognitive decline.

Nature48896–99.

KANG, H. M., SUL, J. H., SERVICE, S. K., ZAITLEN, N. A., KONG, S.-Y., FREIMER, N. B., SABATTI, C., ESKIN, E.ET AL. (2010). Variance component model to account for sample

structure in genome-wide association studies. Nature genetics42348–354.

KANG, H. M., ZAITLEN, N. A., WADE, C. M., KIRBY, A., HECKERMAN, D., DALY, M. J.

and ESKIN, E. (2008). Efficient control of population structure in model organism asso-

ciation mapping. Genetics1781709–1723.

KASS, R. E. and RAFTERY, A. E. (1995). Bayes factors. Journal of the american statistical association90773–795.

KENNY, E., CORMICAN, P., FURLONG, S., HERON, E., KENNY, G., FAHEY, C., KELLEHER,

E., ENNIS, S., TROPEA, D., ANNEY, R. ET AL. (2014). Excess of rare novel loss-of-

function variants in synaptic genes in schizophrenia and autism spectrum disorders.

Molecular psychiatry19872–879.

LEE, S., ABECASIS, G. R., BOEHNKE, M. and LIN, X. (2014). Rare-variant association analysis: study designs and statistical tests. The American Journal of Human Genetics95

5–23.

LEE, S., WU, M. C. and LIN, X. (2012). Optimal tests for rare variant effects in sequencing

association studies. Biostatistics13762–775.

LI, B. and LEAL, S. M. (2008). Methods for detecting associations with rare variants for

common diseases: application to analysis of sequence data. The American Journal of Human Genetics83311–321.

LI, C. (1969). Population subdivision with respect to multiple alleles. Ann Hum Genet33

LI, S. Z. (2012). Markov random field modeling in computer vision. Springer Science & Busi-

ness Media.

LIN, X. (1997). Variance component testing in generalised linear models with random

effects. Biometrika84309–326.

LOUIS, T. A. (1982). Finding the observed information matrix when using the EM algo-

rithm. Journal of the Royal Statistical Society. Series B (Methodological)226–233.

MADIGAN, D. and RAFTERY, A. E. (1994). Model selection and accounting for model un-

certainty in graphical models using Occam’s window. Journal of the American Statistical Association891535–1546.

MADIGAN, D., YORK, J. and ALLARD, D. (1995). Bayesian graphical models for discrete data. International Statistical Review/Revue Internationale de Statistique215–232.

MADSEN, B. E. and BROWNING, S. R. (2009). A groupwise association test for rare mutations using a weighted sum statistic. PLoS Genet5e1000384.

MANOLIO, T. A., COLLINS, F. S., COX, N. J., GOLDSTEIN, D. B., HINDORFF, L. A., HUNTER, D. J., MCCARTHY, M. I., RAMOS, E. M., CARDON, L. R., CHAKRAVARTI, A. ET AL. (2009). Finding the missing heritability of complex diseases. Nature 461

747–753.

MARCHINI, J., CARDON, L. R., PHILLIPS, M. S. and DONNELLY, P. (2004). The effects of human population structure on large genetic association studies. Nature genetics36

512–517.

MATHERON, G. (1973). The intrinsic random functions and their applications. Advances

in applied probability439–468.

MATHIESON, I. and MCVEAN, G. (2012). Differential confounding of rare and common

variants in spatially structured populations. Nature genetics44243–246.

MATHIESON, I. and MCVEAN, G. (2014). Demography and the age of rare variants. PLoS

MCCULLAGH, P. (1984). Generalized linear models. European Journal of Operational Re-

search16285–292.

MCLACHLAN, G. J. and PEEL, D. (2000). Finite Mixture Models. Wiley, New York.

NEJENTSEV, S., WALKER, N., RICHES, D., EGHOLM, M. and TODD, J. A. (2009). Rare

In document Adjustment for Population Stratification in Sequencing Association Studies and Model Averaged Matching Estimator (Page 88-104)