Theses & Dissertations Boston University Theses & Dissertations
Gene- and pathway-based
genomics of breast cancer and
type 2 diabetes in African American
https://hdl.handle.net/2144/20855 Boston University
BOSTON UNIVERSITY SCHOOL OF PUBLIC HEALTH
GENE- AND PATHWAY-BASED GENOMICS OF BREAST CANCER
AND TYPE 2 DIABETES IN AFRICAN AMERICAN WOMEN
STEPHEN ANTHONY HADDAD B.A., Dartmouth College, 1996 M.S., Harvard School of Public Health, 2004
Submitted in partial fulfillment of the requirements for the degree of
Doctor of Philosophy 2017
© 2017 by
STEPHEN ANTHONY HADDAD All rights reserved except for
Project 1, which is © 2016 by Oxford University Press and Project 2, which is © 2015 by Springer
First Reader _______________________________________________________ Edward A. Ruiz-Narváez, Sc.D.
Associate Professor of Epidemiology
Second Reader _______________________________________________________ Julie R. Palmer, Sc.D.
Professor of Epidemiology
Third Reader _______________________________________________________ Ching-Ti Liu, Ph.D.
I wish to thank the members of my dissertation committee, Dr. Edward Ruiz-Narváez, Dr. Ching-Ti Liu, and Dr. Julie Palmer, who provided me with all of the support, guidance, and instruction necessary to successfully complete my dissertation.
Dr. Ruiz-Narváez served as my dissertation committee chair and gave me valuable advice throughout the research process, sharing his genetic epidemiology expertise along with his good nature and calming influence. I visited his office countless times to discuss research topics and ask for advice, and he was always generous with his time and ready to help.
Dr. Liu asked important questions and suggested helpful edits to improve the quality of my research from a statistical genetics perspective. I am very thankful that he agreed to serve on my committee.
Dr. Palmer served as my academic advisor at BUSPH and as my boss at Slone Epidemiology Center. I was so fortunate to have her guidance along the way as she shared her years of investigatory experience and began to teach me the ropes of academic research. I also wish to thank Dr. Palmer for offering me the opportunity to work at Slone and to conduct my dissertation research with data from the Black Women’s Health Study and the AMBER Consortium. My dissertation could not have been completed or even started without her.
I have learned so much from the investigators and staff of the Slone
Epidemiology Center, the Black Women’s Health Study, and the AMBER Consortium. I am grateful to all who have welcomed me and allowed me to learn from them each and
v every day.
Dr. Jennifer Rider and Dr. Gina Peloso also deserve recognition. They
enthusiastically agreed to be additional readers for my dissertation. I am thankful to them for performing this task and for providing me with valuable feedback that led to several meaningful final edits.
I also wish to thank Dr. Kathryn Lunetta for all of her help. First as a student in two of her classes, I benefited greatly from her statistical genetics knowledge and
experience. Then, in working on the AMBER Consortium projects, I benefited from her assistance in carrying out data processing, QC, and analysis tasks, and I was also
fortunate to have her guidance regarding analytic strategies and methods.
I would also like to acknowledge the important role that my years at MGH played in providing me with a solid foundation in the field of genetic epidemiology. Dr. Jordan Smoller gave me the opportunity and immense challenge of managing a busy genetics laboratory, and it is thanks to him that I experienced a period of great personal and professional growth during those years.
It was in Dr. Susan Santangelo’s class at the Harvard School of Public Health that I first learned the core concepts of genetic epidemiology. Dr. Santangelo became my master’s thesis advisor and a trusted mentor. She also offered me my first job at MGH, and as my boss there for several years, she allowed me to gain experience with a number of different research tasks and genetic epidemiology projects. She encouraged me to pursue my Ph.D., and the successful completion of my dissertation is in no small part thanks to her positive influence.
My family and friends have always supported my life goals. I wish to especially thank my fiancée Alexandra Rostikova. I met her during the busiest part of my Ph.D. program when my time was very limited, but she remained patient with me then and ever since. She has always supported my career efforts, and I am lucky to have the caring company of her refreshingly unique perspective and joyful outlook.
Anything and everything I am able to accomplish is a direct reflection on my parents, Alan and Mary Haddad. One could not ask for a better, kinder tree to fall from. As teachers, they have always stressed the importance of education and hard work, and these lessons have helped me immensely. Their love and support is strong and constant. I am truly blessed.
GENE- AND PATHWAY-BASED GENOMICS OF BREAST CANCER
AND TYPE 2 DIABETES IN AFRICAN AMERICAN WOMEN STEPHEN ANTHONY HADDAD
Boston University School of Public Health, 2017
Major Professor: Edward A. Ruiz-Narváez, Sc.D., Associate Professor of Epidemiology ABSTRACT
Women of African ancestry (AA) experience a greater burden from breast cancer and type 2 diabetes compared to women of European ancestry. Some of the racial disparities observed for these diseases may be explained by AA-specific genetic risk variants. The projects conducted here sought to discover risk variants in AA women for overall and subtype-specific breast cancer and for type 2 diabetes using pathway- and gene-based analytic approaches.
Project 1 evaluated 170,812 mostly rare variants across the exome in 3629 breast cancer cases (1093 estrogen receptor negative (ER-), 1968 ER+, 568 ER unknown) and 4658 controls from the African American Breast Cancer Epidemiology and Risk (AMBER) Consortium. Gene-based analyses found ER- associations with PDE4D (previously identified in GWAS) and FBXL22 (novel), based on very small counts at extremely rare SNPs.
Project 2 evaluated common SNPs in 308 genes in hormone pathways using 3663 breast cancer cases (1098 ER-, 1983 ER+, 582 ER unknown) and 4687 controls from AMBER. Gene-based and single SNP analyses identified eight genes (CALM2, CETP, NR0B1, IGF2R, CYP1B1, PGR, MAPK3, and MAP3K1) that contained common variants
associated with overall or subtype-specific breast cancer after gene-level correction for multiple testing.
Project 3 evaluated common SNPs in 69 genes involved in the Wnt pathway using 2632 type 2 diabetes cases and 2596 controls from the Black Women’s Health Study. Gene-based and single SNP analyses were run, and an association was observed between the PSMD2 gene region and type 2 diabetes. Association data on a subset of the top PSMD2 SNPs were available from a large, independent AA sample; associations were in the same direction, but weak and not statistically significant. We also identified a
TCF7L2 SNP that may represent a novel, independent association signal seen only in AA populations.
Many of the SNPs identified in the present research are more common in AA populations, possibly explaining their lack of discovery by European ancestry genome-wide association studies. Replication of the associations we observed using independent AA samples is necessary. Future studies should consider the entire gene regions
TABLE OF CONTENTS
ACKNOWLEDGMENTS ... iv
ABSTRACT ... vii
TABLE OF CONTENTS ... ix
LIST OF TABLES ... xii
LIST OF FIGURES ... xiv
LIST OF ABBREVIATIONS ...xv
PROJECT 1: An Exome-Wide Analysis of Low Frequency and Rare Variants in Relation to Risk of Breast Cancer in African American Women: the AMBER Consortium ...10
ABSTRACT ...10 INTRODUCTION ...11 METHODS ...13 Study Population ...13 Genotyping and QC ...15 Association Analysis ...17 RESULTS ...19 DISCUSSION ...22 REFERENCES ...28 TABLES ...33 SUPPLEMENTARY TABLES ...38 SUPPLEMENTARY FIGURES ...54
PROJECT 2: Hormone-Related Pathways and Risk of Breast Cancer Subtypes in African
American Women: the AMBER Consortium ...58
Study Population ...60
SNP Selection for Genotyping ...62
Genotyping and QC ...63 Association Analysis ...64 RESULTS ...65 DISCUSSION ...68 REFERENCES ...73 TABLES ...81 SUPPLEMENTARY TABLES ...83
PROJECT 3: A Novel TCF7L2 Type 2 Diabetes SNP Identified from Fine Mapping in African American Women ...99
Study Population ...102
SNP Selection for Genotyping ...103
Genotyping and QC ...104
xi Association Analysis ...105 RESULTS ...106 DISCUSSION ...109 REFERENCES ...114 TABLES ...118 SUPPLEMENTARY TABLES ...123 CONCLUSION ...124 BIBLIOGRAPHY ...127 VITA ...144
LIST OF TABLES
Table 1.1: Characteristics of participants in the AMBER Consortium by study site ...33
Table 1.2: Number of gene-based tests conducted and corresponding significance criteria ...34
Table 1.3: The most significant gene-based test results for each analysis ...35
Table 1.4: SNPs contributing to significant gene-based tests for ER- breast cancer ...37
Supplementary Table 1.S1: Exome chip SNP roles ...38
Supplementary Table 1.S2: The 50 most significant gene-based test results for each analysis ...39
Supplementary Table 1.S3: The 50 most significant SNPs for overall, ER+, and ER- breast cancer ...50
Table 2.1: Associations of genes from steroid hormone pathways with overall, ER+, and ER- breast cancer risk in the AMBER Consortium ...81
Table 2.2: Relation of SNPs selected from gene-based analyses to risk of overall, ER+, and ER- breast cancer in AMBER ...82
Supplementary Table 2.S1: Associations of all pathways and genes analyzed with overall, ER+, and ER- breast cancer risk in the AMBER Consortium ...83
Supplementary Table 2.S2: Relation of SNPs selected from gene-based analyses to risk of overall, ER+, and ER- breast cancer, by genotyping project (BWHS/WCHS/CBCS vs. MEC) ...95
Table 3.1: Associations of Wnt pathway genes with risk of type 2 diabetes in the BWHS ...118
Table 3.2: Genetic variants comprising the optimal models for PSMD2 and TCF7L2: associations with risk of type 2 diabetes ...120
Table 3.3: Genetic variants comprising the optimal models for PSMD2 and TCF7L2: analyses conditioning on the top SNPs ...121
Supplementary Table 3.S1: Meta-analysis of BWHS and MEDIA for the PSMD2 region ...123
LIST OF FIGURES
Supplementary Figure 1.S1: QQ plots for single SNP and gene-based analyses ...54 Supplementary Figure 1.S2: Cluster plots for the four SNPs that contributed to the
LIST OF ABBREVIATIONS
1000G ... 1000 Genomes AA ... African ancestry AFR ... African samples AIMs ... ancestry informative markers AMBER ... African American Breast Cancer Epidemiology and Risk ARTP ... adaptive rank truncated product BMI ... body mass index BPC3 ... National Cancer Institute Breast and Prostate Cancer Cohort Consortium BWHS ... Black Women’s Health Study CaM... calmodulin CBCS ... Carolina Breast Cancer Study CHARGE ... Cohorts for Heart and Aging Research in Genomic Epidemiology CI... confidence interval CIDR ... Center for Inherited Disease Research CPE ... carboxypeptidase E ctls ... controls DCIS ... ductal carcinoma in situ ER ... estrogen receptor EUR... European samples FIN ... Finnish GB ... gene-based
GC ... GenCall GnRH ... gonadotropin-releasing hormone GWAS ... genome-wide association studies HER2... human epidermal growth factor receptor 2 ITU ... Indian Telugu in the UK LD ... linkage disequilibrium MAF ... minor allele frequency MEC ... Multiethnic Cohort MEDIA ... Meta-analysis of type 2 diabetes in African Americans MSigDB ... Molecular Signatures Database NA ... not applicable NAFLD ... non-alcoholic fatty liver disease ncRNA ... non-coding RNA NHANES ... National Health and Nutrition Examination Surveys NHLBI ESP ... National Heart, Lung, and Blood Institute Exome Sequencing Project NIS ... Health Care Utilization Project-Nationwide Inpatient Sample NJ ... New Jersey NYC ... New York City OR ... odds ratio PCA ... principal components analysis PR ... progesterone receptor RDD ... random digit dialing
SEER ... Surveillance, Epidemiology, and End Results SES ... socioeconomic status SHBG ... sex-hormone-binding globulin SKAT-O ... unified optimal sequence kernel association test TCGA ... The Cancer Genome Atlas typed ... genotyped UCR ... upstream conserved region UTR ... untranslated region WCHS ... Women’s Circle of Health Study WTCCC ... Wellcome Trust Case Control Consortium YRI ... Yoruban population
Women of African ancestry (AA) experience a greater burden from breast cancer compared to women of European ancestry. Although incidence of breast cancer has historically been higher in white women, a recent report from the Surveillance,
Epidemiology, and End Results (SEER) program of the National Cancer Institute showed an increase in breast cancer incidence among AA women from 2008 to 2012, during which time the incidence rate for white women remained relatively stable. This resulted in a convergence of the breast cancer incidence rates for AA and white women in 2012 (1). At the same time as the incidence rates are converging, AA women continue to experience higher mortality from breast cancer. Based on SEER data, there were 31.0 vs. 21.9 (age-adjusted) breast cancer deaths per 100,000 U.S. black vs. non-Hispanic white women per year from 2008 to 2012 (1).
The higher mortality from breast cancer experienced by AA women is likely due in part to their more aggressive disease profile. AA women have a younger mean age at diagnosis compared to women of European ancestry, with this difference being small but statistically significant (60.3 compared to 61.1 years, based on 2010 SEER data, and adjusted for the overall younger age distribution of U.S. blacks) (2). Likewise, a higher proportion of AA women are diagnosed before age 50 (3). Multiple studies and data sources have shown that AA women are more likely to be diagnosed with estrogen receptor (ER) negative (4–6), triple-negative (1,6,7), and basal-like breast cancers (8), which carry a poor prognosis. For example, in the 2012 SEER data, AA women had twice the proportion of triple-negative breast cancers compared to non-Hispanic whites
(22% vs. 11%) (1). In addition, AA women are more likely to present with breast cancers that have progressed beyond Stage I and are classified as regional or metastatic (4,5,7,9). Furthermore, NIS data showed that AA women with a primary diagnosis of breast cancer are more likely to present with comorbidities than white women (adjusted odds ratio = 1.58, p < 0.001) (9).
While the higher breast cancer mortality experienced by AA women may in part be due to their aggressive disease profile, studies have reported higher risks of breast cancer death and lower survival rates even after accounting for tumor characteristics such as stage and hormone receptor status (1,3,5). Therefore, additional factors likely
contribute to the racial disparities seen in breast cancer outcomes. Socioeconomic status (SES) and access to health care may play a role; however, studies controlling for factors related to SES and health care access have still shown a higher risk of breast cancer death in AA vs. white women (4,5). Thus, it is likely that there are true biological differences impacting these racial disparities.
In addition to the disparities observed for breast cancer, AA women also experience a greater burden from type 2 diabetes compared to women of European ancestry. While the prevalence of this disease has risen greatly over the past several decades across multiple racial groups, African Americans have experienced a
disproportionate increase in the number of cases. Evidence of this disparity comes from the National Health and Nutrition Examination Surveys (NHANES), which showed a greater rise in the prevalence of type 2 diabetes from NHANES I (1971-1975) to NHANES 1999-2004 among African Americans (206.7% increase) vs. whites (169.0%
increase) (10). In line with these observed trends in prevalence, the incidence of type 2 diabetes in African American women has been reported to be more than twice that in white women (11). Furthermore, African Americans with diabetes have shown poorer glycemic control (12) and an increased risk of diabetic complications and mortality (13) compared to whites.
Racial disparities in factors predisposing to type 2 diabetes have also been detailed. The prevalence of obesity is higher in AA vs. white women, and AA women have lower daily energy expenditure including less physical activity and comparatively smaller volumes of metabolically active organs (14). In addition, nondiabetic African American women have higher fasting insulin levels compared to whites, and this has been shown for both obese and nonobese women (15). Greater insulin resistance has also been observed in nondiabetic African Americans vs. whites (16). Despite these racial differences in factors predisposing to diabetes, it has been reported that more than half of the excess risk of type 2 diabetes in AA women remains after adjustment for known risk factors including body mass index (BMI) (11). Thus, additional elements must be at play in producing the racial disparities seen.
Genetic differences between AA and white women may explain some of the racial disparities in breast cancer mortality and type 2 diabetes incidence. Risk alleles that are more common in AA populations may drive the occurrence of more aggressive breast tumors and may also lead to higher breast cancer mortality through biological
mechanisms that are independent of commonly assessed tumor characteristics. Likewise, type 2 diabetes susceptibility alleles that are more common in AA populations may lead
to higher incidence through known predisposing factors such as BMI and insulin resistance, or through other biological mechanisms that are independent of commonly measured diabetes risk factors.
Most previously reported genetic risk loci for breast cancer and type 2 diabetes have been discovered in genome-wide association studies (GWAS) of European or Asian ancestry populations (17–23), and the majority of these associations have failed to reach statistical significance in studies of AA subjects (24–40). This further suggests the presence of risk variants specific to AA individuals and emphasizes the need for more genetic research in this understudied but disproportionately affected population. Another reason for conducting genetic association studies in AA populations is their greater genetic variability compared to European or Asian ancestry populations. Lower levels of linkage disequilibrium (LD) exist in AA populations (i.e. genetic markers are less
correlated with each other on average, and these correlations span shorter stretches of the genome); therefore, genetic variants associated with disease in AA studies are more likely to be located closer to the underlying causal variants (41,42).
Given the disease disparities and research rationale described above for AA populations, the projects conducted here sought to discover risk variants for overall and subtype-specific breast cancer (Projects 1 and 2) and for type 2 diabetes (Project 3) in women of African ancestry from the African American Breast Cancer Epidemiology and Risk (AMBER) Consortium Study (Projects 1 and 2) and the Black Women’s Health Study (BWHS) (Project 3). Association analyses were carried out using genotyped and imputed data generated from large-scale arrays, which included selected custom content
in biological pathways of interest. Pathway- and gene-based analytic approaches were utilized in an attempt to identify important pathways and genes with multiple risk variants that might otherwise be missed in a SNP-based approach. The projects were bolstered by use of the largest AA sample to date for studying breast cancer genetics (~3600 cases and 4700 controls) and by a respectable AA sample size for type 2 diabetes (~2600 cases and 2600 controls).
1. DeSantis CE, Fedewa SA, Goding Sauer A, Kramer JL, Smith RA, Jemal A. Breast cancer statistics, 2015: Convergence of incidence rates between black and white women: Breast Cancer Statistics, 2015. CA: A Cancer Journal for Clinicians. 2016;66:31–42.
2. Robbins HA, Engels EA, Pfeiffer RM, Shiels MS. Age at cancer diagnosis for blacks compared with whites in the United States. Journal of the National Cancer Institute. 2015;107: dju489.
3. Daly B, Olopade OI. A perfect storm: How tumor biology, genomics, and health care delivery patterns collide to create a racial survival disparity in breast cancer and proposed interventions for change: Closing the Racial Disparity Gap in Breast Cancer. CA: A Cancer Journal for Clinicians. 2015;65:221–38.
4. Iqbal J, Ginsburg O, Rochon PA, Sun P, Narod SA. Differences in Breast Cancer Stage at Diagnosis and Cancer-Specific Survival by Race and Ethnicity in the United States. JAMA: the Journal of the American Medical Association. 2015;313:165.
5. Adams SA, Butler WM, Fulton J, Heiney SP, Williams EM, Delage AF, et al. Racial disparities in breast cancer mortality in a multiethnic cohort in the Southeast: Disparities in Breast Cancer Mortality. Cancer. 2012;118:2693–9.
6. Stark A, Kleer CG, Martin I, Awuah B, Nsiah-Asare A, Takyi V, et al. African ancestry and higher prevalence of triple-negative breast cancer: Findings from an international study. Cancer. 2010;116:4926–32.
7. Kurian AW, Fish K, Shema SJ, Clarke CA. Lifetime risks of specific breast cancer subtypes among women in four racial/ethnic groups. Breast Cancer Research. 2010;12:R99.
8. Carey LA, Perou CM, Livasy CA, Dressler LG, Cowan D, Conway K, et al. Race, breast cancer subtypes, and survival in the Carolina Breast Cancer Study. JAMA: the Journal of the American Medical Association. 2006;295:2492–502.
9. Dehal A, Abbas A, Johna S. Racial disparities in clinical presentation, surgical treatment and in-hospital outcomes of women with breast cancer: analysis of nationwide inpatient sample database. Breast Cancer Research and Treatment. 2013;139:561–9.
10. Zhang Q, Wang Y, Huang ES. Changes in racial/ethnic disparities in the prevalence of Type 2 diabetes by obesity level among US adults. Ethnicity & Health. 2009;14:439–57.
11. Brancati FL, Kao WH, Folsom AR, Watson RL, Szklo M. Incident type 2 diabetes mellitus in African American and white adults: the Atherosclerosis Risk in Communities Study. JAMA: the Journal of the American Medical Association. 2000;283:2253–9.
12. Kirk JK, D’Agostino RB, Bell RA, Passmore LV, Bonds DE, Karter AJ, et al. Disparities in HbA1c Levels Between African-American and Non-Hispanic White Adults With Diabetes: A meta-analysis. Diabetes Care. 2006;29:2130–6.
13. Lanting LC, Joung IM, Mackenbach JP, Lamberts SW, Bootsma AH. Ethnic differences in mortality, End-stage complications, and quality of care among diabetic patients a review. Diabetes Care. 2005;28:2280–8.
14. Staiano AE, Harrington DM, Johannsen NM, Newton Jr RL, Sarzynski MA, Swift DL, et al. Uncovering physiological mechanisms for health disparities in type 2 diabetes. Ethnicity & Disease. 2014;25:31–7.
15. Carnethon MR, Palaniappan LP, Burchfiel CM, Brancati FL, Fortmann SP. Serum Insulin, Obesity, and the Incidence of Type 2 Diabetes in Black and White Adults The
Atherosclerosis Risk in Communities Study: 1987–1998. Diabetes Care. 2002;25:1358–64. 16. Haffner SM, Saad MF, Rewers M, Mykkänen L, Selby J, Howard G, et al. Increased insulin resistance and insulin secretion in nondiabetic African-Americans and Hispanics compared with non-Hispanic whites: the Insulin Resistance Atherosclerosis Study. Diabetes.
17. Qi Q, Hu FB. Genetics of type 2 diabetes in European populations: T2D genetics in Europeans. Journal of Diabetes. 2012;4:203–12.
18. Morris AP, Voight BF, Teslovich TM, Ferreira T, Segrè AV, Steinthorsdottir V, et al. Large-scale association analysis provides insights into the genetic architecture and pathophysiology of type 2 diabetes. Nature Genetics. 2012;44:981–90.
19. Hara K, Fujita H, Johnson TA, Yamauchi T, Yasuda K, Horikoshi M, et al. Genome-wide association study identifies three novel loci for type 2 diabetes. Human Molecular Genetics. 2014;23:239–46.
20. Mahajan A, Go MJ, Zhang W, Below JE, Gaulton KJ, Ferreira T, et al. Genome-wide trans-ancestry meta-analysis provides insight into the genetic architecture of type 2 diabetes susceptibility. Nature Genetics. 2014;46:234–44.
21. Michailidou K, Hall P, Gonzalez-Neira A, Ghoussaini M, Dennis J, Milne RL, et al. Large-scale genotyping identifies 41 new loci associated with breast cancer risk. Nature Genetics. 2013;45:353–61.
22. Michailidou K, Beesley J, Lindstrom S, Canisius S, Dennis J, Lush MJ, et al. Genome-wide association analysis of more than 120,000 individuals identifies 15 new susceptibility loci for breast cancer. Nature Genetics. 2015;47:373–80.
23. Couch FJ, Kuchenbaecker KB, Michailidou K, Mendoza-Fandino GA, Nord S, Lilyquist J, et al. Identification of four novel susceptibility loci for oestrogen receptor negative breast cancer. Nature Communications. 2016;7:11375.
24. Zheng W, Cai Q, Signorello LB, Long J, Hargreaves MK, Deming SL, et al. Evaluation of 11 Breast Cancer Susceptibility Loci in African-American Women. Cancer Epidemiology Biomarkers & Prevention. 2009;18:2761–4.
25. Ruiz-Narvaez EA, Rosenberg L, Cozier YC, Cupples LA, Adams-Campbell LL, Palmer JR. Polymorphisms in the TOX3/LOC643714 Locus and Risk of Breast Cancer in African-American Women. Cancer Epidemiology Biomarkers & Prevention. 2010;19:1320–7. 26. Barnholtz-Sloan JS, Shetty PB, Guan X, Nyante SJ, Luo J, Brennan DJ, et al. FGFR2 and
other loci identified in genome-wide association studies are associated with breast cancer in African-American and younger women. Carcinogenesis. 2010;31:1417–23.
27. Hutter CM, Young AM, Ochs-Balcom HM, Carty CL, Wang T, Chen CTL, et al.
Replication of Breast Cancer GWAS Susceptibility Loci in the Women’s Health Initiative African American SHARe Study. Cancer Epidemiology Biomarkers & Prevention. 2011;20:1950–9.
28. Chen F, Chen GK, Millikan RC, John EM, Ambrosone CB, Bernstein L, et al.
Fine-mapping of breast cancer susceptibility loci characterizes genetic risk in African Americans. Human Molecular Genetics. 2011;20:4491–503.
29. Zheng Y, Ogundiran TO, Adebamowo C, Nathanson KL, Domchek SM, Rebbeck TR, et al. Lack of association between common single nucleotide polymorphisms in the TERT-CLPTM1L locus and breast cancer in women of African ancestry. Breast Cancer Research and Treatment. 2012;132:341–5.
30. Huo D, Zheng Y, Ogundiran TO, Adebamowo C, Nathanson KL, Domchek SM, et al. Evaluation of 19 susceptibility loci of breast cancer in women of African ancestry. Carcinogenesis. 2012;33:835–40.
31. Palmer JR, Ruiz-Narvaez EA, Rotimi CN, Cupples LA, Cozier YC, Adams-Campbell LL, et al. Genetic Susceptibility Loci for Subtypes of Breast Cancer in an African American Population. Cancer Epidemiology Biomarkers & Prevention. 2013;22:127–34.
32. Zheng Y, Ogundiran TO, Falusi AG, Nathanson KL, John EM, Hennis AJM, et al. Fine mapping of breast cancer genome-wide association studies loci in women of African ancestry identifies novel susceptibility markers. Carcinogenesis. 2013;34:1520–8. 33. Long J, Zhang B, Signorello LB, Cai Q, Deming-Halverson S, Shrubsole MJ, et al.
Evaluating Genome-Wide Association Study-Identified Breast Cancer Risk Variants in African-American Women. Peterlongo P, editor. PLoS ONE. 2013;8:e58350.
34. O’Brien KM, Cole SR, Poole C, Bensen JT, Herring AH, Engel LS, et al. Replication of Breast Cancer Susceptibility Loci in Whites and African Americans Using a Bayesian Approach. American Journal of Epidemiology. 2014;179:382–94.
35. Palmer ND, McDonough CW, Hicks PJ, Roh BH, Wing MR, An SS, et al. A Genome-Wide Association Search for Type 2 Diabetes Genes in African Americans. Kronenberg F, editor. PLoS ONE. 2012;7:e29202.
36. Waters KM, Stram DO, Hassanein MT, Le Marchand L, Wilkens LR, Maskarinec G, et al. Consistent Association of Type 2 Diabetes Risk Variants Found in Europeans in Diverse Racial and Ethnic Groups. McCarthy MI, editor. PLoS Genetics. 2010;6:e1001078.
37. Saxena R, Elbers CC, Guo Y, Peter I, Gaunt TR, Mega JL, et al. Large-scale gene-centric meta-analysis across 39 studies identifies type 2 diabetes loci. The American Journal of Human Genetics. 2012;90:410–25.
38. Cooke JN, Ng MCY, Palmer ND, An SS, Hester JM, Freedman BI, et al. Genetic Risk Assessment of Type 2 Diabetes-Associated Polymorphisms in African Americans. Diabetes Care. 2012;35:287–92.
39. Haiman CA, Fesinmeyer MD, Spencer KL, Buzkova P, Voruganti VS, Wan P, et al. Consistent Directions of Effect for Established Type 2 Diabetes Risk Variants Across Populations: The Population Architecture using Genomics and Epidemiology (PAGE) Consortium. Diabetes. 2012;61:1642–7.
40. Ng MCY, Shriner D, Chen BH, Li J, Chen W-M, Guo X, et al. Meta-analysis of genome-wide association studies in african americans provides insights into the genetic architecture of type 2 diabetes. PLoS Genetics. 2014;10:e1004517.
41. McCormack S, Grant SFA. Genetics of Obesity and Type 2 Diabetes in African Americans. Journal of Obesity. 2013;2013:1–12.
42. Ruiz-Narvaez EA, Rosenberg L, Yao S, Rotimi CN, Cupples AL, Bandera EV, et al. Fine-mapping of the 6q25 locus identifies a novel SNP associated with breast cancer risk in African-American women. Carcinogenesis. 2013;34:287–91.
An Exome-Wide Analysis of Low Frequency and Rare Variants in Relation to Risk of Breast Cancer in African American Women: the AMBER Consortium (Carcinogenesis 2016; 37(9): 870-877, with permission of Oxford University Press)
A large percentage of breast cancer heritability remains unaccounted for, and most of the known susceptibility loci have been established in European and Asian populations. Rare variants may contribute to the unexplained heritability of this disease, including in women of African ancestry (AA). We conducted an exome-wide analysis of rare variants in relation to risk of overall and subtype-specific breast cancer in the African American Breast Cancer Epidemiology and Risk (AMBER) Consortium, which includes data from four large studies of AA women. Genotyping on the Illumina Human Exome Beadchip yielded data for 170,812 SNPs and 8287 subjects: 3629 cases (1093 estrogen receptor negative (ER-), 1968 ER+, 568 ER unknown) and 4658 controls, the largest exome chip study to date for AA breast cancer. Pooled gene-based association analyses were performed using the unified optimal sequence kernel association test (SKAT-O) for variants with minor allele frequency (MAF) ≤ 5%. In addition, each variant with MAF >0.5% was tested for association using logistic regression. There were no significant associations with overall breast cancer. However, a novel gene, FBXL22 (gene-based p = 8.2 x 10-6
), and a gene previously identified in GWAS of European ancestry populations, PDE4D (gene-based p = 1.2 x 10-6
), were significantly associated with ER- breast cancer after correction for multiple testing. Cases with the associated rare variants were also
negative for progesterone and human epidermal growth factor receptors – thus, triple-negative cancer. Replication is required to confirm these gene-level associations, which are based on very small counts at extremely rare SNPs.
Genome-wide association studies (GWAS) have identified more than 90 genetic loci associated with breast cancer (1,2). Per-allele odds ratios have been modest (most <1.2), as is typical for GWAS findings. These low-penetrance loci, together with previously discovered high- and moderate-penetrance genes, fail to explain the majority of the genetic contribution to the disease (1–4). Most GWAS-based associations have been established in European or Asian populations, and the majority of these associations have failed to reach statistical significance in studies of African ancestry (AA) women (5–15). While larger AA sample sizes and accounting for differences in linkage
disequilibrium (LD) across ethnicities would likely result in more successful replications, the European-discovered risk variants may also explain less of the genetic contribution to breast cancer in AA women.
While some of the unexplained breast cancer heritability in AA women may be due to unidentified common susceptibility SNPs in this population, another portion may be explained by less common (1-5%) and rare variants (< 1%). These lower frequency variants represent a large proportion of all human genetic variation but are poorly captured by most GWAS arrays (16). There are a growing number of examples of rare variants associated with complex disease, with findings for autism, schizophrenia,
inflammatory bowel disease, and diabetes (17). In addition to the already established high- and moderate-penetrance genes for breast cancer (3,4,18), novel low frequency risk variants for cancers including prostate (19) and ovarian (20,21) have also been reported. Still, it remains unclear how much rare variants contribute to the heritability of breast cancer and other complex diseases.
In recent years the development of exome-wide arrays has allowed for the relatively inexpensive assessment of known rare exonic variants. Current exome arrays include >200,000 coding variants and were developed on the basis of whole exome sequencing data from ~12,000 individuals. Most of those sequenced were of European ancestry, but a small number of AAs and other ethnicities were included as well (16,22).
A case-control study nested in the Multiethnic Cohort (MEC) used the Illumina exome chip to investigate the role of rare exonic variation in the etiology of breast cancer (16). Single SNP analyses were conducted, as well as gene-based testing of the burden of rare alleles. Only one significant association was found, for splice-site SNP rs145889899 in the LDLRAD1 gene. This variant was only seen in AAs (with a frequency of 0.65% in AA controls) and had an odds ratio of 3.74. While no additional findings were
significant, there was low power to detect genotype relative risks ≤ 2 in the AA participants due to the modest number of available AA cases (N = 591).
The present study combined the MEC exome chip data with exome chip data from three additional studies of breast cancer in AA women, forming the African American Breast Cancer Epidemiology and Risk (AMBER) Consortium, the largest exome wide analysis sample to date for AA breast cancer (3629 cases and 4658 controls).
We primarily used gene-based methods for association analysis, given the relatively low power and high multiple testing burden for single SNP analyses of rare variants (22,23). Gene-based testing has the potential to increase power when multiple SNPs in a given gene are associated (22).
METHODS Study Population
This investigation was conducted using data from the AMBER Consortium, a collaboration of four of the largest studies of breast cancer in AA women. The AMBER Consortium has been described previously (24), and prior reports have detailed the individual studies: the Carolina Breast Cancer Study (CBCS) (25), the Women’s Circle of Health Study (WCHS) (26,27), the Black Women’s Health Study (BWHS) (28), and the Multiethnic Cohort (MEC) (29). Institutional Review Board approval was obtained for each study, and all participants provided written informed consent.
Briefly, the CBCS is a North Carolina population-based case-control study of women aged 20 to 74 years that began in 1993. The North Carolina Central Cancer Registry’s rapid case ascertainment system was used for case identification, and controls were selected through 2001 using Division of Motor Vehicles lists (age <65 years) and Health Care Financing Administration lists (age ≥ 65). Interviewers collected
questionnaire data and samples for DNA analysis in home visits.
The WCHS is a multi-site case-control study in New York City (NYC) (2002-2008) and New Jersey (NJ) (2006-present). Hospital-based ascertainment of cases aged
20 to 75 years was used in NYC, and controls were selected through random digit dialing (RDD). Cases in NJ are identified by the NJ State Cancer Registry using rapid case ascertainment, and controls are identified through RDD and community-based efforts (27). Risk factor data and samples for DNA analysis are obtained during in-person interviews.
The BWHS is a prospective cohort study of 59,000 AA women from across the United States who enrolled by completing a postal health questionnaire in 1995. The age range at baseline was 21–69 years. Biennial follow-up questionnaires identify new cases of breast cancer, and these cases are confirmed by medical records or from state cancer registry data and the National Death Index. Nearly 27,000 BWHS participants provided saliva samples for DNA analysis.
The MEC is a prospective cohort study that began in 1993 with the enrollment of men and women aged 45–75 years from a range of ethnic groups in Hawaii and
California. Data are collected by mailed questionnaire at 5-year intervals, and breast cancer cases are confirmed through the Hawaii and California state cancer registries and the National Death Index. Blood samples were collected from study participants for DNA analysis.
Eligible cases for the present analyses were AA women with incident invasive breast cancer or ductal carcinoma in situ (DCIS). For BWHS and MEC, controls were chosen from among women without breast cancer, and were frequency matched to cases on geographical region, sex, race, and 5-year age group. Estrogen receptor (ER) status
for cases was determined using pathology data from hospital records or cancer registry records.
Genotyping and QC
Genotyping of DNA from participants in the BWHS, CBCS, and WCHS was performed by the Center for Inherited Disease Research (CIDR) using the Illumina Human Exome Beadchip v1.1. This array includes >200,000 coding variants, as well as tag SNPs for GWAS hits, a grid of common variants, and ancestry informative markers (AIMs). A description of the exome chip design is available from
http://genome.sph.umich.edu/wiki/Exome_Chip_Design. CIDR used the GenTrain Version 1.0 calling algorithm in GenomeStudio version 2011.1, Genotyping Module 1.9.4. Manual review was conducted for all Y, XY pseudoautosomal, and mitochondrial SNPs. Autosomal and X chromosome SNPs were also manually reviewed if a rare heterozygous cluster may have been missed by the GenCall algorithm and if the zCall algorithm (30) identified four or more possible new heterozygous points.
A total of 246,519 SNPs were genotyped, and 231,705 SNPs remained after excluding variants that failed technical filters imposed by CIDR, or QC filters
recommended by the University of Washington. Briefly, genotypes with a GenCall (GC) score <0.15 were classified as missing, and SNPs were removed if they had poor cluster properties (ex. cluster separation <0.2 or <0.3 depending on allele frequency), call rates <0.98, Hardy-Weinberg Equilibrium p <1x10-4
, >1 Mendelian error in trios from HapMap (31), or >2 discordant calls in duplicate samples. Mitochondrial and Y chromosome
SNPs were also removed. Genotypes were attempted for 6936 participants from the BWHS, CBCS, and WCHS, and were completed with call rate >98% for 6828 participants, which included 3130 cases (963 estrogen receptor negative (ER-), 1674 ER+, 493 ER unknown) and 3698 controls.
Genetic data from 499 cases (130 ER-, 294 ER+, 75 ER unknown) and 960 controls in the MEC were available from genotyping on a previous version of the exome chip (16) which contained >99% of the high quality variants from v1.1. Genotypes from MEC were combined with the data from the other AMBER studies into a data set
containing 245,571 SNPs. Greater than 66,000 SNPs were monomorphic in the
combined set and were omitted from analyses, as were SNPs with high quality data from only one of the two exome chips and SNPs with any discordant genotypes across the two chips for 30 MEC participants who were included on both. The final data set for analysis included 170,812 SNPs and 8287 participants: 3629 cases (1093 ER-, 1968 ER+, 568 ER unknown) and 4658 controls.
We used the CHARGE (Cohorts for Heart and Aging Research in Genomic Epidemiology) Consortium’s annotation of exome chip variants (version 6, 11/7/14) downloaded from http://www.chargeconsortium.com/main/exomechip (32). This annotation was performed with dbNSFP version 2.6 (33,34).
We used the smartpca program in the EIGENSOFT package (35) to conduct a principal components analysis (PCA) based on ~42,000 common SNPs, most of which were custom content additions to the exome chip for use in other AMBER projects. In a separate analysis, PLINK version 1.07 (36) was used to estimate identity by descent in
participant pairs, and identified 130 sets of relatives across and within the individual studies, consisting of 270 individuals. These 270 individuals were flagged, as were 35 outlying individuals from the PCA, so that sensitivity analyses could be performed. Genotype principal components were tested for association with case status after controlling for the study covariates: study, DNA source (blood, saliva[Oragene], saliva[mouthwash]), and the matching variables. While no principal components were strongly associated in the multivariable model, we included terms for principal
components with p <0.1 in our analyses.
Gene-based association analyses for overall, ER+, and ER- breast cancer were conducted using the unified optimal sequence kernel association test (SKAT-O) (37), as implemented in the R package seqMeta (38). As a linear combination of the burden and SKAT (39) tests, SKAT-O achieves robust power whether a given gene has a high proportion of causal variants exerting effects in the same direction, or instead has many noncausal variants or variants exerting effects in opposite directions (22). We used the default SKAT-O option in the seqMeta package that considers rho = 1 (burden) and rho = 0 (SKAT) tests and selects the optimal of the two tests. Depending on which test is chosen, SKAT-O models the phenotype vs. a weighted aggregation of either the variants (burden test) or the variant score test statistics (SKAT) to produce a gene-level p-value that indicates the degree of enrichment of rare variant associations in that gene (37). We included variants with minor allele frequency (MAF) ≤ 5%, and used the beta distribution
weights proposed by Wu et al. (39), which upweight rarer variants, for both tests. We used a Bonferroni correction based on the number of genes evaluated to assess the significance of the gene-based test results.
We performed separate gene-based analyses for three sets of exonic variants: 1) “NS_strict” variants (based on Purcell et al. (40)): stopgain, stoploss, frameshift, or predicted damaging by all five of the following algorithms: SIFT (41), mutationTaster category [A or D] (42), LRT (43), PolyPhen_HDIV (44), and PolyPhen_HVAR (44), 2) “NS_broad” variants (Purcell et al. (40)): “NS_strict” variants plus those variants that are predicted damaging by at least one of the five algorithms, and 3) All nonsynonymous variants (“NS_all”): “NS_broad” variants plus all other missense and splice variants. Testing of these three sets of variants gave us more flexibility to find the best set of SNPs for gene-based analysis (ideally a set including most or all truly associated SNPs, but few, if any, unassociated SNPs).
Single SNP association analyses were conducted using logistic regression as implemented in PLINK version 1.07. These analyses were restricted to variants with MAF >0.5% in order to avoid performing a large number of underpowered tests. We used a Bonferroni adjustment for the effective number of independent tests, applying the method of Gao et al. (45), to assess the significance of the single SNP results.
Both gene-based and single SNP analyses were adjusted for study, age, geographic region, DNA source, and genotype principal components 5, 6, and 8 in a pooled analysis that combined individual level data across the four studies in AMBER. This approach was preferred over meta-analysis given prior evidence that pooled analysis
is more powerful for gene-based testing of rare variants under conditions where pooling is appropriate (46).
The present analyses included 3629 breast cancer cases (1093 ER-, 1968 ER+, 568 ER unknown) and 4658 controls. Table 1.1 shows the distribution of ER subtypes and age at diagnosis for the cases by study site.
There were 184,100 annotation records for the 170,812 SNPs that passed QC filters: some SNPs mapped to more than one gene, and these multiple mappings were maintained for the gene-based analyses we performed. More than 80% of the SNP records were annotated as nonsynonymous, including missense, stopgain, stoploss, frameshift, and splicing variants (see Supplementary Table 1.S1 for the full distribution of roles for the final SNP set). Over 80% of the SNPs had MAF <5% in AMBER, over 70% had MAF <1%, and nearly half of the SNPs had MAF <0.1%. QQ plots for the gene-based and single SNP association analyses we performed are shown in
Supplementary Figure 1.S1. As is common for SKAT analyses of binary traits, there was inflation in the gene-based test results (47,48).
The number of gene-based tests conducted and the resulting alpha levels for significance are listed in Table 1.2 by outcome and SNP group. As the SNP functional group became more strict, fewer tests were conducted because fewer genes contained at least two SNPs in the given group. Fewer gene-based tests were conducted for the ER+ and ER- analyses compared to overall breast cancer because these subtype analyses had
smaller sample sizes, which resulted in more monomorphic SNPs that were excluded. Table 1.3 shows the five most significant genes for each SKAT-O run (see Supplementary Table 1.S2 for the top 50 genes for each set of variants). For overall and ER+ breast cancer, RTN4RL1 was the most significant gene for both the “NS_all” and “NS_broad” SNP sets, with nominal p-values ranging from 1.8 x 10-5
to 1.9 x 10-4
. These results were based on 6-10 SNPs that were used in burden tests (the SKAT-O method selected rho = 1 as optimal in these instances). For the “NS_strict” variants, the most significant genes for overall and ER+ breast cancer were IQCA1 (p = 4.6 x 10-4
) and FSCN3 (p = 2.3 x 10-4
), respectively. None of the top results for overall or ER+ breast cancer survived a multiple testing correction based on the number of genes evaluated.
The most significant genes for ER- breast cancer were PDE4D (p = 1.2 x 10-6 using either the “NS_all” or “NS_broad” SNP sets) and FBXL22 (p = 8.2 x 10-6
using the “NS_strict” SNP set), and these survived a correction for multiple testing. The PDE4D and FBXL22 results were each based on burden testing (rho = 1) of two SNPs with a cumulative MAF ~0.02%. Details of the four SNPs contributing to these significant test results are shown in Table 1.4. All of the contributing SNPs are nonsynonymous coding SNPs for multiple isoforms of PDE4D or FBXL22. These SNPs had good genotyping cluster properties (Supplementary Figure 1.S2) and 100% genotyping pass rates in the present study. Each SNP had an MAF ~0.01% in the AMBER analysis of ER- cases and controls, due to the presence of one rare allele in one invasive ER- case. The rare allele carriers were four independent participants (one for each SNP) with varying ages at diagnosis and percentages of African ancestry (Table 1.4). All four of these women had
triple-negative breast cancer (tumors negative for estrogen receptors, progesterone receptors and human epidermal growth factor receptor 2). Among all 1093 ER- cases, 599 had been classified as triple negative based on available data on all three molecular markers. The four SNPs of interest are monomorphic in AAs from the 1000 Genomes Project (49) Phase 3 and the NHLBI ESP (National Heart, Lung, and Blood Institute Exome Sequencing Project) (50). These same projects report very low allele frequencies (≤ 0.2%) for these SNPs in European ancestry populations (Table 1.4).
Sensitivity analyses were run for the PDE4D and FBXL22 genes vs. ER- breast cancer, excluding first-degree and second-degree relatives identified via the genotypes, as well as PCA outliers who clustered with HapMap 3 Europeans, Mexicans, or Asians. Results became more significant with these exclusions (PDE4D p = 9.9 x 10-7
; FBXL22 p = 7.4 x 10-6
Single SNP association analyses were performed for the 58,776 SNPs with MAF >0.5%. The correlation among these SNPs yielded the equivalent of 50,245 independent tests (45); therefore, the threshold for significance was set at 9.95 x 10-7. SNP rs8100241, a previously reported risk marker at the ER- / triple-negative GWAS locus on
chromosome 19p13.11 (12,51–55), met this study-wide threshold for ER- disease (p = 1.7 x 10-7
). The A allele at rs8100241 had a frequency of 40% in the present study and was associated with a decreased risk of ER- breast cancer (OR 0.75, 95% CI 0.68, 0.84). No other individual SNPs reached statistical significance (Supplementary Table 1.S3), including the LDLRAD1 SNP rs145889899 (p = 0.17), for which an association had been reported in the MEC exome chip study (16).
In these analyses, we observed significant associations between the PDE4D and FBXL22 genes and ER- breast cancer in a relatively large sample of AA women, using gene-based testing of rare exonic variants. Two nonsynonymous coding SNPs in each of these genes were responsible for their significant associations. The minor allele at each of these four SNPs was present in one invasive ER- case each (a different case subject for each of the four SNPs). These four cases were all triple-negative breast cancers. The four SNPs of interest from these two genes were absent in the AMBER controls. This is consistent with their reported monomorphism in AA samples from the 1000 Genomes Project and the NHLBI ESP.
Although the association we report for the PDE4D gene is with ER- (and triple-negative) breast cancer, one of the two rare SNPs contributing to this association (rs201360779) was also seen in one invasive ER+ case. Thus, this gene may affect the risk of both ER subtypes. The two contributing PDE4D SNPs in our study were
predicted to be damaging by mutationTaster (42), although SIFT (41) predicted that these mutations would be tolerated. The PolyPhen HDIV and HVAR models (44) predicted damaging results from rs200725508, but these algorithms predicted benign results for rs201360779 (with the exception of the HDIV prediction of “possibly damaging” for this SNP for one PDE4D isoform).
The 2013 GWAS meta-analysis by Michailidou and colleagues (1) reported a breast cancer association with SNP rs1353747, which is located in an intron of PDE4D. In that study, the minor G allele at this common SNP showed weak protective
associations for both ER+ (OR = 0.93) and ER- (OR = 0.92) disease. In the current study, rs1353747 showed slightly stronger effects but in the opposite direction (ER+ OR = 1.13; ER- OR = 1.29), and these results did not reach statistical significance given the smaller sample size compared to Michailidou et al. and the lower frequency of this variant in African vs. European populations (1% vs. 8% based on 1000 Genomes).
PDE4D is located on chromosome 5q11.2-12.1 and encodes phosphodiesterase subtype 4D, a member of the PDE4 family of phosphodiesterases, which multiple tumor cell types express as major regulators of cAMP degradation (56). PDE4D may function as a tumor-promoting factor by causing lower cAMP concentrations, which have been linked to increased survival and proliferation of cancer cells. This oncogenic role is supported by experiments showing that inhibition of PDE4D causes apoptosis and growth retardation in multiple types of cancer cells, including breast, but not in nonmalignant epithelial cells (56).
Lin et al. (56) reported PDE4D homozygous deletions in 198 of 5569 (3.6%) primary tumors from The Cancer Genome Atlas (TCGA) projects and TumorScape (57), with most being internal microdeletions. They also found microdeletions in established cancer cell lines including breast. These microdeletions were associated with increased expression of the protein, and they affected upstream conserved regions 1 (UCR1) and 2 (UCR2) of the gene. UCR1 and UCR2 inhibit PDE4D activity, likely by forming complexes with the PDE4D catalytic domain before cAMP enters the site. Lin et al. showed that a short isoform of PDE4D with no functional UCR1 or UCR2 promoted cancer cell growth, while a long isoform that contained both UCR1 and UCR2 did not.
In the present study, the two rare missense mutations contributing to the PDE4D gene-level association were located upstream of UCR1 and UCR2 and were risk variants (not protective). It could be hypothesized that these variants act by inducing a change in protein structure that disrupts the interaction of the UCRs with the catalytic domain of PDE4D, thereby increasing protein activity.
FBXL22 has not previously been associated with breast cancer. This gene is located on chromosome 15q22.31 and encodes F-box and leucine-rich repeat protein 22. This F-box protein, a ubiquitin ligase component, has been shown to promote the
degradation of sarcomeric proteins, and is critical for maintaining cardiac contractility in vivo (58). It is unclear what biological mechanism might link FBXL22 to breast cancer development. Nevertheless, the two rare SNPs contributing to the FBXL22 / ER-
association in this study met strict criteria for variant functionality: these nonsynonymous SNPs were predicted to be damaging by five different algorithms (41–44).
Single SNP analyses confirmed an ER- association for the GWAS locus on chromosome 19p13.11 (12,51–55). The associated SNP in the present study was the common missense variant rs8100241 in the ANKLE1 gene. This SNP has shown
significant associations with overall (51), ER- (54), and triple-negative (52) breast cancer in prior studies of mostly European ancestry subjects. These studies reported odds ratios ranging from 0.84 to 0.88 for the A allele at this SNP, as compared with the odds ratio of 0.75 reported in the present analysis for ER- breast cancer. It should be noted, however, that the large GAME-ON meta-analysis (http://gameon.dfci.harvard.edu) reported weaker effect estimates for rs8100241: the odds ratio reported for overall breast cancer was 0.95
(95% CI 0.92, 0.99; p = 0.017), and the odds ratio for ER- breast cancer was 0.94 (95% CI 0.83, 1.07; p = 0.36).
Although the present study sample is the largest exome wide analysis sample to date for AA breast cancer, this analysis was underpowered to detect per-allele odds ratios <1.5, except when cumulative risk allele frequencies per gene approached or exceeded 5%. Further power limitations existed for analyses by ER status.
The significant gene-level findings reported here are based on four SNP variants that appear only once each in the AMBER sample of ER- cases and controls. Given these very small counts and the inflation seen in the gene-based test statistics, our results should be interpreted with caution. A simple Fisher’s exact test for the presence of a rare allele in PDE4D vs. ER- case / control status yields a p-value of 0.036, as does the same test for FBXL22. Fisher’s exact test is conservative and does not upweight rarer variants or account for covariates; however, the modest p-value from this test emphasizes the need for replication to verify associations between these genes and ER- breast cancer.
This is not to say that the rare variant calls for the four SNPs of interest are questionable. There is good reason to believe that these calls were accurate in our study. These SNPs are known rare variants that the exome chip was designed to capture. Each SNP was seen in two or more studies that contributed sequence data for development of the exome chip: the minor allele at rs201360779 was seen 29 times across the ~12,000 sequenced individuals, the minor allele at rs149590841 was seen seven times, and the minor alleles for the other two SNPs were each seen three times
all four SNPs showed high quality genotype clusters and clear separation of the
heterozygous calls from the remainder of the genotypes. In addition, SNP rs201360779 was not a true singleton in AMBER, having also been seen in one ER+ case.
The exome chip used here has inherent limitations. First and most obvious, this array-based method includes only selected rare variants and is therefore not as exhaustive as exome sequencing in capturing rare exonic variants. Second, this chip does not
attempt to assay rare variation in noncoding regions. Third, the chip was designed using exome sequencing data from mostly European samples. Therefore, rare variants in non-Europeans are not as well captured, and our data set may have lacked information on some important rare SNPs in AA populations (22).
Another potential limitation of our study is a current limitation of the field: the use of traditional methods such as PCA (or linear mixed models) to adjust for population stratification in rare variant association studies. These methods may not adequately control for population structure in some rare variant analysis settings; thus, the development of new methods for this purpose is an area of active research (23).
We should also acknowledge that while our multiple testing correction adjusted for the number of genes analyzed, there were additional levels of testing that were not included in this correction. Multiple outcomes (overall, ER+, ER- breast cancer) and SNP functional groups (NS_all, NS_broad, NS_strict) were also analyzed. We did not correct for the multiple breast cancer outcomes because there was considerable overlap among the groups of patients analyzed for overall, ER+, and ER- breast cancer, and we considered these to be tests of related hypotheses. There was also a high amount of
interdependence among the three SNP functional groups, which would render a Bonferroni correction overly conservative. Nevertheless, implementing an adjustment for all of the ER- gene-based tests conducted across the three SNP functional groups results in a corrected p-value of 0.035 for PDE4D, although the corrected p-value for FBXL22 becomes non-significant (0.240).
In summary, an exome-wide gene-based analysis of rare variants found
significant associations between the PDE4D and FBXL22 genes and ER- breast cancer in a collaborative study of AA women. The previous GWAS finding of a breast cancer risk marker in the PDE4D gene supports the idea that rare variants in this region in particular might affect breast cancer risk. Replication is needed to confirm the gene-level
associations reported here, which are based on very small counts at extremely rare variants.
1. Michailidou K, Hall P, Gonzalez-Neira A, Ghoussaini M, Dennis J, Milne RL, et al. Large-scale genotyping identifies 41 new loci associated with breast cancer risk. Nature Genetics. 2013;45:353–61.
2. Michailidou K, Beesley J, Lindstrom S, Canisius S, Dennis J, Lush MJ, et al. Genome-wide association analysis of more than 120,000 individuals identifies 15 new susceptibility loci for breast cancer. Nature Genetics. 2015;47:373–80.
3. Lalloo F, Evans DG. Familial Breast Cancer. Clinical Genetics. 2012;82:105–14.
4. Njiaju UO, Olopade OI. Genetic Determinants of Breast Cancer Risk: A Review of Current Literature and Issues Pertaining to Clinical Application: Genetic Determinants of Breast Cancer Risk. The Breast Journal. 2012;18:436–42.
5. Zheng W, Cai Q, Signorello LB, Long J, Hargreaves MK, Deming SL, et al. Evaluation of 11 Breast Cancer Susceptibility Loci in African-American Women. Cancer Epidemiology Biomarkers & Prevention. 2009;18:2761–4.
6. Ruiz-Narvaez EA, Rosenberg L, Cozier YC, Cupples LA, Adams-Campbell LL, Palmer JR. Polymorphisms in the TOX3/LOC643714 Locus and Risk of Breast Cancer in African-American Women. Cancer Epidemiology Biomarkers & Prevention. 2010;19:1320–7. 7. Barnholtz-Sloan JS, Shetty PB, Guan X, Nyante SJ, Luo J, Brennan DJ, et al. FGFR2 and
other loci identified in genome-wide association studies are associated with breast cancer in African-American and younger women. Carcinogenesis. 2010;31:1417–23.
8. Hutter CM, Young AM, Ochs-Balcom HM, Carty CL, Wang T, Chen CTL, et al.
Replication of Breast Cancer GWAS Susceptibility Loci in the Women’s Health Initiative African American SHARe Study. Cancer Epidemiology Biomarkers & Prevention. 2011;20:1950–9.
9. Chen F, Chen GK, Millikan RC, John EM, Ambrosone CB, Bernstein L, et al.
Fine-mapping of breast cancer susceptibility loci characterizes genetic risk in African Americans. Human Molecular Genetics. 2011;20:4491–503.
10. Zheng Y, Ogundiran TO, Adebamowo C, Nathanson KL, Domchek SM, Rebbeck TR, et al. Lack of association between common single nucleotide polymorphisms in the TERT-CLPTM1L locus and breast cancer in women of African ancestry. Breast Cancer Research and Treatment. 2012;132:341–5.
11. Huo D, Zheng Y, Ogundiran TO, Adebamowo C, Nathanson KL, Domchek SM, et al. Evaluation of 19 susceptibility loci of breast cancer in women of African ancestry. Carcinogenesis. 2012;33:835–40.
12. Palmer JR, Ruiz-Narvaez EA, Rotimi CN, Cupples LA, Cozier YC, Adams-Campbell LL, et al. Genetic Susceptibility Loci for Subtypes of Breast Cancer in an African American Population. Cancer Epidemiology Biomarkers & Prevention. 2013;22:127–34.
13. Zheng Y, Ogundiran TO, Falusi AG, Nathanson KL, John EM, Hennis AJM, et al. Fine mapping of breast cancer genome-wide association studies loci in women of African ancestry identifies novel susceptibility markers. Carcinogenesis. 2013;34:1520–8. 14. Long J, Zhang B, Signorello LB, Cai Q, Deming-Halverson S, Shrubsole MJ, et al.
Evaluating Genome-Wide Association Study-Identified Breast Cancer Risk Variants in African-American Women. Peterlongo P, editor. PLoS One. 2013;8:e58350.
15. O’Brien KM, Cole SR, Poole C, Bensen JT, Herring AH, Engel LS, et al. Replication of Breast Cancer Susceptibility Loci in Whites and African Americans Using a Bayesian Approach. American Journal of Epidemiology. 2014;179:382–94.
16. Haiman CA, Han Y, Feng Y, Xia L, Hsu C, Sheng X, et al. Genome-wide testing of putative functional exonic variants in relationship with breast and prostate cancer risk in a multiethnic population. PLoS Genetics. 2013;9:e1003419.
17. Panoutsopoulou K, Tachmazidou I, Zeggini E. In search of low-frequency and rare variants affecting complex traits. Human Molecular Genetics. 2013;22:R16–21.
18. Apostolou P, Fostira F. Hereditary Breast Cancer: The Era of New Susceptibility Genes. BioMed Research International. 2013;2013:1–11.
19. Gudmundsson J, Sulem P, Gudbjartsson DF, Masson G, Agnarsson BA, Benediktsdottir KR, et al. A study based on whole-genome sequencing yields a rare variant at 8q24 associated with prostate cancer. Nature Genetics. 2012;44:1326–9.
20. Loveday C, Turnbull C, Ramsay E, Hughes D, Ruark E, Frankum JR, et al. Germline mutations in RAD51D confer susceptibility to ovarian cancer. Nature Genetics. 2011;43:879–82.
21. Rafnar T, Gudbjartsson DF, Sulem P, Jonasdottir A, Sigurdsson A, Jonasdottir A, et al. Mutations in BRIP1 confer high risk of ovarian cancer. Nature Genetics. 2011;43:1104–7. 22. Lee S, Abecasis GR, Boehnke M, Lin X. Rare-Variant Association Analysis: Study Designs
and Statistical Tests. American Journal of Human Genetics. 2014;95:5–23. 23. Auer PL, Lettre G. Rare variant association studies: considerations, challenges and
opportunities. Genome Medicine. 2015;7:16.
24. Palmer JR, Ambrosone CB, Olshan AF. A collaborative study of the etiology of breast cancer subtypes in African American women: the AMBER consortium. Cancer Causes and Control. 2014;25:309–19.
25. Newman B, Moorman PG, Millikan R, Qaqish BF, Geradts J, Aldrich TE, et al. The Carolina Breast Cancer Study: integrating population-based epidemiology and molecular biology. Breast Cancer Research and Treatment. 1995;35:51–60.
26. Ambrosone CB, Ciupak GL, Bandera EV, Jandorf L, Bovbjerg DH, Zirpoli G, et al. Conducting Molecular Epidemiological Research in the Age of HIPAA: A
Multi-Institutional Case-Control Study of Breast Cancer in African-American and European-American Women. Journal of Oncology. 2009;2009:871250.
27. Bandera EV, Chandran U, Zirpoli G, McCann SE, Ciupak G, Ambrosone CB. Rethinking sources of representative controls for the conduct of case-control studies in minority populations. BMC Medical Research Methodology. 2013;13:71.
28. Rosenberg L, Adams-Campbell L, Palmer JR. The Black Women’s Health Study: a follow-up study for causes and preventions of illness. Journal of the American Medical Women's Association. 1995;50:56–8.
29. Kolonel LN, Henderson BE, Hankin JH, Nomura AM, Wilkens LR, Pike MC, et al. A multiethnic cohort in Hawaii and Los Angeles: baseline characteristics. American Journal of Epidemiology. 2000;151:346–57.
30. Goldstein JI, Crenshaw A, Carey J, Grant GB, Maguire J, Fromer M, et al. zCall: a rare variant caller for array-based genotyping: Genetics and population analysis. Bioinformatics. 2012;28:2543–5.
31. The International HapMap Consortium. A haplotype map of the human genome. Nature. 2005;437:1299–320.
32. Grove ML, Yu B, Cochran BJ, Haritunians T, Bis JC, Taylor KD, et al. Best Practices and Joint Calling of the HumanExome BeadChip: The CHARGE Consortium. Aulchenko YS, editor. PLoS One. 2013;8:e68095.
33. Liu X, Jian X, Boerwinkle E. dbNSFP: A lightweight database of human nonsynonymous SNPs and their functional predictions. Human Mutation. 2011;32:894–9.
34. Liu X, Jian X, Boerwinkle E. dbNSFP v2.0: A Database of Human Non-synonymous SNVs and Their Functional Predictions and Annotations. Human Mutation. 2013;34:E2393– E2402.
35. Patterson N, Price AL, Reich D. Population Structure and Eigenanalysis. PLoS Genetics. 2006;2:e190.
36. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, et al. PLINK: A Tool Set for Whole-Genome Association and Population-Based Linkage Analyses. The American Journal of Human Genetics. 2007;81:559–75.
37. Lee S, Emond MJ, Bamshad MJ, Barnes KC, Rieder MJ, Nickerson DA, et al. Optimal unified approach for rare-variant association testing with application to small-sample case-control whole-exome sequencing studies. American Journal of Human Genetics.
38. Voorman A, Brody J, Chen H, Lumley T. seqMeta: An R package for meta-analyzing region-based tests of rare DNA variants [Internet]. 2012. Available from: https://cran.r-project.org/web/packages/seqMeta/index.html.
39. Wu MC, Lee S, Cai T, Li Y, Boehnke M, Lin X. Rare-variant association testing for sequencing data with the sequence kernel association test. American Journal of Human Genetics. 2011;89:82–93.
40. Purcell SM, Moran JL, Fromer M, Ruderfer D, Solovieff N, Roussos P, et al. A polygenic burden of rare disruptive mutations in schizophrenia. Nature. 2014;506:185–90.
41. Kumar P, Henikoff S, Ng PC. Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm. Nature Protocols. 2009;4:1073–81.
42. Schwarz JM, Rödelsperger C, Schuelke M, Seelow D. MutationTaster evaluates disease-causing potential of sequence alterations. Nature Methods. 2010;7:575–6.
43. Chun S, Fay JC. Identification of deleterious mutations within three human genomes. Genome Research. 2009;19:1553–61.
44. Adzhubei IA, Schmidt S, Peshkin L, Ramensky VE, Gerasimova A, Bork P, et al. A method and server for predicting damaging missense mutations. Nature Methods. 2010;7:248–9. 45. Gao X, Starmer J, Martin ER. A multiple testing correction method for genetic association
studies using correlated single nucleotide polymorphisms. Genetic Epidemiology. 2008;32:361–9.
46. Liu L, Sabo A, Neale BM, Nagaswamy U, Stevens C, Lim E, et al. Analysis of Rare, Exonic Variation amongst Subjects with Autism Spectrum Disorders and Population Controls. Zeggini E, editor. PLoS Genetics. 2013;9:e1003443.
47. Ma C, Blackwell T, Boehnke M, Scott LJ, the GoT2D investigators. Recommended Joint and Meta-Analysis Strategies for Case-Control Association Testing of Single Low-Count Variants: Joint and Meta-Analysis of Low-Count Variants. Genetic Epidemiology. 2013; 37:539–50.
48. Lee S, Fuchsberger C, Kim S, Scott L. An efficient resampling method for calibrating single and gene-based rare variant association analysis in case–control studies. Biostatistics. 2015;kxv033.
49. McVean GA, Altshuler (Co-Chair) DM, Durbin (Co-Chair) RM, Abecasis GR, Bentley DR, Chakravarti A, et al. An integrated map of genetic variation from 1,092 human genomes. Nature. 2012;491:56–65.
50. Exome Variant Server, NHLBI GO Exome Sequencing Project (ESP) [Internet]. Seattle, WA; Available from: http://evs.gs.washington.edu/EVS/. Accessed August 2015.
51. Antoniou AC, Wang X, Fredericksen ZS, McGuffog L, Tarrell R, Sinilnikova OM, et al. A locus on 19p13 modifies risk of breast cancer in BRCA1 mutation carriers and is associated with hormone receptor–negative breast cancer in the general population. Nature Genetics. 2010;42:885–92.