• No results found

Pleiotropy

In document Hodonsky_unc_0153D_18920.pdf (Page 76-84)

3.8. State of the literature: quantitative trait analysis

3.8.1. Pleiotropy

One avenue to address missing heritability is by exploiting the correlation between traits with a suspected shared genetic basis (i.e., “pleiotropy”) to increase statistical power. The concept of pleiotropy was first discussed in the 1950s, and animal models, particularly D.

Melanogaster, have been used in the molecular evaluation of pleiotropy for over 50 years389-392.

phenotypes, such as CHARGE syndrome or velocardiofacial syndrome, exhibit pleiotropy by definition393-396. Without identifying the actual genes at play, several family-based studies have

successfully used pairs or groups of traits to identify pleiotropic mechanisms underlying phenotypes such as metabolic traits and lipid biomarkers since the early 90s397-399. Correlated

traits have long been understood to have a shared genetic basis, and highly polygenic complex traits in particular have been understood to be good candidates for pleiotropy since the

development of quantitative trait linkage analysis methods400; 401.

Since the outset of population-based genomic analyses, multiple complex traits being associated with the same locus has been demonstrated for multiple seemingly distinct groups of phenotypes or diseases401-404. Over the past decade it has become clear that pleiotropy is

extremely common among complex traits that share biological underpinnings, including RBC traits—within five years of the first GWAS, nearly 5% of GWAS loci were reported to be associated with more than one trait400; 401; 403; 405. As a specific example, GWAS of autoimmune

disorders have shown that rheumatoid arthritis (RA), systemic lupus erythematosus (SLE), and Sjoegren’s Syndrome largely overlap in symptomatology, relevant biomarkers, and genetic-

association signatures14; 402; 406-411. These diseases typically do not present together in individuals

but they do commonly co-occur in families, suggesting that overlapping genetic architecture of these diseases includes both shared and disease-specific causal variants12; 14; 402; 403; 406; 412.

Therefore, it stands to reason that diseases like RA, SLE, and Sjoegren’s Syndrome may also share a genetic signature that could be identified by incorporating information from all of the traits into one analysis. In a gene-expression meta-analysis of RA, SLE, and Sjoegren’s Syndrome, Toro-Dominguez, et al, identified 371 genes that were significantly differentially expressed between individuals with one or more auto-immune diseases compared to controls12.

In a separate cross-phenotype analysis of immune-disease risk SNPs in seven auto-immune diseases, 47% of SNPs tested were associated with more than one but not all of the phenotypes tested14. Continuing to leverage correlation among traits is important in the genomics era:

identification of loci with shared effects across traits of interest as well as clarification of the molecular relationships between correlated traits could lead to new application of treatments that have been successful in diseases with shared genetic architecture413; 414.

Despite the long-standing awareness of pleiotropy in both monogenic and complex phenotypes, genomic studies of pleiotropy have been mostly limited to evaluating shared signals in traits that have been analyzed individually. Methods to evaluate correlated traits in

combination have only been developed within the last decade, and implementation of these methods requires a sufficient sample size, often deeply phenotyped data, and access to highly- powered computing clusters. Recently introduced methods have shown the value of analyzing multiple related traits together that are expected to share an underlying biology. Combined- phenotype analysis has been successful in identifying associations that were not significant in univariate analyses of any one trait for groups of phenotypes known to share biological pathways and overlap in physiologic mechanisms 415-417. Below we describe the mechanisms behind

pleiotropy as well as the benefits of co-examining traits that likely share genetic underpinnings.

3.8.1.1 Definition and description of biological pleiotropy

In order to describe why assessing pleiotropy improves on standard GWAS methodology, we must consider the mechanism(s) underlying an association signal shared by multiple

phenotypes418. As described by Solovieff, et al, in 2013, pleiotropy identified in a genetic

on biological, or “true”, pleiotropy. Descriptions in the literature vary widely with regard to whether the term “pleiotropy” refers to a single causal variant, a locus, or a gene. Here, we

define pleiotropy as the effect of one variant on multiple phenotypes (in this work, RBC traits), without making claims as to the relevant tissue type(s) or proposed function aside from that which can be implicated via bioinformatics analysis (Section 3.8.2.3.). The expectation of identifying "true" pleiotropic variants is that performing validation studies in a molecular setting would characterize one or more causal SNPs with a measurable effect on multiple associated traits. While we cannot assign functional pleiotropy to potential causal variants within the scope of this work, combined-phenotype analysis of correlated traits which are suspected to exhibit overlapping genetic architecture has been used as a proxy for pleiotropy with success419; 420. 3.8.1.2 Combined-phenotype studies

As described above, correlated traits often show a moderate to significant amount of overlap in associated genomic loci, suggesting a potential point of leverage for identifying loci associated with multiple correlated phenotypes but not previously reported for any of them individually. Correlation among a set of traits that also share some biological underpinnings makes such traits excellent candidates for pleiotropy402; 421; 422. Many groups of correlated

phenotypes are expected to share some underlying biology, but this potentially useful feature cannot be utilized in traditional GWAS. While pleiotropy is suspected to be the genomic feature at work behind many shared association signals across correlated traits, causality cannot be assigned directly in association studies. Such limitations have prompted use of the phrase “combined phenotype association” or “multi-phenotype association” in the absence of a systematic and comprehensive interrogation of pleiotropy.402

3.8.1.2.1 Combined-phenotype analysis using summary statistics

One area of active research is the combination of univariate GWAS summary statistics for combined-phenotype inference. Compared to individual-level data, analysis of summary statistics for multiple phenotypes benefits when population sizes differ across traits, or summary statistics for the traits of interest are publicly available (for instance, from dbGAP)345; 347; 419; 423.

Ours and others' work has demonstrated that these methods are scalable to densely imputed GWAS data; are accurate for common (MAF>0.05) as well as uncommon (0.01<MAF<0.05) variants; and have shown success identifying loci undetected by very large GWAS415.

Many combined-phenotype and multi-phenotype methods are currently available; we present a comprehensive evaluation of eight methods in Section 4.4.3, which indicates that the adaptive sum of powered score (aSPU) test is well-suited to the proposed work. In this

supportive work, we also performed combined-phenotype analysis of published body-mass index (BMI),364 waist-hip ratio,365 fasting glucose, and fasting insulin424 summary statistics (HapMap 2

imputed data) from consortia of predominantly EU participants (N range: 51,750 [fasting insulin] – 339,224 [BMI]) to interrogate potential pleiotropy425; 426. Six previously unreported lociwere

significantly associated (p < 5.0x10-8) with aggregated information for all four phenotypes, but

none of the traits in univariate analysis. These results show the utility of combined-phenotype analyses to identify novel loci undetected by meta-GWAS.

3.8.1.3 Leveraging RBC trait correlation for evidence of shared genetic associations

Identifying pleiotropic loci in the context of multiple traits can benefit basic-science and translational research, as a shared effect on more than one trait may help elucidate undescribed molecular pathways contributing to that trait, or in some cases even implicate shared treatment options for disease phenotypes401; 402; 427. Understanding how overall gene expression affects one

can also improve understanding of that gene’s biological role across phenotypes413; 418; 428; 429.

RBC traits exhibit a range of correlation, with several trait pairs being moderately or highly correlated. While trait correlation is not necessary for pleiotropy to exist, highly correlated traits can be an indicator of a common underlying mechanism, either genetic or environmental, with pleiotropy being a reasonable expectation347.

Of equal import, RBC trait GWAS have uncovered associations that consistently

replicate across multiple traits and diverse study populations17; 18; 26; 29; 301. Table 6 demonstrates

that while some signals associate strongly and repeatedly with only one or two RBC traits, several association signals have been found for multiple traits across study populations. In combination with the fact that visualization of results shows a distribution including suggestively significant association signals with strong tails in many published RBC trait GWAS, available evidence suggests that RBC traits exhibit pleiotropy and are therefore excellent candidates for a combined-phenotype approach28.

3.8.1.4 Benefits of combined-phenotype analysis

There are multiple benefits to analysis methods that leverage correlation across traits, beginning with statistical gains over traditional GWAS methods418; 430. The number of

suggestively significant associations in univariate GWAS—some of which are almost certainly true associations that the analysis is underpowered to detect—implies that increased power would improve the ability to identify new associations. Recently developed methods combining trait summary statistics demonstrate a cost-effective gain in statistical power compared to the alternative of analyzing hundreds of thousands of individuals for each trait301; 425; 430; 431.

Statistical power is increased for discovery when an association is present for multiple traits, without an accompanying negative effect on Type I error421; 432. This is particularly meaningful

for highly polygenic traits which likely have a large number of contributing causal variants with (primarily) small effect sizes that are influenced by lack of precision and relatively small study populations301; 433; 434.

Additionally, fine-mapping of known and previously unreported loci (described more fully in Section 3.8.2)to detect independent associations as well as identifying potential causal variants can be improved when analyzing traits together435. Recent work indicates that fine-

mapping of multiple correlated traits can significantly reduce the number of SNPs required to identify the 90% credible set compared to fine-mapping phenotype results individually.

Additionally, shared genetic architecture among traits with different units of measurement (i.e., variable effect sizes) can benefit from a combination of summary statistics which allows for difference in effect by trait (see Section 4.4.4.4.)347; 436. In combination, the benefits of

combined-phenotype analysis show great promise to detect and characterize loci underlying a shared genetic architecture.

3.8.1.5 Limitations of combined-phenotype analysis

Several limitations to combined-phenotype analysis do merit examination. Foremost, all of the above pleiotropic mechanisms can be implied but not confirmed by association studies; evidence of pleiotropy requires more extensive functional evaluation than can be ascertained from association studies alone. Additionally, fine mapping of combined-phenotype results cannot detect trait-specific associations within one association signal—fine-mapping methods only test one variant per independent association signal but allelic heterogeneity is common, particularly when multiple tissues may be involved in the physiology of the outcomes. If multiple variants contribute to one signal via different traits, narrowing down a list of causal variants may be difficult or inaccurate 419; 435; 437; 438.

GWAS of RBC traits have consistently demonstrated replication in some traits for association signals that are rarely or never found in other traits—for example, the functional variant rs855791 in TMPRSS6 strongly associates with multiple RBC traits but has never been reported for RBC count (Table 6). While our definition of pleiotropy and proposed combined- phenotype analysis methods emphasize the shared effect of a single variant within an association signal on correlated traits, the importance of assessing pleiotropy at the gene level cannot be overlooked, although it outside the scope of the proposed project. Genes expressed in multiple tissue types often exhibit tissue-specific isoforms, and causal variants in shared (or variably expressed) exons may lead to true pleiotropy even in diseases that present very differently428; 439- 442. Finally, some methods for combined analysis of summary statistics from multiple phenotypes

require extensive computing resources for simulations, preventing analysts from differentiating between signals that exceed our proposed GWS threshold of ~8x10-9. However, our continued

evaluation and refinement of available methods suggests several alternatives that are under investigation. Overall, none of these limitations preclude combined-phenotype analysis from being a useful tool to identify previously unreported genetic associations with RBC traits, and their benefits suggest that they will be useful for identifying associations which do not exceed GWS in univariate analyses.

3.8.1.6 Summary

The proposed work will leverage RBC trait correlation in a diverse study population and established combined-phenotype analysis methods to attempt to identify genomic loci that previous studies were underpowered to detect. We hypothesize that the improvement in power gained by evaluating traits with a shared genetic architecture will enable the identification of genetic loci associated with multiple RBC traits, which is beneficial not only for discovery

purposes but also for contextualizing the fine-mapping results of combined-phenotype genome- wide-significant loci.

In document Hodonsky_unc_0153D_18920.pdf (Page 76-84)