Introduction - 6017.pdf

Over the last several years, genome-wide association studies (GWAS) have mapped thousands of genetic variants associated with complex diseases and disease-related quantitative traits (57, 56). However, to date most mapped variants contribute only a modest proportion of the underlying heritability for most traits (102, 99, 40), suggest- ing that many additional trait loci are still to be discovered. A general hypothesis for why many trait loci have yet to be identified is that the standard GWAS paradigm for gene mapping of complex traits, which employs individual testing for the main effects of individual common single-nucleotide polymorphisms (SNPs) across the autosomes, may be too na¨ıve for reliable gene identification. One particular limitation of current methodology lies in the identification of gene-gene interactions, or epistasis. Epistatic relationships (29) may explain some of the missing heritability (173) and can account for replication failures of initial GWAS findings(60, 49).

Despite some success in identification of gene-gene interactions for Crohn’s disease (41), autism (93), and multiple sclerosis (16), powerful analysis of epistatic effects con- tinues to pose a major challenge for the field. Currently, standard examination of epistatic effects involves using a regression model accounting for the main effects of the

genotypes of two single nucleotide polymorphisms (SNPs), and also the 2-way interaction between the two SNPs. One can then test whether the regression coefficient for the interaction is different from zero. Such analyses are often applied to candidate SNPs, e.g. those lying within particular pathways or with significant main effects, and can also be applied exhaustively, though the computation quickly becomes prohibitively ex- pensive and the threshold for statistical significance, extremely stringent. In addition, this form of analysis suffers from reduced power when the causal SNPs are not genotyped (124) and also fails to accommodate more complex interaction effects involving multiple pairs of SNPs or higher order interactions.

Many of the problems related to analyzing epistatic effects on the SNP-by-SNP scale are analagous to the difficulties faced by using individual SNP analysis, wherein each SNP is examined one-by-one, for testing the main effects of SNPs. SNP-set testing is a common way of bypassing many of the limitations associated with individual feature analysis within the context of studying main SNP effects. SNP-set analysis focuses on simultaneous analysis of multiple SNPs, which are grouped based on prior knowledge into a SNP-set, and assessing the cumulative effect of SNPs in the SNP set on a trait. Common grouping structures include grouping all features associated with a gene (e.g. all SNPs within the gene), within a pathway, or within some functional group. SNP- set analysis allows for reduced multiple comparison burden and facilitates detection of multi-SNP effects. At the same time, it allows for improved gene mapping when the causal SNP is untyped but in modest linkage disequilibrium (LD) with multiple genotyped SNPs. Although SNP-set analysis has been found to be a powerful approach for analysis of main SNP effects, relatively little is available for interrogation of epistasis within the context of multi-SNP analysis.

In this paper, we propose the SNP-set kernel interaction test (SKIT), a novel approach for epistasis testing using SNP sets in quantitative trait studies that considers

an interaction on the scale of multiple SNPs, rather than on the scale of a single SNP, by testing for interactions across two different SNP sets, e.g. comprised of SNPs from two different genes. Specifically, we will use the powerful kernel machine regression (KMR) framework which has become a popular method for main effect testing of common (71, 161, 83, 101) and rare variants (165, 74) and been successfully applied to many studies (86, 92, 98, 142, 95). Under the KMR framework, pair-wise similarity between individuals in terms of genotype, measured through a kernel, is compared to pair-wise similarity in phenotype with correlation indicative of association. SKIT extends this to allow for testing of interactions by comparing pair-wise similarity in the interaction effects to pair-wise similarity in the phenotype while adjusting for main effects.

Similar to the main effects testing setting, SNP-set modeling of interactions has several advantages over single SNP-based modeling. Modeling multiple SNPs in a gene together should tag untyped causal variants better than individual SNPs, but our procedure will have fewer degrees of freedom than an analogous haplotype test, and thus be more powerful. Furthermore, assuming a commercial panel of 1 million SNPs that help tag ∼22,000 genes, exhaustive pairwise gene-based interaction testing using kernels should potentially be more powerful (owing to a less stringent multiple-testing adjustment) and more computationally efficient than exhaustive SNP-based interaction testing, since the former requires approximately 2000-fold fewer tests than the latter for analysis. In addition to the gene-based nature of the interaction test, KMR methods have many other attractive features for practical analysis that other methods lack. For instance, KMR allows for covariates, adjusts for LD among variants, and allows for rapid, analytic p-value computation.

The remainder of this article is organized as follows. In the next section, we describe the SNP-set kernel interaction test, first under a simple linear model and then under the general KMR framework. We then present simulation results comparing the SKIT

to alternative SNP-set based tests of epistasis as well as an application to a genetic association study of birth weight. We conclude with a brief discussion.

In document 6017.pdf (Page 68-71)