• No results found

Estimating the genetic variation explained by common SNPs for

Chapter 2 Analysis of two breast cancer GWAS and the COGS

2.1 Introduction

2.1.2 Estimating the genetic variation explained by common SNPs for

for polygenic traits

Since the first GWAS, many traits and diseases have been shown to have a polygenic basis, making it difficult to establish all associated genetic variants. With there being many variants that affect disease risk, it has been hard to discover them all. It has not helped that GWAS are expensive to conduct in a large number of individuals, and that many of the studies conducted to date have not been large enough to detect many of the associated genetic variants with small effect sizes. However, researchers have developed methods to estimate the proportion of phenotypic variance that can be explained by genotyped SNPs, without having to first discover the genetic variants associated with the trait (81). These estimates are known as chip, or SNP, heritability estimates, and are particularly useful as they enable the potential a genotyping array has in explaining the heritability of a trait, to be evaluated. Estimates can be produced using unrelated individuals, which is an advantage as it is easier to collect a larger number of unrelated individuals, than it is related individuals. Also, using unrelated individuals reduces the risk of shared environments inflating the chip heritability

estimate (60). The estimates also allow researchers to assess whether a certain group of variants, for example SNPs mapping to a specific chromosome, explain more phenotypic variation than other groups. This type of analysis is known as genome partitioning, and will not be discussed further in this chapter, but will instead will discussed in chapter 3.

For many polygenic traits and diseases, chip heritability estimates have been produced and have shown that a fairly large proportion of the variation in a trait, can be explained by genotyped SNPs not yet reaching genome-wide significance. An early study

conducted by Yang et al.(82) showed that a large proportion of the heritability for human height could be explained by common SNPs using when using the genomic- relatedness based restricted maximum-likelihood (GREML) (83), implemented as part

53

of the genome-wide complex trait analysis software (GCTA). They estimated that 45% (se = 8%) of the phenotypic variance for height could be explained by 294,831 SNPs, genotyped in 3,925 individuals of European descent. This estimate was much larger than the estimated 5% explained by the combination of genome-wide significant SNPs, published before the analysis was conducted, and suggested that over 50% of the heritability for height could be explained by common SNPs. Otowa et al.(84) used both GREML and another estimation method, LD score regression (LDSC) (85), to produce chip heritability estimates for anxiety disorder. LDSC is a method that uses summary data to produce chip heritability estimates. In this study, anxiety disorder was defined by five phenotypes, these being; generalized anxiety disorder, panic disorder, phobias, social phobia, agoraphobia, and specific phobias. Based on 3,695 European

individuals from the Rotterdam Study Cohort, Otowa et al. estimated that 13.8% (se = 18%) of the variation in liability to anxiety disorder could be explained by genotyped SNPs, when using GREML. An LDSC chip heritability estimate was produced using summary statistics, based on a meta-analysis of over 18,000 individuals and 995,869 SNPs, across nine cohorts. Using LDSC, they estimated that 9.5% (se = 3.7%) of the variation in liability to anxiety disorder could be explained by genotyped SNPs. These results showed that approximately a third of the genetic variation in anxiety disorders could be explained by common SNPs.

Chip heritability estimates, on the unobserved liability scale, have also been produced for breast cancer. Lu et al.(86) have produced a chip heritability estimate based on 489,247 genotyped SNPs, in 1,081 breast cancer cases and 1,085 controls. Using GREML, it was estimated that 13% (95% CI:[0%-56%]) of the variation in liability to breast cancer could be explained by genotyped SNPs. Assuming that the heritability of breast cancer on the unobserved liability scale is 44%, then approximately 30% of the genetic variation in liability to breast cancer could be explained by these genotyped SNPs. However, the 95% CI for the estimate was quite wide which, with only ~2,000

54

individuals being used in the analysis to produce the estimate, would have been due to the limited sample size. Therefore, the estimate could in fact be much larger than 13%. Sampson et al.(87) have also produced a chip heritability estimate for breast cancer, but instead they have focussed on ER-negative breast cancer. The estimate they produced was based on GWAS SNPs, genotyped in 1,998 ER-negative breast cancer cases and 3,263 controls. Using GREML, they estimated that 9.6% (95% CI: [0%- 19.9%]) of the variation in liability to ER-negative breast cancer, could be explained by genotyped SNPs.

The non-breast cancer studies mentioned, are only a very small sample of the chip heritability studies that have been conducted to date. Studies in general have shown that a relatively large proportion of variation for a trait can be explained by currently genotyped SNPs, compared to the proportion of phenotypic variation that can be explained by SNPs reaching genome-wide significance. This common finding indicates that much of the missing heritability for many phenotypes, may be explained by SNPs not yet reaching genome-wide significance.

55