• No results found

Quantifying harmful mutations in human populations

N/A
N/A
Protected

Academic year: 2021

Share "Quantifying harmful mutations in human populations"

Copied!
13
0
0

Loading.... (view fulltext now)

Full text

(1)

1

Short Report March 9, 2011

Quantifying harmful mutations in human populations

Sankar Subramanian

Environmental Futures Centre and Australian Rivers Institute, Griffith University, 170 Kessels Road, Nathan Qld 4111, Australia

Running head: Quantifying deleterious polymorphisms

Keywords: human evolution, deleterious mutations, disease-associated mutations, genome-wide association, rare and common variants, SNP and population genetics theory

Address for correspondence: Dr. Sankar Subramanian Environmental Futures Centre School of Environment Griffith University 170 Kessels Road Nathan QLD 4111 Australia Phone: +61-7-3735 7495 Fax: +61-7-3735 7459 E-mail: [email protected]

(2)

2 ABSTRACT

1

A number of previous studies suggested the presence of deleterious amino acid altering 2

nonsynonymous single nucleotide polymorphisms (nSNPs) in human populations. 3

However, the proportions of deleterious nSNPs among rare and common variants are not 4

known. To estimate these, >77,000 SNPs from human protein-coding genes were analyzed. 5

Based on two independent methods this study reveals that up to 53% of rare nSNPs (Minor 6

Allele Frequency <0.002) could be deleterious in nature. The fraction of deleterious nSNPs 7

declines with the increase in their allele frequencies and only 12% of the common nSNPs 8

(MAF >0.4) were found to be harmful. This shows that even at high frequencies significant 9

fractions of deleterious polymorphisms are present in human populations. These results 10

could be useful for genome-wide association studies in understanding the relative 11

contributions of rare and common variants in causing human genetic diseases. 12 13 14 15 16 17 18 19 20 21 22 23

(3)

3 INTRODUCTION

24

Although harmful mutations affect fitness of an organism they are nevertheless present in human 25

populations and contribute to the diversity due to random genetic drift1. However natural 26

selection eliminates such deleterious mutations over time and thus they are prevented from 27

reaching high frequencies. Therefore low frequency single nucleotide polymorphisms (SNPs) 28

typically comprise of deleterious as well as neutral polymorphisms, whereas high frequency 29

SNPs are largely neutral in nature. Since amino acid changing SNPs might be detrimental to 30

proper protein function, a significant proportion of them could be harmful. A number of 31

previous studies have shown an enrichment of low frequency nSNPs compared to those with 32

high frequencies2-5, which indirectly suggests that these nSNPs are deleterious and removed over 33

time by natural selection. However the fraction of deleterious nSNPs with respect to their allele 34

frequencies is unclear. In other words the proportion of deleterious nSNPs among low (or high) 35

frequency variants has not been quantified. To estimate this, the present investigation has 36

gathered over 77, 000 SNPs from human protein-coding genes and grouped them based on their 37

minor allele frequencies. Two independent methods were used to estimate the proportion of 38

deleterious nSNPs and the frequency distribution of these harmful nSNPs was examined. 39

40

MATERIALS AND METHODS 41

42

SNP Data 43

First, SNPs of all human protein coding genes (dbSNP build130) were obtained from the UCSC 44

genome resource (http://genome.ucsc.edu/). Then using the rsIDs of SNPs, their corresponding 45

minor allele frequencies were obtained from the dbSNP database 46

(4)

4

(http://www.ncbi.nlm.nih.gov/projects/SNP/). For consistency, only the SNPs and their allele 47

frequencies reported by the 1000 genome project2 (1000 Genome phase 1 – May 2011 data 48

release) were used for further analysis. This final dataset consisted of 37,123 nSNPs and 40,599 49

synonymous SNPs (sSNPs). These SNPs were grouped in to ten categories based on their minor 50

allele frequencies and the proportion of deleterious nSNPs was computed for each category as 51

described below. 52

53

Estimation of the deleterious proportion of nSNPs 54

McDonald and Kreitman6 showed that under neutral evolution the ratio of nonsynonymous 55

(Pn) to synonymous (Ps) polymorphisms (Pn/Ps) within species is expected to be equal to the 56

ratio of nonsynonymous (Dn) to synonymous SNPs (Ds) substitutions between species i.e. 57 s n s n D D P P = . 58

However it is clear from table 1 that the ratios of SNPs are always higher than that of the 59

substitutions between human-chimp .i.e. 60 s n s n D D P P > . 61

This is due to the presence of deleterious nSNPs in the human populations as predicted by 62

previous theoretical studies 1,7. Hence to subtract the fraction of deleterious nSNPs (

δ

) the 63

equation could be written as 64 s n s n n D D P P P = −δ . 65

This equation could be simplified to estimate the fraction of deleterious nSNPs (

δ

) as: 66

(5)

5 n s s n P D P D − =1

δ

. 67

The measure

δ

is the fraction of deleterious nSNPs that are segregating in the population8. The 68

numbers of nonsynonymous (Dn= 47079)and synonymous (Ds= 71956)substitutions (based on 69

13,454 orthologous human-chimpanzee protein coding genes) were obtained from a previous 70

study9. To obtain the standard error, a bootstrap procedure was used by resampling the SNPs 71

(1000 replications). 72

73

Quantification of the fraction of damaging nSNPs 74

To determine the deleterious nature of each nSNP, the online software tool Polyphen-2 75

(http://genetics.bwh.harvard.edu/pph2/bgi.shtml) was used10. Using protein secondary 76

structures, functional motifs, and relative conservation of each amino acid in the protein the 77

above program predicts the possible impact of an amino acid replacement polymorphism on the 78

structure and/or function of a human protein. For each nSNP, this program predicted whether 79

the given type of amino acid change is benign, possibly damaging or probably damaging. The 80

fraction of damaging nSNPs (

ρ

) was computed by adding the counts of possibly and probably 81

damaging nSNPs and dividing this by the total nSNP count. The binomial variance was used to 82

estimate the standard error. 83

84

RESULTS 85

Since some of the amino acid polymorphisms are deleterious, selection prevent such nSNPs from 86

spreading in a population. Therefore nSNPs are expected to be more abundant at low 87

frequencies than at high frequencies. In contrast, all sSNPs are largely neutral and hence they 88

(6)

6

are likely to be present in equal proportions at low and high frequencies. Therefore sSNPs could 89

be used as a normalizing factor and thus the ratio of nSNPsto sSNPs(Pn/Ps) will reflect the 90

excess fraction of nSNPs. Table 1 shows that this ratio has a negative relationship with the 91

minor allele frequencies of SNPs. This ratio is roughly two times (1.4 Vs 0.74) higher for the 92

SNPs with a minor allele frequency (MAF) of <0.002 compared to those with a MAF > 0.4. It 93

should be noted that the discovery of very low frequency variants (MAF < 0.002) might be error 94

prone as the observed number of minor alleles was small (< 4)2. However the method used to 95

estimate the fraction of deleterious SNPs is based on the ratio of nonsynonymous and 96

synonymous SNPs. Hence this estimate will not be significantly affected as the error rate is 97

expected to be fairly the same for both types of SNPs. 98

99

The ratio of nonsynonymous to synonymous substitutions (Dn/Ds) estimated for the human-100

chimp pair (0.65) is significantly smaller than all Pn/Ps ratios (G test, P < 0.0001). This suggests 101

an overabundance of nSNPs with respect to the nonsynonymous substitutions and this excess 102

fraction of nSNPs is deleterious as they were prevented from becoming fixed (see Rand and 103

Kann11). This deleterious fraction was estimated as described in the methods. Clearly the 104

deleterious proportion of nSNPs (

δ

) shows a negative relationship with minor allele frequencies 105

(Figure 1A). Deleterious nSNPs constitute as high as 53% of the nSNPs with a MAF <0.002, 106

whereas for common nSNPs (MAF =0.4-0.5) the deleterious fraction is only 12%. 107

108

I also used an independent method to quantify the fraction of deleterious nSNPs using the online 109

software tool Polyphen-210. This program determines the deleterious nature of amino acid 110

changing nSNPs based on their effect on protein structure and/or function and based on their 111

(7)

7

location in the protein. Using this software the fraction of damaging nSNPs (

ρ

)was estimated as 112

explained in the methods. Interestingly the relationship between

ρ

and MAF shown in Figure 1B 113

is very similar to that observed for

δ

and MAF (Figure 1A). The estimate

ρ

obtained for low 114

frequency nSNPs (MAF <0.002) was 2.6 times higher than that estimated for high frequency 115

nSNPs with a MAF=0.4-0.5 (0.34 Vs 0.13). Here the estimate

ρ

includes the nSNPs that are 116

predicted by Polyphen-2 as ‘possibly damaging’ and ‘probably damaging’ with probabilities of 117

>50% and >95% respectively to disrupt the structure and/or function of a protein. However 118

using only ‘probably damaging’ nSNPs also produced a negative relationship with similar 119

magnitude and the

ρ

of low frequency nSNPs was 3 times higher than that of high frequency 120 SNPs (0.19 Vs 0.06). 121 122 DISCUSSION 123

Based on two independent methods this study estimated the proportion of deleterious amino acid 124

variants in human populations. The first method showed a much higher fraction of deleterious 125

nSNPs among the rare variants (MAF < 0.002) compared to the second method (53% Vs 34%; 126

Table 1). Since the second method (using Polyphen-2) depends on the relevant information 127

available for a protein (to predict the deleterious nature of an SNP), this method is rather 128

subjective. More detailed information about proteins in the future might result in redefining 129

some of the harmless nSNPs to harmful ones. In contrast the first method is based on a ratio, 130

which is objective and not depended on the availability of protein specific information. 131

132

The high fraction of deleterious nSNPs reported for the low frequency nSNPs suggests that rare 133

variants are more likely to be associated with diseases than common variants5,12. On the other 134

(8)

8

hand the results also showed that a significant fraction of high frequency nSNPs could be 135

deleterious in nature. This suggests a likely association of some of the common variants to 136

human genetic diseases13. The deleterious fraction of nSNPs reported here could be an 137

underestimate of deleterious mutations in humans as it does not include lethal or strongly 138

deleterious mutations. On the other hand, these estimates might include false positive SNPs due 139

to sequencing errors 14. 140

141

The present study has estimated the proportion of deleterious SNPs (

δ

) only for protein-coding 142

regions. However, the same formula could be used to estimate

δ

for SNPs in constrained 143

noncoding regions such as UTRs, promotors, enhancers and silencers. For such a calculation, Pn 144

and Dn are the number of SNPs and substitutions observed in the noncoding region (eg. 145

promotor) and Ps and Ds are the number of SNPs and substitutions in synonymous positions or 146

intron(s). The findings of this study might have implications in genome-wide association studies 147

in understanding the respective contributions of rare as well as common variants to human 148 diseases. 149 150 CONFLICT OF INTERST 151 152

The author declares no conflict of interest. 153

154

ACKNOWLEDGEMENTS 155

The author is grateful to David Lambert and thanks Leon Huynen and two anonymous reviewers 156

for valuable comments. 157

(9)

9 REFERENCES

159 160 161

1. Kimura M: The neutral theory of molecular evolution, Cambridge: Cambridge University 162

press, 1983. 163

164

2. 1000 Genomes Project Consortium: A map of human genome variation from population-165

scale sequencing. Nature 2010; 467: 1061-1073. 166

167

3. Cargill M, Altshuler D, Ireland J et al: Characterization of single-nucleotide 168

polymorphisms in coding regions of human genes. Nat Genet 1999; 22: 231-238. 169

170

4. Frazer KA, Ballinger DG, Cox DR et al: A second generation human haplotype map of 171

over 3.1 million SNPs. Nature 2007; 449: 851-861. 172

173

5. Zhu Q, Ge D, Maia JM et al: A genome-wide comparison of the functional properties of 174

rare and common genetic variants in humans. Am J Hum Genet 2011; 88: 458-468. 175

176

6. McDonald JH, Kreitman M: Adaptive protein evolution at the Adh locus in Drosophila. 177

Nature 1991; 351: 652-654. 178

179

7. Kryazhimskiy S, Plotkin JB: The Population Genetics of dN/dS. Plos Genetics 2008; 4: 180

e1000304. 181

8. Subramanian S: High proportions of deleterious polymorphisms in constrained human 182

genes. Mol Biol Evol 2011; 28: 49-52. 183

184

9. Mikkelsen TS, Hillier LW, Eichler EE et al: Initial sequence of the chimpanzee genome 185

and comparison with the human genome. Nature 2005; 437: 69-87. 186

187

10. Adzhubei IA, Schmidt S, Peshkin L et al: A method and server for predicting damaging 188

missense mutations. Nat Methods 2010; 7: 248-249. 189

190

11. Rand DM, Kann LM: Excess amino acid polymorphism in mitochondrial DNA: contrasts 191

among genes from Drosophila, mice, and humans. Mol Biol Evol 1996; 13: 735-748. 192

193

12. Pritchard JK: Are rare variants responsible for susceptibility to complex diseases? Am J 194

Hum Genet 2001; 69: 124-137. 195

196

13. Reich DE, Lander ES: On the allelic spectrum of human disease. Trends Genet 2001; 17: 197

502-510. 198

199

14. MacArthur DG, Tyler-Smith C: Loss-of-function variants in the genomes of healthy 200

humans. Hum Mol Genet; 19: R125-130. 201

(10)

10 203

(11)

11 Figure Legends

Figure 1. (A) Deleterious fraction of nSNPs (

δ

) estimated for variants with different minor allele frequencies (MAF). SNPs were grouped into 10 categories based on their MAF and only the upper values of the ranges are shown (X-axis). Error bars denote the standard error, which was estimated using a bootstrap procedure (1000 replications). (B) Theproportion of damaging nSNPs (ρ) was estimated using amino acid variants belonging to 10 MAF categories. Error bars indicate the standard error computed using the binomial variance.

(12)

12

Table1. Human polymorphisms, substitutions (between human-chimp) and deleterious fractions of nSNPs Minor Allele Frequency (%) Nonsynonymous SNPs (Pn) Synonymous SNPs (Ps) Pn/Ps

δ

(SE) Damaging SNPs Benign SNPs

ρ

(SE) <0.2 1290 919 1.40 0.534 (0.021) 375 721 0.342 (0.014) 0.2-0.5 2618 2156 1.21 0.461 (0.015) 773 1466 0.345 (0.010) 0.5-1 3633 3330 1.09 0.400 (0.014) 938 2181 0.301 (0.008) 1-2 4819 4696 1.03 0.362 (0.013) 1148 2942 0.281 (0.007) 2-5 6915 7516 0.92 0.289 (0.011) 1461 4406 0.249 (0.006) 5-10 5335 6051 0.88 0.258 (0.014) 992 3422 0.225 (0.006) 10-20 4965 5978 0.831 0.212 (0.016) 716 3364 0.175 (0.006) 20-30 3061 3914 0.782 0.163 (0.020) 360 2148 0.144 (0.007) 30-40 2362 3173 0.744 0.121 (0.024) 222 1723 0.114 (0.007) 40-50 2125 2866 0.741 0.118 (0.025) 236 1530 0.134 (0.008) Human-Chimp1 47079 (D n) 71956 (Ds) 0.654 - - -

(13)

References

Related documents

H-V Epipolar Geometry Calibration and Slope Extraction for Advance Depth Camera to Estimate Target Distance..

Nair (2007) observes pointing out some of the problems resulting from lack of cohesive national policy for addressing refugee flows, ‘…it limits the ability of

concluded that the United States should &#34;review and reconsider Torres's conviction and sentence in light of the consequences of the treaty violation.&#34; '

The Mortgage to Rent scheme, which is being piloted by Clúid with a number of lenders and the Department of the Environment, Community and Local Government, involves people

ligands based upon residue preference will help in better understanding of various ligand interactions. A web-based method named LPIcom is also developed for identification of

A lots of insoluble drugs (hydrophobic)will require a carrier for its secure delivery to targeted organs, and the lower the critical micelle concentration- cmc,