• No results found

Population Genetics 1

N/A
N/A
Protected

Academic year: 2021

Share "Population Genetics 1"

Copied!
119
0
0

Loading.... (view fulltext now)

Full text

(1)

Population Genetics

(2)

Introduction

 At the time Alfred Russel Wallace and Charles Darwin first identified natural selection as the mechanism of adaptive evolution in the mid-nineteenth century, there was no accurate model of the mechanisms responsible for variation and inheritance.

 Gregor Mendel published his work on the inheritance of traits in 1866, but it received little notice at the time.

 The rediscovery of Mendel’s work in 1900 began a 30-year effort to reconcile Mendel’s concept of genes and alleles with the theory of evolution.

(3)

 In a key insight, biologists realized that the frequency of phenotypic traits in a population is linked to the relative abundance of the alleles influencing those traits.

 This insight laid the foundation for population genetics— the study of genetic variation in populations and how it changes over time.

 Population geneticists investigate patterns of genetic variation within and among groups of interbreeding individuals.

 Because changes in genetic structure form the basis for evolution of a population, population genetics has become an important subdiscipline of evolutionary biology.

(4)

 Early in the twentieth century a number of workers, including G. Udny Yule, William Castle, Godfrey Hardy, and Wilhelm Weinberg, formulated the basic principles of

population genetics.

 Theoretical geneticists Sewall Wright, Ronald Fisher, and J.B.S. Haldane developed

mathematical models to describe the genetic structure of a population.

 More recently, field researchers have been able to test these models using

biochemical and molecular techniques that measure variation directly at the protein and DNA levels.

 These experiments examine allele frequencies and the forces that act on them, such as selection, mutation, migration, and genetic drift.

(5)

 With advances in the application of molecular techniques to studies of the human genome, the genetic composition of populations can be examined in detail.

 Different mutations in the same gene are known to cause diseases such as cystic fibrosis, Tay-Sachs, phenylketonuria, hemophilia A, and familial hypercholesterolemia.  The frequencies of these disease alleles often differ among populations.

 Normal variation within and among populations is also a result of differences in allele frequencies.

 The principles of population genetics attempt to explain the genetic diversity in

present populations and the changes in allele and genotype frequencies over time.

(6)

 Population genetic studies may provide insight into the value of carrier screening programs

and the effect of medical intervention on the population frequency of a disease.

 Allele and genotype frequencies depend on factors such as mating patterns, population size and distribution, mutation, migration, and selection.

 By making specific assumptions about these factors, the Hardy-Weinberg law, a fundamental principle of population genetics, provides a model for calculating genotype frequencies from allele frequencies for a random mating population in equilibrium.

(7)

Allele Frequencies in Population Gene Pools

Vary in Space and Time

A population is a group of individuals who share a common set of genes, live in the same geographic area, and actually or potentially interbreed.

All the alleles shared by these individuals constitute the gene pool for the population. Often when we examine a single genetic locus in a population, we find that

combinations of the alleles at this locus result in individuals with different genotypes. In studying a population, geneticists face three important tasks:

1. computing the frequencies of various alleles in the gene pool, 2. the frequencies of different genotypes in the population,

3. and the changes in frequency that occur from one generation to the next.

(8)

Population geneticists use these calculations to address questions such as: How much genetic variation is present in a population?

Are genotypes randomly distributed in time and space, or is there a pattern in their distribution?

What processes affect the composition of a population’s gene pool? Do these processes produce genetic divergence among populations?

(9)

Populations are dynamic; they expand and contract through changes in birth and

death rates, migration, or contact with other populations.

Often, some individuals within a population will produce more offspring than others, contributing a disproportionate amount of their alleles to the next generation.

Thus the dynamic nature of a population can, over time, lead to changes in the population’s gene pool.

(10)

Hardy-Weinberg Law

The allele frequencies at a locus can always be calculated from the genotype frequencies, but the converse is not necessarily true.

The Hardy-Weinberg law states that for a single autosomal locus in a large population in which 1) mating takes place at random with respect to genotype,

2) allele frequencies are the same in males and females,

3) mutation, selection, and migration are negligible, genotype frequencies can be calculated from allele frequencies after one generation regardless of the allele and genotype frequencies in the initial population.

(11)

This is not true for a single X-linked locus or for any set of loci considered jointly; for these loci, the establishment of this relationship between allele and genotype

frequencies takes more than one generation.

The theoretical relationship between the relative proportions of alleles in the gene pool and the frequencies of different genotypes in the population was elegantly described in the early 1900s in a mathematical model developed independently by the British

mathematician Godfrey H. Hardy and the German physician Wilhelm Weinberg.

This model, called the Hardy-Weinberg law, describes what happens to alleles and genotypes in an “ideal” population, meaning one that is infinitely large, is not subject to any evolutionary forces such as mutation, migration, or selection, and in which mates are selected randomly.

(12)

Under these conditions the Hardy-Weinberg model makes two predictions: 1. The frequencies of the alleles in the gene pool do not change over time.

2. If a locus may be occupied by either of two alleles, A or a, then after one generation of random mating, the frequencies of the resulting genotypes AA, Aa, and aa in the population can be represented by p2 + 2pq + q2 = 1

(13)

Calculating genotype frequencies from allele frequencies. Gametes represent withdrawals from the gene pool to form the genotypes of the next generation. In this population, the frequency of the A allele, or fr(A), is 0.7, and the frequency of the a allele, or fr(a), is 0.3. The frequencies of the genotypes in the next

generation are calculated as 0.49 for AA, 0.42 for Aa, and 0.09 for aa. Under the Hardy-Weinberg law, the frequencies of A and a remain constant from generation to generation.

(14)

The general description of allele and genotype frequencies under Hardy-Weinberg assumptions. The

frequency of allele A is p, and the frequency of allele a is q. Random mating produces the three genotypes AA, Aa, and aa in the frequencies p2, 2pq, and q2, respectively.

(15)

These calculations demonstrate the two main predictions of the Hardy-Weinberg law:

allele frequencies in our hypothetical population do not change from one generation to the next, and genotype frequencies after one generation of random mating can be predicted from the allele frequencies.

In other words, this population does not change or evolve with respect to the locus we have examined.

Remember, however, the assumptions about the ideal population described by the Hardy-Weinberg model:

1. Individuals of all genotypes have equal rates of survival and equal reproductive success—that is, there is no selection.

2. No new alleles are created and no old alleles are lost by mutation.

(16)

3. Individuals do not migrate into or out of the population.

4. The population is infinitely large, which in practical terms means it is large enough that sampling errors and other random effects are negligible.

(17)

Allelic variation in the CCR5 gene. Michel Samson and colleagues used PCR to amplify a part of the CCR5 gene containing the site of the 32-bp deletion, cut the resulting DNA with a restriction enzyme, and ran the fragments on an electrophoresis gel. Each lane reveals the genotype of a single individual. The 1 allele

produces a 332-bp fragment and a 403-bp fragment; the Δ32 allele produces a 332-bp fragment and a 371-bp fragment. Heterozygotes produce three bands.

(18)
(19)

The frequency (percentage) of the CCR5-Δ32 allele in 18 European populations.

(20)

1.

AUTOSOMAL LOCUS

Consider a locus with two alleles, A

1

and A

2

, and suppose the population

frequencies of the three genotypes,

A

1

A

1

, A

1

A

2

, A

2

A

2

are p

11

, p

12

, p

22

,

respectively, where p

11

+ p

12

+ p

22

= 1.

Then, in this initial population, the

frequency of A

1

is p

11

+ ½ p

12

and the

frequency of A

2

is p

22

+ ½ p

12

. Random

mating is approximately equivalent to

random union of gametes. Thus,

random mating within this initial

population results in the following

genotype frequencies in the next

generation:

The genotype frequencies in

this second generation may be

different from those in the

first generation. However,

calculation of the allele

frequencies from the

genotype frequencies in the

second generation gives:

(21)

Similarly, the frequency of A2 is p22 + ½ p12, which is equal to 1 - (p11 + ½ p12). These allele frequencies are identical to those in the first generation.

In other words, if the allele frequencies are p = p11 + ½ p12 and q = 1 - p = p22 + ½ p12, then after one generation of random mating, the genotype frequencies are p2, 2pq, and q2.

These frequencies are the Hardy-Weinberg proportions, and the population is said to be in Hardy-Weinberg equilibrium.

(22)

Establishment of Equilibrium in One Generation for an Autosomal Locus

Offspring Genotype Frequencies Mating Frequency A1A1 A1A2 A2A2 A1A1× A1A1 (.1)2 (.1)2 0 0 A1A1× A1A2 (.1)(.2) (.1)(.2) (.1)(.2) 0 A1A1× A2A2 (.1)(.7) 0 2(.1)(.7) 0 A1A2× A1A2 (.2)2 ¼(.2)2 ½(.2)2 ½(.2)2 A1A2× A2A2 2(.2)(.7) 0 (.2)(.7) (.2)(.7) A2A2× A2A2 (.7)2 0 0 (.7)2

(23)

Table presents a numerical example, in which the initial population comprises 20, 40, and 140 individuals with genotypes A1A1, A1A2, and A2A2, respectively.

The genotype frequencies are:

The allele frequencies are:

Random union of gametes results in the following genotype frequencies in the next generation:

(24)

Note that these genotype frequencies are different from those in the initial population. To confirm that these results are correct, Table shows the genotype frequencies in the offspring that result from each of the six possible mating types.

For example, all the offspring of the mating, A1A1 × A1A1 must be A1A1, while for the mating type, A1A1 × A1A2, each of the two offspring genotypes, A1A1 and A1A2, has a probability of one half.

Summing the columns in Table gives the frequencies of each of the genotypes in the second generation.

These frequencies are the same as those obtained by random union of gametes, and the allele frequencies calculated from these genotype frequencies are:

Repeating these steps will give identical genotype and allele frequencies in the third generation to those in the second generation.

(25)

The chi-square goodness-of-fit test may be used to determine whether the observed numbers of each genotype are significantly different from those expected under Hardy-Weinberg equilibrium.

The total number of individuals is 200, so the expected numbers for the three genotypes are (.2)2200 = 8, 2(.2)(.8)200 = 64, and (.8)2200 = 128, compared with the observed numbers of 20,

40, and 140. The test value is (20-8)2/8 + (40-64)2/64 + (140-128)2/128 = 28.

This value is compared to the chi-square distribution with one degree of freedom.

(There are three classes, but the total number of individuals is known and the allele frequencies are known.Thus, there is only one independent class and one degree of freedom.)

In general, the number of degrees of freedom is equal to the number of genotypes minus the number of alleles.

(26)

The 99.9th percentile of the chi-square distribution with one degree of freedom is 10.83. Thus, the observed numbers of each genotype in the initial population are significantly different at the 1% level from those expected under Hardy-Weinberg equilibrium.

However, after one generation of random mating, the observed and expected numbers are the same.

Calculation of allele frequencies from genotype frequencies is straightforward when all three genotypes are observable, but, in the case of recessive diseases, such as cystic fibrosis, only two phenotype classes are observed.

However, if equilibrium is assumed, the frequency of affected individuals is q2; thus, the

square root of this frequency is the frequency of the disease allele.

(27)

For example, in populations of European ancestry, the frequency of cystic fibrosis is estimated to be 1/2000; thus, the frequency of the abnormal allele is .022 and the

frequency of the normal allele is .978.

The frequency of heterozygotes is therefore 2 × .022 × .978, which is about 1/23. That is, approximately 4% of the population are carriers, but less than .1% are affected.

Several different mutations have been described in the cystic fibrosis gene.

Each one of these is an abnormal allele; thus, the frequency of .022 is actually the sum of the frequencies of all the abnormal alleles at the cystic fibrosis locus.

The Hardy-Weinberg principle may be extended to more than two alleles.

(28)

In general, for n alleles, A1, A2,…, An with frequencies p1, p2,…, pn, the genotype frequencies are p2

i for homozygotes AiAi and 2pipj for heterozygotes AiAj.

The heterozygosity value (H) for a locus is the total frequency of heterozygotes, and it may be written as

For two alleles, the maximum heterozygosity is .5, for 5 alleles it is .8, and for 10 alleles it is .9.

In other words, for a locus to have a heterozygosity of 80%, it must have at least 5 alleles. (The maximum heterozygosity is reached when the alleles have equal

(29)

Example

Suppose a locus has 5 alleles (designated 1, 2, 3, 4, 5) with frequencies .5, .3, .1, .08, .02.

What are the genotype frequencies when Hardy-Weinberg equilibrium is established? What is the heterozygosity value (H) at this locus?

With n alleles, there are n(n + 1)/2 genotypes. Thus, for 5 alleles there are 15 genotypes.

The frequencies of the five homozygotes, 1-1, 2-2, 3-3, 4-4, 5-5, are .25, .09, .01, .0064, .0004, respectively.

The frequencies of the 10 heterozygotes, 1-2, 1-3, 1-4, 1-5, 2-3, 2-4, 2-5, 3-4, 3-5, 4-5, are .3, .1, .08, .02, .06, .048, .012, .016, .004, .0032, respectively.

(30)
(31)

Approach to Equilibrium for a Locus on the X Chromosome

Generation pm pf 0 .33 .57 1 .57 .45 2 .45 .51 3 .51 .48 4 .48 .495 5 .495 .485 6 .485 .49125 7 .49125 .489375 8 .489375 .4903125 9 .4903125 .48984375 10 .48984375 .490078125 11 .490078125 .4899609375 12 .4899609375 .49001953125 -Equilibrium .49 .49 31

(32)

The genotype frequencies at a locus on the X chromosome differ in the two sexes because males have only one X chromosome, whereas females have two X

chromosomes.

Thus, in males the genotype frequency is equal to the allele frequency.

For Hardy-Weinberg equilibrium, the allele frequencies in males must be equal to those in females.

Suppose the frequency of the A1 allele is pm in males and pf in females in the first generation.

By the principles of X-linked inheritance, the frequency of this allele in males in the second generation must be pf because males get their X chromosomes from their mothers.

(33)

By contrast, in females in the second generation the frequency of the A1 allele is ½(pm + pf) because females get one X chromosome from each parent.

The difference between the male and female frequencies in this generation is pf -½(pm + pf) = ½(pf - pm), which is one half of the difference in the first generation.

Similarly, in the third generation, the male allele frequency is ½(pm + pf), while the female frequency is ¼pm + ¾pf, and the difference is ¼(pf - pm).

With each generation, the difference between the male and female frequencies becomes smaller, and equilibrium is reached when they are the same.

The equilibrium allele frequency of A1 is equal to ⅔pf + ⅓pm in both sexes.

(34)

Table shows the approach to equilibrium for an X-linked locus when the initial allele frequencies are .33 in males and .57 in females.

With each generation, the difference between the frequencies in males and females is reduced, and they approach the equilibrium frequency of ⅓(.33) + ⅔(.57) = .49.

If the frequencies of the two alleles at the locus (A1 and A2) are p = ⅓pm + ⅔pf and q = 1 - p, the equilibrium genotype frequencies are p and q in males and p2, 2pq, and q2 in

(35)

Genotype Equilibrium Frequencies for an X-Linked Locus

Offspring Genotype Frequencies

Male Female

Mating Frequency (Male × Female) A1 A2 A1 A1 A1 A2 A2 A2

A1× A1A1 p3 p3 0 p3 0 0 A1× A1A2 2p2 q P2 p2 q p2 q p2 q 0 A1× A2A2 P q2 0 P q2 0 P q2 0 A2× A1A1 p2 q p2 q 0 0 p2 q 0 A2× A1A2 2P q2 P q2 0 0 P q2 P q2 A2× A2A2 q3 0 q3 0 0 q3 Total p q P2 2pq q2 35

(36)

Table gives the frequency of each possible mating type and the expected offspring genotype frequencies for males and females.

Summing these genotype frequencies shows that the equilibrium frequencies are maintained in the next generation.

(37)

3.

TWO LOCI

Equilibrium is reached after one generation of random mating for a single autosomal locus and over several generations for an X-linked locus.

However, the approach to equilibrium may be much longer for two loci considered jointly, and the number of generations depends on the recombination fraction.

Suppose the first locus has alleles A1 and A2, with frequencies p1 and p2, and the second locus has alleles B1 and B2, with frequencies q1 and q2, respectively.

The four possible gametes are A1B1, A1B2, A2B1, A2B2; let their frequencies in the population be g11, g12, g21, g22, where p1 = g11 + g12, p2 = g21 + g22, q1 = g11 + g21, and q2 = g12 + g22.

(38)

Allowing these gametes to unite at random gives the genotype frequencies in the next generation.

Now consider the gametic output of this population.

In doing so we must take into account the fact that the frequency of gametes

produced by the double heterozygote (A1A2B1B2) depends on the recombination fraction,

θ.

If the phase is A1B1/A2B2, A1B1 and A2B2 are nonrecombinants, and A1B2 and A2B1 are recombinants.

Conversely, if phase is A1B2/A2B1, A1B2 and A2B1 are nonrecombinants, and A1B1 and A2B2 are recombinants.

(39)

In addition, all the gametes produced by individuals with the genotype A1A1B1B1, and one half of those produced by individuals with the genotypes A1A1B1B2 and A1A2B1B1 will be A1B1.

Thus, the total frequency of A1B1 gametes in this generation is g112 + g

11g12 + g11g21 +

g11g22(1 – θ ) + g12g21θ, which may be written as g11 – θ D, where D = g11g22 - g12g21. D is called the coefficient of linkage disequilibrium and is a measure of allelic

association.

Similar calculations may be done for each of the gametic types, and the frequencies obtained are g12 + θ D, g21 + θ D, and g22 – θ D for A1B2, A2B1, and A2B2, respectively.

(40)

Joint Genotype Frequencies for Two Loci

Genotype Frequency Equilibrium Frequency

A1A1B1B1 g112 p12q12 A1A1B1B2 2g11g12 2p12q1q2 A1A1B2B2 g122 p12q22 A1A2B1B1 2g11g21 2p1p2q12 A1A2B1B2 2g11g22 + 2g12g21 4p1p2q1q2 A1A2B2B2 2g12g22 2p1p2q22 A2A2B1B1 g212 p22q12 A2A2B1B2 2g21g22 2p22q1q2

(41)

If the loci are unlinked, θ = ½, and the change in gametic frequency from one generation to the next is ½ D.

For linked loci the change is θ D.

Thus, the more closely two loci are linked, the slower is the approach to equilibrium. The coefficient of linkage disequilibrium after t generations may be written as

(42)

At equilibrium, D is equal to zero and the genotype and gametic frequencies are products of the allele frequencies.

The gametic frequencies may be written as g11 = p1q1 + D, g12 = p1q2 - D, g21 = p2q1 -D, and g22 = p2q2 + D.

Each of these gametic frequencies must be greater than or equal to zero.

Thus, D must be greater than or equal to both -p1q1 and -p2q2, and D must be less than or equal to both p1q2 and p2q1.

These results may be written as

(43)

For two loci each with two alleles, D must lie between -.25 and .25, and it can reach these extreme values only if the frequencies of the four alleles are .5.

Thus, the value of D is dependent on allele frequencies, meaning that D values for different pairs of loci are not comparable.

The value of the standardized measure, D' = D/Dextreme, where Dextreme = -Dmin if D < 0 and Dmax if D > 0, is less dependent on the allele frequencies and lies between -1 and 1.

(44)

Linkage disequilibrium studies are particularly useful in refining the flanking markers for a disease locus in populations that grew from a small number of founders with very little migration into the population.

For example, the Usher syndrome type I locus in the Acadian population of

southwestern Louisiana was localized to a small region on chromosome 11 by linkage disequilibrium analysis.

Individuals in this population are the descendants of a small group of Acadians who moved to southwestern Louisiana, when they were expelled from the Nova Scotia region of Canada by the British in 1755.

(45)

The statistic, δ, is another measure of linkage disequilibrium that is useful for estimating the location of a disease locus if a single mutation is likely.

The formula is

where pD is the frequency of the associated allele on disease chromosomes and pN is the frequency of this allele on normal chromosomes.

This value represents an estimate of the proportion of disease chromosomes bearing the original associated allele.

If there is a single mutation, the proportion of chromosomes carrying this mutation is the same for all marker loci, so differences in δ across loci should largely represent effects of recombination.

(46)

Thus, δ can be used to determine the most likely location of the disease locus among a set of tightly linked marker loci and also to estimate the age of the mutation in the

population.

Using this approach, Risch and colleagues refined the location of the idiopathic torsion dystonia locus on chromosome 9 and estimated its age to be approximately 350 years in the Ashkenazi Jewish population.

Linkage disequilibrium is one possible explanation for association between a phenotype and a marker allele in a population.

In this case, the disease locus is tightly linked to the marker locus.

(47)

Other possible reasons for association are pleiotropy (multiple effects of the same gene), such as the association between stomach cancer and the A allele of the ABO

blood group and departures from random mating due to events such as racial admixture, stratification, inbreeding, and assortative mating.

1. Suppose the frequencies of the gametes A1B1, A1B2, A2B1, A2B2 are .5, .1, .3, .1, respectively.

 What is the value of D after one generation of random mating if 1) the two loci are unlinked and 2) the recombination fraction between the two loci is .01?

 The value of D in the original population is (.5)(.1) - (.1)(.3) = .02.

 After one generation, D = (1 .5)(.02) = .01 if the two loci are unlinked, and D = (1 -.01)(.02) = .0198 if the recombination fraction is .01.

(48)

2. How many generations are required for the value of D to be one-half its initial value? Dt/D0 = (1 – θ )t = ½; therefore, t = log(½)/log(1 – θ ).

 Thus, for θ equal to .3, .1, .01, and .001, the numbers of generations required are

approximately 2, 7, 69, and 693, respectively.

(49)
(50)
(51)

Factors That Affect Hardy-Weinberg Equilibrium

The assumption of a large, random mating population is fundamental to Hardy-Weinberg equilibrium.

If mating is not at random, the allele frequencies at a locus (say, p and q) in the

population do not change from one generation to the next, but the genotype frequencies are not p2, 2pq, and q2.

Evolutionary forces such as random genetic drift, mutation, selection, and migration, however, will change allele frequencies (and consequently genotype frequencies) from one generation to the next.

(52)

A.

FACTORS THAT AFFECT GENOTYPE FREQUENCIES BUT

NOT ALLELE FREQUENCIES

Random mating has been assumed so far in all the derivations.

If gametes do not unite at random, the genotype frequencies are not in Hardy-Weinberg proportions and cannot be derived simply from allele frequencies.

Consanguinity (inbreeding), assortative mating, and stratification (e.g., ethnic subgroups within a population) are examples of nonrandom mating.

In these situations, the frequency of homozygotes is increased at the expense of heterozygotes, and the genotype frequencies may be significantly different from Hardy-Weinberg expectations.

(53)

1.

Consanguinity and Inbreeding

Individuals who are related are consanguineous, and the offspring of matings between such individuals are said to be inbred.

Inbreeding increases the frequency of homozygous genotypes and decreases the frequency of heterozygous genotypes in the population.

The offspring of consanguineous marriages have an increased risk over that of the general population of having recessive disorders.

The increase in risk depends on the population frequency of the disease allele and the degree of relationship between the parents.

(54)

In cultures in which uncle-niece and first- and second-cousin marriages are

encouraged, recessive disorders that are rare in most populations may be relatively common.

However, the occurrence of a recessive disease that is relatively frequent in Caucasian populations, such as cystic fibrosis, is rarely the result of inbreeding.

(55)

The coefficient of inbreeding for a child of a consanguineous marriage is the

probability that the child receives two alleles at a given locus that were both from the same ancestor and are, thus, identical by descent (autozygous).

For example, half first cousins share a grandparent in common.

The probability that a child of half first cousins is homozygous by descent at a locus is (½)5 = 1/32.

In general, for autosomal loci, the inbreeding coefficient for an individual is (½)(n

1+n2+1),

where n1 and n2 are the numbers of generations separating the individuals in the

consanguineous mating from their common ancestor. (This formula assumes that the common ancestor is not inbred.)

Half first cousins are separated from their common grandparent by two generations.

(56)

Thus, the exponent is 2 + 2 + 1 = 5.

Table gives the estimated proportion of alleles shared by consanguineous individuals that are identical by descent as well as the coefficient of inbreeding (F) for the offspring of these consanguineous matings.

If a child is inbred through more than one line of descent, the total coefficient of inbreeding is the sum of each of the separate coefficients.

For example, first cousins are related through two grandparents.

Thus, the inbreeding coefficient for the offspring of first cousins is (½)5 + (½)5 = (½)4 =

1/16.

The coefficient of inbreeding is also an estimate of the proportion of loci at which an individual is autozygous.

(57)

Proportion of Alleles Shared by Related Individuals That Are Identical by Descent and the Inbreeding Coefficient (F) in the Offspring of Various Types of Consanguineous Matings

Type of Mating Proportion of Shared Alleles F

Parent-offspring 1/2 1/4

Brother-sister 1/2 1/4

Half sibs 1/4 1/8

Uncle-niece, aunt-nephew 1/4 1/8

First cousins 1/8 1/16

Double first cousins 1/4 1/8

Half first cousins 1/16 1/32

First cousins once removed 1/16 1/32

Second cousins 1/32 1/64

Second cousins once removed 1/64 1/128

(58)

The coefficient of inbreeding for X-linked loci depends on the number of males in the lines of descent and is always zero for male offspring, because they have only one X chromosome.

In order to calculate the inbreeding coefficient for daughters of first cousins, four possibilities need to be considered for the first cousins: their fathers are brothers, their mothers are sisters, or the father of the male cousin and the mother of the female cousin are siblings, and vice versa.

If the fathers are brothers, the first cousins cannot share any X-linked alleles in

(59)

Thus, female offspring of this type of first cousin mating are not inbred for X-linked loci and have an inbreeding coefficient of zero.

Similarly, if the first cousins are offspring of a brother and a sister with the father being the son of the brother and the mother being the daughter of the sister, the inbreeding coefficient for their daughters is zero because the first cousins cannot share any X-linked alleles in common.

On the other hand, if the mothers of the first cousins are sisters, then the inbreeding coefficient for X-linked loci in their daughters is greater than that for autosomal loci because a male transmits the X chromosome he received from his mother to all his daughters.

(60)

Thus, the inbreeding coefficient in this situation is (½)3 + (½)4 = 3/16.

The fourth possibility is that the first cousins are offspring of a brother and sister, with the sister being the mother of the male and the brother being the father of the female.

(61)

2.

Assortative Mating

Assortative mating is the tendency for people to choose mates who are more similar (positive) or dissimilar (negative) to themselves in phenotype characteristics than would be expected by chance.

If these characteristics are genetically determined, positive assortative mating may increase homozygosity in the population.

An important difference between inbreeding and positive assortative mating is that inbreeding affects all loci, while assortative mating affects only those that play a role in the phenotype characteristics that are similar.

(62)

Clinical examples of positive assortative matings are those between individuals who are profoundly hearing impaired or blind, which in some cases may be attributable to the same genotypes.

(63)

B.

FACTORS THAT AFFECT ALLELE FREQUENCIES

Evolutionary forces such as random genetic drift, mutation, selection, and migration change the allele frequencies in a population.

Important examples of each of these forces have been documented in human

populations, and they are likely to become more relevant as knowledge of the genetic structure of populations at the DNA level increases.

(64)

1.

Random Genetic Drift

The Hardy-Weinberg principle assumes that population size is large, and this assumption is probably valid for many present-day populations.

However, if the population size is small, allele frequencies may change from one generation to the next by chance alone.

This change is a consequence of sampling in small populations and is called random genetic drift.

The sample is the set of gametes that contributes to the next generation.

Suppose this sample consists of 2N gametes (N individuals) and consider a locus with two alleles, A1 and A2.

(65)

The 2N + 1 possible values of the frequency of A1 are: 0, 1/2N, 2/2N, 3/2N ,…..,(2N-1)/2N, 2N/2N

The probability that the number of A1 alleles in the population is k (0 ≤ k ≤ 2N)

depends on the population size and the frequencies, p and q, of A1 and A2, respectively, in the previous generation.

It may be written as

(66)

Thus, if N and p are known, the probability of a particular frequency of A1 in the next generation may be calculated.

For example, if N = 50 and p = .5, the probability that the frequency of A1 in the next generation is less than .4 or greater than .6 is .023, while the probability that it is

between .45 and .55 is .682.

The probability that A1 will either be lost or become fixed in the population in the next generation is extremely small but is greater than zero.

If N = 50 and p = .01, the probability that A1 will be lost in the next generation is .37, and the probability that it will have a frequency of greater than .05 is .002.

The precise change in allele frequency from one generation to the next cannot be predicted because drift is a random process.

(67)

However, over a number of generations, drift can lead to the loss of some alleles from the population, with others becoming fixed.

If a large number of populations are considered, the average behavior of allele frequencies can be predicted.

The probability that a new allele in a population will eventually become fixed is 1/2N, the frequency of the allele in the population at the time it arose.

If the allele is to become fixed in the population, the average time to fixation is approximately 4N generations.

After a large enough number of generations of random genetic drift, every allele in a population can be traced back to a single allele in the initial ancestral population.

All other alleles in the initial population will have been lost.

(68)

This concept is known as coalescence, and it has been used to model DNA sequence variation in populations.

Random genetic drift in a population is similar to inbreeding and stratification in that its effect on the population is a reduction in the number of heterozygotes and an increase in the number of homozygotes.

When the population size is drastically reduced (a bottleneck), the genetic drift is known as a founder effect.

Examples of this effect (e.g., new colonization by a small subset of a population or environmental disasters such as plague and famine) abound in history.

Founder effect explains the relatively high frequency of diseases such as Usher syndrome type I in the Acadian population of southwestern Louisiana and idiopathic

(69)

2.

Mutation

When mutations occur in the germ cells, they may be passed on to the next generation.

The change in the DNA may be a single nucleotide substitution or it may involve many nucleotides, such as in the case of an insertion or deletion.

Many hemoglobinopathies are due to point mutations that cause the replacement of an amino acid (missense) and consequently an abnormal protein product.

The most common mutation causing Tay-Sachs disease is a 4-base-pair (bp) insertion (frameshift), whereas the ΔF508 mutation in the cystic fibrosis gene is a 3-bp deletion.

The source of genetic variation in a population is mutation.

(70)

Mutation rates in humans have been estimated to be of the order 10-4 to 10-6 per gene

per generation.

The rate of nucleotide substitutions is estimated to be one in 108 per generation,

implying that 30 nucleotide mutations would be expected in each human gamete. Most new mutations are lost due to chance.

However, new mutations arise in each generation, and some become established in the population.

Suppose μ is the mutation rate from A1 to A2 per generation.

If the frequencies of A1 and A2 are pt and qt, respectively, in generation t, then in the (t+1)th generation the frequency of A2 is:

(71)

Similarly, qt = μ + (1 -μ)qt-1, qt-1 = μ + (1 -μ)qt-2, and so forth.

By substitution, qt may be written in terms of q0, the frequency of A2 in the initial generation:

Because μ is very small, (1 - μ)t is approximately equal to e-tμ.

Thus, the number of generations required to change the frequency of A2 from q0 to qt is inversely proportional to the mutation rate.

Also note that as t gets larger and larger, qt gets closer and closer to 1.

In other words, if mutation from A1 to A2 is the only force acting to change the allele frequencies, then A2 will eventually become fixed in the population.

The change in allele frequency from one generation to the next is qt+1 - qt = μ(1 - qt), meaning that the change in allele frequency is greater for smaller frequencies of A2.

(72)

So far we have considered mutation in only one direction.

Now suppose the mutation rate from A1 to A2 is μ and the reverse rate from A2 to A1 is ν.

Then the change in the frequency of A2 per generation is μp - νq, and equilibrium is reached when this change is equal to zero.

Thus, the equilibrium frequencies are p = ν/(μ + ν) and q = μ/(μ + ν).

This equilibrium is stable, meaning that if the frequencies are disturbed, they will eventually return to their equilibrium values as long as no other forces are affecting them.

(73)

Mutation rates have been estimated for a number of autosomal dominant disorders, such as neurofibromatosis type I, which has the high rate of 10-4, and tuberous sclerosis,

with a rate of about 10-5.

Some of these disorders (e.g., achondroplasia, for which the mutation rate is estimated to be 10-5) have reduced fitness, which is discussed in the next section.

(74)

1. How many generations will be required to change the frequency of A2 a) from .1 to .2, b) from .8 to .9, if the mutation rate from A1 to A2 is 10-4?

 The number of generations is

 Therefore, for a mutation rate of 10-4, 1178 generations are required, whereas for a

mutation rate of 10-5, 11,780 generations are required to change the frequency of A 2

from .1 to .2.

 On the other hand, to change the frequency from .8 to .9 requires 6932 generations if the mutation rate is 10-4 and 69,315 generations if the mutation rate is 10-5.

(75)

2. Suppose the mutation rate from A1 to A2 is 10-4 and the reverse rate is 10-5.

 What is the equilibrium frequency of A1?

 The equilibrium frequency of A1 is 10-5/(10-4 + 10-5) = .091.

 However, to reach this equilibrium frequency may take tens of thousands of generations, depending on the initial allele frequencies.

(76)

1.

SELECTION

2.

MIGRATION

(77)

Natural Selection

We have noted that the Hardy-Weinberg law allows us to estimate allele and genotype frequencies in ideal populations, that is, in populations described by the assumptions of equal viability and fertility, absence of mutation, absence of migration, sufficient size, and random mating.

Obviously, it is almost impossible to find natural populations in which all these assumptions hold for all loci.

In nature, populations are dynamic, and changes in size and gene pool are common. However, an understanding of the Hardy-Weinberg law allows us to investigate

populations that vary from the ideal.

(78)

In this and the following sections, we discuss factors that prevent populations from reaching Hardy-Weinberg equilibrium, or that drive populations toward a different equilibrium, and the relative contributions of these factors to evolutionary change.

The first assumption of the Hardy-Weinberg law is that individuals of all genotypes have equal survival rates and equal reproductive success.

If this assumption does not hold, allele frequencies may change from one generation to the next.

To see why, let’s imagine a population of 100 individuals in which allele A has a frequency of 0.5 and allele a has a frequency of 0.5.

Assuming the previous generation mated randomly, we find that the genotype

(79)

Because our population contains 100 individuals, we have 25 AA individuals, 50 Aa individuals, and 25 aa individuals.

Now suppose that individuals with different genotypes have different rates of survival: all 25 AA individuals survive to reproduce; 90 percent, or 45, of the Aa individuals survive to reproduce; and 80 percent, or 20, of the aa individuals survive to reproduce.

When the survivors reproduce, each contributes two gametes to the new gene pool, giving us 2(25) + 2(45) + 2(20) = 180 gametes.

What are the frequencies of the two alleles in the surviving population?

We have 50 A gametes from AA individuals, plus 45 A gametes from Aa individuals, so the frequency of allele A is (50 + 45)/180 = 0.53.

We have 45 a gametes from Aa individuals, plus 40 a gametes from aa individuals, so the frequency of allele a is (45 + 40)/180 = 0.47.

(80)

These frequencies differ from those we started with. Allele A has increased, while allele a has declined.

A nonrandom difference among individuals of different genotypes in survival or reproduction rate or both is called natural selection.

Natural selection is the principal force that shifts allele frequencies within large populations and is one of the most important factors in evolutionary change.

(81)

1.

Fitness and Selection

Selection occurs whenever individuals with a particular genotype enjoy an advantage in survival or reproduction over other genotypes.

However, selection may vary from less than 1 to 100 percent (the latter indicating a lethal allele).

In the previous hypothetical example, selection was strong.

Weak selection might involve just a fraction of a percent difference in the survival rates of different genotypes.

Advantages in survival and reproduction ultimately translate into increased genetic

(82)

An individual organism’s genetic contribution to future generations is called its fitness. Genotypes associated with high rates of reproductive success are said to have high fitness, whereas genotypes associated with low reproductive success are said to have low fitness.

The fitness of an individual is defined as ability to survive and reproduce.

The process by which the frequencies of genotypes in individuals with greater fitness increase in the population is natural selection.

It acts to decrease the frequencies of the less fit genotypes.

The relative fitness is defined as 1 - s, where s is the selection coefficient against the deleterious genotype.

(83)

Thus, the most fit genotype has a relative fitness of 1 (and a selection coefficient of 0). Consider the situation where there are three genotypes, A1A1, A1A2, A2A2, at a locus with relative fitnesses of 1, 1, 1 - s, respectively.

That is, there is selection against the A2A2 homozygote. (If s = 1 the selection is complete, meaning that individuals with the A2A2 genotype do not reproduce.)

(84)

Table A shows the change in allele frequencies from one generation to the next.

In the case in which s = 1, the frequencies after t generations can be written in terms of the initial allele frequencies.

Substituting in the formula given in Table B shows that when the A2A2 homozygote does not reproduce, the number of generations required to reduce the frequency of A2 to one half its initial value is equal to the reciprocal of its initial value.

Thus, if the frequency of A2 is .01, it will take 100 generations of complete selection against the A2A2 homozygote to reduce the frequency of A2 to .005.

(85)

In other words, lack of reproduction of individuals with a rare recessive disease does not lead to a rapid reduction in the frequency of the deleterious allele from one

generation to the next.

Hardy-Weinberg analysis also allows us to examine fitness.

By convention, population geneticists use the letter w to represent fitness.

Thus, wAA represents the relative fitness of genotype AA, wAa the relative fitness of genotype Aa, and waa the relative fitness of genotype aa.

Assigning the values wAA = 1, wAa = 0.9, and waa = 0.8 would mean, for example, that all AA individuals survive, 90 percent of the Aa individuals survive, and 80 percent of the aa individuals survive, as in the previous hypothetical case.

Let’s consider selection against deleterious alleles.

(86)

Fitness values wAA = 1, wAa = 1, and waa = 0 describe a situation in which a is a homozygous lethal allele.

As homozygous recessive individuals die without leaving offspring, the frequency of allele a will decline.

The decline in the frequency of allele a is described by the equation

where qg is the frequency of allele a in generation g, q0 is the starting frequency of a (i.e., the frequency of a in generation zero), and g is the number of generations that have passed.

(87)

Table A: Selection Against the A

2

A

2

Genotype at an Autosomal Locus

Genotype A1A1 A1A2 A2A2

Frequency before selection p2 2pq q2

Relative fitness 1 1 1-s

Frequency after selection p2 2pq q2(1 - s)

After one generation of selection

Freq. A1= (P2+pq)/[p2+2pq+q2(1-s)]= p/(1-sq2)

Freq. A2= [pq+(1-s)q2]/(1-sq2)

If s = 1 (that is, complete selection against the A2A2genotype), then:

After one generation of selection Freq.A1= (p2+pq)/(p2+2pq)=1/(1+q)

Freq. A2= pq/(p2+2pq)=q/1+q)

After t generations of selection Freq. A1= [1+(t-1)q]/(1+tq) Freq. A2= q/(1+tp)

(88)

Now consider the situation in which there is partial selection against the A2A2 genotype.

The allele frequencies after t generations cannot be written in terms of the initial frequencies, but the decrease in the frequency of the A2 allele from one generation to the next can be calculated.

This decrease is equal to sq2(1 - q)/(1 - sq2), and the number of generations required

to change the frequency of A2 from its initial value to a new value can be approximated. For example, if s = .001 and the initial frequency of A2 is .01, more than 100,000

generations will be required to reduce the frequency to .005.

This example makes the point that even if the selective disadvantage of a genotype is very small, the allele frequencies in the population will gradually change.

(89)

Change in the frequency of a lethal recessive allele, a. The frequency of a is halved in two generations and halved again by the sixth generation. Subsequent reductions occur slowly because the majority of a alleles are carried by heterozygotes.

(90)

For the same selection coefficient (s = .001), 11,665 generations are required to reduce the frequency of A2 from .7 to .1.

If there is selection against the heterozygous genotype (A1A2) as well as the A2A2 genotype, with s = .001 for A2A2 and s = .0005 for A1A2, then 6156 generations are required to reduce the frequency of A2 from .7 to .1.

In the case in which there is selection favoring the heterozygote over both homozygotes, an equilibrium state is reached for the allele frequencies.

Table B shows the change in allele frequencies from one generation to the next. At equilibrium, s1p = s2q, so that:

(91)

This equilibrium is stable and is called a balanced polymorphism. This type of selection is known as overdominance.

If, on the other hand, selection is against the heterozygote, the equilibrium is unstable, and the selection is known as underdominance.

The equilibrium frequencies are the same, but if a disturbance occurs such that q > s1/(s1 + s2), q will increase further rather than returning to its equilibrium value.

The reverse is also true, so eventually one allele or the other will be eliminated.

(92)

Table B. Selection Favoring the Heterozygous Genotype at an Autosomal Locus

Genotype A1A1 A1A2 A2A2

Frequency before selection p2 2pq q2

Relative fitness 1-s1 1 1-s2

Frequency after selection p2(1-s

1) 2pq q2(1-s2)

After one generation of selection

Freq. A1= (P-s1p2)/(1-s

1p2-s2q2)

Freq. A2= (q-s2q2)/( 1-s

1p2-s2q2)

The change in the frequency of A2from one generation to the next is pq(s1p - s2q)/(1 - s1p2- s

2q2). Equating this quantity to zero gives the equilibrium allele

frequencies, which are

Freq. A1 = s2/(s1 + s2) Freq. A = s /(s + s)

(93)

Let us now consider a balance between mutation and selection.

Suppose the mutation rate from A1 to A2 is μ, and the relative fitnesses of the genotypes A1A1, A1A2, A2A2, are 1, 1, 1 - s, respectively.

As shown in Table A, the frequency of A1 after selection is p/(1 - sq2).

Thus, the increase in frequency of A2 due to mutation from A1 to A2 is μp/(1 - sq2) while

the decrease due to selection is sq2(1 - q)/(1 - sq2).

At equilibrium, μ p /(1 - sq2) = sq2(1 - q)/(1 - sq2), which simplifies to q =

This equilibrium is stable and q =

When s = 1. Thus, for a lethal recessive disease and a mutation rate of 10-6, the

equilibrium frequency of the deleterious allele is 1/1000.

(94)

In the case of a deleterious dominant phenotype, the fitness of both the homozygote and the heterozygote is reduced.

With selection coefficients of 1 - s, 1 - s, 1, the increase in the frequency of A1 due to mutation is equal to the decrease due to selection when q = μ/s, which reduces to q = μ for s = 1.

If individuals with a dominant disease do not reproduce, the frequency of the deleterious allele in the next generation is equal to the mutation rate.

Examples of such disorders are atelosteogenesis and thanatophoric dysplasia, which are both lethal forms of short-limbed dwarfism.

In the case of achondroplasia, fitness is not zero, but it is considerably lower than one, and is estimated to be about .2.

(95)

Thus, the equilibrium frequency of the deleterious allele is 10-5/.8 = 1.25 × 10-5, or

slightly higher than the mutation rate

(96)

Table C. X-Linked Locus: Selection Against the A

2

A

2

Genotype in Females and

the A

2

Genotype in Males

Females

Genotype A1A1 A1A2 A2A2

Frequency before selection p2 2pq q2

Relative fitness 1 1 1-s

Frequency after selection p2 2pq q2(1-s)

Males

Genotype A1 A2

Frequency before selection p q

Relative fitness 1 1-s

Frequency after selection p q(1-s)

(97)

Selection against genotypes at loci on the X chromosome needs to be tabulated separately for males and females because males have only one allele at an X-linked locus.

Table C shows the case in which the A2A2 genotype and the A2 genotype are selected against in females and males, respectively.

The decrease in frequency of A2 due to reduced fitness in females is extremely small compared with the decrease due to reduced fitness in males.

Thus, we will only consider males.

The loss of A2 alleles is equal to sq(1 - q)/(1 - sq), which is q if s = 1.

(98)

In other words, if selection is complete, all male A2 alleles are lost in each generation. Because males have only one allele and females have two, this loss represents one third of the A2 alleles in the population.

If the mutation rate from A1 to A2 is μ, the increase in frequency of A2 due to mutation in males is μ p /(1 - sq).

But mutation in males represents only one third of the mutations that are occurring in the population.

Thus, increase in frequency due to mutation balances decrease due to selection when 3μ p/(1 - sq) = sq(1 - q)/(1 - sq), which reduces to q = 3μ /s.

(99)

For an X-linked recessive lethal, s = 1, and μ = q/3.

In other words, one third of the deleterious alleles in the population, and, thus, one third of cases of diseases such as Duchenne muscular dystrophy, are new mutations.

In less severe X-linked disorders, the proportion of cases that are new mutations is not as high; for example, the relative fitness of individuals with hemophilia A is about 70%.

Therefore, the proportion that are new mutations is .3q/3, meaning that about 10% of cases are new mutations.

Of course, during the initial years of the AIDS epidemic when blood was not being screened for HIV, the relative fitness of hemophiliacs was considerably lower than 70%.

The effect of this transient reduction on the frequency of hemophilia A will be seen in future generations.

(100)

The effect of selection on allele frequency. The rate at which a deleterious allele is removed from a population depends heavily on the strength of selection.

(101)

2.

Migration (Gene Flow)

Migration introduces new alleles into the population and, like mutation, increases heterozygosity.

In general, migration rates tend to be much higher than mutation rates, so migration is much more effective than mutation in counterbalancing the effects of genetic drift.

Comparison of alleles in different ethnic groups demonstrates the contribution of gene flow to the current population gene pools.

For example, the most common mutations in phenylalanine hydroxylase that cause phenylketonuria (PKU) are likely to be of Celtic origin.

(102)

These same mutations have been found in many different populations throughout the world, reflecting the migrations of the Celts.

As an example of the effect of gene flow on allele frequencies, suppose the frequency of an allele at an autosomal locus is 40% in one population and is not present in another population of equal size.

Then the frequency of homozygotes for this allele is 16% in the first population and zero in the second.

If these two populations fuse and undergo random mating, the frequency of the allele in the combined population is (.4 + 0)/2 = .2, and the frequency of homozygotes is 4%.

(103)

The frequency of homozygotes in this combined population is considerably smaller than the average of the two populations (8%).

This example illustrates the reduction in homozygosity and consequently the increase in heterozygosity that results from gene flow between populations.

In general, suppose two populations have allele frequencies of p1 and q1, and p2 and q2.

Then the frequency of heterozygotes in the mixed population is p1q1 + p2q2. After one generation of random mating, the frequency of heterozygotes is p1q1 + p2q2 + ½ (p1 -p2)2, which is always greater than p

1q1 + p2q2.

[Note that (p1 - p2)2 = (q

1 - q2)2 and the decrease in each of the two homozygote

frequencies is ¼(p1 - p2)2.]

(104)
(105)

ETHNIC DIVERSITY OF DISEASE ALLELES

The existence of different disease alleles among ethnic groups is significant both for understanding the origins of the disease in a population and for estimating recurrence risks that will depend on ethnicity. Several examples show the benefit of applying

population history to medical genetics.

The thalassemias have relatively high frequencies in many different populations,

presumably due to selective advantage (increased relative fitness) of the heterozygotes over the homozygotes against malaria.

(106)

Mutations in the genes encoding both the α-chain (chromosome 16) and the β-chain (chromosome 11) of hemoglobin cause thalassemia.

Most of the β-thalassemia mutations are single base-pair substitutions as opposed to the α-thalassemias, in which complete genes are deleted.

More than 80 β-chain point mutations that cause β-thalassemia have been described. These mutations have a wide ethnic distribution, with several different common alleles found in Mediterranean, African, and Southeast Asian populations.

(107)

The most common mutation in the hexosaminidase-A α-subunit gene (chromosome 15) causing Tay-Sachs disease in the Ashkenazi Jewish population is a 4-bp insertion in exon 11.

It is found in 80% of mutant alleles in this population but in less than 20% in other populations.

Three alleles account for 99% of mutations in the Ashkenazi Jewish population. The frequency of Tay-Sachs alleles is also relatively high in the French Canadian population, in which two different mutant alleles have been described.

Members of several Acadian families in southwestern Louisiana were found with Tay-Sachs disease; in 11 of 12 disease alleles, the mutation was the 4-bp insertion that is the most frequent mutant allele in the Ashkenazi Jewish population.

(108)

Other diseases, such as gyrate atrophy and familial hypercholesterolemia, show similar ethnic diversity in the distribution of mutant alleles.

Founder effect (random genetic drift), selection, and gene flow determine the frequencies of mutations in different populations.

(109)

EVOLUTIONARY PATTERNS

Various types of DNA marker alleles have been analyzed in studies of population structure and evolution, and before the introduction of DNA markers, similar studies were done using blood group and protein polymorphisms.

The evolutionary tree derived from these studies suggested four major groupings, consisting of Africa, Europe/Asia, Americas, and Australia/New Guinea, with the most likely origin being the African branch.

More recently, detailed analyses of mitochondrial DNA (mtDNA) and Y-chromosome haplogroups have extended these findings and confirmed that contemporary populations are largely the descendants of people who migrated out of Africa about 50,000 years

(110)

For example, the mtDNA and Y haplogroups found in southeastern Asia and Australia are distinct from those in the rest of Asia and Europe.

This variation is most likely the result of random genetic drift and northern and southern migrations at different times.

The Americas were the last continents to be colonized, and as would be anticipated, most Native American Y chromosomes belong to a single haplogroup.

Interestly, the results of voyages by Europeans to the Americas and Oceania in the past 500 years are clearly revealed through mtDNA and Y-chromosome analyses.

In these populations, European Y-chromosome haplogroups are relatively common, while the mtDNA haplogroups are those of the indigenous population.

(111)

GENOME VARIATION

The advances of the Human Genome Project have renewed appreciation and interest in the study of naturally occurring variation in the human genome.

About 90% of human DNA variation is due to single nucleotide base changes.

On average, a single base-pair difference between two human genomes is observed every 1000 base pairs.

But the odds of finding a difference may be as much as 100-fold greater in some regions of the genome than in others.

Single nucleotide polymorphisms (SNPs) are defined as loci with alleles that differ at a single base, with the rarer allele having a frequency of at least 1% in a random set of individuals in a population.

References

Related documents