A estimates of 9^ - The indirect estimation of mutation rates in man

-4 4 -

model, it is important to no t i c e that the infinite

sites model w i t h o u t r e c o m b i n a t i o n is identical to the

infinite alleles mode.! . Besides, for alleles w i t h low

f r e q uency the e x p e c t e d n umber of different alleles

in a sample a p p r o x i m a t e s well the n u mber of d i f f e r e n t

sites s e g r e g a t i n g (Nei, 1977). Any d i f f e r e n c e in

the results usi n g these two models must, therefore,

be related to dif f e r e n t aspects of allelic data used,

as input.

For the step-wise m u t a t i o n model, the form of

equation (2.29) is given as

[2nqj 2n

E ( k ) = T. ( .) B fB' + j ;6 + 2 n - j )/u(6B' + l) (2.39)

r j=l ] 1

where b(.,.) is the beta f u nction and B^is as given

in (2.2). This e x p r e s s i o n is p r a c t i c a l l y identical

to (2.29) for 0<O.Oi as seen in e x t ensive n u m e r i c a l

computa t i o n s by C h a k r a b o r t y et a i (1980) .

2.4.1. i . .3 . Num b e r of singl e t o n s (k ) .

An exact sol u t i o n of (2.19) m a y be obta i n e d for

singletons or single copy alleles in the sample (k ).

By taking the bino m i a l s ampling e q u a t i o n for j=l, we

get E(k ) = 2n f x ( l - x ) 2n ^ $(x)dx s 0 = 2nC / (1-x) 2n + 0 'flx 0 2nG/(2n+0 -1) (2.40)

4 5 -

w h i c h y ields a m e t h o d of moment estimator, 6^ , as s

0, k s (2n-l)

s ( 2 n - k g) (2.41)

where k^ is the o b s e r v e d n umber of s i n g l e t o n s / l o c u s

in the sample. For infinite sites model.

° l k s k s /Zn (2.42)

■k

where k is the p r o p o r t i o n of sites segregated as

singletons. The e q u a t i o n (2.42) is quite similar to

(2.41) for large 2n.

The above solutions of the sampling equations

have b e e n given on the a s s u m p t i o n of sampling from

an infinite p o p u l a t i o n w i t h r a n d o m mating, w i t h o u t

replacement. This is not a p p r o a c h e d in actual situations.

While the sampling p r o c e d u r e s for finite p o p u l a t i o n s will

be taken up in the next section, it m a y be relevant

to include the role of i n b r e e d i n g in re c o v e r i n g the

n u m b e r of alleles in the population.

T e m p l e t o n (1980) has given the a p p r o p r i a t e

e x pected n u m b e r of n e u t r a l alleles und e r the infinite

model, as E (k I a) = £l-[l-x] 4'+ax(l-x) ] 0 2n-l 2n-j = E(k j a=0) - 0 E (x)dx f (2 n + 1 ) T (2 n + j + 0) j =0 2n-j r(j+l)r(2n+0) (2.43)

-4 6 ~

where E(kjcx = 0) is e q u at i o n (2.11); 2n T and 6 are

a l r e a dy defined. It is r e a d i l y seen from (2.43) that the

m e a n n u mb er of alleles r e c o v e re d in a sample decreases w i t h increase in a, the inbreeding c o e f f i c i e n t of

the population. No simple e x p r e ss io n for 0 in

terms of E(kja), leading to an estimator based on k is, however, available.

In an earlier section it was pointed out that

0p is a b i a s e d e s t i m a t o r of 0. For 0.6<0<2.0, Ewens

and Gillespie (1974) show, by simulation, that the

m ean value of 0p is rather c on si s t e n t l y about 40% or

m ore in excess of 0. A l t h o u g h 0^ is also a bi a s e d

e s t i m at o r (no u n b i a s e d e s t i m a t o r of' 0 exists; Ewens,

1979a), its bias d e c r e a s e s to zero asymptotically. For 2n=200, the bias is n e g l i g i b l e (Ewens, 1979a).

Furthermore, 0^ has v ery small m e a n square error

t y p ic al l y l/7th or l/8th of 0p. In the context of

strict n e u t ra l i t y there appears no excuse for

e s t i m a ti n g 0 t h r ough F.

Und e r g e n e r a l i z e d neutrality, however, the

above c o mp ar i s on does not hold u n i v e r s a l l y for the large values of a"=2Ns, w h e r e s is the selection coefficient.

•A. /\

For large a', 0p has less bias than 0 ^ und e r a n um be r of

conditions. It m i g ht seem p a r a doxical then to estimate

A A

0 from 0-, w h e n a " is low and from 0 w h e n ex'* is

k F

high for the p r o p e r t y of u nbiasedness. In addition, the

two estimates will then r e p r esent "total" 0 and " n e u t r a l ” 0.

- 47-

In dealing with protein data, when strict neutrality is presumed but not either proved or disproved using a statistical test, the choice of

0r may fortiutously, provide a less biased estimate

t ' *

if the bias is removed by defining a new estimator Gp , defined by

0* - 0.71 0r (2.44)

F b

This new estimator is designed to allow for the 40%

A A

bias in 0p. Thus 6* will be an approximately unbiased estimator of ’total* 0 i.e, when a"=0 and an unbiased estimator of ’’neutral only” 9,

whichever might apply. This blind estimation procedure is the only alternative, unless a strong test statistic are developed to discriminate between the various

aspects of neutrality.

2.4.2 Branching process models.

Alternative approaches to estimate the mutation rate from protein data were suggested by Neel and Thompson (1978) and Rothman and Adams (1978) using branching process models. Fisher (1930) and Karlin and McGregor (1967) had earlier considered these models for estimating the total number of heterozygous loci in the genome and number of alleles represented by j copies in the population, respectively.

48 -

One of the advantages in working with branching process models is that the assumptions of fixed population size

and equilibrium are not required to describe the transition of one allele from the jth allelic state to ith allelic state where an allelic state is defined by the number of copies by which an allele is

represented in the population. However, for the estimation of mutation rate these questions are d i ffi cu 11 t o avo i.d.

2.4.2.1. Rothman and Adams' method.

Let denote the number of different single copy alleles segregating at a locus in a population in tth generation. The presence of these alleles can be attributed to three different sources, provided there is no immigration/emigration and intragenic recombination is low. These sources are:

(1) New mutations introduced in the tth

t t

generation at a rate 2N y where N ' is the population size in the tth generation. It is assumed that an infinite series of alleles can be generated at this locus i.e, every new

allele is a novel allele,

(2) The drift of higher frequency alleles in the t-lth generation to the singleton class in the tth generation,

(3) The retention of singletons in the t-lth generation as singletons itself in the tth generation.

- 4 9-

The drift to and retention of singletons is given by the probability transition matrix P

where individual elements of the matrix P.. indicate

the probability that an allele present in j copies in the t-lth generation is changed to i conies in the tth generation. Quantitatively, this is given by Rothman and Adams (1978) as

E(K ) - 2Ny+ K Cj)P .

j=l 3 (2.49)

where E C ) is the expected number of singletons in the population, g(i) is the relative proportion of alleles each represented by j copies in the populations,^ is the expected number of alleles in the population and P_. ^ is the transition probability vector. This

equation represents the balance, at equillibrium between the expected number of alleles entering the singleton class and those alleles which exit.

The method of Rothman and Adams of course, assumes that the mutational events are given under the infinite alleles model. The form of the equation

(2.49) implicitly also assumes that mutation is

introduced as a replication error during gametogenesis and is expressed phenotypically in the offspring.

This being a unique event under the infinite alleles model, the possibility of a similar slip occuring again

- 5 0“

An alternative model for the occurrence of mutation has been, put forward by Vogel (1970;

1975). Under this model the mutation is introduced in the non-expressible form in the gamete cells of

one of the parents. The probability of transmission of such mutations is governed by the usual demographic processes. The form of equation (2.45) under this model will be E ( V ) !2N1 'V f 7 '1 ] Pll+K1;_lZg(i3P,l j=2 J 2Nt '1yPu + Kt '1Zg(j)Pjl j “ 1 (2.50) Expanding over £ generations, and after rearranging we get at equilibrium:

E(Ks) = 2NyP11 + KZg(j)P

j = 1 J

(2.51) The model derived here, however, assumes that the mutations are introduced during the pre-pubertal period. Adjustments to the transmission probability Pj^ (associated with fresh mutations) will have

to be made if the mutation is introduced in the gametes of the parents during the reproductive period.

Although the second model is not entirely acceptable (Vogel, 1975), the above equations have implications when the model is extended to expanding or contracting populations. While under the first model the mutation rate is measured in terms of the

size of the tth generation population size, under

the second model the size of the previous generation

is taken into consideration. A comparison of equations

(2.45) and (2,51) reveals that, under the first

model, the adjustment for the size of the previous

generation is not admissible.

Rothman and Adams (1978) have given the

equation which takes into consideration the growth

rate per generation in the estimation of K , Accordingly,

2Nt ‘1y + H Zg(j)Pn j=l J '

(2.52)

which is an extension of the approach taken by Lea

and Coulson (1949). However, this equation is not

extendable to any of the two models of mutation

mentioned earlier.

Neel and Rothman (1978) rewrite the expression

(2.49) as

2Ny + K Zg (j)?.-, j > 1 -

K(l) (1-Pn )

(2.53)

which expresses the balance between the number of

singletons lost and gained per generation. This

-52 -

The elements of transition probability matrix

P are calculated as P

ji

m i n (i , j) £ h=l Ch)(i-h^1- T ^ )j'hbhci'h (2.54)

where b and c are the parameters of geometric

d i s t r i b u t i o n .

The population values of g(i), the expected

relative frequencies of the alleles, are obtained

^g (j)pjl = g (i)

j =1 _(2.55)

for i>2. The relative frequency of g(l), however,

is given as

g(l) - Zg(j)Pj]L + 2Ny/K

(2.56)

The estimation of the relative proportions g(j),

however, needs a well documented demographic data on

the population as also extensive computations. In the

absence of such data, the rough estimates of g(j)

can be obtained from the observed distribution of rare

alleles by taking the number of copies over a set of

protein loci for sufficient sample sizes.

-5 3 -

(1978) give the e s t i m a t e d n u m b e r of alleles in the

p o p u lation, using the n u m b e r k in the sample, as

K - k 2N U - 2 g(j) j=l 2N- i ( *) 2n — s (2N) 2 n (2.57)

the binomial app r o x i m a t i o n of w h i c h is

^ 2n .j

K = k/[l- Z g(j) (1-f)

3=1

(2.58)

where f - n / N is the s a mpling fraction. For j>30,

g(j)(l-f)^ is n e g l i g i b l e and the summation m a y be

t r u n c a t e d .

The e s t i m a t i o n p r o c edure of Neel and Rothman

(1978), however, is very d i f f i c u l t to utilise since

there are too m a n y unknowns. In the a b s ence of well

d o c u m e n t e d d e m o g r a p h i c data over a n umber of generations,

c a l c u l a t i o n of the values of the elements of p r o b a b i l i t y

t r a n s i t i o n s m a t r i x is d i f f i c u l t . s i m i l a r l y the p o p u l a t i o n

values of g(j) are not k n o w n . F u r t hermore e x t r a p o l a t i o n s

of the va l u e s of k to obtain K is a v e r y u n c e r t a i n

p r o p o s i t i o n since k is a ra n d o m v a r i a b l e rather than an

-5 4 -

2.4.2.2. Nee'l and T h o m p s o n ' s method.

T h o m p s o n and Neel (1978), usi n g the tth g en e r a t i o n d i s t r i b u t i o n forms of the n um b e r of copies given by K ei d i n g and N i e l s e n (197S) have given the p a r a m e t e r s of cumul a t i v e d i s tr i b u t i o n for the t w o - p a r a m e t e r - g e o m e t r i c

form. Neel and T h o m p s o n (1978) u t i lize these r e s ults

to give an e s t i m at o r of m u t a t i o n rate as T

K (Aäj) = nN E {(I---i-)j - i >

t t t

(2.59)

w h ere r e p re s e n t s the n u m b er of alleles w i t h m ore

than or equal to j copies in the p o p u l a t i on / lo cu s , and is the tth g e n e r a t i o n m ean value of r e plicates

conditional on non-zero. However, the s u m m ation on

the right hand side is u n b o u n d e d which, for the p ri va te variants, may be t r u n c a te d to include only the time

since tribal d i f f e rentation. The a pp r o a c h is quite

useful for u t i l i z i n g i n f or m a t i o n on p r i vate po l y m o r p h i s m s .

2.4.3. E q u i l i b r i u m m e t h o d s .

The d i f fu s i o n a p p r o xi ma ti on s approach o ut li ne d above helps in a r r i v i n g at some of the results in

simple a p pr o x i m a t e forms. These a p p r o x im a ti on s are,

however, based on a n umber of a s s u m p t i o n s w hi c h m a y be c o n s i d er e d u n r e a l i s t i c for natural populations.

I n cluded in this section is the e qu i l i b r i u m a p p r o a c h of Ewens (1964) and K imura and Ohta (1969),

In document The indirect estimation of mutation rates in man (Page 59-86)