MUTATION-SELECTION BALANCE I N MULTI-LOCUS SYSTEMS. I. DUPLICATE GENE ACTION
EVELYN PRITCHETT-EWING
Department of Biology, The University of New Mexico, Albuquerque, New Mexico 87131
Manuscript received October 2,1979 Revised copy received April 22,1981
ABSTRACT
A theoretical model is presented that extends the case of selection against homozygous recessives counterbalanced by mutation to a system of n loci. This extension allows analysis of the role of gene duplication in the evolution of new function. The aspect of retention of function for sufficiently long periods of time to allow for divergence us. silencing of nonfunctional loci is discussed in relation to examples in salmonid and catastomid fishes and in the globin-like clusters.
important aspect of evolution is the ability to acquire new functions. Gene ANduplication (OHNO 1970) provides a likely mechanism for allowing diver- gence in function. Duplicated loci are free to accumulate and “experiment” with mutations, as they are sheltered from selection because of the retention of the normal functioning allele at the ancestral locus. Whether duplications arise tan- demly via unequal crossing over or arise from chromosome or genome doubling, there are two possible results: (1) through accumulation of mutations, new func- tion may evolve and be retained, or (2) deletenous nonfunctional products may result and be silenced. The outcome and rate of loss of function depends upon pop- ulation size (LI 1980; BAILEY, POULTER and STOCKWELL 1978). T h e importance of the role of duplication in evolution is illustrated quite clearly, for example, in certain fishes, e.g., the salmonids and catastomids that arose from tetraploid an- cestors (BAILEY,
POULTER
and STOCKWELL 1978), and in the evolution of the a- and P-like globin clusters (DAYHOFF 1969; MANIATIS et aZ. 1980).This problem is addressed here by extending to n loci a two-locus model that was analyzed by CHRISTIANSEN and FRYDENBERG (1977; see also KIMURA and KING 1979). By incorporating multiple-locus effects, the examples of evolution in some of the fishes and in the globin genes can now be examined rather specifically from a theoretical standpoint. The deterministic model incorporates selection against completely homozygous recessive counterbalanced by irreversible muta- tion. The extension from two to n loci allows for a more complete analysis with respect to the evolution of function. Both tandem and genomic duplications are in- cluded, as results hold for all possible recombination values. This type of selection is generally associated with duplicate genes, but can also be applied to genes that interact epistatically.
410 E. PRITCHETT-EWING THE MODEL
CHRISTIANSEN and FRYDENBERG (1977) examined a two-locus model of dupli- cate gene action where the relative fitnesses of all genotypes are 1, with the ex- ception of the completely recessive homozygote, whose fitness is l-s.
In
the ab- sence of mutation or some other pressure, the deleterious gamete will, of course, eventually be eliminated, as selection will override any initial effects of recom- bination to increase the level of this gamete within the population. We note that, depending upon the initial configuration of gametic frequencies and the values of the recombination frequency ( r ) and selection coefficient(s), the demise of the deleterious gamete can be extremely slow. If irreversible mutation is incorpor- ated such that the rates differ at each of the two loci, monomorphism results a t one locus and polymorphism at the other. This, then, gives the appearance of a single- locus situation where the frequency of the detrimental gamete is, as in the ana- logous single-locus model of selection against the recessive, counterbalanced by one-way mutation. If, on the other hand, mutation rates are equal at the two loci, polymorphism at both loci occurs when r is sufficiently larger than the mutation rate. So, depending upon the relationship of the mutation rates, what is in fact caused by two loci may appear to result from one or two loci. This can then be related to loss or retention of function.The extension of the model to three or more loci, two alleles per locus, is direct and shows that stability requirements and equilibrium behavior can be predicted from corresponding models involving fewer loci. The major focus will be on the three-locus case.
Consider three loci, two alleles per locus. The eight gametic frequencies are denoted by x1 = f(ABC),
x2
= f(ABc), x 3 = f(AbC),x4
= f(Abc), x5 = (aBC),x6 = f(aBc),
x7
= f(abC) and xs = f(abc). As before, let the relative fitnesses ofall genotypes be 1, with the exception of the completely recessive homozygote,
aabbcc, whose fitness is l-s. Then the new gametic frequencies after selection are
as given in the recursion relationships listed below, following the notation of STROBECK (1973) :
4 1
1are the various measures of disequilibrium and where E, the average fitness over all genotypes, is given by
iz
=1
-
sx,”.
Here, rl denotes the recombination fraction between the A and B loci; r2, the recombination fraction between the B and C loci; and r3, the recombination frequency between the A and C loci.
Now let mutation act to alter the frequencies into the actual gametic frequen- cies of the next generation. In particular, let the mutation rates be as follows: let p
be
the mutation rate fromA
to a, v the rate from B to b, and 7 the rate fromC
to c. After mutation, the gametic frequencies can be computed by the equations given below (see KARLIN and MCGREGOR 1971) :x:’= (1-p) ( 1 7 )
(1-7)s:
x;
= (l-p) (1-v)xi+
(1-,p) (1-v)7x;
x;
=1
- &X)Zl,i
=1,2,
.
. .
We find that p z , the frequency of A after selection and mutation, can be com- puted as follows:
p ~ = x : ’ + x ; + x ; f x y = (l-p)pA/Z. (3a)
p; = (
l-v)ps/Z
(3b)&
=(1-7)
p o / z . (3c)Similarly, we have that
and
By analogy with the two-locus case, each of these can be generalized immediately as follows:
where PA,,, P B , ~ and pc,, are the frequencies of the A , B and C alleles, respectively,
after n generations and where C $ ( i =1,2,.
. .,
n ) is the average fitness in generation i.Case with differing mutation rates: To analyze the equilibrium behavior, we merely examine equations (4a, b, c) in pairwise combinations as n 4 W . The case
$A =
cB
=Cc
= 0 is unstable. If y>
v>
T , then pa,, -+ 0 and pB,. 4 0 as n 4 W .By setting A p c = 0, i.e.,
we obtain
f,
=dq.
ThenApe = p g
-
p c = ( p c / z z ) ( I - r C ) = 0,($d, $B, $ C ) (0,0,1
($4
(5b) or, in terms of gametic frequencies,
_ _ - ( & , & f , , f & ,
&,&,
&,%I = (~,0,0,0,0,0,1 --V/T/S, V / T / S ) .Parallel results emerge for the relationships p
>
T>
v and v>
T>
p. SO. for dif- ferent mutation rates at each of the three loci, there is global convergence to the monogenic state to give the appearance of a single-locus situation.Case where two mutation rates are identical: The next case to be considered is that where two of the mutation rates are the same. For example, let p = v
>
T . Byexamining the limiiing behavior of the ratios p A , , : p c , , and P B , n : pc,,,. we have that both pa,, and pB,, converge to zero as n + W . Again, by examining Apc, we
have that the frequency of the abc gamete is given by
Pa
=~ T T .
Consequently, we obtain the following:($A, $B, $C> = (090,
-dTs)
(6a)As a result, there is again the appearance of deterioration to a monogeilic condi- tion. Further, as this result is global, the extension to the three-locus case now applies to duplicated loci.
If, on the other hand, p = v
<
T , there is the illusion of a two-locus condition. By looking at pc,, : p A , , and pc,, : p B , , we see that p C , , approaches zero. while pAapproaches p B at a constant rate. Therefore,
($AA, $B, $C> = ( $ A , c$A, 0 ) (6b)
D U P L I C A T E G E N E A C T I O N 41 3 that ($Ai,
tB,
&) =(JjA,
c1lj,,
cz $ A ) at equilibrium to give the appearance of atrigenic condition.
Generalization to n-loci: Now consider a n n-locus system, two alleles per locus, such that Ai and aj are the dominant and recessive alleles at locus j ( i 1,2,
.
.
.,n). Again assume that the relative fitness of each genotype is 1, with theexception of the completely recessive homozygote
whose viability is l-s. After selection, then, the average fitness over all genotypes is E = l-sxZ,,. Again, mutation acts to alter gametic frequencies so that pi is the mutation rate at locus
i
from Ai to ai. Following selection and mutation, the gene frequency of Ai is given byp:, =( l-pj)pAj/W for all
i,
j ;= I,2, .. .
,n.
PAj,n = ( l-,pj)npA~/IIE< for all j ,
i
1,2,. . .
,n,(8)
(9)
where p.4f.n is the frequency of Ai after n generations, and Ei is the average fitness in generation
i.
As a consequence, if pi # pm(j,7n=1,2.. .,
n ) for alli,
m, the limiting process used in the three-locus case can be employed. Thus, PA^,^*
0 asn --f
*
wherei
# m(i=1,2..
.,
n ) and ZjAm = 1-
d a ,
where .E.L~ is the smallestmutation rate. Once more there is the illusion of a single-locus disorder where the equilibrium frequency of the detrimental gamete is
z2%
= d p m / s and n is the number of loci. Similarly, if two of the mutation rates are the same, the appear- ance of monogenicity or digencity may be given, etc. (See, for example, equations 6a-c) .We obtain the generalization
-
C O N C L U S I O N S
414 E. PRITCHETT-EWING
Whether duplicated loci retain function for sufficient time to allow for diver- gence or are silenced (NEI 1975) depends in part upon population size. The two- locus version of the model presented here has been examined specifically in terms of size effects (BAILEY, POULTER and STOCKWELL 1978; KIMURA and KING 1979; and
LI
1980). BAILEY, POULTER and STOCKWELL (1978) utilized evidence from groups of fishes, the salmonids and catastomids, which are thought to have arisen from tetraploid ancestors. The evidence indicates that a large percentage of their duplicated loci have remained functional. Their computer simulation suggests that this slow rate of silencing can be explained by a form of the model presented under conditions of large population size. Specifically, for population size N>
1000, the time for 50% probability of silencing is about 15
N
++r3I4, where p isthe mutation rate. Thus, the model shows that unlinked duplicates can be retained effectively in the functional state for sufficient time to be available for evolution of new function.
LI
(1980) re-examined this question using a different and more extensive simulation approach with application to the fish data. In addition, he included the effects of tandem duplications and linkage disequilibrium. LI showed that if Np,>
0.01, the population remains polymorphic for normal and null alleles, but if N p 5
0.01, the population becomes monomorphic for the normal allele. Further, his results indicate that if more than two loci are involved, as in the model given above, the rates of gene loss increase, particularly in large populations, as shelter- ing becomes more effective. He found that genes can persist if the time for dip- loidization is long, the mutation rate is low, the effective size is large and/or divergence in regulation or function results.
The aspects of the model pertaining to multiple tandem duplications may be exemplified in the evolution of the globin-like genes (see MANIATIS et al. 1980;
DAYHOFF
1969; NEI 1975). It is believed that the a- and @-like genes evolved through duplication, separating about 500 million years ago. Recently, they have been mapped to two different chromosomes in man. Both the a- and @-like genes then underwent a series of tandem duplications that involved amino acid sequence divergence, as well as apparent regulatory evolution that involved switches in gene expression during development. In humans, there are embryonic-fetal-adult switches in the @-like cluster and fetal-adult switches in the a-like cluster.If
these developmental switches result from alterations of the flanking sequences, the above model is directly applicable; otherwise, as pointed out by LI, it is only a n approximation. Within both clusters, there are silenced loci. In the @-like clus- ter, for example, five duplicates are active in man: E in embryonic development,
Gy and A y in fetal development, and 6 and ;P in adults. Two pseudogenes, $PI
and $@2, have been found. They show high sequence homology with the normal
adult @-gene, yet do not encode a functional polypeptide, having been silenced as a result of deletions and insertions that alter the reading frame. Thus, this is an excellent example of duplication followed by divergence (OHNO 1970). This sort of structure has been found in all mammalian globin clusters examined (e.g.,
DUPLICATE GENE ACTION 41
5
I wish to thank J. FELSENSTEIN, H. HARPENDING and M. NEI for their valuable comments. This research was supported in part by Public Health Service grant GM-07661.
LITERATURE CITED
BAILEY, G. S., R. T. M. POULTER and P. A. STOCKWELL, 1978 Gene duplication in tetraploid fish: model f o r gene silencing at unlinked duplicated loci. Proc. Natl. Acad Sci. US. 11:
5575-5579.
CHRISTIANSEN, F. B. and 0. FRYDENBERG, 1977 Selection-mutation balance for two nonallelic recessives producing a n inferior double homozygote. Am. J. Hum. Genet. 29: 195-207. DAYHOFF, M. O., (ed.), 1969 Atlas of Protein Sequence and Structure. Natl. Biomed. Res.
Found., Silver Springs, MD.
HARDISON, R. C., E. T. BUTLER, E. LACY, T. MANIATIS, N. ROSENTHAL and A. EFSTRATIADIS, 1979 The structure and transcription of four linked rabbit p-like globin genes. Cell 18:
1285-1297.
KARLIN, S. and J. MCGREGOR, 1971 On mutation-selection balance for two-locus haploid and diploid populations. Theoret. Pop. Biol. 2: 60-70.
KIMURA, M. and J. L. KING, 1979 Fixation of a deleterious allele a t one of two “duplicate” loci by mutation pressure and random drift. Proc. Natl. Acad. Sci. U.S. 76: 2858-2861
LI, W.-H., 1980 Rate of gene silencing at duplicate loci: a theoretical study and interpretation of data from tetraploid fishes. Genetics 95: 237-258.
MANIATIS, T., E. F. FRITSCH, J. LAUER and R. M. LAWN, 1980 The molecular genetics of human hemoglobins. Ann. Rev. Genet. 14: 145-178
NEI, M. 1975 Molecular Population Genetics and Euolution. North-Holland Pub. Co., Amster- dam-Oxford.
OHNO, S., 1970 Euolution by Gene Duplication. Springer, Berlin.
STROBECK, C., 1973 The three locus model with multiplicative fitness values. Genet. Res., Camb.
Corresponding editor: M. NEI