Copyright 0 1986 by the Genetics Society of America
MULTILOCUS POPULATION GENETICS WITH WEAK
EPISTASIS. 11. EQUILIBRIUM PROPERTIES
OF
MULTILOCUS MODELS: WHAT
IS
T H E U N I TOF
SELECTION?ALAN HASTINGS
Department of Mathematics and Division of Environmental Studies, University of Calqornia, Davis, Calijornia 95616
Manuscript received December 2, 1984 Revised copy accepted September 21, 1985
ABSTRACT
Using perturbation techniques, I study the equilibrium of deterministic dis- crete time multilocus models with weak epistasis. T h e most important results are on the relationship between epistasis and disequilibrium. Disequilibriuni involv- ing a particular set of loci reflects only epistasis simultaneously involving those loci. Moreover, all the disequilibria of all orders vary approximately as the inverse of the probability of at least o n e recombination event among the loci involved. Finally, higher order disequilibria among loci will be lower than lower order ones, even if the level of epistasis is the same a t all orders. In this sense, the unit of selection is small. However, given the larger number of higher order disequilibria, these higher order disequilibria may play an important role in the computation of gametic frequencies from allelic frequencies in models with a large number of loci. Finally, I show that epistasis between blocks of loci will be averages of epistatic effects, not additions of epistatic effects. Thus, failure to find significant epistasis on a chromosomal basis does not rule o u t the importance of epistatic effects.
HE study of two-locus models in population genetics was spurred by the
T
fact that two loci are the minimum number for which the effects oflinkage, recombination and epistasis appear. A large number of results for two-
locus models have been obtained (reviewed in KARLIN 1975; EWENS 1979). A
natural question is the extent to which results for multiple loci reflect two-
locus results (e.g., KARLIN and LIBERMAN 1982). One way to phrase this is to
ask the question (see LEWONTIN 1974), what is the unit of selection? I will give
answers to this question using deterministic, discrete time models for selection with nonoverlapping generations.
A natural way to start the study of multilocus models is to assume weak
additive epistasis. Experimental results reviewed in SIMMONS and CROW (1977)
indicate that epistasis is nonzero, but weak. This would suggest the importance
of extending the results in HASTINGS (1985-hereafter referred to as I) on
the two-locus model with weak epistasis to the multilocus case.
I shall begin with a review of previous results for models with more than
158 A . HASTINGS
two loci. Among the first studies of more than two loci were simulations
performed by LEWONTIN (1964a,b) and FRANKLIN and LEWONTIN (1970). In
the former case, a symmetric model was studied, and linkage disequilibrium was shown to be important to a greater extent than would have been predicted
by two-locus models. In the simulations of FRANKLIN and LEWONTIN (1970) a
symmetric, overdominant model with a large number of loci was studied. Here, they found very high disequilibrium values of all even orders, and for larger recombination rates than would have been predicted from two-locus theory.
For weaker and probably more realistic levels of selection, CLEGG (1978)
showed that this “crystallization” effect was not important. TURELLI and GINZ-
BURG (1983) simulated a large number of “random” fitness matrices and found that, in general, the intuition from one-locus two-allele models that heterosis is required for a stable equilibrium held for multilocus models.
Another approach to the study of multilocus models has been to determine conditions under which an equilibrium with zero disequilibrium of any order is stable. This equilibrium has been called the product or Hardy-Weinberg equilibrium. For a wide variety of models, conditions on local stability of the
product equilibrium are obtained in KARLIN and LIBERMAN (1979a,b), KARLIN
and AVNI (1981) and KARLIN and LIBERMAN (1982). An important result they
obtain is that, generally, for nonepistatic or symmetric fitnesses, if the product equilibrium is stable for a given recombination pattern, it is also stable for any
pattern with “more” recombination. In KARLIN and LIBERMAN (1978; 1979a,b),
it is shown that the product equilibrium is locally stable for the multilocus multiallele additive model as long as there is some recombination between all
the loci. KARLIN and LIBERMAN (1982) determine when the stability of the
product equilibrium for a multilocus multiplicative model is controlled by two- locus conditions.
Another analytical approach to multilocus models has been to consider the
case of small recombination rate (see KARLIN and MCGREGOR 1972; KARLIN
1978). This is done by examining the case with no recombination and then considering the implications for the case with small recombination. This ap- proach treats a case different from the one considered here. T h e results ob-
tained by the method of small recombination give insight into solutions with
large levels of disequilibrium; the results in the current paper complement the small recombination results. In fact, the justification for the method used here
can be expressed in terms of the results in KARLIN and MCGREGOR (1972).
Finally, there have been several papers that have examined three-locus models in detail. Some of these are modifier models, with only two loci undergoing direct viability selection. T h e symmetric three-locus model was
studied by FELDMAN, FRANKLIN and THOMSON (1974), who explicitly found a
large number of equilibrium solutions for weak recombination. A three-locus
multiplicative model was studied by STROBECK (1976).
There have also been several studies that have considered properties of
equilibria without considering the role of stability. EWENS and THOMSON (1977)
derived a number of properties of marginal subsystems, which are used below.
HASTINGS ( 1984) showed that conditions for maintaining disequilibrium appear
WEAK EPISTASIS WITH MANY LOCI 159
A different approach has been taken by NAGYLAKI (1976, 1977) and then
by SHASHAHANI (1979) and AKIN (1979). NAGYLAKI showed that, with weak
selection, linkage disequilibrium disappears fairly rapidly in a two-locus model
and is at a low level thereafter. SHASHAHANI and AKIN use sophisticated math-
ematical tools to examine the continuous time models for multilocus systems. In some ways, the approach of the current paper is similar in spirit to these works, by examining small deviations away from the globally stable additive model. One important difference is that, with the continuous time model,
disequilibrium can be generated without epistasis by the effects of overlapping
generations. Also, I concentrate on an explicit determination of the equilib-
rium rather than the dynamics. A primary goal of the current paper is to contrast results for multiple locus models with weak epistasis with results for
two-locus models obtained in (I). There are two ways to envision potential
complicating effects of multiple loci on disequilibria. First, there could be an
imbedding effect that would cause an increase in the level of pairwise dlsequi-
libria in a multilocus model relative to a two-locus model. Second, the higher
order disequilibria could be important if their magnitude is roughly that of
the pairwise disequilibria. T h e primary goal of this paper is to determine
whether either or both of these effects is important in the context of weak
epistasis.
OVERVIEW
Additive models for multiple loci, and for nonepistatic models in general,
are a natural class of multilocus models that have been extensively studied (see
KARLIN and LIBERMAN 1979a,b; 1982). Nonepistatic models are models for which the fitness of an individual depends on independent effects at different
loci. In the additive model, the fitness of an individual is determined by sum-
ming effects at single loci; in the multiplicative model it is determined by
multiplying effects at single loci. More general nonepistatic models are also
possible (KARLIN and LIBERMAN 1979a).
For all nonepistatic models there is a product equilibrium, an equilibrium
where there is no correlation between the alleles at different loci (KARLIN and
LIBERMAN 1979a). At this equilibrium, gametes have a frequency equal to the product of the frequencies that alleles at single loci would have, as determined
by the single locus effects entering into the fitnesses. For the additive model,
KARLIN and LIBERMAN (1978; 1979a,b) have shown that this product equilib-
rium is locally stable, independent of the number of loci or alleles involved,
for positive recombination rates. KARLIN and LIBERMAN have also shown that
this equilibrium is stable for the multiplicative model, if recombination rates are large enough.
It is unlikely that fitnesses in natural systems are ever truly additive or truly multiplicative across loci. This is confirmed by the experimental work with
Drosophila reviewed by SIMMONS and CROW (1977). Hence, I shall consider
models where the fitnesses deviate from those of a nonepistatic model, usually
the additive model, by terms for which the size is measured by a parameter 6.
160 A. HASTINGS
attention to those cases for which the product equilibrium is locally stable-if
epistasis is weak, 6 is small-there remains a stable equilibrium for the model,
close to the product equilibrium (see KARLIN and MACGRECOR 1972). The
new equilibrium is a function of 6.
T h e goal of this paper is to characterize this perturbed equilibrium. This is done by calculating the effect of the nonepistatic terms on mean fitness, allele frequencies and disequilibrium, up to first order in 6. T h e main results are as
follows, with the calculations and a number of minor results collected in later
sections.
Result 1: T h e change in the mean fitness due to weak epistasis does not
depend on the recombination pattern, to a first approximation. As in the
additive model, the mean fitness is independent of recombination, to a first approximation. This result does not depend on the additive model, but re- quires that there should be a stable equilibrium with no disequilibrium, which is independent of the recombination rates.
T h e remainder of the analysis depends on the study of marginal systems
(EWENS and THOMSON 1977), what would be observed at a subset of loci in a
multiple locus system. For this purpose the following result is necessary and also is of independent interest. This result would also apply to small deviations from t h e multiplicative model, when the recombination rates are large enough so the product equilibrium is stable.
Result 2: For a one-locus marginal system, fitnesses depend only on epistatic
parameters actually involving the particular locus being considered.
As for the one-locus marginal fitnesses, it is straightforward to show that, if
there is no simultaneous epistasis (at the order equal to the number of loci or
greater) involving some set of m loci, then for that set of m loci there is no
epistasis (at equilibrium) for the marginal fitnesses involving that set of m loci. Thus, if in a four-locus model all the epistasis can be accounted for by two- factor interactions, then there is no deviation from additivity for all three-locus marginal systems, to lowest order in weakly epistatic systems.
Deviations away from additive fitness at a particular group of loci in a marginal system are weighted averages of the deviations away from additive fitnesses at the loci involved. In particular, the epistatic effects are not added. Making use of result 2, information about allele frequencies and disequilib- rium can be deduced.
Result 3: To lowest order, weak epistasis affects allele frequencies only
through epistasis directly involving the particular locus in question. Moreover,
only the epistasis is involved, and the recombination pattern does not enter.
T h e next results depend on the original nonepistatic model being additive. Result 4: Pairwise disequilibria reflect additive epistasis involving only the loci being considered. T h e disequilibria vary as the reciprocal of the recom- bination rate between the loci.
Result 5: Three-way disequilibria directly reflect the presence of additive
epistasis at the loci involved. Three-way disequilibria probably are similar than pairwise disequilibria because, in most reasonable models, the strength of ep-
WEAK EPISTASIS WITH MANY LOCI 161 frequencies at the third locus. Finally, the dependence of disequilibrium on recombination is simple: disequilibrium is roughly proportional to one divided by the probability that there is some recombination among the loci involved.
Proceeding to more than three loci in this manner leads to more and more algebra. Instead, the following conjecture is offered, with some motivation given below.
Conjecture: For weak selection, result 5 is approximately true for higher
order disequilibria, with the appropriate changes in the wording.
MULTILOCUS MODELS
Models describing the dynamics of multilocus genetic systems have been written down in a variety of places. Some care is needed in the choice of notation. I shall use a notation similar to that of KARLIN and LIBERMAN (1979a,b). Let the vector
i =
(il,
i 2 ,.
.
a , in) (1)represent an n-locus gamete, where
i,
represents the allele at locus a. Definethe fitness (viability) of an individual with gametes i a n d j as wq. I shall assume
that there is no position o r cis-trans effect in the fitnesses. T h e difficulty of
induced position effects studied by TURELLI (1982) need not be considered, as
I shall include an arbitrarily large number of loci. T h e definition of “enough”
loci will be one result of the analysis. Let x ( j ) represent the frequency of the
gametej. T h e marginal gametic fitness is defined as
and the mean fitness of the population as
w
= X(i)Wi.I
( 3 )
Then the dynamics of this system can be written as
~ ( i ) ’ =
[C
w j k ~ ( j ) ~ ( k ) R ( i , j , k ) ] / W (4)j , k
where R(i, j ,
K )
is the probability that an individual with gametes j andk
produces the gamete
i
as a result of meiosis. T h e inclusion of interferencemakes the function R particularly complicated.
One additional notation that will be needed below is a way to describe
recombination. T h e notation is easiest to describe by example
(cf:
BENNETT1954). T h e quantity r I 2 p 4 will be the probability of a recombination event that
separates loci 1 and
2
from 3 and 4, while r13/2 is the probability of an eventthat separates 1 and 3 from 2. Thus, it is possible to describe recombination
among fewer loci than the number included in the model and to describe the probablity of no recombination. For example,
162 A. HASTINGS
Marginal systems: T h e definition of marginal genetic systems (EWENS and
THOMSON 1977) will be important in the discussion of the equilibrium prop- erties of weakly epistatic systems. Think of the fitnesses actually depending on n loci when only m loci are, in fact, observed. Define a marginal m-locus subsystem of an n-locus system as the system obtained by averaging all the fitnesses over the missing loci, weighted by the appropriate frequencies of the
gametes. Thus, following EWENS and THOMSON, denote the frequencies of the
m-locus gamete
p
by z ( p ) . By properly renumbering the loci, the m loci beingconsidered can be thought of as being the ones numbered one through m in
the full n-locus system. (Note that this means that the numerical order of the loci need not correspond to the physical order on the chromosome). Thus,
where the set Sp is defined as
s,
=(i
I
i,
=p,,
a = 1, m),(7)
the set of n locus gametes that have the same alleles at the first m loci as the
m-locus gamete
p .
T h e induced marginal fitness of the genotype formed by the m-locus gametes
p
and q will be denoted by Wpg and is obtained by averaging over all genotypic combinations making up these two m-locus gametes, weighted appropriately by fitnesses and frequencies. This yieldsThis definition is useful below because of the following fact found by EWENS and THOMSON. T h e dynamic equations for the m-locus subsystem are the same as those for a genuine m-locus system, with the marginal fitnesses taking the place of the actual fitnesses. Note, however, that the marginal fitnesses are not constants, but depend on allele frequencies and disequilibria at other loci. Thus, the marginal systems are particularly important for deducing equilibrium behavior.
A slight change in notation will prove convenient below. Since the gametes
themselves denote the number of loci being considered, the ‘‘x” designation
for marginal systems will be used at times instead of the “z” designation. No
confusion results, since the number of loci being specified is given explicitly.
P R O P E R T I E S OF T H E A D D I T I V E M O D E L
As the multilocus additive model is an approximation to the weakly epistatic
case, it is necessary to review some properties of the additive model here. Both
the definitions of additive nonepistasis and the equilibrium properties of an additive model play an important role in the analysis of weakly epistatic sys- tems.
Epistasis: T h e study of epistasis began with FISHER’S (1 9 18) classic paper and was continued by many later workers (see KEMPTHORNE 1957). In an
WEAK EPISTASIS WITH MANY LOCI 163
wij =
c
Wa(2,,ja)
a
where the sum is over all the loci in the model (see KARLIN and LIBERMAN
1979a,b). In what follows, the one-locus fitnesses wa are assumed to satisfy the
conditions for all the alleles at locus a in the model to be present at equilibrium
(see KINCMAN 1961).
Two-factor epistasis can be included by adding to the definition of fitness
terms of the form waB(i,p,j,B) that depend on the alleles at the two loci a and
8,
where the sum will now be over all pairs of loci in the model. This procedurecan be expanded to include epistasis with any given number of factors.
Equilibrium properties of the additive model: When fitnesses are additive,
satisfying (9), a number of important properties emerge (KARLIN and LIBER-
MAN 1979a,b), as discussed above. From the fitnesses at each particular locus
it is possible to define a one-locus equilibrium, where the frequency of the
allele
i,
at locus a is x(Q. T h e equilibrium for the full model given byn
x ( i ) =
n
x(i,) a=iwill be called the product equilibrium.
ANALYSIS
T h e first step in the analysis will be to study the definition of marginal fitnesses under weak epistasis in a multilocus system. T h e equilibrium of the
dynamic equations (3) will be studied, using regular perturbation techniques.
As mentioned above, the equilibrium is known in the additive case, and each fitness in the case studied here is assumed to be “close” to the additive model. Hence, write all the fitnesses as
wt, = wy.0
+
6Wt,;I (1 1)where 6 is a small parameter and 6wy;, gives the deviation away from the
additive fitness, wq;o. Thus, wy;o satisfies (9). Then, write all the allelic frequen-
cies, disequilibria of all orders, gametic frequencies and various marginal fit-
nesses in a similar fashion. As the notation is consistent, I shall just give several
examples:
x(2) = x(i);o
+
6X(i);l+
O(62)D
= D;o+
6D.1+
0(6*).(12)
(13)
and for a typical disequilibrium coefficient
D
of unspecified order,T h e terms with a second subscript of zero (after the “;”) are the equilibrium
values for the additive model given in (10). Thus, all the zero order terms are known. Other quantities of interest will be expressed in a similar manner. It is clear, however, that
D;o = 0 (14)
164 A. HASTINGS
T h e analysis described here will be used to find the terms with the second
subscript of “1 ” in the expansions for the disequilibrium (1 3) and for the allele
frequencies. This will give the dependence of these terms on epistasis. As an
example, the marginal fitness of allele i depends on the small parameter 6 as:
Definition of disequilibrium: Before proceeding with the analysis, it is first necessary to write down formulas for linkage disequilibria of higher order, as
has been done in BENNETT (1954) and SLATKIN (1972). As noted in HILL
( 1 974), in general these two definitions are, in fact, different, although they
are identical to the order (in 6) considered here. BENNETT defines the higher
order disequilibria in such a way so that
D’ = rD (16)
where T is the probability that there is no recombination among the loci in-
volved in the disequilibrium specified by D. SLATKIN defines the higher order
disequilibrium as statistical correlations.
T h e definition of pairwise and third-order disequilibria are the same in both
SLATKIN’S and BENNETT’S form. Pairwise disequilibria between the alleles i l
and i p are defined as
D(i1ip) = x(ili2)
-
x(il)x(ig), (17)which reduces to the familiar
in the two-allele case. Third-order disequilibria are defined as
in the two-allele case. A definition like (17) is possible here as well, with
extensions to an arbitrary number of alleles and loci a straightforward exercise.
Change in mean fitness: I shall now analyze the model equations. The first step will be to compute the change in the mean fitness caused by the addition
of t h e epistatic terms. A small amount of algebra shows
(4.
I):This is result 1.
WEAK EPISTASIS WITH MANY LOCI 165
Using the fact that the zero order terms arise from the additive model, one can show that the sum
depends only on the m-locus gamete $I, not on which n-locus gamete i deter-
mines
p .
This calculation makes use of the fact that the fitnesses are nonepi-static and that the zero order frequencies are a product equilibrium.
Making use of (6) and the fact that (22) is independent of
i,
one sees byinterchanging the order of summation that the first two terms of (21) cancel the last two terms of (21) to yield
This formula leads to result 2, after using (9) and (10) and the formula for
the equilibrium of a one-locus model and performing a straighforward calcu- lation to show that the first-order change in the marginal fitnesses (23) for a one-locus marginal system depends only on epistatic parameters actually in- volving the particular locus being considered.
Numbering of gametes: T o facilitate the derivation of the results for the equilibrium allele frequencies and two- and three-locus disequilibria, it is con- venient to restrict attention to two-allele models. Analogous results would hold for more alleles. Moreover, a traditional labeling of loci by letters of the
alphabet, and numbering of gametes, is useful. At one locus, label the allele A
as 0, and the allele a as 1. At two loci, let the gametes AB, Ab, aB, ab be
numbered 0, 1, 2 , 3, respectively. Also, in the three-locus model, let the
gametes ABC, ABc, AbC, Abc, aBC, aBc, abC, abc, be numbered as 0, 1, 2, 3,
4, 5, 6, 7.
Computation of allele frequencies: To find the first-order change in the
allele frequencies, use equations (23) and a formula analogous to (30) in (I).
With the more general perturbation to additive fitnesses allowed here, formula (30) from (I) is not correct. Taking a derivative of the formula for the allele
166 A. HASTINGS
(24)
Z(0);I = (( 1
-
2Z(O),O)~Ol;O+
Z ( 0 ) ; O ~ O O ; I-
Z(l);OWll;1)/(2WOl;O-
7&l,O-
W O O ; O ) , where the marginal system that the z’s refer to is a one-locus system, and the standard notation for numbering alleles and gametes introduced above is used. U se2WOl;O
-
W 1 l ; O-
W O 0 ; O = 2w ( 1 , 0)-
Wk( 1, 1)-
W k ( 0 , O), (25)which follows from the fact that the unperturbed fitnesses are for an additive
model. Thus, the single locus being referred to is locus Iz. Combining (23),
(24) and (25) yields
(26)
Z(O);1 = ((1
-
2Z(O);O)WOI;O+
Z(0);OWOO;l-
Z(1):OWll;l)/(2Wk(l, 0)
-
W k ( l , 1 )-
Wk(O, 0)).Making use of result 2, (26) can be summarized as result 3.
Calculation of pairwise disequilibria: I shall now describe the calculation of the disequilibria along the lines of the calculation in (I). to simplify the notation in the presentation, I shall present the derivations as though each system of m-loci being considered were truly an m-locus system. If the system has more loci than the order of the disequilibrium, use a marginal system with only the loci involved in the disequilibrium coefficient being calculated.
T h e first step will be to compute the pairwise disequilibria. Take the dy-
namic equations for a two-locus system and set x(i)’ = x(i). Substitute for each
term its appropriate Taylor series (in 6, about 6 = 0, the additive case). From
this, the equations determining the order 6 terms are easily seen to be
(6
equation (19) in (I))
x(i);OW;l = x(i);Owi;I 2 ( 1
-
Tab)wOS;$AB. (27) (The sign in (27) is determined by the usual convention for two-locus models.)Take each equation in (27) and divide by x(&, and multiply each equation by
the sign in front of the disequilibrium term. Summing the resulting four equa- tions yields
(28)
3
WOS;ODAB;I(1 - Tab) (x(i);O)-’ = wO;l
-
w1;l-
W P ; l+
W 3 ; l . i=OIt would be useful to express this answer in the form of a correlation coeff-
cient, p A B , where
P A B = DAB[x(A)x(B)( 1
-
x(A))( 1-
x(B))]-”, (29)where the role played by the allele frequencies is made clear. Make use of (10)
to write
PAB;O = [x(A)x(B)(1
-
x(A))(1 - x(~))l’/’[wo,i-
wi;I - w z ; l+
q I ] /(30)
Although the definition (15) of the quantitites wG1 would appear to suggest
that the right-hand side of (30) itself depends on the disequilibria, one can
WEAK EPISTASIS WITH MANY LOCI 167
make use of (9) to show that this is not the case. This is where the fact that the original nonepistatic model is additive is used.
Calculation of higher order disequilibria: T h e next step in the analysis will be to calculate the three-way disequilibria in a manner analogous to that used
to calculate the pairwise disequilibria. Rather than give the details for these
extensive calculations, which were performed with the aid of a computer al-
gebra program, I shall summarize the steps. First, form the equations analo-
gous to ( 2 5 ) that arise in a three-locus model. Second, use the two-locus case
as a guide to the appropriate sign by which to multiply each equation. Divide
each equation by x(i),o, multiply by the appropriate sign and then sum all eight
equations. Replacing any gametic frequencies by allele frequencies and dis-
equilibria and simplifying, one obtains
(31) DAW1 ([x(c);Owl7;0
+
x(C)p;OwOS;O]ra/b+
[X(b);Ow27;0+
X(B);OwO5;O]ra/c7 7
+
[ X ( ~ ) ; O ~ G ; O+
~ ( A ) ; o w o ~ ; o ] ~ b / c J (E
(x(i);0)-’]/2 =-E
(-1)At)~;i,t=O t=O
wheref(i) is the number of “lower case” alleles in the gamete i . Note that only
the differences between the detriments in fitness caused by the homozygotes
at each of the three loci arising on the left-hand side of (31) make the formula
differ from
7 7
the three-locus analogue of (28). Thus, one would conjecture that the following
holds for higher order disequilibria, where the scheme for numbering gametes is extended in the natural way:
W . . , ~ D , ~ ( ~
-
r )I:
(x(i);o)-l=E
( - ~ ) % u ~ ; ~+
ERROR, (33)I I
where ERROR is a term that depends only on differences among the fitnesses of
different genotypes that are homozygous at n
-
2 loci (where n is the orderof the disequilibrium) and, therefore, will usually be fairly small. (Note, how-
ever, that ERROR may increase as the number of loci increases.) Here w..;o is
the fitness of an individual heterozygous at all the loci in the system, D is the
disequilibrium coefficient for all the loci in the system, r is the probability of
no recombination among the loci in the system, and the sums extend over all
the gametes in the system. T h e information in this section and the preceding
ones is summarized in results 3 through 5 .
A crude approximation: Is there an approximation that sheds some light
on formulas (31)-(33)? I provide details in the case of three loci, but the steps
carry over to an arbitrary number of loci, mutatis mutandis. BENNETT shows
that
d ( A B C ) = r,b$(ABC)
+
r,/b,x(A)x(BC)+
rb/,,x(B)x(AC)+
r,/,#(C)x(AB). (34)T h e definition of the recombination parameters [see equation ( 5 ) ] shows this
168 A. HASTINGS
x’(ABC) = x(ABC)
+
ra/bcx(A)x(BC)+
Y ~ / ~ = x ( B ) x ( A C )+
rc/abx(C)x(AB)-
x(ABC) (Ta/bc+
r b / u c+
rc/ab)Substituting for the second x(ABC) in equation (35) from (19) yields
(35)
Now substitute from ( 1 8) for all the two-locus gametic frequencies [x(AB) etc.]
in (36) to obtain, after rearrangement
(37)
x’(ABC) = x(ABC)
-
(1-
T ~ ~ ~ ) D A B c-
( 1-
rb,)x(A)D~c-
( 1-
raC)x(B)DAc-
(1-
T ~ ~ ) X ( C ) D A BSimilar equations hold for the other gametic frequencies, with changes in the
signs on the right-hand side of ( 3 7 ) . Note that an alternative method of deriv-
ing (37) would be to use the compact notation employed by KARLIN and
LIBERMAN (1979a,b). These methods would be needed to study cases of more
loci or alleles.
It is the analogy between this formula and the corresponding formula for
the two-locus model that leads to the approximations (32) and (33). Adding
selection to (37) is not simple, because the disequilibria do not involve just
specified pairs of gametes as in the two-locus case. If selection is sufficiently weak, however, the errors resulting from simply multiplying the disequilibria
on the right-hand side of (37) by a particular fitness will be small, however.
This leads to the conjecture presented earlier.
DISCUSSION
One of the primary questions that arises in the study of multilocus systems is the question of how many loci must be included to obtain valid description of the natural system. This question can be rephrased as asking on how many loci do linkage disequilibria of various orders depend? T h e results of this paper answer this question in the case of weak epistasis.
First, disequilibria of any order depend only on epistatic interactions of that
order. Second, disequilibria among any group of loci depend only on epistatic interactions involving that group of loci. Thus, there is absolutely no imbed- ding effect. There is a direct correlation between disequilibria and epistasis. Also, disequilibria scale as one divided by the recombination fraction, which is a faster decline with recombination than in the case of the additive model (for
two loci) with drift (FELSENSTEIN 1974). Finally, higher order disequilibria are
WEAK EPISTASIS WITH MANY LOCI 169
However, the higher order disequilibria are not a lower order (in 6) than
the lower order disequilibria. Also, in a system of n loci there are many more
disequilibria of order roughly n / 2 than pairwise disequilibria. Thus, these
higher order disequilibria may play an important role in the computation of gametic frequencies from allelic frequencies in models with a large number of loci.
For a definition of what 1 mean by weak epistasis here, it is necessary to
consider the results of KARLIN and LIBERMAN (1979a,b; 1982) and KARLIN
and AVNI (1981). As in (I), I shall argue that epistasis must be weak relative
to both recombination and selection for the results here to apply. 1 also claim
that one can obtain a rough estimate of the minimum level of recombination
for which the results here apply by taking the level of recombination necessary for the product equilibrium to be stable in nonadditive models. This level of recombination is truly small. Estimating this way by plugging into formulas in the papers by KARLIN and co-workers, it can be seen that, even if the recom- bination between adjacent loci is a small fraction of the per locus selection strength, then the results of this paper would apply. T h e numerical calculations in (I) for two-locus models suggest that an estimate obtained in this way is reasonable. Moreover, the numerical results there suggest that estimates of
disequilibrium are quite accurate even for 6 as large or larger than 0.2. T h e
error is typically about 10% for 6 about 0 . 2 .
What is the evidence on the level of epistasis in natural populations? Unfor- tunately, this is a very difficult question to answer. There are grave statistical
difficulties in estimating higher order epistasis (KEMPTHORNE 1957). Typical
analyses (e.g., MUKAI et al. 1974) just try to estimate the presence of epistasis,
without trying to ascertain more details.
However, the formula ( 2 2 ) and result 2 do suggest an interesting interpre-
tation of one attempt to detect epistasis by TEMIN et al. (1969). They do not
detect significant epistasis between chromosomes, but only within chromo-
somes. Since ( 2 2 ) suggests that deviations from additivity are averaged in mar-
ginal systems (not added), by looking at larger blocks of genes at one time, one may be covering up epistasis, rather than making it easier to detect. In
fact, this may be just what TEMIN et al. (1969) found. Thus, the failure to
detect epistasis by looking for it between chromosomes does not imply that
epistasis is unimportant.
What is the evidence on higher order disequilibria? Estimating higher order disequilibria in nature also has grave statistical difficulties for essentially the same underlying reason that it is hard to measure higher order epistasis (see
BROWN 1975). Among the studies that have searched for disequilibrium in
natural populations, that of LANGLEY, TOBARI and KOJIMA (1974) is among
the most extensive for outcrossers. They found no evidence for higher order correlations among loci.
170 A. HASTINGS LITERATURE CITED
AKIN, E., 1979
BENNETT, J. H . , 1954
BROWN, A. H. D., 1975
CLEGG, M. T., 1978
EWENS, W., 1979 Mathematical Population Genetics. Springer-Verlag, New York.
EWENS, W. and G. THOMSON, 1977 Properties of equilibria in multi-locus genetic systems. Ge- FELDMAN, M., I. FRANKLIN and G. THOMSON, 1974 Selection in complex genetic systems I. T h e FELSENSTEIN, J., 1974 Uncorrelated genetic drift of gene frequencies and linkage disequilibrium FISHER, R. A., 1918 T h e correlation between relatives on the supposition of mendelian inherit-
The Geometry of Population Genetics, Springer-Verlag, New York.
O n the theory of random mating. Ann. Eugen. (Lond.) 18: 311-317.
Sample sizes required to detect linkage disequilibrium between two or
Dynamics of correlated genetic systems. 11. Simulation studies of chromo- three loci. Theor. Pop. Biol. 8: 184-201.
somal segments under selection. Theor. Pop. Biol. 13: 1-23.
\
netics 87: 807-819.
symmetric equilibria of the three-locus symmetric viability model. Genetics 7 6 135-162.
in some models of linked overdominant polymorphisms. Genet. Res. 2 4 281-294.
ance. Trans. R. Soc. Edinb. 52: 399-433.
FRANKLIN, I. and R. LEWONTIN, 1970
HASTINGS, A., 1984
Is the gene the unit of selection? Genetics 65: 701-734.
Linkage disequilibrium, selection and recombination at three loci. Genetics
Multilocus population genetics with weak epistasis. I. Equilbrium properties
Disequilibrium among several linked genes in finite population. I. Mean changes in disequilibrium. Theor. Pop. Bio. 5: 366-392.
General two-locus selection models: some objectives, results and interpretations. Theor. Pop. Biol. 7: 364-398.
Theoretical aspects of multi-locus selection balance I. pp. 503-587. In: Studies
in Mathematical Biology Part 11: Populations and Communities, Edited by S. A. LEVIN. Math. Assoc. Amer., Washington, D.C.
symmetric viability regime. Theor. Pop. Biol. 2 0 241-280.
Biol. 5: 201-21 1 .
106 153-164.
HASTINGS, A., 1985
HILL, W. G., 1974
KARLIN S., 1975
of two-locus two-allele models. Genetics 1 0 9 799-81 2.
KARLIN, S., 1978
KARLIN, S. and H. AVNI, 1981 Analysis of central equilibria in multilocus systems: a generalized T h e two-locus multi-allele additive viability model. J. Math.
Representative of nonepistatic selection models and analysis
Central equilibria in multilocus systems. I. Generalized
KARLIN, S. and U. LIBERMAN, 1982 T h e reduction property for central polymorphisms in KARLIN, S. and J. MCGREGOR, 1972 Application of method of small parameters to multi-niche KEMPTHORNE, O., 1957 An Introduction to Genetic Statistics. John Wiley and Sons, New York, KINGMAN, J. F. C., 1961 A mathematical problem in population genetics. Proc. Camb. Phil. Soc. LANGLEY, C. H., Y. N. TOBARI and K.-I. KOJIMA, 1974 Linkage disequilibrium in natural pop KARLIN, S. and U. LIBERMAN, 1978
KARLIN, S, and U. LIBERMAN, 1979a
of multilocus Hardy-Weinberg equilibrium configurations. J. Math. Biol. 7: 353-374.
nonepistatic regimes. Genetics 91: 777-798.
nonepistatic systems. Theor. Pop. Biol. 22: 69-95.
population genetic models. Theor. Pop. Biol. 3: 186-209.
KARLIN, S. and U. LIBERMAN, 1979b
57: 574-582.
WEAK EPISTASIS WITH MANY LOCI 171
LEWONTIN, R., 1964a T h e interaction of selection and linkage. 1. General considerations; het- T h e interaction of selection and linkage. 11. Optimum models. Genetics
The Genetic Basis of Evolutionary Change. Columbia University Press, New York.
T h e genetic variance for viability and its components in a local population of Drosophila melanogaster. Genetics 78:
erotic models. Genetics 4 9 49-67. LEWONTIN, R., 1964b
LEWONTIN, R., 1974
MUKAI, T . , R. CARDELLINO, T. K. WATANABE and J. F. CROW, 1974
5 0 757-782.
1195-1208.
NAGYLAKI, T . , 1976 The evolution of one- and two-locus systems. Genetics 83: 583-600.
NAGYLAKI, T . , 1977 The evolution of one- and two-locus systems. 11. Genetics 85: 347-354. SHASHAHANI, S., 1979 A new mathematical framework for the study of linkage and selection. SIMMONS, M. J. and J. F. CROW, 1977 Mutations affecting fitness in Drosophila populations.
Annu. Rev. Genet. 11: 49-78.
SLATKIN, M., 1972 On treating the chromosome as the unit of selection. Genetics 72: 157-168. STROBECK, C., 1976 T h e three-locus model with multiplicative fitness values: the crystallization
of the genome. In: Population Genetics and Ecology, Edited by S. KARLIN and E. NEVO. Aca- demic Press, New York.
TEMIN, R. G., H. U. MEYER, P. S. DAW” and J. F. CROW, 1969 T h e influence of epistasis on homozygous viability depression in Drosophila melanogaster. Genetics 61: 497-5 19.
TURELLI, M., 1982 Cis-trans effects induced by linkage disequilibrium. Genetics 102: 807-8 15. TURELLI, M. and L. GINZRURG, I983
Memoirs AMS 2 1 1 .
Should individual fitness increase with heterozygosity. Genetics 104 191-209.