Copyright1998 by the Genetics Society of America
Fixation Indices in Subdivided Populations
Thomas Nagylaki
Department of Ecology and Evolution, The University of Chicago, Chicago, Illinois 60637
Manuscript received April 30, 1997 Accepted for publication October 3, 1997
ABSTRACT
Without restricting the evolutionary forces that may be present, the theory of fixation indices, or F-sta-tistics, in an arbitrarily subdivided population is developed systematically in terms of allelic and genotypic frequencies. The fixation indices for each homozygous genotype are expressed in terms of the fixation indices for the heterozygous genotypes. Therefore, together with the allelic frequencies, the latter suffice to describe population structure. Possible random fluctuations in the allelic frequencies (which may be caused, e.g., by finiteness of the subpopulations) are incorporated so that the fixation indices are parameters, rather than random variables, and these parameters are expressed in terms of ratios of evolutionary expectations of heterozygosities. The interpretation of some measures of population differentiation is also discussed. In particular, FSTis an appropriate index of gene-frequency differentiation if and only if the genetic diversity is low.
W
RIGHT’s fixation indices, or F-statistics, are the pectations or probabilities.Nei’s indices, however,be-parameters most widely used to describe popula- come random variables through their dependence on tion structure.Wright (1969, pp. 294–295; 1978, pp. the allelic and genotypic frequencies in the population.
80–89; and refs. therein) defined the fixation indices Therefore, his indices are more difficult to relate to as correlations between uniting gametes. His treatment theoretical investigations of population structure (
Nagy-is restricted to neutral diallelic loci; it Nagy-is somewhat artifi- laki1989 and refs. therein;Nagylakiet al. 1993), which
cial (because numerical values are assigned to gametes) are usually formulated in terms of covariances of allelic and not entirely clear. frequencies or probabilities of identity in allelic state or
Cockerham(1969, 1973; see alsoWeirandCocker- of identity by descent.
ham1984;CockerhamandWeir1986) based his study Here, we shall combine some of the desirable
proper-of population structure on the analysis proper-of the variance ties of the treatments ofCockerham andNei. In the
and covariances of indicator variables for allelic state, next section, we shall developNei’s approach fully and
and he related his parameters to fixation indices and systematically for deterministic genotypic frequencies. measures of identity by descent. AlthoughCockerham’s Then we shall extend our analysis to randomly varying
analysis is more lucid and general thanWright’s, it is allelic frequencies. In the final section, we shall discuss
disturbing that negative variance components may oc- some of our results and the interpretation of some mea-cur if mates are less closely related than the average sures of population differentiation.
within subpopulations (i.e., ifWright’s FIS, 0). Nei (1973; 1977; 1986; 1987, pp. 159–166; see also
Nei and Chesser 1983) presented a third approach,
DETERMINISTIC G ENOTYPIC FREQUENCIES
formulated entirely in terms of the allelic and genotypic
After defining the fixation indices, we shall present frequencies in the population. He expressed the
fixa-the constraints fixa-they satisfy, express fixa-the indices for each tion indices in terms of ratios of heterozygosities. His
homozygote in terms of the indices for heterozygotes, treatment is biologically the most direct, and it clearly
derive the generalization ofWright’s hierarchical
rela-requires no restrictions on the action of the evolutionary
tionship among the indices, and evaluate the comple-forces.
ment of each index as a ratio of heterozygosities. Allelic and genotypic frequencies may fluctuate
be-The population is subdivided into an arbitrary num-cause of finite subpopulation numbers or random
varia-tion in evoluvaria-tionary forces. Even in this case,Wright’s ber of subpopulations. Let w
kdenote the proportion of
andCockerham’s measures of population structure are the population in subpopulation k, so that
still parameters because they are defined in terms of
ex-o
k wk5 1. (1)We consider a single autosomal locus with r alleles Ai.
Address for correspondence: Thomas Nagylaki, Department of Ecology
The frequencies of the allele Aiand the ordered
geno-and Evolution, The University of Chicago, 1101 East 57th Street,
Chicago, IL 60637. type AiAjin subpopulation k are pi,kand Pij,k, respectively.
Thus, Pij,k 5 Pji,k for every i and j, and the frequencies
FST,ii5
Var(pi) pi(12pi)
, (10a)
of the unordered genotypes AiAiand AiAjin
subpopula-tion k are Pii,k and 2Pij,k for i≠j, respectively. Then we
have FST,ij5 2
Cov(pi,pj) pipj
i≠ j, (10b)
pi,k5
o
jPij,k. (2)
and noting that
The frequencies of the allele Ai and the genotype AiAj Var(pi)#pi(12 pi) (11a)
in the entire population are
and pi5
o
k
wkpi,k, Pij5
o
kwkPij,k, (3)
[Cov(pi,pj)]2#Var(pi) Var(pj)#pi(12 pi)pj(1 2pj),
(11b) where the bar indicates averaging over subpopulations.
We do not restrict the action of the evolutionary from (10) we infer forces, except that they must be deterministic. This
im-0# FST,ii# 1 (12a)
plies, in particular, that every subpopulation must be
(in principle) infinite. (Chakraborty1993) and
We now define Nei’s (1977) genotype-specific
fixa-tion indices. The subscripts I, S, and T refer to individu- |FST,ij|#
3
(12 pi)(1 2pj) pipj4
1/2
, i≠j. (12b) als, subpopulations, and the total population,
respec-tively. The parameters FIS,ij,kand FIT,ijdesignate standardized Now we express the fixation indices for each
homozy-measures of the deviation from Hardy-Weinberg pro- gote in terms of the heterozygote indices, which there-portions of genotype AiAjin subpopulation k and in the fore suffice for the analysis of population structure.
Sub-entire population, respectively; FST,ijsignifies a standard- stituting (4) into (2) leads to
ized measure of the covariance of the frequencies of
(1 2pi,k)FIS,ii,k5
o
j :j≠ipj,kFIS,ij,k, (13)
the alleles Aiand Aj:
Pii,k5 pi,k2 1 FIS,ii,kpi,k(12pi,k), (4a) which can be rewritten more compactly but less
instruc-tively as Pij,k5 (12FIS,ij,k) pi,kpj,k, i ≠j; (4b)
FIS,ii,k5
o
jpj,kFIS,ij,k. Pii5pi2 1FIT,iipi(12pi), (5a)
Pij 5(12 FIT,ij)pipj, i≠j; (5b)
Inserting (5) into the average of (2) yields (National
p2
i 5pi2 1FST,iipi(1 2pi), (6a) Research Council 1996, Appendix 4A)
pipj 5(12 FST,ij)pipj, i≠j. (6b) (1 2pi)FIT,ii5
o
j :j≠i
pjFIT,ij. (14)
If every subpopulation is panmictic, then (4) implies
Finally, substituting (6) into the equation that FIS,ij,k50 for every i, j, and k. In this case, Pij5pipj,
so comparing (5) with (6) informs us that FIT,ij5 FST,ij
o
jpipj5 pi,
for every i and j.
we find that FST,ijalso satisfies (14):
The panmictic indices are the complements of the fixation indices:
(12 pi)FST,ii5
o
j :j≠ipjFST,ij. (15)
HIS,ij,k5 12 FIS,ij,k, (7a)
Thus, in each subpopulation, the1⁄2r (r11)21
inde-HIT,ij5 12 FIT,ij, (7b)
pendent genotypic frequencies can be replaced by the HST,ij5 12 FST,ij. (7c) r21 independent allelic frequencies and the1⁄2r(r21)
heterozygote fixation indices FIS,ij,k(i≠j). An analogous
The fixation indices satisfy some simple constraints.
reparametrization holds for the mean genotypic fre-From (4b), (5b), and (6b) we see immediately
quencies in (5) and the covariances [see (10)] in (6). Note that if FIS,ij,k5 F˜IS,k, independent of i and j, for FIS,ij,k,FIT,ij, FST,ij# 1, i≠j. (8)
every i and j such that i ≠ j, then (13) appropriately These fixation indices can be negative. Since 0 # Pii,k implies that F
IS,ii,k5F˜IS,k for every i. Similar results hold
#pi,kand 0#Pii#pi, from (4a) and (5a) we conclude for F
IT,ijand FST,ij.
Next, we derive the generalization of Wright’s
2 pi,k
12pi,k
#FIS,ii,k# 1, 2 pi
12 pi
#FIT,ii#1, (9) (1943) relationship among the fixation indices. First,
guided by (4), we define the weighted average of FIS,ij,k
which is misprinted inChakraborty(1993). Rewriting over subpopulations (Nei 1977; Wright 1978, pp.
hS,k512fS,k5
o
i,j :i≠jpi,kpj,k5
o
ipi,k(12pi,k), (23b) FIS,ii5
1 pi2 pi2
o
kwkpi,k(12 pi,k)FIS,ii,k, (16a)
hS512fS5
o
i,j :i≠jpipj5
o
i(pi2pi2)5
o
kwkhS,k. (23c) FIS,ij5
1 pipj
o
kwkpi,kpj,kFIS,ij,k, i≠ j. (16b)
Therefore, fS,kis the probability that two genes chosen
Inserting (8) into (16b) and (9) into (16a) demonstrates
at random from subpopulation k are the same allele; that FIS,ij# 1 for every i and j. Since the averages (16)
the probability that two genes chosen at random from are properly normalized (i.e., the sum of the weights is
the same subpopulation are the same allele is fS. The
1), from (7a) we have
corresponding probabilities that the two genes are dif-ferent alleles are hS,kand hS.
HIS,ij512FIS,ij. (17)
If the entire population were panmictic, its homozy-Note carefully that the weighting in (16) differs from gosity and heterozygosity would become f
T and hT,
re-that in (3). spectively:
Solving (4) for FIS,ij,k, substituting into (16), and
recall-fT 5
o
ip2
i , (24a)
ing (3), we deduce (Nei1977)
hT5 12 fT5
o
i,j :i≠jpipj5
o
ipi(1 2pi). (24b) FIS,ii5
Pii2pi2 pi2 pi2
, (18a)
Therefore, fT is the probability that two genes chosen FIS,ij5
pipj 2Pij pipj
, i≠j. (18b) at random from the entire population are the same
allele; the probability that they are different alleles is We insert (13) into (16a) and invoke (16b) to express hT. From (23a) and (24a) we see at once that fS$ fT, every average homozygote index in terms of the average whence hS# hT.
heterozygote indices: We shall indicate averages over genotypes by an aster-isk. Consider first FIS,ij,k. Multiplying (13) by pi,kand sum-FIS,ii5
1 pi2pi2j :j
o
≠ipipjFIS,ij. (19) ming over i yields the equivalent homozygote and
het-erozygote averages Now we can prove that
F*IS,k 5
1 hS,k
o
ipi,k(12pi,k)FIS,ii,k (25a) HIT,ij5 HIS,ijHST,ij (20)
for every genotype AiAj. From (18) we obtain 5 1 hS,ki,j :i
o
≠jpi,kpj,kFIS,ij,k, (25b) Pii5pi21FIS,ii(pi2pi2), (21a)
which are properly normalized because of (23b). In-Pij 5(12 FIS,ij)pipj, i≠j. (21b)
serting (4b) into (25b) and invoking (22b) and (23b) For i 5 j, we equate (21a) to (5a), solve for FIT,ii, and leads to
invoke (6a), (7b), (7c), and (17) to establish (20). For
H*IS,k 512F *IS,k5hI,k/hS,k (26) i ≠ j, we equate (21b) to (5b), employ (7b) and (17),
solve for HIT,ij, and deduce (20) from (6b) and (7c). in every subpopulation k. Therefore, F*
IS,kcan be negative,
Finally, we express each panmictic index as a ratio of but F *
IS,k# 1 for every k .
heterozygosities, or gene diversities. Let fI,kand fIdenote Recalling (23c), we define the averages of F* IS,k over
the actual homozygosities in subpopulation k and in subpopulations as the entire population, respectively; the corresponding
heterozygosities are hI,kand hI: F* IS5
1 hS
o
kwkhS,kF *IS,k. (27) fI,k5
o
i
Pii,k, fI5
o
iPii5
o
kwkfI,k, (22a)
Substituting (26) into (27) and employing (22c) and hI,k512fI,k5
o
i,j :i≠jPij,k, (22b) (23c) yields
H *IS5 12 F*IS 5hI/hS. (28) hI 512fI5
o
i,j :i≠jPij 5
o
kwkhI,k. (22c)
This simple result, in which the numerator and denomi-If subpopulation k were panmictic, its homozygosity nator in (26) are averaged separately, follows from the would be fS,k; if every subpopulation were panmictic, the weightings in (25) and (27). Note that F *IScan be
nega-homozygosity in the entire population would be fS. The tive, but F *IS#1.
corresponding heterozygosities are hS,kand hS. Thus, By substituting (25) into (27) and appealing to (16),
we can also express F *ISas an average over homozygotes fS,k5
o
i p2
i,k, fS5
o
ip2
i 5
o
kwkfS,k, (23a)
(28), (31), and (33) are ratios of random heterozygosi-F *IS5
1 hS
o
i(pi 2pi2)FIS,ii (29a)
ties, even their expectations are difficult to evaluate and to relate to theoretical studies of population structure, which are usually formulated in terms of covariances of
5 1
hSi,j :i
o
≠jpipjFIS,ij, (29b)
allelic frequencies or probabilities of identity in allelic state or of identity by descent. The fixation indices we which are properly normalized by (23c).
shall define are parameters. Now we turn to FIT,ij. Multiplying (14) by piand
sum-We shall examine only the allelic frequencies. These ming over i gives the equivalent homozygote and
hetero-are of greatest evolutionary interest and suffice for most zygote averages
theoretical investigations of population structure, which are usually restricted to panmictic subpopulations. To F *IT5
1 hT
o
ipi(1 2pi)FIT,ii (30a)
account for random variation, we imagine that the popu-lation T, which comprises the subpopupopu-lations S, is
repli-5 1
hT i,j :i≠j
o
pipjFIT,ij, (30b) cated infinitely many times to form the metapopulation U. Each of these replicates is an independent realization whose normalization is justified by (24). Inserting (5b) of the evolutionary process, so U is an infinite collection into (30b) and utilizing (22c) and (24b), we obtain of such realizations. We do not assume that the
subpopu-lations S are panmictic. H*IT5 12 F*IT5 hI/hT. (31)
The arrangement of this section is the same as that Therefore, F*IT# 1, but F *ITcan be negative. of the preceding one.
For FST,ij, from (15) we get The allelic frequencies p
i,kare now random variables.
As in the last section, a bar indicates averages over sub-F *ST 5
1 hT
o
ipi(1 2pi)FST,ii (32a) populations S within the population T :
pi 5
o
kwkpi,k. (35)
5 1
hT ij:i≠j
o
pipjFST,ij. (32b)
Of course, piis now a random variable. For typographical
Substituting (6b) into (32b) and using (23c) and (24b),
simplicity, we use an angle bracket to signify averages we find
over evolutionary realizations (or sample paths). Thus, H *ST 512F*ST5 hS/hT. (33) kpi,klis averaged over T within U, and the grand mean
of the frequency of Ai is
Since hT$ hS$ 0, we have 0#F *ST# 1.
From (28), (31), and (33) we infer at once the hierar- pi; E(pi)5 kpil. (36) chical formula
Analogy with (21), (5), and (6) suggests the defini-H*IT 5H*ISH *ST. (34) tions
Nei(1977) derived (28), (31), (33), and (34) for homo- k
p2
il5 kpi2l1 FST,iikpi(12pi)l, (37a)
zygotes. Our treatment establishes these results also for
kpipjl5 (12FST,ij)kpipjl, i≠j; (37b)
heterozygotes. Observe from (34) that when (20) is
aver-aged over genotypes, the factors on the right-hand side kp2
il5 p2i 1FSU,iipi(12 pi), (38a)
are averaged separately. This occurs because the k
pipjl5 (12FSU,ij)pipj, i≠ j; (38b)
weightings in (30) and (32) differ from those in (29).
kp2
il5 p2i 1FTU,iipi(12 pi), (39a)
In the above analysis, we posited a discretely
subdi-vided population. However, if we restrict our attention kp
ipjl5 (12FTU,ij)pipj, i≠j. (39b)
to FIT,ij, this assumption becomes unnecessary. Indeed,
As in (7), the panmictic indices are the complements the definitions (5), (22c), and (24) involve only allelic
of the above fixation indices. and genotypic frequencies in the entire population.
Solving (37) to (39) for the fixation indices yields Therefore, (14), (30), and (31) hold for arbitrary
popu-lation structure.
FST,ii5 kVar(p i|T )l
kpi(12 pi)l
, (40a)
STOCHASTIC ALLELIC FREQUENCIES
FST,ij5 2
kCov(pi, pj|T)l
kpipjl
, i≠j ; (40b) Here, we shall extend the analysis in the last section to
randomly varying allelic frequencies, which may reflect
FSU,ii5
Var(pi)
pi(12 pi)
, (41a)
finite subpopulation numbers or random variation in evolutionary forces. In this case, it is obvious thatNei’s
(1977) definitions (4), (5), and (6) lead to fixation F
SU,ij5 2
Cov(pi,pj)
pipj
If the metapopulation U were panmictic, its homozygos-FTU,ii5
Var(pi)
pi(1 2 pi)
, (42a) ity and heterozygosity would be
fU5
o
ip2
i, (49a)
FTU,ij5 2
Cov(pi, pj)
pipj
, i≠j . (42b)
hU5 12 fU5
o
i,j :i≠jpipj5
o
ipi(12 pi). (49b)
A glance at (37b), (38b), and (39b) immediately
re-Note that the definitions (47), (48), and (49) follow veals the constraints
from the transformation of (22), (23), and (24), respec-FST,ij,FSU,ij,FTU,ij#1, i≠j. (43) tively.
From (47a), (48a), and (49a) we obtain easily fS $
These fixation indices can be negative. Reasoning as in
fT$fU, which implies that hU# hT#hS.
(11), from (40a), (41a), and (42a) we deduce
To average FST,ijover homozygotes or heterozygotes,
0#FST,ii,FSU,ii,FTU,ii# 1. (44) we transform (29):
Bounds corresponding to (12b) are easy to derive, but
F *ST5
1 hT
o
ikpi(1 2pi)lFST,ii (50a)
are too complicated to be illuminating.
We can easily derive the remaining results in this
sec-tion ab initio, but we can obtain them more quickly by 5 1 hTi,j :i≠j
o
kpipjlFST,ij, (50b)
the following transformation. In (21), (5), and (6), we drop the bar from FIS,ij; make the substitutions I → S,
for which (28) yields S→ T, and T→ U; replace the bars by angle brackets;
and finally substitute Pij→pipjand pi→pi. This transfor- H *ST512F*ST 5hS/hT. (51)
mation yields pipj→ kpipjl and pi →pi. Then (21), (5),
For FSU,ij, from (30) and (31) we obtain
and (6) become (37), (38), and (39), respectively. To express the fixation indices for each homozygote
F *SU5
1 hU
o
ipi(12 pi)FSU,ii (52a)
in terms of the heterozygote indices, we apply our trans-formation to (19), (14), and (15), which become,
re-spectively, 5 1
hUi,j :i
o
≠jpipjFSU,ij. (52b)
FST,ii5
1
kpi(1 2pi)lj :j
o
≠ikpipjlFST,ij, (45a) H *SU512F *SU5hS/hU. (53)
For FTU,ij, from (32) and (33) we get FSU,ii5
1 12 pij :j
o
≠ipjFSU,ij, (45b)
F*TU5
1 hU
o
ipi(12 pi)FTU,ii (54a) FTU,ii5
1 12 pij :j
o
≠ipjFTU,ij. (45c)
5 1
hUi,j :i
o
≠jpipjFTU,ij, (54b)
The generalization (20) of Wright’s relationship
H*TU512F *TU5hT/hU. (55)
among the fixation indices becomes
Since hS $ hT $ hU$ 0, the results (51), (53), and HSU,ij5HST,ijHTU,ij (46)
(55) inform us that for every i and j.
0# F*ST,F *SU,F*TU# 1,
Finally, we express each panmictic index as a ratio of expected heterozygosities. If every subpopulation S were
which also follows easily from (44), (50a), (52a), and panmictic, the expected homozygosity and
heterozygos-(54a). ity in the entire population T would be fSand hS,
respec-From (51), (53), and (55) we establish immediately tively. Thus, in this case, fSand hSare the homozygosity
the hierarchical result and heterozygosity in the metapopulation U:
H *SU5H *STH *TU, (56) fS5
o
i
kp2
il, (47a)
in accordance with (34).
The panmictic index H*ST is a measure of variation hS5 12 fS5
o
i,j :i≠jkpipjl5
o
ikpi2 pi2l. (47b)
between subpopulations. Our development justifies the use of (51) for this parameter in theoretical investiga-If the entire population T were panmictic, these
expec-tions (see, e.g.,Takahata1983;CrowandAoki1984;
tations would become
TakahataandNei1984; Slatkin andBarton1989;
fT5
o
ikp2
il, (48a) Slatkin
1991, 1993), and the ratio (51) of expected heterozygosities may also be preferable for data analysis hT 512fT5
o
i,j :i≠j
kpipjl5
o
ikpi(12 pi)l. (48b)
heterozygosi-ties (NeiandChakravarti1977;Neiet al. 1977). Sub- population. Below, we develop this idea more precisely
and illustrate it by four examples. stituting (47) and (48) into (51) produces the explicit
Since nucleotide diversities are generally low, there-formula
fore F*STis usually a suitable measure of differentiation
at the nucleotide or codon level. H*ST5
12
o
i
kp2
il
12
o
i
kp2
il
. (57)
We separate the cases of high and low genetic diversity and use the criteria ofKimuraandMaruyama(1971);
see alsoNagylaki(1983, 1985, 1986).
Our index of genetic diversity is the effective number
DISCUSSION of alleles (KimuraandCrow1964;Maruyama1970)
Without restricting the evolutionary forces that may ne51/fT, (60)
be present, we have developed systematically the theory
where fTis given by (24a) or (48a). In an infinite,
pan-of fixation indices in an arbitrarily subdivided
popula-mictic population with,alleles, it is trivial to prove that tion. Our indices are parameters, rather than random
ne # ,, with equality if and only if all the alleles are
variables. To estimate the pattern and strength of
evolu-equally frequent (Nagylaki1992, pp. 29–30). Diversity
tionary forces (such as migration) from the above
the-is high if ne@1 and low if ne≈1.
ory, a model must be specified and used to derive
formu-For high diversity, our measure of gene-frequency las for the fixation indices, as in examples 3 and 4 at
differentiation is fT/fS. We shall say that differentiation
the end of this section.
is strong if fT ! fS(defined as fT/fS ! 1) and weak if
The formulas (26), (28), (31), (33), (51), (53), and
fT ≈fS(recall that fT #fS).
(55) for the panmictic indices all have the same simple
For low diversity, the ratio fT/fSis insensitive to
differ-form: if B is a finer level of subdivision than C, then
entiation because fT≈fS≈1. A more sensitive measure
is hS/hT: strong and weak differentiation correspond to
HBC5 hB/hC, (58)
hS!hTand hS≈hT, respectively.
where hX designates the expected heterozygosity with Now consider
random mating within subdivisions at level X. Then
not only are the hierarchical relations (34) and (56) F*
ST5
hT2 hS hT
5fS2fT
12fT
. (61) obvious, but so is their extension to further nested
subdi-vision (Wright1969, p. 295). Thus, if R, S, T, and U
For low diversity, our criteria are, indeed, equivalent to signify increasingly coarse subdivision, we have F*
ST≈1 if differentiation is strong and to F *ST! 1 if it is
weak. For high diversity, however, F*ST ≈ fS 2 fT, so if HRU5HRSHSTHTU. (59)
fT ! fS! 1, then differentiation is strong yet F*ST ! 1;
We proceed to discuss the interpretation of some thus, strong differentiation does not imply that F *ST≈1. measures of population differentiation. According to Weak differentiation does imply that F*ST !1.
(10a) and (12a), the fixation index FST,iiis a standardized Example 1:Suppose that there are K subpopulations,
measure of the intersubpopulation variance of the fre- of which L (0,L, K) are fixed for A1and K2L for quency piof the allele Ai. By (10b), the corresponding A2. Then (23c) and (24b) give hS50 and hT.0, whence
covariance measure for the frequencies of Aiand Ajis (33) yields F *ST51. This indicates that every subpopula-FST,ij. If every subpopulation is panmictic, then FIT,ij 5 tion is fixed, and not all for the same allele. Since there FST,ijfor every i and j, and therefore (5) shows that the are only two alleles, however, complete differentiation
parameters FST,ijyield the genotypic frequencies in the between subpopulations (in the sense of having no
com-entire population. mon alleles) is possible only for two subpopulations. Now consider in more depth the interpretation of Example 2: By contrast, consider n subpopulations the homozygote or heterozygote average index F*ST, de- of the same size, without common alleles, each with
fined by (32) and evaluated in (33). Wright (1978, p. homozygosity fS. Then fT51⁄nfS, so from (33) we obtain
82) noted and exemplified that F *ST measures “the
amount of differentiation among subpopulations, rela- F*ST5
(n21)fS n2 fS
. (62)
tive to the limiting amount under complete fixation”
and that F *STis “not a measure of degree of differentia- Thus, F*ST,1 unless fS51, even though the
subpopula-tion in the sense implied in the extreme case by absence tions are fully differentiated. Furthermore, F *ST ≈ 1 if of any common allele. It measures differentiation within fS≈1, whereas F *ST! 1 if fS!1. The second possibility
the total array in the sense of the extent to which the is misleading unless carefully interpreted. For high di-process of fixation has gone toward completion.” These versity, fS ! n (which must always hold if n @ 1), so
observations suggest that F*STis an appropriate measure F*ST!1 for small n, and this result can occur for any n.
of differentiation in a population with low genetic diver- If diversity is low, then fS≈1 and n must be small, which
correctly implies that F*ST≈1.
Two special cases illustrate the above observations. If which genes are sampled. We write the variance of the single-generation gametic displacement as 1⁄2s2and in-n@1, then F *ST≈fS. If each subpopulation has,equally
frequent alleles, then fS5 1/,, and hence F *ST5 (n2 troduce the scaled, dimensionless separation
1) /(n,2 1). j 5
2
√
uw /s. (67) Example 3: Our third example is the island modelFor weak mutation (u ! 1) and large neighborhood (Moran1959;Maruyama1970;Maynard Smith1970;
size (Ns @ 1), the probability at equilibrium that two
Nagylaki 1983, 1986, and refs. therein). Generations
distinct genes sampled from demes separated by a dis-are discrete and nonoverlapping. Each of n ($2)
pan-tance w ($0) are the same allele is adequately approxi-mictic (including selfing) subpopulations comprises N
mated by (Nagylaki1989, and refs. therein)
monoecious, diploid individuals. These colonies ex-change gametes with no spatial effect on dispersion, i.e.,
f(j)≈ e2j
11 b, (68) if the migration rate is m (0 , m , 1), every colony
receives a proportion m/(n 2 1) of its gametes from
each of the other colonies. Selection is absent, and every whereb 54Ns
√
u designates a dimensionless parame-allele mutates to new parame-alleles at the same rate u (0 # ter. We setu #1).
h(j)512f(j). (69)
We posit that migration is weak and that mutation is
The expected heterozygosity weak relative to the stronger one of migration and
ran-dom drift:
h(0)≈ b
11 b. (70) m! 1 and u!max(m ,1 /N). (63)
Then, at equilibrium, is high ifb *1 and low if b !1.
Now consider two demes with scaled separation j. ne≈
n[m1 u(4mNT1n2 1)]
nm1(n21)u (64) The effective number of alleles in these two demes is ne5
2 f (0)1 f(j)≈
2(11 b)
11e2j , (71) (Nagylaki1983), where N
T5nN represents the total
population number;
so their diversity is high ifb @1 and low ifb &1. For high diversity, we use f(j)/f (0) as a simple index F *ST≈
1
4Nma 11, (65) of differentiation between the two demes. Therefore, differentiation is strong if e2j!1 and weak if e2j≈1, where a 5[n /(n21)]2(Nei1975, p. 123;Nagylaki
independent ofb. For low diversity, the measure h(0)/ 1983;Takahata1983;CrowandAoki1984;Takahata
h(j) reveals that differentiation is strong if andNei1984;CockerhamandWeir1987); and
differ-entiation is strong if and only if b !12e2j (72a)
4mN!max(1,4NTu) (66a) and weak if
and weak if and only if b @ 12 e2j. (72b)
4mN@max(1,4NTu) (66b) From (61) we obtain
(Nagylaki 1986). Using F*ST to assess differentiation
F*ST(j)5
h(j)2 h(0) h(j)1h(0) ≈
11e2j
11 2b 2 e2j. (73) would replace (66a) and (66b) by 4mN!1 and 4mN@
1, respectively, which is correct if and only if 4NTu#1.
Again, F*STyields the correct criterion for differentiation
Thus, F*STprovides the correct criterion for
differentia-if and only differentia-if diversity is low. tion if and only if diversity is low (cf. Nagylaki 1983,
I thankBrian Charlesworth, James F. Crow,andMagnus Nord-1986).
borgfor useful comments on the manuscript. This work was sup-Example 4:Our last example is the unbounded,
unidi-ported by National Science Foundation grant DEB-9706912.
mensional stepping-stone model (Male´ cot1949, 1950,
1951;Kimura1953;Nagylaki1989, and refs. therein).
As in the island model, generations are discrete and
LITERATURE CITED
nonoverlapping; selection is absent; and every allele
mutates to new alleles at the same rate u (0 # Chakraborty, R., 1993 Analysis of genetic structure of populations:
meaning, methods, and implications, pp. 189–206 in Human
Popu-u#1). There are panmictic (including selfing) colonies
lation Genetics, edited byP. P. Majumder. Plenum Press, New York. of N monoecious, diploid individuals at all the integers. Cockerham, C. C., 1969 Variance of gene frequencies. Evolution
23:72–84.
These demes exchange gametes at rates that depend
Cockerham, C. C., 1973 Analyses of gene frequencies. Genetics 74: on displacement, but not on initial and final positions
679–700.
separately, i.e., dispersion is homogeneous. Cockerham, C. C., andB. S. Weir, 1986 Estimation of inbreeding
parameters in stratified populations. Ann. Hum. Genet. 50: 271–281.
Cockerham, C. C., andB. S. Weir, 1987 Correlations, descent mea- National Resarch Council, 1996 The Evaluation of Forensic DNA
sures: drift with migration and mutation. Proc. Natl. Acad. Sci. Evidence. National Academy Press, Washington, DC.
USA 84: 8512–8514. Nei, M., 1973 Analysis of gene diversity in subdivided populations. Crow, J. F., and K. Aoki, 1984 Group selection for a polygenic Proc. Natl. Acad. Sci. USA 70: 3321–3323.
behavioral trait: estimating the degree of population subdivision. Nei, M., 1975 Molecular Population Genetics and Evolution.
North-Hol-Proc. Natl. Acad. Sci. USA 81: 6073–6077. land Publishing Co., Amsterdam.
Kimura, M., 1953 “Stepping-stone” model of population. Annu. Nei, M., 1977 F-statistics and analysis of gene diversity in subdivided
Rept. Natl. Inst. Genet. Jpn. 3: 62–63. populations. Ann. Hum. Genet. 41: 225–233.
Kimura, M., andJ. F. Crow, 1964 The number of alleles that can Nei, M., 1986 Definition and estimation of fixation indices.
Evolu-be maintained in a finite population. Genetics 49: 725–738. tion 40: 643–645.
Kimura, M., andT. Maruyama, 1971 Pattern of neutral
polymor-Nei, M., 1987 Molecular Evolutionary Genetics. Columbia University
phism in a geographically structured population. Genet. Res. 18: Press, New York.
125–131. Nei, M., andA. Chakravarti, 1977 Drift variances of FSTand GST Male´cot, G., 1949 Les processus stochastiques de la ge´ne´tique.
statistics obtained from a finite number of isolated populations. Coll. Int. Cent. Nat. Rech. Sci. 13: 121–126. Theor. Popul. Biol. 11: 307–325.
Male´cot, G., 1950 Quelques sche´mas probabilistes sur la variabilite´
Nei, M., A. ChakravartiandY. Tateno, 1977 Mean and variance
des populations naturelles. Ann. Univ. Lyon Sci. Sec. A 13: 37–60. of F
STin a finite number of incompletely isolated populations.
Male´cot, G., 1951 Un traitement stochastique des proble`mes
lin-Theor. Popul. Biol. 11: 291–306. e´aires (mutation, linkage, migration) en Ge´ne´tique de
Popula-Nei, M., andR. K. Chesser, 1983 Estimation of fixation indices and
tion. Ann. Univ. Lyon Sci. Sec. A 14: 79–117.
gene diversities. Ann. Hum. Genet. 47: 253–259.
Maruyama, T., 1970 Effective number of alleles in a subdivided
Slatkin, M., 1991 Inbreeding coefficients and coalescence times.
population. Theor. Popul. Biol. 1: 273–306.
Genet. Res. 58: 167–175.
Maynard Smith, J., 1970 Population size, polymorphism, and the
Slatkin, M., 1993 Isolation by distance in equilibrium and
non-rate of non-Darwinian evolution. Am. Nat. 104: 231–237.
equilibrium populations. Evolution 47: 264–279.
Moran, P. A. P., 1959 The theory of some genetical effects of
popula-Slatkin, M., andN. H. Barton, 1989 A comparison of three indirect
tion subdivision. Aust. J. Biol. Sci. 12: 109–116.
methods for estimating average levels of gene flow. Evolution 43:
Nagylaki, T., 1983 The robustness of neutral models of
geographi-1349–1368. cal variation. Theor. Popul. Biol. 24: 268–294.
Takahata, N., 1983 Gene identity and genetic differentiation of Nagylaki, T., 1985 Homozygosity, effective number of alleles, and
populations in the finite island model. Genetics 104: 497–512. interdeme differentiation in subdivided populations. Proc. Natl.
Takahata, N., andM. Nei, 1984 FSTand GSTstatistics in the finite
Acad. Sci. USA 82: 8611–8613.
island model. Genetics 107: 501–504.
Nagylaki, T., 1986 Neutral models of geographical variation, pp.
Weir, B. S., andC. C. Cockerham, 1984 Estimating F-statistics for
216–237 in Stochastic Spatial Processes, edited by P. Tautu. Springer,
the analysis of population structure. Evolution 38: 1358–1370. Berlin.
Nagylaki, T., 1989 Gustave Male´cot and the transition from classical Wright, S., 1943 Isolation by distance. Genetics 28: 114–138.
to modern population genetics. Genetics 122: 253–268. Wright, S., 1969 Evolution and the Genetics of Populations, Vol. II. The Nagylaki, T., 1992 Introduction to Theoretical Population Genetics. Theory of Gene Frequencies. University of Chicago Press, Chicago.
Springer, Berlin. Wright, S., 1978 Evolution and the Genetics of Populations, Vol. IV. Nagylaki, T., P. T. KeenanandT. F. Dupont, 1993 The influence Variability Within and Among Natural Populations. University of
Chi-of spatial inhomogeneities on neutral models Chi-of geographical cago Press, Chicago. variation. III. Migration across a geographical barrier. Theor.