Population Genetics of Polymorphism and Divergence Under Fluctuating Selection

(1)

DOI: 10.1534/genetics.107.073361

Population Genetics of Polymorphism and Divergence Under

Fluctuating Selection

Emilia Huerta-Sanchez,* Rick Durrett

†

_{and Carlos D. Bustamante}

‡,1

*Center for Applied Mathematics,†Department of Mathematics,‡Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York 14853

Manuscript received March 14, 2007 Accepted for publication October 2, 2007

ABSTRACT

Current methods for detecting fluctuating selection require time series data on genotype frequencies. Here, we propose an alternative approach that makes use of DNA polymorphism data from a sample of individuals collected at a single point in time. Our method uses classical diffusion approximations to model temporal fluctuations in the selection coefficients to find the expected distribution of mutation frequencies in the population. Using the Poisson random-field setting we derive the site-frequency spectrum (SFS) for three different models of fluctuating selection. We find that the general effect of fluctuating selection is to produce a more ‘‘U’’-shaped site-frequency spectrum with an excess of high-frequency derived mutations at the expense of middle-frequency variants. We present likelihood-ratio tests, comparing the fluctuating selection models to the neutral model using SFS data, and use Monte Carlo simulations to assess their power. We find that we have sufficient power to reject a neutral hypothesis using samples on the order of a few hundred SNPs and a sample size of20 and power to distinguish between selection that varies in time and constant selection for a sample of size 20. We also find that fluctuating selection increases the probability of fixation of selected sites even if, on average, there is no difference in selection among a pair of alleles segregating at the locus. Fluctuating selection will, therefore, lead to an increase in the ratio of divergence to polymorphism similar to that observed under positive directional selection.

T

WO mechanisms by which evolution can occur are the adaptive process of natural selection and the neutral processes of genetic drift. Which of these is the principal force in the evolution of a population has been one of the central issues in evolutionary biology. An early exchange in this debate was over the changes in the frequencies of a color polymorphism in a pop-ulation of the scarlet tiger moth,Callimorpha (Panaxia) dominula. Fisherand Ford(1947) argued that the pop-ulation size was too large for the changes in the fre-quencies to be due to drift and suggested that fluctuating selection must be acting. Wright(1948) replied by ar-guing that multiple factors could affect a population and that the effective population size might be much smaller than the census population size.

The potential importance of fluctuating selection on the rate and patterns of molecular evolution is well established on the basis of theoretical and simulation arguments (see Gillespie1991, 1994). While it is likely that changing environmental conditions affect the fit-ness and level of genetic variation in natural populations, only a handful of empirical population genetic studies have sought to investigate the issue. Part of the reason

this problem remains understudied is the lack of power-ful statistical tools for comparing patterns of polymor-phism to the predictions under fluctuating selection.

We begin by summarizing key experimental evidence for the importance of fluctuating selection. Mueller et al. (1985) developed statistical tools on the basis of time series analysis to examine variations in allele fre-quencies. These tests were applied to Drosophila pseu-doobscura andD. persimilis populations sampled over a 3-year period. They found significant evidence that fluc-tuating natural selection likely maintained genetic poly-morphisms in 15–20% of cases, suggested by correlated allele frequencies at different enzyme loci. Their model allowed for time-correlated changes in the environmen-tal conditions and it was noted that the power to detect selection was at a maximum when the environmental changes were strongly positively correlated in time rel-ative to the sampling time. A year later, Lynch(1987) argued that their findings of fluctuating selection effects could have been the result of nearby migration that could cause shifts in gene frequency. To eliminate this possibility, he used enzyme loci in Daphnia populations. These populations were large enough to dismiss signif-icant variation due to drift over the timescale of the study and isolated enough to preclude mass migration. He detected variation in allele frequencies attributable to fluctuating selection in the short term, but also discovered 1_{Corresponding author:} _{101 Biotechnology Bldg., Cornell University,}

Ithaca, NY 14853. E-mail: [email protected]

(2)

that the time-averaged selection coefficients for all loci were not significantly different from zero.

A decade later, Cookand Jones(1996) reexamined some of the earlier results by Fisher and Ford. Analyzing census data from natural and artificial colonies of C. dominula, they concluded that the color polymorphism was maintained by frequency-dependent selection (i.e., selection for a rare variant and against a common var-iant). They posited small effective population size as the reason for the large variation in observed frequencies, as opposed to fluctuating selection acting on the popula-tions. More recently, O’Hara (2005) took a Bayesian approach to fit the frequencies of the same medionigra gene inC. dominulaexamined by Fisher and Ford, with the aim of disentangling the relative contribution from both drift and selection in the observations. He inferred that, apart from a recent 8-year period (from the total 60 years of observations), random drift could explain most of the data, but effects of selection were also present. Furthermore, he determined that the time-averaged selection coefficients were close to zero and no change in fitness over the length of the study could be detected, in agreement with previous authors. In contrast to these earlier works, we develop procedures for estimation of fluctuating selection coefficients from sequence poly-morphism data taken from a sample of individuals at a single point in time.

For statistical estimation and the derivation of the the-oretical site-frequency spectrum, we follow the Poisson random-field approach (Sawyerand Hartl1992; Hartl et al.1994). This framework has proved to be useful in estimating mutation and selection parameters in a vari-ety of population genetic settings including genic selec-tion in a populaselec-tion of constant size (Bustamanteet al. 2001), general diploid selection (Williamsonet al.2004), and selection in a population undergoing a change in size (Williamsonet al.2005). In all of the cases exam-ined, the site-frequency spectrum contains sufficient in-formation so as to allow for parameter estimation and hypothesis testing given sufficient polymorphism data (normally in the hundreds or thousands of SNPs).

Our first step is to calculate the predicted effect of fluctuating selection on the site-frequency spectrum (SFS). That is, the SFS is the number of mutations at frequency i=n, where 1#i,n1 andnis the number of indi-viduals sampled. To obtain the SFS we use a diffusion approximation for the stationary distribution of the mu-tation frequencies in a large population. Kimura(1954) was the first to attempt to use diffusion theory to study how the allele frequencies might change under fluctu-ating environments. There was an error in the calcula-tion that was later corrected by Gillespie (1973) and Jensen (1973). Karlin and Levikson (1974) studied the model with independent fluctuations in more detail and with some simulations. They calculated fixation prob-abilities and the expected time to fixation under different assumptions. Takahataet al.(1975) (TIM) derived the

diffusion approximation when fluctuations were corre-lated in time. The TIM model served as a springboard for the rich body of theoretical investigation by Gillespie and others (see Gillespie1991). Briefly, the TIM model is part of a class of diploid selection models denoted by Gillespie as stochastic additive scale–concave fitness func-tion (SAS–CFF) models. SAS describes the stochastic ad-ditive scale to which alleles contribute, and the adad-ditive scale is mapped to a CFF that is used to assign fitness to the different genotypes. These models can also be ex-tended to include dominance, subdivision, or autocor-relations in the environment.

To compute the theoretical SFS when there are tempo-ral fluctuations in the environment, we use the diffusion approximation. We take the mean selection coefficient of each allele to be equal, implying a net mean selection coefficient of zero. This is in agreement with the con-clusions of earlier works on the absence of a long-term net fitness advantage between alleles, when examining temporal data. We also derive the theoretical site fre-quency, assuming an autocorrelation in environmental changes. As a validation of the diffusion approximation, we compare the diffusion approximation prediction for the SFS to data gathered from simulation assuming in-dependent sites.

The site-frequency spectrum data are then used to define a maximum-likelihood function from which we obtain estimates of the mutation rate parameter,u, and a parameter,b, which measures the variance of the fluc-tuating selection coefficients. This allows the derivation of the asymptotic variance and covariances for these param-eters and a comparison to likelihood-based uncertainty bounds. To distinguish between fluctuating selection and a neutral model, we compute the power of the likelihood-ratio test (LRT) for various values ofb. We also investi-gate the coverage ofband explore the power to distinguish fluctuating selection and negative or positive (directional) selection by generating a series of empirical distribu-tions for the LRT statistic under the appropriate null hypothesis.

THEORY AND METHODS

Diffusion approximation for the site-frequency spectrum: We begin with the model of Karlin and Levikson(1974). Consider two allelesAandathat have the following fitnesses

A a

Fitness in generationn: 11sðnÞ 11tðnÞ;

(3)

bðxÞ ¼xð1xÞð2NÞ½EðstÞ Eðs2t2Þ=2 1EðstÞ2_ð₁₌₂

xÞ ð1Þ

and the variance is

aðxÞ ¼xð1xÞ1x2ð1xÞ2₂

NEðstÞ2_: _ð₂_Þ (For a full derivation see theappendix.) To simplify the formulas we let

a¼2N½EðstÞ Eðs2_t2_Þ₌₂₁_E_ð_s_t_Þ2₌₂

b¼2NEðstÞ2 so that

bðxÞ ¼xð1xÞðabxÞ aðxÞ ¼xð1xÞ½11bxð1xÞ: ð3Þ

At first glance, the drift term looks like balancing se-lection, but there is an extra term in the variance that speeds up the movement and prevents accumulation of intermediate-frequency alleles. We consider the special case in which s and thave the same distribution, so E(st)¼0,E(s2_t2₎_¼_{0, and hence}_a_¼_b_{/2. In these} models,bis a scaled measure of the total variance in the fluctuating selection since var(st)¼b/(2N) . Thus, substitutinga¼b/2 we get

bðxÞ ¼xð1xÞ b

2bx

aðxÞ ¼xð1xÞ½11bxð1xÞ:

ð4Þ

Finding the stationary solution to the diffusion equa-tion (see theappendix) and using the formula derived in Kimura(1962) or in Sawyerand Hartl(1992), we find the density of mutations for a unit overall mutation rate is given byf(y,b)dy, where

fðy;bÞ ¼ 2

KðbÞyð1yÞlog

1r1ðbÞ

yr1ðbÞ

r2ðbÞ y r2ðbÞ 1

: ð5Þ

Here,r1ðbÞ¼1 2

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 11ð4=bÞ

p

=2,r2ðbÞ¼1 21

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 11ð4=bÞ

p

=2, and

KðbÞ ¼log 1r1ðbÞ

r1ðbÞ

r2ðbÞ r2ðbÞ 1

:

K(b) is ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffib2₁

4b

p

f(1), wheref(y) is an increasing scale function normalized byf(0)¼0. To show the re-lationship to the neutral case (b¼0), we rewrite (5) as

fðy;bÞ ¼2

y 1

KðbÞð1yÞlog

1r1ðbÞ

yr1ðbÞ

r2ðbÞ y r2ðbÞ 1

[2 yhðb;yÞ:

The functionh(b,y)/1 asb/0 as we would expect. Whenb6¼0, we find that limy/0hðb;yÞ ¼1 and

lim

y/1 hðb;yÞ ¼

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

bð41bÞ

p

2 logðb=2111pbffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffið41bÞ=2Þ

!

$1:

Therefore, compared to the neutral model, the occur-rence of rare alleles (withyclose to 0) is unaffected by the fluctuating environmental effects while the propor-tion of high-frequency derived alleles (yclose to 1) is higher than expected under neutrality. Intermediate-frequency alleles, however, are underrepresented with respect to the neutral case (seeresults). The reason for this is that fluctuating selection (unlike directional se-lection) does not result in a net increase in the sojourn time of selected mutations relative to neutrality. Rather, it appears as if the increase in fixation comes at the ex-pense of higher genetic drift at intermediate-frequency alleles.

Autocorrelated selection coefficients:Takahataet al. (1975) considered a Wright–Fisher diffusion with varying selection:

1

4Nxð1xÞ d2

dx21sðtÞxð1xÞ d dx:

They let s¼E½sðtÞ and V ¼Ð₀‘Eð½sðtÞ s½sð0Þ sÞdt and found that in the diffusion approximation

aðxÞ ¼ 1

2Nxð1xÞ12Vx

2_ð₁_x_Þ2 bðxÞ ¼sxð1xÞ1Vxð1xÞð12xÞ:

To connect this to the model discussed above (which as-sumed s and t uncorrelated in time), suppose that E(s2_t2₎_¼_{0, let}_s_¼_E_ð_s_t_Þ_{, and equate}_V_¼_E₍_s

t)2_{/2. This suggests that to extend the previous analysis} to autocorrelated selection coefficients, (s_n,t_n), all we do is replaceE(st)2_{/2 by the sum of the} autocovar-iance function

X‘

n¼0

E½ðs₀t₀Þðs_nt_nÞ ð6Þ

ifE(st)¼0. In other words, we use (6) as the formula forb/(4N) in (5). While the above is not a proof, it is supported by the calculations that Takahata et al. (1975) give for their model. We also show numerically that substituting the autocovariance (6) for V gives a good approximation to the case wheresandtare not independent and identically distributed.

We consider two different forms for the autocorre-lation of the selection coefficients. In an earlier article (Gillespie1993), Gillespie used an autocorrelated model where the fitness of an allele changed by assigning a random fitness 11Xi(t). TheXi(t)’s are normal random variables that remain constant, on average, for 1/a gen-erations before changing,

XiðtÞ ¼ Xi

ðt1Þ with probability 1a

j_iðtÞ with probabilitya;

(4)

wherej(t) are independent and identically distributed normal random variables with mean zero and variance

s2_{. For this case the autocovariance is}

EðXðtÞ Xðt1kÞÞ ¼s2ð1aÞk

and

X‘

k¼0

EðXiðtÞ;Xiðt1kÞÞ ¼

s2

a: ð8Þ

We also consider an autocorrelated model with a higher degree of noise every generation:

Xiðt11Þ ¼

j_iðt11Þ1XiðtÞ with probability 1a

j_iðt11Þ with probabilitya:

ð9Þ

The sum of the autocovariance is

X‘

k¼0

EðXiðtÞ;Xiðt1kÞÞ ¼

2s2

a2

s2

a: ð10Þ

We get this quantity by rewriting theXi(t)’s as the se-quence of random variablesX(0),X(0) 1 S1,X(0) 1

S2,. . .,X(0)1Sk1,jk, whereSk¼j11j21. . .1jk. To find the covariance we need to find

EððXð0Þ1SiÞ ðXð0Þ1SjÞÞ ¼EXð0Þ21ESi2: ð11Þ

The middle terms disappear sinceX(0) andSkare inde-pendent and sinceSiSj¼S2

i 1SiSjiforj.iandSi,Sji are independent. Thus the covariance is

X‘

i¼0

EXð0Þ2₁_ES2

i ¼

X‘

k¼0

s2 a 1is

2

ð1aÞi

¼s

2 a21s

2X

‘

i¼0

ið1aÞi_: _ð₁₂_Þ

To evaluate the sum in the second line of (12), we can rewrite it as

s2X

‘

i¼0 ia

að1aÞð1aÞ

i1_¼_s2 1 a1 _X‘

i¼0

iað1aÞi1

and now note that the sum (on the right) is the geo-metric mean so (12) simplifies to

X‘

i¼0

EXð0Þ2₁

ES_i2 ¼2s

2 a2

s2

a: ð13Þ

Again, the quantity (10) is simply substituted forb/(4N) in Equation 5.

Site-frequency spectrum: Whether we are discussing the time-correlated or -uncorrelated selection model, the stationary polymorphism density of the frequency of a single mutation per unit overall mutation rate is

fðq;bÞdq¼ 2

KðbÞ

1 qð1qÞlog

1r1 qr1

r2q r21

dq: ð14Þ

Givenf(q,b) we can now write formulas for the expected site-frequency spectrum, following Bustamante et al. (2001). For a particular site containing a derived mu-tation at frequencyqin the population, the probability of samplingisequences with the derived mutation and niwith the ancestral type is binomially distributed with parametersnandq. We assume an infinite-sites model of mutation where sites are independent, and each muta-tion that enters the populamuta-tion is described by the two-allele model outlined in the previous section. Therefore, if mutations are entering the population at rateu/2, the number of sites that have a derived mutation counti,i.e., the site-frequency spectrum,fYign_i¼1, are independent

Poisson-distributed random variables with meanuF(i,b), where

Fði;bÞ ¼

ð1

0 fðq;bÞ

2 n

i q

i_ð₁_q_Þni_dq_: _ð₁₅_Þ

Simulations: We verify the accuracy of the diffusion approximation in the previous section, using indepen-dent Wright–Fisher Monte Carlo simulations for each site. The numerical routine proceeds as follows:

1. Run Wright–Fisher simulations to estimate the de-rived mutation frequency, starting at geometrically distributed time intervals measured in generations with rate u/2. This represents the appearance of a mutation at a previously unmutated site (the sites are independent, so each mutation follows the two-allele model described earlier). The mutant and ancestral alleles have fluctuating selection coefficients,sand

t, with strengths and autocorrelation parameters dic-tated by the model we are simulating. The population size for the Wright–Fisher simulations,N, should be sufficiently large such that the diffusion approxima-tion holds. More specifically, the variances of the se-lection parameters, quantified bybshould be of order 1=pffiffiffiffiffiN. Thusbis of order one. We have chosenN¼

2000 for all results presented here.

2. After a suitable burn-in to ensure stationarity (10N generations in our simulations), we begin to sample from the population at regularly spaced time intervals. At each time interval, we determine the population frequency of each of the derived mutations. The num-ber of derived mutations in the population (i.e., the number of segregating sites,S) is simply the number of Wright–Fisher trajectories that have not been fixed or lost at the given sample time.

(5)

4. We make a histogram {y1,y2,. . .,yn1} of the number

of sites carrying derived mutations that occur with frequency 1,. . .,n 1 in the sample. This sample provides one realization of the site-frequency spec-trum for a given value ofb. The results are discussed later and shown in Figure 2.

Parameter estimation and inference: As the site-frequency spectrum is independent Poisson distributed, the likelihood function for an observed spectrum, {yi}, is given by

Lðu;bÞ ¼Y n1

i¼1

expðuFði;bÞÞðuFði;bÞyi_Þ

yi!

: ð16Þ

Therefore, the log-likelihood function ½dropping the term log(yi!) independent of parametersis

lðu;bÞ ¼X n1

i¼1

yilogðuFði;bÞÞ u

Xn1

i¼1

Fði;bÞ: ð17Þ

The parametersuandbcan be estimated by maximizing the likelihood (using standard optimization techniques, e.g., conjugate gradient). We note (as in Bustamante et al.2001) that the maximizer ofufor a givenbis easily computed to be ˜u¼S=Pn_i_¼₁1Fði;bÞ, whereS¼Pn1

i¼1 yi

is the number of segregating sites in the sample. There-fore we can work with theprofilelog-likelihood function, l*ðbÞ ¼lðu˜ðbÞ;bÞ, which is now a function of a single var-iable. We use both simulation and asymptotic theory to study properties of these estimates. We follow Bustamante et al.(2001), where all their equations (9)–(32) apply to this method as well, with the appropriate changes given the new expression forf(q,b).

To obtain asymptotic confidence intervals, we must compute the inverse of the Fisher information matrix (FIM),I, given by the expected value of the Hessian of

1 times the log-likelihood function (17):

I ¼ E

@2l

@u2

@2l

@u@ðlogbÞ

@2l

@ðlogbÞ@u

@2l

@ðlogbÞ2 0

B B B @

1 C C C

A: ð18Þ

The derivatives are all evaluated at the maximum-likelihood estimates (MLEs) for the parameters. As we now show, the mixed partial derivatives are in fact zero for all values ofuandb, so the FIM is diagonal and the covariances Cov(b,u)¼0. To see this, note that

@l

@u¼

S

u

Xn1

i¼1

Fði;bÞ;

whereS¼Pn_i¼11yi. In fact,f(y,b)1f(1y,b) does not

depend onb, which implies thatPn_i¼11Fði;bÞis

inde-pendent of b; therefore ˆu is constant. It is more con-venient to work with logbsince the profile-likelihood function is better approximated by a Gaussian when

using this transformation (Figure 1). By the invariance principle of maximum likelihood, the estimates of the parameter are not affected, but we obtain better perfor-mance when far away from the maximum in calculating derived statistics such as confidence intervals. For exam-ple, using the logbtransformation, we are assured that the lower confidence bounds in the original coordi-nates are prevented from being negative. The asymp-totic variance–covariance matrix for the ML estimates is provided by the inverse FIM,

V ¼ 1

I11I22I₁₂2

I22 I12

I12 I11

¼ 1=I11 0

0 1=I22

¼ s

2 ˆ

u 0

0 s2

logðbˆÞ

!

: ð19Þ

For a 95% confidence interval on ˆu, we compute ˆu6 1:96suˆ. For logðbˆÞthe confidence interval is logðbˆÞ6

1:96s_log_ðbˆÞ.

For the empirical distribution of the parameter esti-mates, we generate 1200 data sets (site-frequency spectra) for given values ofuandb. We then compute the MLE Figure 1.—Comparison of normal approximation to the

(6)

values ˆuand ˆbfor each of the data sets and obtain 95% confidence intervals on the basis of both a normal ap-proximation of the sampling distribution of the MLE as well as ax2_{-approximation of the sampling distribution} of the log likelihood.

Unfortunately, the parameters involved in the auto-correlated cases are not individually identifiable; they enter the likelihood expression only throughb. That is, for both of the autocorrelated cases we haveb¼4Ng(s2_, a) where g is the function described in (10) or (8). There is a one-parameter family of pairs of values fors2 andathat yields the same value forb. Note that, for the form of autocorrelation in (7),b¼4Ntv(t¼1/a, and v is the variance of the autocorrelated noise); then, given an MLE forb, ˆb, any combination of (t,v) witht $

1 andtv¼4Nbˆ will be consistent with the data.

RESULTS AND DISCUSSION

The site-frequency spectrum: In the theory and methodssection we computed the expected frequency of the derived alleles in the whole population and we can calculate the expected site-frequency spectrumuF(i,b) for a sample of sizen. To validate the theory we use simu-lations (described in theSimulationssection) withn¼20,

u¼20, andb¼50. The expected site-frequency spectra are shown in Figure 2. The diffusion approximation is excellent; the data generated by the simulation for the independent case agree with the theory (see Figure 2A). Note that as mentioned before, the more common alleles are favored while alleles with intermediate frequencies are underrepresented, and there are fewer rare alleles than in the neutral case. Another interesting feature is that the expected number of segregating sites does not change withbsincePn_i¼11F9ði;bÞ ¼0 (the integrand is

an odd function), which implies thatuPn_i_¼₁1Fði;bÞ is constant. For the autocorrelated case corresponding to (7) and (9) see Figure 2, B and C. We useda¼0.1, which represents a change in the environment about every 10 generations, on average. The theory curve is slightly off at the edges of the spectrum but it is sufficiently accurate as to validate our substitution ofE(st)2_{/2 by the sum} of the autocovariances.

We present in Figure 3 the predicted fixation rates relative to neutrality for fluctuation selection and direc-tional selection as a function of the selection parameter. The natural scaling ispffiffiffibfor the fluctuating selection model andg¼2Nsfor the directional selection model. We see that in the parameter range of½0, 30, the two models make qualitatively similar predictions as the in-crease in substitution relative to neutrality. This result suggests that fluctuating selection will lead to an excess of divergence relative to polymorphism and in the anal-ysis of protein-coding DNA sequences result in rejection of the McDonald–Kreitman test in the same direction as positive selection.

Figure2.—Comparison of predictions from diffusion

(7)

We also considered the impact of fluctuating selection on commonly used SFS statistics for detecting deviation from the standard neutral model. Surprisingly, we find that the expected values of Tajima’sDand Fu and Li’s statistics are not affected by fluctuating selection. A po-tential explanation for this observation is that Tajima’sD is derived from the folded site-frequency spectrum that is not nearly as affected by selection as the unfolded SFS that can distinguish rare from high-frequency derived alleles.

The power of the test:We perform a test of neutrality that compares the null hypothesis (b¼0) with the alter-native hypothesis (b6¼0). The likelihood test statistic is

L¼ Lðuˆ;bˆjxÞ

Lðuˆw;0jxÞ

;

where ˆu_w is Watterson’s (1975) estimator ofu and 2 ln(L) has (asymptotically) ax2_{-distribution with 1 d.f. To}

assess the power of the test, we generate 1200 data sets (described in theSimulationssection), and for each data set, we apply the LRT at the (1a) significance level wherea¼0.05. We used different levels of mutationu2

{20, 40, 60} withb2{2, 8, 14, 20, 26, 32, 38, 44, 50, 56}. The results are shown in Figure 4. We observe that the test has very good power to detect deviations from neu-trality as u and bincrease. For higher mutation rates, there are a larger number of sites with derived mutations in any given sample, and intuitively the power to detect for selection is improved. TheP-values obtained here de-pend on the indede-pendent-sites assumption. However, it is possible to findP-values in the presence of linkage by using coalescent simulations with recombination to find the critical value for the test statistic (Zhuand Bustamante 2005).

Sampling distributions of logðbˆÞ and ˆu: Using the generated data under different mutation rates we find that the distributions for logðbˆÞare centered around the true value (Figure 5) with increasing variance asbmoves Figure4.—Power of the LRT test. On thex-axis are the

dif-ferent values ofbunder which the data were simulated. The

y-axis represents the proportion of data sets of 1200 that re-jected neutrality (b¼0).

Figure 3.—Fixation rate of mutations under directional

selection with scaled selection coefficientgor under fluctuat-ing selection with standard deviationpffiffiffibrelative to neutrality (g¼0).

Figure_{5.—The sampling distributions of log}b_{as a}

(8)

away from zero. The variance is large for smalleru. This happens because for some data sets we get very large values for ˆband can have ˆb¼‘. For example, if we use a data set generated withu¼20 andb¼56, we getb¼‘. If one plots the log-profile-likelihood function, it is easy to see that the function is monotonic and peaks atb¼‘

(see Figure 6). On the other hand, the estimation ofuis excellent for the different values ofb. The variance stays mostly constant as we varyb(see Figure 7).

Confidence sets: We examined two types of confi-dence sets: (1) the region that contains (1a)100% of the area under the normal approximation to likelihood function and (2) the region in the profile-likelihood space that is ,0:5x2

1;1alikelihood units from the

maximum-likelihood point. The corresponding area ata¼0.05 is 1.96 standard deviations away from logðbˆÞ, where the

standard deviation is given by

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi varðlogðbˆÞÞ

q

for the first confidence set. For the second confidence set, ata¼0.05, this corresponds to the region wherel*ðbˆÞ l*ðbÞ,1:92 likelihood units.

From Figure 8, we see that the confidence intervals based on the profile-likelihood function have good cov-erage. The confidence intervals based on the normal ap-proximation to the likelihood do cover very well for all values ofb, in fact, extremely well foru¼20. This occurs because for smalleruit is common to see that the best estimate forbis zero. Sincef(i,b) is not defined forb¼0, we have to use the limit asbgoes to zero. This results in an upper confidence interval that is‘and a lower confi-dence interval that is zero. Here we are estimating log(b) instead ofbbecause the quadratic approximation to the log-profile function is better when we use log(b). It is important to note that sometimes usingbdoes not give good coverage. In Figure 9 we compare the coverage us-ingb, log(b), andpffiffiffib;bgives bad coverage,pffiffiffibgives good coverage, and using log(b) gives big coverage. This im-plies that perhapspffiffiffibis the best function ofbof the three.

Distinguishing between positive selection and fluctu-ating selection: We also investigated how much power we have to distinguish fluctuating selection from positive or negative (constant) selection. Since these two models are not nested, we need to empirically find the distribu-tion of our statistic under the null (constant selecdistribu-tion) model. That is:

1. Generate a data set as explained in theSimulations section but under theg-model;i.e., the ancestral al-lele has fitness one, and the derived alal-lele has fitness 11s, wheresis constant in time, andg¼2Ns. 2. Find the MLE estimate ofgusing the likelihood for

the g-model ½this requires using the appropriate f(y,g) that is given in Bustamante et al.2001and the MLE estimate of busing the likelihood for the

b-model.

3. LetL¼2lðˆu;bˆÞ 2lðˆu;gˆÞ.

4. Repeat steps 1–3Mtimes to generate the empirical distribution of the statisticL.

Figure 6.—The log-profile-likelihood function ofbfor a

data set with trueb¼56 and maximum-likelihood estimate atb¼‘.

Figure7.—The sampling distributions of uas a function

(9)

To compute the power to distinguish these two models, we generated data under the b-model and computed the MLE estimates forgand forb. We then calculated

Lobs¼2lðuˆ;bˆÞ 2lðuˆ;gˆÞ, and we computedLobsfor 1000

data sets. The power is then the proportion of times (of 1000) that theP-value was,0.05. This translates to the number of times that Lobs lies in the upper 5th

per-centile of the empirical distribution. Figure 10 shows the results. The power to distinguish fluctuating selection from negative selection is excellent for all values ofb. However, the power to distinguish positive selection from fluctuating selection is not very good for smallband the power fluctuations are big; but for bigger values ofbthe power is satisfactory.

Conclusions: Fluctuating selection is potentially an important force in shaping polymorphism, and even though it has received much theoretical attention, it is commonplace to use population genetic models that

assume that all new mutations are under constant selec-tion (deleterious, neutral, or positive). We have devel-oped a statistical inference for fitting population genetic models that incorporate temporal fluctuations in the environment. Our simulation results suggest that the method has reasonable power to distinguish fluctuating selection from neutral evolution, with increasing power as the number of segregating sites increases. We also examined the possibility of distinguishing constant se-lection from fluctuating sese-lection. The power to do so was very good when we tried to distinguish negative se-lection from fluctuating sese-lection; however, when we looked at positive selectionvs. fluctuating selection, it became harder to detect true fluctuating selection when Figure8.—Coverage of 95% confidence intervals using two

different methods for building confidence sets. One is based on the normal approximation and the other uses the differ-ence in profile log likelihood. The coverage under the normal approximation is always higher than when using the profile log-likelihood function.

Figure 9.—Coverage of 95% confidence intervals under

u ¼ 60 using the normal approximation method for b, logb, andpffiffiffib.

Figure10.—Power to distinguish positive and negative

(10)

bis small since both models lead to an increase in high-frequency derived alleles.

We also note that fluctuating selection, much like po-sitive directional selection, leads to an increase in the ratio of divergence to polymorphism as compared to neutrality. The magnitude of this effect is very compa-rable forpffiffiffib (standard deviation of fluctuations) and

g (strength of directional selection) in the range of selection parameters likely to be important in molecular evolution. This implies that often-used tests of neutrality that make use of polymorphism and divergence (such as the HKA and McDonald–Kreitman tests) may reject neutrality if fluctuating selection has shaped the evolu-tion of the molecule and the rejecevolu-tion will occur ‘‘in the same direction’’ as for positive selection. This raises the tantalizing hypothesis that the signature of positive se-lection in Drosophila may really be the result of fluc-tuating (and not directional) selection. This hypothesis has also recently been put forth in a similar analysis by Mustonenand Lassig(2007).

This method also allowed us to make point estimates forðˆu;bˆÞand to compute confidence intervals for these parameters. Finally, we compared methods for making confidence bounds on our estimators. We showed that we obtained good coverage when using the likelihood function directly, but the normal approximation implied by using the Fisher information matrix was often overly conservative.

The approaches outlined here are also readily gener-alized to more complicated demographic models such as selection in a population undergoing size change (e.g., expansion, contraction, or bottlenecks) (Williamsonet al. 2005). Even though this analysis does not incorporate important factors such as population structure and as-sumes high levels of recombination, this method ap-pears to have good promise for detecting temporal fluctuations in selection from genomewide patterns of polymorphism.

LITERATURE CITED

Bustamante, C. D., J. Wakeley, G. Sawyerand D. L. Hartl, 2001

Di-rectional selection and the site-frequency spectrum. Genetics159:

1779–1788.

Cook, L. M., and D. A. Jones, 1996 The medionigra gene in the moth

panaxia dominula: the case for selection. Philos. Trans. R. Soc. Lond. B351:1623–1634.

Fisher_{, R. A., and E. B. F}ord_{, 1947} _{The spread of a gene in natural}

conditions in a colony of the mothPanaxia dominula.Heredity1:

143–174.

Gillespie, J. H., 1973 Natural selection with varying selection

coef-ficients-a haploid model. Genet. Res.21:115–120.

Gillespie_{, J. H., 1991} _{The Causes of Molecular Evolution.}_Oxford

Uni-versity Press, New York.

Gillespie_{, J. H., 1993} _{Substitution processes in molecular}

evolu-tion. I. Uniform and clustered substitutions in a haploid model. Genetics134:971–981.

Gillespie, J. H., 1994 Substitution processes in molecular

evolu-tion. II. exchangeble models from population genetics. Evolu-tion48:1101–1113.

Hartl, D. L., E. N. Moriyamaand S. A. Sawyer, 1994 Selection

in-tensity for codon bias. Genetics138:227–234.

Jensen, L., 1973 Random selective advantage of genes and their

probabilities of fixation. Genet. Res.21:215–219.

Karlin, S., and B. Levikson, 1974 Temporal fluctutations in

selec-tion intensities: case of small populaselec-tion size. Theor. Popul. Biol.

6:383–412.

Kimura_{, M., 1954} _{Process leading to quasi-fixation of genes in a}

population. Genetics39:280–295.

Kimura_{, M., 1962} _{On the probability of fixation of mutant genes in}

a population. Genetics47:713–719.

Lynch, M., 1987 The consequences of fluctuating selection for

iso-zyme polymorphisms in daphnia. Genetics115:657–669. Mueller, L. D., L. G. Barrand F. J. Ayala, 1985 Natural selection

vs. random drift: evidence from temporal variation in allele fre-quencies in nature. Genetics111:517–554.

Mustonen_{, V., and M. L}assig_{, 2007} _{Adaptations to fluctuating}

se-lection in Drosophila. Proc. Natl. Acad. Sci. USA104:2277–2282. O’Hara_{, R. B., 2005} _{Comparing the effects of genetic drift and}

fluc-tuating selection on genotype frequency changes in the scarlet tiger moth. Proc. R. Soc. Lond. Ser. B272:211–217.

Sawyer, S. A., and D. L. Hartl, 1992 Population genetics of

poly-morphism and divergence. Genetics132:1161–1176.

Takahata_{, N., K. I}shii_{and H. M}atsuda_{, 1975} _{Effect of temporal}

fluctuation of selection coefficient on gene frequency in a pop-ulation. Proc. Natl. Acad. Sci. USA72:4541–4545.

Watterson, G. A., 1975 On the number of segregating sites in

genet-ical models without recombination. Theor. Popul. Biol.7:256–276. Williamson, S. H., A. Fledel-Alonand C. D. Bustamante, 2004

Pop-ulation genetics of polymorphism and divergence for diploid selec-tion models with arbitrary dominance. Genetics168:463–475. Williamson, S. H., R. Hernandez, A. Fledel-Alon, L. Zhu,

R. Nielsen _{et al.}_{, 2005} _{Simultaneous inference of selection}

and population growth from patterns of variation in the human genome. Proc. Natl. Acad. Sci. USA102(22): 7882–7887. Wright, S., 1948 On the roles of directed and random changes in

gene frequency in the genetics of populations. Evolution2:279– 294.

Zhu_{, L., and C. D. B}ustamante_{, 2005} _{A composite-likelihood}

ap-proach for detecting directional selection from DNA sequence data. Genetics170:1411–1421.

Communicating editor: J. B. Walsh

APPENDIX

To derive the diffusion approximation, note that the change in frequency in one generation is

xð11sÞ

xð11sÞ1ð1xÞð11tÞx¼

ðstÞxð1xÞ

11sx1tð1xÞ

ðstÞxð1xÞ½1sxtð1xÞ:

Writingx¼1

2 ð12xÞand 1x¼12112xwe have

(11)

Taking the expected value and speeding up time by a factor of 2Nthe drift coefficient is

bðxÞ ¼xð1xÞð2NÞ½EðstÞ Eðs2t2Þ=21EðstÞ2_ð1

2xÞ: ðA1Þ

To compute the variance, letDXbe the change in frequency andYbe the environment:

varðDXÞ ¼EvarðDXjYÞ1varðEðDXjYÞÞ:

To evaluate the first term, recall that the variance of Binomial(2N,p) is 2Np(1p) so

varðDXjYÞ ¼ xð11sÞð1xÞð11tÞ

2N½xð11sÞ1ð1xÞð11tÞ2xð1xÞ=2N sincesandtare small. As we computed above

EðDXjYÞ ¼ ðstÞxð1xÞ

11sx1tð1xÞ:

This time we can ignore the contribution from the denominator:

varðEðDXjYÞÞ ¼x2ð1xÞ2

EðstÞ2_: Adding the two results and speeding up time by a factor of 2Ngives

aðxÞ ¼xð1xÞ1x2ð1xÞ2₂_NE_ð_s_t_Þ2_: _ð_A2_Þ

To simplify formulas we let

a¼2N½EðstÞ Eðs2t2Þ=21EðstÞ2₌₂

b¼2NEðstÞ2 so that

bðxÞ ¼xð1xÞðabxÞ aðxÞ ¼xð1xÞ½11bxð1xÞ: ðA3Þ

To compute the natural scale we want to solvec9(x)¼ 2b(x)c(x)/a(x). To do this, we begin by noting

2bðxÞ

aðxÞ ¼

2½abx

11xð1xÞb: ðA4Þ

The quadratic in the denominator has two rootsr1,0,1,r2given by

16pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi114=b

2 :

To evaluate the integral we write

2bðxÞ

aðxÞ ¼

2½abx

11xð1xÞb¼

2½a=bx

1=b1xð1xÞ¼

C xr11

D r2x:

The quadratic in the denominator has two rootsr1,0,1,r2given by

16pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi114=b

2 :

Note that the two roots are symmetric about1₂.

To find the constants we solveC1D¼2 andCr2Dr1¼ 2a/bto find

C ¼2r12a=b

r2r1 D¼

2r22a=b

r2r1 :

(12)

ðy _C

xr11 D

r2xdx ¼C lnðyr1Þ Dlnðr2yÞ

yields

cðyÞ ¼exp ðy

2bðxÞ

aðxÞ dx

¼ ðyr1ÞC_ð_r2_y_ÞD_:

Consider now the special case in whichsandthave the same distribution soE(st)¼0,E(s2_t2₎_¼_{0, and hence}

a¼b/2. Then

C ¼2r12a=b

r2r1 ¼

2ðr11=2Þ

r2r1 ¼ 1

D¼2r22a=b

r2r1 ¼

2ðr21=2Þ

r2r1 ¼1

and we have the very nice formula

cðyÞ ¼ ðyr1Þ1_ð_r2_y_Þ1_¼ 1 yð1yÞ11=b:

One can check this by noting that

c9ðyÞ ¼ 12y ½yð1yÞ11=b2

¼ 12y

yð1yÞ11=bcðyÞ ¼

2ðb=2byÞ

11byð1yÞcðyÞ

whena¼b/2 and comparing with (A4). Karlinand Levikson(1974) findc(y)¼ ½11by(1y)1in their p. 402, but this agrees with our computation since the solution ofc9(y)¼ 2b(y)c(y)/a(y) is determined only up to a constant multiple. To make it easier to compare with their formulas, we let

cðyÞ ¼ 1

byð1yÞ11¼b

1_ð_y_r1_Þ1_ð_r2_y_Þ1_: _ð_A5_Þ

To compute the natural scalef, we integrate to find

fðxÞ ¼b1

ðx

0

ðyr1Þ1_ð_r2_y_Þ1_dy

¼ 1

bðr2r1Þ

ðx

0 1 yr11

1 r2ydy

¼ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi1

b214b

p ½lnðxr1Þ lnðr1Þ lnðr2xÞ1lnðr2Þ: ðA6Þ

This is close to but not exactly the same as that in Karlinand Levikson(1974). Their roots arel₂¼r₁andl₁¼r₂, and they writew¼b/2, so their constant has 2binstead of 4b.

Sincef(0)¼0, the probability of fixation starting from frequencyxis

hðxÞ ¼fðxÞ

fð1Þ¼

ln½ðxr1Þ=r1r2=ðr2xÞ

ln½ð1r1Þ=r1r2=ðr21Þ

sincer21₂¼1₂r1andr21¼ r1sofð1₂Þ=fð1Þ ¼1₂and we can write the above as

1 21

fðxÞ fð1=2Þ

fð1Þ ¼

1 21

log½ðxr1Þ=ðr2xÞ

2 log½r2=ðr1Þ :

(13)

The speed measure

mðyÞ ¼ 2

aðyÞcðyÞ¼2

byð1yÞ11 yð1yÞð11byð1yÞÞ¼

2 yð1yÞ

is exactly the same as for the neutral case. Sincef(0)¼0, Green’s functionG(x,y) is

2fðxÞ

fð1Þ

fð1Þ fðyÞ

yð1yÞ 0#x#y

2fð1Þ fðxÞ

fð1Þ

fðyÞ

yð1yÞ y#x#1:

The expected time to fixation is

Ext¼2

fðxÞ

fð1Þ

ð1

x

fð1Þ fðyÞ

yð1yÞ dy12

fð1Þ fðxÞ

fð1Þ

ðx

0

fðyÞ

yð1yÞdy:

Since 1/(1y)y¼1/(1y)11/yhas antiderivative log(y/(1y)), integrating by parts gives

¼2fðxÞ

fð1Þðfð1Þ fðyÞÞ log

y 1y

1

x

12fðxÞ

fð1Þ

ð1

x

cðyÞlog y 1y

dy

12fð1Þ fðxÞ

fð1Þ fðyÞ log

y 1y

x

0

2fð1Þ fðxÞ

fð1Þ

ðx

0

cðyÞlog y 1y

dy:

Sincef(1)f(y)f9(1)(1y) asy/1 andxlogx/0 asx/0, evaluating the first term at 1 gives 0, as does evaluating the third term at 0. Evaluating the first term atxcancels with evaluating the third atx, so the above becomes

¼2fðxÞ

fð1Þ

ð1

0

cðyÞlog y 1y

dy1 ðx

0

2cðyÞlog 1y y

dy:

c(y) is symmetric about1₂and log(y/(1y)) is antisymmetric about1₂so the first integral vanishes and Ext¼

ðx

0 2

11byð1yÞlog

1y y

dy:

Whenx#1

2;ð1yÞ=y$1 and hence the log is nonnegative throughout the range of integration, soExtis a decreasing function ofb. Whenx$1

2we can use the fact that the total integral is 0 to write Ext¼

ð1

x

2

11byð1yÞlog

y 1y

dy

and we again conclude that this decreases asbincreases. This result, which is on p. 402 of Karlinand Levikson (1974), is somewhat surprising since (A3) shows that the diffusion has a drift toward1₂, which will encourage it to spend more time at intermediate values. However, as the computations above show, this effect is counteracted by the increase ina(x).

By Hartl and Sawyer’s formula the site-frequency spectrum is

vðxÞ ¼ lim

N/‘2NuGð1=2N;yÞdy¼

uf9ð0Þ

fð1Þ

fð1Þ fðyÞ

yð1yÞ ;

which up to a constant is

2

yð1yÞlog

1r1 yr1

r2y r21