Copyright2000 by the Genetics Society of America
Multipoint Mapping of Viability and Segregation Distorting Loci
Using Molecular Markers
Claus Vogl*
,†and Shizhong Xu
†*Department of Biology, University of Oulu, FIN-90401 Oulu, Finland and†Department of Botany and
Plant Sciences, University of California, Riverside, California 92521
Manuscript received July 13, 1998 Accepted for publication April 3, 2000
ABSTRACT
In line-crossing experiments, deviations from Mendelian segregation ratios are usually observed for some markers. We hypothesize that these deviations are caused by one or more segregation-distorting loci (SDL) linked to the markers. We develop both a maximum-likelihood (ML) method and a Bayesian method to map SDL using molecular markers. The ML mapping is implemented via an EM algorithm and the Bayesian method is performed via the Markov chain Monte Carlo (MCMC). The Bayesian mapping is computationally more intensive than the ML mapping but can handle more complicated models such as multiple SDL and variable number of SDL. Both methods are applied to a set of simulated data and real data from a cross of two Scots pine trees.
C
HROMOSOMAL regions that cause distorted seg- are often caused by structural differences between chro-regation ratios in early life stages may be referred mosomes (Whitkus1998), i.e., by events before fertil-to as segregation-disfertil-torting loci (SDL). These disfertil-tor- ization.tions are caused either by differential representation Haploid life stages can be exposed to selection, espe-of SDL genotypes in gametes before fertilization or by cially in plants. In the life cycle of mosses, the haploid viability differences of SDL genotypes after fertilization life stage (the gametophyte) is dominant over the dip-but before genotype scoring. In both cases, the observ- loid life stage (the sporophyte). In vascular plants, maize able phenotype is a distortion of marker locus genotypes gametophytic mutations indicate that pollen tube in chromosomal regions close to the SDL. Hence, re- growth rates are determined in part by the genotypes gardless of the timing of action of the SDL, mapping of the microgametophytes (reviewed inGrant1975). of locations and estimation of effects of SDL follow the Viability selection after fertilization may be more
im-same statistical treatment. portant than gametic selection. Viability selection is
Let us first discuss mechanisms that cause deviated common in consanguinous matings where inbreeding segregation ratios by altering the gametic proportions. depression reduces the survival of homozygotes com-With meiotic drive, gametic proportions become dis- pared to heterozygotes (Charlesworthand Charles-torted during meiosis because one chromosome type worth1987). Viability selection gives rise to segrega-may preferentially end up in the egg nucleus (meiotic tion ratios distorted from 1:2:1 at linked loci. Inbreeding drive). Meiotic drive is known, e.g., for the maize chro- depression is often expressed in very early life stages mosome 10 where a variant carrying a heterochromatic (Husband and Schemske 1996). In Scots pine, only knob is preferentially transmitted (reviewed inGrant ⵑ15% of self-fertilized embryos develop into mature
1975). Gametes carrying a certain allele act to render seeds, whereas ⵑ75% do so in wind-pollinated seeds gametes carrying the homologous chromosome, e.g., (Ka¨rkka¨inenet al. 1996). Some aspects of the genetic the segregation distorter (SD) and sex ratio (SR) loci basis of inbreeding depression require further investiga-of Drosophila and the t-alleles investiga-of mice (e.g.,Hartland tion, e.g., number and effects of loci and degree of
Clark1997, p. 244ff). Meiotic drive can be a powerful dominance. Yet these factors have major consequences selective force. The t-alleles are maintained in the popu- for mating system evolution (Charlesworth and lation, even though they are homozygous lethals, due Charlesworth1998), conservation genetics (Hedrick to their 0.95 probability of being passed to the next 1994), and plant breeding (e.g., Williams and Savo-generation in heterozygotes. In many species hybridiza- lainen1996). A biased segregation ratio due to viability tions, outbreeding depression and segregation distor- differences of genotypes also occurs in the F
2generation
tion have been observed in the F2 generation. These of wide crosses. This is generally thought to be caused
by epistatic interactions.
Often events before fertilization cannot be distin-Corresponding author: Claus Vogl, Department of Botany and Plant
guished from events after fertilization. McColdrick
Sciences, University of California, Riverside, CA 92521.
E-mail: [email protected] andHedgecock(1997) reported that crosses of
trea gigas, the Pacific oyster, produced biased segrega- tially informative. Furthermore, due to the effects of SDL, estimation of map distances of markers might be-tion ratios when tested as adults. Later Launey and
Hedgecock(1999) showed that, for many loci, the ra- come biased (Lorieux et al. 1995a,b; Liu 1998). This might cause the interval mapping method to become tios were Mendelian when 6-hr-old larvae were assessed,
but the ratios deviated from the Mendelian ratios when inefficient and biased.
the animals were 2 to 3 mo old in the same crosses. The SDL analysis is based on binomial (or multinom-Hence, the differences are due to post-fertilization via- ial) distributions instead of normal distributions, and
bility selection. hence multiple regression is not readily available and
Quantitative trait loci (QTL) are usually mapped in cannot be combined with conventional interval map-agronomically important plants and animals. To in- ping as in the composite interval mapping (CIM;Zeng
crease differences of parental types, and thus to increase 1994) or the multiple QTL mapping (MQM) scheme the power of mapping, crosses are often conducted be- (JansenandStam1994). Therefore, multiple SDL on tween inbred lines or between distantly related cultivars a single chromosome pose an unsolved theoretical prob-or even between species. As discussed above, these con- lem. On the other hand, if maps are inferred correctly ditions promote segregation distortion. and if SDL on different chromosomes do not interact For molecular characterization of the genetic causes epistatically, i.e., SDL effects combine multiplicatively, of distorted segregation ratios, mapping of the location linkage to an SDL is solely responsible for the pheno-and effects of SDL would be desirable. As the phenotype type. SDL analysis of one chromosome is therefore usu-in SDL mappusu-ing is different from that of QTL mappusu-ing ally independent from other chromosomes.
(data in SDL mapping usually consist of frequencies of We present a multipoint method for mapping multi-genotypes among survivors), QTL methods cannot be ple SDL using a backcross design. The multipoint used for SDL mapping. Development of advanced meth- method is developed under both the maximum-likeli-ods for estimation of locations and effects of SDL has hood and the Bayesian frameworks.
been lagging behind that for QTL mapping. In the past, often a single marker was considered at a time, where
only the linkage between one fully informative marker THEORY
and a single SDL was tested (Sorensen1967;Servitova´
Model:We develop and present the model under a and Cetl 1984; Hedrick and Muona 1990; Fu and
backcross design only, although the method can be
Ritland 1994a; Ka¨rkka¨inen et al. 1999). In a
single-applied to other controlled mating designs as well. We marker test, the number of distinguishable genotypic
assume that the parents that initiate the cross are pure configurations of the marker is at best equal to the
inbred lines. The F1of the cross is backcrossed to one
number of genotypic configurations of a linked SDL,
of the parents and a total of N individuals are generated but the genotypic frequencies of the marker are affected
in the backcross (BC) family for mapping. We are inter-by the recombination fraction in addition to the
fre-ested in mapping loci responsible for segregation distor-quencies of the SDL’s genotypic configurations. Hence,
tion using multiple markers that are already mapped for a single-marker test, estimations of the position and
on the genome. The data here are the observed marker effect are confounded.
genotypes (configurations). The parameters, however, Errors in marker genotyping may also cause systematic
are the number of SDL, the locations, and effects of deviations from the expected segregation ratio.
Ran-these loci. We assume that all markers are neutral in domly amplified polymorphic DNA (RAPD) markers
the sense that their segregations would be Mendelian are often misscored as a faint band and may be
interpre-if there were no linked SDL on the same chromosome. ted as absent. This may lead to misscoring of only a
The observed segregation distortions on these neutral single marker. In contrast, if segregation distortion is
markers, however, are caused by one or more SDL near caused by SDL, all markers in the vicinity of the SDL
the markers. will be affected.
Note that the flow of causality is from the SDL to the
Fu and Ritland (1994b), Mitchell-Olds (1995),
genotypic configurations of the SDL, then from the and Cheng et al. (1996) have developed
maximum-genotypic configurations of the SDL to the maximum-genotypic likelihood methods for mapping one SDL using
flank-configurations of the marker loci, and finally from the ing markers, i.e., an interval mapping strategy (Lander
genotypic configurations of the marker loci to the ob-andBotstein1989). Given a map of fully informative
served marker information. We first consider a single markers, no missing data, no interference between
re-SDL. The genotype of the F1 is heterozygous and that
combinations, and no more than one SDL per
chromo-of a BC individual (generated from F1 backcrossed to
some, this theory can be used to scan the genome for
the first inbred parent) is either heterozygous or homo-SDL. Under these assumptions, loci outside the interval
zygous for the allele of the first parent with an unequal flanking the SDL contribute no information to the
seg-probability. The degree of asymmetry in the probability regation of the SDL. But more than one SDL per
conditional on the jth inheritance digit the jth marker
i⫽
0 if i is homozygous
1 if i is heterozygous information is independent from all other variables. Given the position () of the SDL on the chromo-for i ⫽ 1, . . . , N. This indicator variable, i, is also some, the joint distribution foriandφi1, . . . ,φiMis called the “inheritance digit” because it indicates which
Pr(φi1, . . . ,φiM,i|,)⫽Pr(φi1, . . . ,φiM|i,)Pr(i|), of the two alleles carried by the F1 has been inherited
(5) to the ith progeny. Parameters of interest are the effect,
denoted by, and location, denoted byof the segrega- where Pr(φ
i1, . . . , φiM|i, ) can be found using the tion distorting locus. The distribution ofiis Bernoulli property of a two-state Markov chain (Lander and with Green 1987; Jiang and Zeng1997). We assume that there is no interference between two consecutive cross-Pr(i|,)⫽Pr(i|)⫽ 1⫺i(1⫺ )i (1)
overs so that Haldane’s mapping function applies.
Un-for i⫽1, . . . , N, with der this assumption, the sequence
⫽Pr(i ⫽0). (2)
兵
φi1, . . . ,φik,i,φi(k⫹1), . . . ,φiM其
Note that in the SDL case the distribution of the inheri- forms a Markov chain with two discrete states, where tance digit of the SDL given is independent of the the markers are ordered according to their positions location. Another parameter of interest is the location on the chromosome and the SDL is located between of the SDL on the chromosome, denoted by, which markers k and k⫹1. We, thus, have
will be dealt with later. In the absence of segregation distortion, we have ⫽1⁄
2. Therefore, the deviation of Pr(φi1, . . . ,φiM|i,)⫽
冤
Pr(φik|i,)兿
k⫺1j⫽1
Pr(φij|φi(j⫹1))
冥
from 1⁄
2 is the effect or size of the SDL. If i were observable, we could directly estimate and test. The
⫻
冤
Pr(φi(k⫹1)|i,)兿
M⫺1
j⫽k⫹1
Pr(φi(j⫹1)|φij)
冥
,maximum-likelihood estimate would be
(6)
ˆ ⫽ 1 N
兺
N
i⫽1
(1 ⫺ i) (3)
where
if we could maximize the following log-likelihood
func-Pr(φij|φi[j⫹1))⫽
1⫺rj(j⫹1) ifφij ⫽φi(j⫹1)
rj(j⫹1) ifφij ⬆φi(j⫹1)
tion:
is the transition probability between two consecutive l(|)⫽
兺
N
i⫽1
ln Pr(i|)
loci and rj(j⫹1) is the recombination fraction between
loci j and j⫹1. The transition probability between the
⫽
兺
Ni⫽1
[(1⫺ i)log ⫹ ilog(1⫺ )]. (4) SDL and the nearby marker k is
Butiis not observable; only the inheritance digits of Pr(φ
ik|i,)⫽
1⫺rkl if φik⫽φi rkl ifφik⬆φi, marker alleles can be observed. Therefore, an entirely
different approach is required to estimate. Consider
where rklis the recombination fraction between the kth M markers with known map positions on the
chromo-marker and the SDL identified as locus l. The transition some of interest. Define the inheritance digits of the
probability between the SDL and the (k⫹1)th locus is ith individual at the jth marker locus as
obtained similarly.
Let Ii⫽ [Ii1, . . . , IiM]. Combining formula (6) with
φij⫽
冦
0 if i is homozygous for marker j
1 if i is heterozygous for marker j the marker information and “summing out” the marker inheritance digits, we get
for i⫽1, . . . , N. Without genotyping errors, there are
just three possibilities of marker information Iij of the Pr(Ii|i,)⫽
兺
φi1 . . .
兺
φiM
冢
Pr(φi1, . . . ,φiM|i,)
兿
Mj⫽1
Pr(Iij|φij)
冣
, ith individual at the jth marker locus. The first two casesare mutually exclusive events: either one or the other where we have made use of the independence from marker inheritance digit is observed. In the third case other markers of the jth marker information conditional of a missing observation, we define the marker infor- on the jth marker inheritance digit. Combining the mation as the union of the former two cases. Thus, previous formula with formula (5) results in the follow-Pr(Iij|φij)⫽ 1 if the marker information is compatible ing equation:
withφij and Pr(Iij|φij) ⫽ 0 otherwise. In the latter case,
Pr(Ii,i|,)⫽ Pr(Ii|i,)Pr(i|) Pr(Iij|φij)⫽ 1 is equal to 1 independent of the
inheri-tance digit. If there are genotyping errors Pr(Iij|φij) will ⫽
Pr(Ii|i,)(1⫺i)(1⫺ )i. (7) assume values intermediate between 0 and 1. Note that
proba-bility model, we now introduce a maximum-likelihood the SDL, , can be obtained by examining the likeli-method to estimate and test the SDL. There are several hood-ratio profile along the chromosome, as is com-ways to find the maximum-likelihood estimate of ; monly done in interval mapping of QTL.
we adopt an expectation maximization (EM) algorithm Bayesian analysis: We now introduce the Bayesian and treat i as missing data. We treat as a known analysis of SDL implemented via the Markov chain constant for the moment. Let I⫽[I1, . . . , IN] and ⫽ Monte Carlo (MCMC). We first classify variables into [1, . . . ,N]. For the EM algorithm we need to determ- observables and unobservables. The observables are the ine the logarithm of Pr(I,|,), i.e., data, denoted by I. The unobservables include parame-ters and missing information. The parameparame-ters here in-log Pr(I,|,)⫽
兺
N
i⫽1
log[Pr(Ii|i,)(1⫺i)(1⫺ )i]⫽const cludeand, and the missing information consists of the inheritance digitsφandφin the current situation.
⫹
兺
Ni⫽1
[(1⫺ i)log()⫹ ilog(1⫺ )]. We always sum over all the missing information, such that inheritance digits will only appear in intermediate (8)
steps. The joint posterior distribution of the parameters The constant does not depend on the parameter of is
interest,.
Pr(,|I )⬀Pr()Pr()
兿
N
i⫽1
Pr(Ii|,) Conditional on the data, the position, and the initial
value of the parameter,(0), the posterior probabilities
ofi ⫽0 andi ⫽1 are, respectively, ⫽Pr()Pr()
兿
N
i⫽1
兺
iPr(Ii|i,)Pr(i|), (12) Pr(i⫽ 0|Ii,(0),)
where Pr() and Pr() are the prior distributions for
⫽ Pr(Ii| ⫽ 1,)(1⫺ (0))
Pr(Ii| ⫽0,)(0) ⫹Pr(Ii| ⫽1,)(1⫺ (0))
the parameters of interest; beta with Beta(1, 1) for the former and uniform for the latter. Samples are simu-lated from the joint posterior distribution via the (9a)
MCMC. In the MCMC analysis, instead of sampling all and
the unobservables simultaneously, we sample one unob-Pr(i⫽1|Ii,(0),)
servable at a time with others taking values simulated in the previous cycle. When all the unobservables are
⫽ Pr(Ii| ⫽0,)(0)
Pr(Ii| ⫽0, )(0)⫹Pr(Ii| ⫽1, )(1⫺ (0)) .
updated, we have completed one cycle of the Markov chain. When the chain reaches a stationary stage, subse-(9b)
quent samples are considered to be drawn from the joint posterior distribution.
Because Pr(i|Ii,(0),) follows a Bernoulli distribution,
Starting with an initial value for each parameter, {(0),
the probability in (9a) is equivalent to the expectation
(0)}, we sample using the Metropolis-Hastings
algo-E [i|Ii, (0), ] ⫽ ˆ(0)i . Taking the expectation of (8)
with respect toand substitutingiinto the resulting rithm (e.g.,Gelmanet al. 1995). A new proposal,*, is formula, we have completed the expectation step in the sampled from a beta proposal distribution J(*|(0))⫽
EM-algorithm. The M-step consists of maximizing the Beta((0)N ⫹ 2, (1 ⫺ (0))N ⫹ 2). The proposal * is
resulting equation to obtain accepted with probability min{1, a(*,(0))}, where
(1)⫽
兺
N
i(1⫺ ˆ(0))i
N . (10) a(*,(0))⫽
Pr(*, (0)|I )
Pr((0),(0)|I )
J((0)|*)
J(*|(0)). (13)
Equations 9 and 10 are iterated until convergence. Note that the first term is the ratio of posterior probabili-We can now test the null hypothesis that there is no ties of the parameters and the second term is the ratio segregation distortion for the particular location. The
of proposal probabilities. If * is accepted, we take null hypothesis is formulated as H0: ⫽1⁄2, which can (1)⫽ *; otherwise we do not update the effect of the
be tested using the likelihood-ratio test statistic ⌳ ⫽
SDL and simply take (1) ⫽ (0). The beta proposal ⫺2(l(1⁄
2,)⫺l(ˆ ,)), where l(ˆ ,) is the log likelihood distribution assures that 0ⱕ ⱕ1. The simulated value
of , denoted by (1), is then used to generate. We
log Pr(I|,)⫽
兺
Ni⫽1
log
冤
兺
iPr(Ii,i|,)
冥
(11)use the Metropolis algorithm (e.g.,Gelmanet al. 1995). First, a new value ofis proposed by a small perturba-evaluated at the maximum-likelihood estimate ˆ , and
tion from(0), i.e.,
l(1⁄
2, ) ⫽ N log(1⁄2) is the log-likelihood value under
Mendelian segregation. Under the null model,⌳is ap- *⫽ (0)⫾ x,
proximately distributed as a chi-square variable with 1
where x is a uniform variable sampled from U(0, d) and d.f.
of the linkage group. We accept this proposal with prob- Pr(L) truncated at Lmax. After each existing SDL has
been updated, we propose two types of move to update ability min{1, a(*, (0))}, where
L, adding a locus if L⬍Lmax (with probability pa) and
deleting a locus if L⬎0 (with probability pd).
a(*,(0))⫽ Pr(*, (1)|I )
Pr((0),(1)|I ). (14)
For adding an SDL, a new location L⫹1 and effect L⫹1are sampled from their uniform priors for the new
If* is accepted, we take(1)⫽ *; otherwise(1)⫽ (0).
SDL. The new sets of parameters are*⫽((0),
L⫹1)
Multiple-SDL model: Consider the joint action of L
and*⫽((0),
L⫹1). We then accept this new SDL with
SDL located on the chromosome of interest. Define the
probability min{1, a(L⫹ 1, L)}, where locations of these SDL by⫽ {l} for l⫽1, . . . , L, in
contrast to the single-SDL model where is a scalar.
a(L⫹ 1, L)⫽Pr(I|*,*, L⫹ 1) Pr(I|(0), L)
1 L⫹1
pd
pa
. (19)
Also define the marginal effects of the SDL by⫽{l} for l⫽1, . . . , L. Assume that these SDL act
multiplica-tively then the joint effect of all the SDL can be formu- If the new SDL is accepted, its location and effect are lated as a product of these marginal effects. Define accepted simultaneously; otherwise, the number of SDL
i⫽ [i1, . . . ,iL] andφi⫽[φi1, . . . ,iM] as vectors of remains the same. In the deleting step, a random SDL is inheritance digits of all SDL and marker loci, respec- proposed to be deleted. Then the SDL are renumbered tively, for the ith individual. Using Bayes’ theorem, the such that the candidate SDL is the last SDL, i.e., the joint posterior distribution ofican be formulated as Lth SDL. The new parameter sets will be * ⫽ (1(0), . . . ,L⫺1(0)) and*⫽(1(0), . . . ,L⫺1(0)). The proposal
Pr(i|,)⫽
(
兿
Ll⫽⫺11 Pr(i(l⫹1)|il,))兿
Ll⫽1Pr(il|l)
兺
φi(兿
L⫺1
l⫽1Pr(i(l⫹1)|il,))
兺
Ll⫽1Pr(il|l). is accepted with probability min{1, a(L⫺1, L)}, where
a(L⫹ 1, L)⫽Pr(I|*,*, L⫺ 1) Pr(I|(0),(0), L)
L 1
pa
pd
. (20) (15)
The joint posterior distribution of the parameters is
Note that we handle SDL within the same marker inter-val in exactly the same way as SDL in different interinter-vals Pr(,|I )⬀Pr()Pr()
兿
N
i⫽1
兺
i(Pr(Ii|i,)Pr(φi|,)),
and that (20) is just the inverse of (19). Our interpreta-tion of the terms (L ⫹ 1)⫺1 and L in (19) and (20),
(16)
respectively, differs from the usual. Usually, these terms where Pr()⫽ ⌸L
l⫽1Pr(l), Pr()⫽ ⌸Ll⫽1Pr(l), and are included to account for a perceived imbalance in the number of loci selected for a delete step vs. that selected for an addition step if the order of loci is not Pr(Ii|i,)⫽
Pr(Ii,i|)
Pr(i|) fixed. We believe that the balance is one to one in both
the addition and deletion steps and no balancing is
⫽
兺
i(Pr(i,i|)兿
jPr(Iij|φij)) Pr(i|). (17) necessary; we include these terms because of the Poisson
prior. The difference to the usual algorithm, however, Under the multiple-SDL model, formulation of an EM is just a minor modification of the prior distribution algorithm seems impossible. On the other hand, the and thus irrelevant in most biological applications. Bayesian method requires little modification: instead of
updating the effect and location of a single locus at a
time,andare updated iteratively for all loci. APPLICATIONS
With the Bayesian approach, the number of SDL (L)
To illustrate the method, a simulation study and an can be treated as an unknown variable. This involves a
analysis of a data set from one cross of two Scots pine change in the dimension of the model. Reversible jump
(Pinus sylvatica) trees are presented. The simulation MCMC (Green 1995;Satagopan andYandell 1996;
study conforms to an inbred line BC situation. In the
Heath 1997; Richardson and Green 1997;
Sillan-pine data analysis, we concentrate on the maternal part
pa¨a¨andArjas1998;StephensandFisch1998) is an
of the progeny of a single tree, i.e., a pseudobackcross extension to the Metropolis-Hastings sampler,
permit-design. In a backcross it is not possible to distinguish ting moves to be made between models with different
between gametic selection and viability selection after dimensions. The joint posterior distribution of the
pa-fertilization. rameters is
Simulations: In the simulation study, first, a single viability locus that eliminates 50% of the progeny of the Pr(,, L|I )⬀Pr(|L)Pr(|L)Pr(L)
heterozygous genotype, i.e., ⫽ 2⁄
3, was placed in the ⫻
兿
Ni⫽1
兺
i(Pr(Ii|i,, L)Pr(i|,, L)), (18) middle of a chromosome of length 1 M; six markers were spaced at regular intervals of 0.2 M along the chromosome; no missing data were considered. In the where Pr(L) is the prior probability of the number of
the single-SDL situation were placed at locations 0.33 M and 0.67 M, respectively. In both cases, simulations with sample sizes of 500 were repeated five times and results were compared; additionally, simulations with sample sizes of 100 and less were also performed. Compared to empirical reports of distortions of marker loci from Mendelian ratios, the simulated effect is high but not unrealistic. The marker map is rather dense and fully informative.
The outcomes of the analyses of the five simulated data sets were almost identical such that we present only one of them. In the maximum-likelihood (ML) analysis, the number of SDL was fixed to one. The inferred effect, the likelihood-ratio statistic⌳, is reported at each loca-tion. We also performed an MCMC analysis of the same data. From Figure 1A, we see that the position and effect of the SDL are estimated quite accurately. For the other four simulations, the inferred positions were also mostly between the two middle markers and the estimated ef-fects were close to the true value. Reducing sample sizes did not appreciably change the estimate of location or effect. The likelihood-ratio statistic, however, dropped considerably (results not shown). We do not present the ML results with two SDL, because the model is not appropriate.
With the Bayesian MCMC analysis, the Poisson prior mean was set to ⫽1 and the maximum number of SDL was set to three. The chain length was 105. The
chain was thinned by storing only after every 10th cycle. No burn-in period was discarded because the chain reached approximate stationarity very quickly. The pos-terior probability of the simulated number of SDL (i.e., one or two, respectively) was always between 0.6 and 0.9. In the one-SDL case, frequencies are higher at the center, i.e., close to the simulated position (Figure 1B). Effects are very similar to those estimated with the ML method. In the two-SDL case, posterior distributions of both the locations as well as the effects are about correct
Figure 1.—Simulated data. (A and B) A simulation with (Figure 1C). It can be easily discerned from the
poste-one SDL; (C) a simulation with two SDL. The scale on the x-rior distribution of frequencies that there are actually axis is 1 M, the positions of the markers are indicated with two SDL present. When the number of individuals was an “⫻,” while the positions of the SDL are indicated with a circle. “Likelihood” refers to the broken line and to twice the reduced, the posterior probability of the different
num-log-likelihood ratio; “frequency” to the posterior probability bers of SDL approached that of the prior distribution
of an SDL in an interval of 0.04 the length of the linkage rapidly (data not shown). This corresponds to the
de-group; and “effect” to the solid line and to the probability of crease in the likelihood-ratio statistic with decreasing finding the homozygote genotype in the BC.
sample size.
Pine data: In the second application, data consisted of the megagametophytes of open-pollinated offspring
of a single Scots pine P. sylvestris tree, P304 (Hurme C02-680, G13-750, K09-750, E09-250, and AC15-270 at positions 0.038 M, 0.115 M, 0.287 M, 0.461 M, and 0.478 andSavolainen1999). Megagametophytes are haploid
tissues consisting of the maternal part of the seedling’s M, respectively. As determined from other crosses, the map length of the whole linkage group wasⵑ0.85 M. genome and can be scored at the seedling stage without
damaging the seedling. We treated the progeny of this The sample size was 73 individuals, and in many individ-uals some markers were scored as missing.
tree as a pseudobackcross family. Map distances and
linkage phases were determined with Mapmaker as de- With the ML analysis, the log-likelihood ratio statistic was appreciable only close to the marker G13-750 (Fig-scribed inHurmeandSavolainen(1999). Five RAPD
were inferred, most often location and effect of one of the SDL was similar to the single-SDL case, while the other counteracted its effect at the other end of the linkage group (Figure 2C).
DISCUSSION
Herein, a method for mapping SDL in a backcross is presented. The method makes efficient use of a map of partially or fully informative marker loci by using the multipoint method (Lander and Green 1987; Jiang
andZeng1997). A maximum-likelihood analysis via an EM algorithm as well as a Markov chain Monte Carlo Bayesian analysis using a reversible jump algorithm for varying the number of loci is presented in detail. Given a dense marker map, the method can be used for preci-sion analysis of positions and effects of the SDL. The best previously available methods (FuandRitland1994b;
Mitchell-Olds1995;Chenget al. 1996) rely on fully informative markers flanking the putative SDL and as-sume just one SDL per chromosome.
With our approach, it is possible to efficiently analyze the number, positions, and effects of SDL in organisms, for which a high-resolution marker map has been devel-oped and where inbred line crosses can be performed easily. Analysis can be extended easily to a general full-sib family or to the selfing of an outcrossing individual: the dimension changes from two to four, binomial distri-butions change to multinomial distridistri-butions, and the transition probabilities between adjacent loci change. Marker information now contributes to the full or par-tial identification of four combinations of genotypic configurations. As with the BC case, partial marker infor-mation can be defined as the union of compatible cases. All the above changes are rather trivial consequences of the change in dimension but complicate presentation substantially. Additionally, the missing phase informa-tion needs to be considered. Furthermore, the
multi-Figure2.—Pine data. The notation is the same as in Figure pointing algorithm becomes more important for the 1. The ML result is presented in A, and the posterior
distribu-full-sib design. tion of the single-SDL case is in B and of the two-SDL cases
Presently, our method for the backcross can only be in C. The marker loci are (from left to right) C02-680,
G13-used to analyze the SDL currently segregating in the two 750, K09-750, E09-250, and AC15-270.
lines, not those that have been segregating in the ancest-ral population from which the inbred lines derived. Segregation distortion might have already affected the of the heterozygous genotype ofⵑ0.2 over the Mende- inbreeding process for creation of the lines. Extrapola-lian value of 0.5. For the Bayesian MCMC analysis, the tion from the current to the ancestral situation is there-prior distribution was the same as for the simulation fore problematic. This problem is even more pressing study. The posterior probabilities of zero, one, two, and for recombinant inbred lines, where overrepresention three SDL were 0.01, 0.15, 0.61, and 0.23, respectively. of chromosomal fragments of one or the other parent This result is, however, quite sensitive to the prior distri- is commonly observed (e.g., Lister and Dean 1993) bution of SDL number. We report the posterior distribu- and requires a more elaborate approach.
which restricts the achievable combinations of geno- to increase differences between parents and thus the power of mapping. Probably for this reason, markers typic proportions. On the other hand, SDL acting after
fertilization may alter genotypic proportions directly. with segregation ratio distortions are commonly ob-served in data sets used for QTL mapping resulting from Thus, many more combinations of genotypic
propor-tions are possible for SDL acting after fertilization. In wide crosses (e.g.,van Ooijenet al. 1994). Segregation ratio distortion is also commonly observed in doubled experimental crosses more complex than the backcross
design, inferred genotypic proportions of an SDL may haploid lines (e.g.,Fultonet al. 1997).
Usually generation of a linkage map of marker loci thus render unlikely prefertilization mechanisms of
seg-regation distortion. Two or more SDL acting before precedes QTL analysis. If a dense map of informative fertilization may, however, mimic the effect of SDL act- markers is inferred correctly, the bias introduced by ing after fertilization because of the increase in combi- segregation distortion into QTL analysis will be
negligi-natorial possibilities. ble. But if recombination fractions or, worse, order of
In hybrids of species or subspecies, segregation distor- marker loci are inferred incorrectly, basic assumptions tion commonly occurs (see, e.g., Whitkus 1998 and of QTL analysis do not hold and results will be imprecise references therein). This may be caused by structural at best. Hence, aside from being interesting in them-rearrangements, e.g., inversions, which constitute a pre- selves, SDL cause practical problems in QTL projects fertilization mechanism. Alternatively, the segregation as observed, e.g., bySandbrinket al. (1995). Thus, segre-distortion may be caused by postfertilization differences gation distortion should be accounted for in mapping in viability between genotypic configurations, most projects.
probably caused by epistatic interactions. Our method Segregation distortion is known to bias estimation can be used to detect chromosomal areas that are caus- of recombination fractions in two-point inference of ing these distortions. But because of the presumed epis- recombination distances between markers (Lorieuxet tasis, relaxation of the assumption of a multiplicative al. 1995a,b;Liu1998). If markers are fully informative, effect of different SDL may be necessary. estimation of the recombination fraction of only the Our method may also be used to map loci influencing markers flanking the SDL will be affected. Only in the early viability. This would enhance our understanding of unlikely case of coincidence of SDL and marker location the nature of early inbreeding depression. The method will no bias be observed. If less than fully informative provides another approach for estimating the number markers are used, the effects of the distortion are spread and effects of loci causing inbreeding depression. Tradi- out to the smallest interval of fully informative markers tionally, such information has been derived mainly from flanking the distorted region. As a remedy, markers that biometric analysis of crosses (e.g., Dudash and Carr show obvious segregation distortion are often excluded
1998). But as inbreeding depression can be expressed from the map. But that reduces coverage of the genome in embryonic life stages not amenable to biometric anal- and qualitative or quantitative trait loci might be missed. ysis, application of this method is limited. To gain in- Our method can be extended to allow for detection sight on these early life stages, sparse maps and single- of SDL concurrently with estimation of a linkage.Cheng marker methods have been used to infer the effect of et al. (1996) have already developed an EM algorithm a viability locus influencing inbreeding depression to infer positions of two fully informative markers in (Sorensen1967;ServitovaandCetl1984;Hedrick the presence of a single SDL (an interval method) in
andMuona1990;FuandRitland1994a;Ka¨rkka¨inen a backcross or doubled haploid lines. This could be
et al. 1999). With single-marker analysis, estimation of extended to a multipoint inference of a marker map in position and effect of the SDL is, however, confounded the presence of SDL by augmenting the EM or MCMC and multiple SDL on a single linkage group cannot be schemes presented herein by allowing the markers to handled at all. Interval methods (Fu and Ritland change their positions relative to each other.
1994b; Mitchell-Olds 1995; Cheng et al. 1996) rely The source code for a C⫹⫹program and executables on fully informative markers flanking the putative SDL for a Sun workstation, with which the above calculations and assume just one SDL per chromosome. Dense link- can be performed, are available from Claus Vogl (claus@ age maps of fully informative markers may be hard to genetics.ucr.edu).
obtain in closely related individuals that need to be
We thank Pa¨ivi Hurme and Outi Savolainen for the data set and considered in the analysis of inbreeding depression.
Elja Arjas, Anita de Haan, Mikko Sillanpa¨a¨, and Nengjun Yi for discus-Like the interval methods, our method requires a dense sion of this and related issues. Outi Savolainen, Elja Arjas, and Lori linkage map of polymorphic markers but is not re- Weingartner have commented on earlier versions of this manuscript. We thank Zhao-Bang Zeng and two anonymous reviewers for their stricted to fully informative markers; instead it can make
patient work, which helped to improve this article a lot. This work efficient use of, e.g., dominant markers.
was supported by grants from the Environment and Natural Resources Only rarely have data sets been gathered for mapping
Research Council and the Medical Research Council to Outi Savo-segregation distortion or viability selection (see, how- lainen and by the National Institutes of Health Grant GM-55321 and ever, Harushima et al. 1996 and Kuang et al. 1998). the U.S. Department of Agriculture National Research Initiative
1998 An allele responsible for seedling death in Pinus radiata LITERATURE CITED
D. Don. Theor. Appl. Genet. 96: 640–644.
Lander, E. S.,andD. Botstein,1989 Mapping Mendelian factors
Charlesworth, B.,andD. Charlesworth,1987 Inbreeding
de-underlying quantitative traits using RFLP linkage maps. Genetics pression and its evolutionary consequences. Annu. Rev. Ecol.
121:185–199. Syst. 18: 237–268.
Lander, E. S.,andP. Green,1987 Construction of multilocus
ge-Charlesworth, B.,andD. Charlesworth,1998 Some
evolution-netic maps in humans. Proc. Natl. Acad. Sci. USA 84: 2363–2367. ary consequences of deleterious mutations. Genetica 102/103:
Launey, S.,andD. Hedgecock,1999 Genetic load causes
segrega-3–19.
tion ratio distortion in oysters: mapping at 6 hours. Plant and
Cheng, R., A. SaitoandY. Ukai,1996 Estimation of the position
Animal Genome VII, abstracts W14, p. 33. and effect of a lethal factor locus on a molecular marker linkage
Lister, C.,andC. Dean,1993 Recombinant inbred lines for
map-map. Theor. Appl. Genet. 93: 494–502.
ping RFLP and phenotypic markers in Arabidopsis thaliana. Plant
Dudash, M. W.,andD. E. Carr,1998 Genetics underlying
inbreed-J. 4: 745–750. ing depression in Mimulus with contrasting mating systems.
Na-Liu, B. H.,1998 Statistical Genomics: Linkage, Mapping, and QTL
Analy-ture 393: 682–684.
sis. CRC Press, Boca Raton, FL.
Fu, Y.-B.,andK. Ritland,1994a Evidence for the partial dominance
Lorieux, M., B. Goffinet, X. Perrier, D. Gonza´lez de Leo´ nand
of viability genes contributing to inbreeding depression in
Mimu-C. Lanaud,1995a Maximum likelihood models for mapping
lus guttatus. Genetics 136: 323–331.
genetic markers showing segregegation distortion. 1. Backcross
Fu, Y.-B., and K. Ritland, 1994b On estimating the linkage of
populations. Theor. Appl. Genet. 90: 73–80. marker genes to viability genes controlling inbreeding
depres-Lorieux, M., X. Perrier, B. Goffinet, C. LanaudandD. Gonza´lez
sion. Theor. Appl. Genet. 88: 925–932.
de Leo´ n,1995b Maximum likelihood models for mapping
ge-Fulton, T.-M., J. C. NelsonandS. D. Tanksley,1997 Introgression
netic markers showing segregegation distortion. 2. F2-popula-and DNA marker analysis of Lycopersicum peruvianum, a wild
rela-tions. Theor. Appl. Genet. 90: 81–89. tive of the cultivated tomato, into Lycopersicum esculentum,
McColdrick, D. J.,andD. Hedgecock,1997 Fixation, segregation
followed through three successive backcross generations. Theor.
and linkage of allozyme loci in inbred families of the Pacific Appl. Genet. 95: 895–902. oyster Crassostrea giga (Thunberg): implications for the causes of
Gelman, A., J. B. Carlin, H. S. SternandD. B. Rubin,1995 Bayesian inbreeding depression. Genetics 146: 321–334.
Data Analysis. Chapman and Hall, London. Mitchell-Olds, T.,1995 Interval mapping of viability loci causing
Grant, V.,1975 Genetics of Flowering Plants. Columbia University heterosis in Arabidopsis. Genetics 140: 1105–1109.
Press, New York. Richardson, S., andP. J. Green, 1997 On Bayesian analysis of
Green, P. J.,1995 Reversible jump Markov chain Monte Carlo com- mixtures with an unknown number of components. J. R. Stat.
putation and Bayesian model determination. Biometrika 82: 711– Soc. B 59: 731–792.
732. Sandbrink, J. M., J. W. van Oijen, C. C. Purimahua, M. Vrielink,
Hartl, D. L.,andA. G. Clark,1997 Principles of Population Genetics, R. Verkerket al., 1995 Localization of genes for bacterial
resis-Ed. 3. Sinauer, Sunderland, MA. tance in Lycopersicon peruvianum using RFLPs. Theor. Appl. Genet.
Harushima, Y., N. Kurata, M. Yano, Y. Nagamura, T. Sasakiet al., 90:444–450.
1996 Detection of segregation distortions in an indica-japonica Satagopan, R. J.,andB. S. Yandell,1996 Estimating the number of rice cross using a high-resolution molecular map. Theor. Appl. quantitative trait loci via Bayesian model determination. Special Genet. 92: 145–150. Contributed Paper Session on Genetic Analysis of Quantitative
Heath, S. C.,1997 Markov-chain Monte Carlo segregation and link- Traits and Complex Diseases. Biometric Section, Statistical
Meet-age analysis for oligogenic models. Am. J. Hum. Genet. 61: 748– ing. Chicago, IL.
Servitova´, J.,andI. Cetl,1984 The use of recessive lethal
chloro-760.
phyll mutants for linkage mapping of Arabidopsis thaliana (L.)
Hedrick, P. W.,1994 Purging inbreeding depression and the
proba-Heynh. Arabidopsis Inf. Serv. 21: 59–64. bility of extinction: full-sib families. Heredity 73: 363–372.
Sillanpa¨a¨, M.,andE. Arjas,1998 Bayesian mapping of multiple
Hedrick, P. W.,andO. Muona,1990 Linkage of viability genes to
quantitative trait loci from incomplete inbred line cross data. marker loci in selfing organisms. Heredity 64: 67–72.
Genetics 148: 1373–1388.
Hurme, P.,andO. Savolainen,1999 Comparison of homology and
Sorensen, F. C.,1967 Linkage between marker genes and
embry-linkage of RAPD markers between individual trees of Scots pine
onic lethal factors may cause distrubed segregation rations. Silvae (Pinus sylvestris L.). Mol. Ecol. 8: 15–22.
Genet. 16: 132–134.
Husband, B. C.,andD. W. Schemske,1996 Evolution of magnitude
Stephens, D. A.,andR. D. Fisch,1998 Bayesian analysis of
quantita-and timing of inbreeding depression in plants. Evolution 50:
tive trait locus data using reversible jump Markov chain Monte 554–570.
Carlo. Biometrics 54: 1334–1347.
Jansen, R. C.,andP. Stam,1994 High resolution of quantitative
van Ooijen, J. W., J. M. Sandbrink, M. Vrielink, R. Verkerk, P. traits into multiple loci via interval mapping. Genetics 136: 1447–
Zabelet al., 1994 An RFLP linkage map of Lycopersicum
peruvi-1455.
anum. Theor. Appl. Genet. 89: 1007–1013.
Jiang, J.,andZ.-B. Zeng,1997 Mapping quantitative trait loci with
Whitkus, R.,1998 Genetics of adaptive radiation in Hawaiian and
dominant and missing markers in various crosses from two inbred Cook Island species of Tetramolopium (Asteraceae). II. Genetic lines. Genetica 101: 47–58. linkage map and its implications for interspecific breeding
barri-Ka¨rkka¨inen, K., V. KoskiandO. Savolainen,1996 Geographical ers. Genetics 150: 1209–1216.
variation in inbreeding depression in Scots pine. Evolution 50: Williams, C. G.,andO. Savolainen,1996 Inbreeding depression in 111–119. conifers implications for breeding strategy. For. Sci. 42: 102–117.
Ka¨rkka¨inen, K., H. Kuittinen, R. van Treuren, C. VoglandO. Zeng, Z.-B.,1994 Precision mapping of quantitative trait loci.
Genet-Savolainen, 1999 Genetic basis of inbreeding depression in ics 136: 1457–1468.
Arabis petrea. Evolution 53: 1354–1365.