Multipoint Mapping of Viability and Segregation Distorting Loci Using Molecular Markers

(1)

Copyright2000 by the Genetics Society of America

Multipoint Mapping of Viability and Segregation Distorting Loci

Using Molecular Markers

Claus Vogl*

,†

_{and Shizhong Xu}

†

*Department of Biology, University of Oulu, FIN-90401 Oulu, Finland and†_{Department of Botany and}

Plant Sciences, University of California, Riverside, California 92521

Manuscript received July 13, 1998 Accepted for publication April 3, 2000

ABSTRACT

In line-crossing experiments, deviations from Mendelian segregation ratios are usually observed for some markers. We hypothesize that these deviations are caused by one or more segregation-distorting loci (SDL) linked to the markers. We develop both a maximum-likelihood (ML) method and a Bayesian method to map SDL using molecular markers. The ML mapping is implemented via an EM algorithm and the Bayesian method is performed via the Markov chain Monte Carlo (MCMC). The Bayesian mapping is computationally more intensive than the ML mapping but can handle more complicated models such as multiple SDL and variable number of SDL. Both methods are applied to a set of simulated data and real data from a cross of two Scots pine trees.

C

HROMOSOMAL regions that cause distorted seg- are often caused by structural differences between chro-regation ratios in early life stages may be referred mosomes (Whitkus1998), i.e., by events before fertil-to as segregation-disfertil-torting loci (SDL). These disfertil-tor- ization.

tions are caused either by differential representation Haploid life stages can be exposed to selection, espe-of SDL genotypes in gametes before fertilization or by cially in plants. In the life cycle of mosses, the haploid viability differences of SDL genotypes after fertilization life stage (the gametophyte) is dominant over the dip-but before genotype scoring. In both cases, the observ- loid life stage (the sporophyte). In vascular plants, maize able phenotype is a distortion of marker locus genotypes gametophytic mutations indicate that pollen tube in chromosomal regions close to the SDL. Hence, re- growth rates are determined in part by the genotypes gardless of the timing of action of the SDL, mapping of the microgametophytes (reviewed inGrant1975). of locations and estimation of effects of SDL follow the Viability selection after fertilization may be more

im-same statistical treatment. portant than gametic selection. Viability selection is

Let us first discuss mechanisms that cause deviated common in consanguinous matings where inbreeding segregation ratios by altering the gametic proportions. depression reduces the survival of homozygotes com-With meiotic drive, gametic proportions become dis- _{pared to heterozygotes (}_Charlesworth_and Charles-torted during meiosis because one chromosome type _worth_{1987). Viability selection gives rise to} segrega-may preferentially end up in the egg nucleus (meiotic _{tion ratios distorted from 1:2:1 at linked loci. Inbreeding} drive). Meiotic drive is known, e.g., for the maize chro- _{depression is often expressed in very early life stages} mosome 10 where a variant carrying a heterochromatic ₍_Husband _and _Schemske _{1996). In Scots pine, only} knob is preferentially transmitted (reviewed inGrant _ⵑ_{15% of self-fertilized embryos develop into mature}

1975). Gametes carrying a certain allele act to render _{seeds, whereas} _ⵑ_{75% do so in wind-pollinated seeds} gametes carrying the homologous chromosome, e.g., ₍_{Ka¨rkka¨inen}_{et al. 1996). Some aspects of the genetic} the segregation distorter (SD) and sex ratio (SR) loci _{basis of inbreeding depression require further} investiga-of Drosophila and the t-alleles investiga-of mice (e.g.,Hartland _{tion, e.g., number and effects of loci and degree of}

Clark1997, p. 244ff). Meiotic drive can be a powerful _{dominance. Yet these factors have major consequences} selective force. The t-alleles are maintained in the popu- _{for mating system evolution (}_Charlesworth _and lation, even though they are homozygous lethals, due _Charlesworth_{1998), conservation genetics (}_Hedrick to their 0.95 probability of being passed to the next _{1994), and plant breeding (e.g.,} _Williams _and Savo-generation in heterozygotes. In many species hybridiza- _lainen_{1996). A biased segregation ratio due to viability} tions, outbreeding depression and segregation distor- _{differences of genotypes also occurs in the F}

2generation

tion have been observed in the F2 generation. These _{of wide crosses. This is generally thought to be caused}

by epistatic interactions.

Often events before fertilization cannot be distin-Corresponding author: Claus Vogl, Department of Botany and Plant

guished from events after fertilization. McColdrick

Sciences, University of California, Riverside, CA 92521.

E-mail: [email protected] andHedgecock(1997) reported that crosses of

(2)

trea gigas, the Pacific oyster, produced biased segrega- tially informative. Furthermore, due to the effects of SDL, estimation of map distances of markers might be-tion ratios when tested as adults. Later Launey and

Hedgecock(1999) showed that, for many loci, the ra- come biased (Lorieux et al. 1995a,b; Liu 1998). This might cause the interval mapping method to become tios were Mendelian when 6-hr-old larvae were assessed,

but the ratios deviated from the Mendelian ratios when inefficient and biased.

the animals were 2 to 3 mo old in the same crosses. The SDL analysis is based on binomial (or multinom-Hence, the differences are due to post-fertilization via- ial) distributions instead of normal distributions, and

bility selection. hence multiple regression is not readily available and

Quantitative trait loci (QTL) are usually mapped in cannot be combined with conventional interval map-agronomically important plants and animals. To in- ping as in the composite interval mapping (CIM;Zeng

crease differences of parental types, and thus to increase 1994) or the multiple QTL mapping (MQM) scheme the power of mapping, crosses are often conducted be- (JansenandStam1994). Therefore, multiple SDL on tween inbred lines or between distantly related cultivars a single chromosome pose an unsolved theoretical prob-or even between species. As discussed above, these con- lem. On the other hand, if maps are inferred correctly ditions promote segregation distortion. and if SDL on different chromosomes do not interact For molecular characterization of the genetic causes epistatically, i.e., SDL effects combine multiplicatively, of distorted segregation ratios, mapping of the location linkage to an SDL is solely responsible for the pheno-and effects of SDL would be desirable. As the phenotype type. SDL analysis of one chromosome is therefore usu-in SDL mappusu-ing is different from that of QTL mappusu-ing ally independent from other chromosomes.

(data in SDL mapping usually consist of frequencies of We present a multipoint method for mapping multi-genotypes among survivors), QTL methods cannot be ple SDL using a backcross design. The multipoint used for SDL mapping. Development of advanced meth- _{method is developed under both the} maximum-likeli-ods for estimation of locations and effects of SDL has _{hood and the Bayesian frameworks.}

been lagging behind that for QTL mapping. In the past, often a single marker was considered at a time, where

only the linkage between one fully informative marker _THEORY

and a single SDL was tested (Sorensen1967;Servitova´

Model:We develop and present the model under a and Cetl 1984; Hedrick and Muona 1990; Fu and

backcross design only, although the method can be

Ritland 1994a; Ka¨rkka¨inen et al. 1999). In a

single-applied to other controlled mating designs as well. We marker test, the number of distinguishable genotypic

assume that the parents that initiate the cross are pure configurations of the marker is at best equal to the

inbred lines. The F1of the cross is backcrossed to one

number of genotypic configurations of a linked SDL,

of the parents and a total of N individuals are generated but the genotypic frequencies of the marker are affected

in the backcross (BC) family for mapping. We are inter-by the recombination fraction in addition to the

fre-ested in mapping loci responsible for segregation distor-quencies of the SDL’s genotypic configurations. Hence,

tion using multiple markers that are already mapped for a single-marker test, estimations of the position and

on the genome. The data here are the observed marker effect are confounded.

genotypes (configurations). The parameters, however, Errors in marker genotyping may also cause systematic

are the number of SDL, the locations, and effects of deviations from the expected segregation ratio.

Ran-these loci. We assume that all markers are neutral in domly amplified polymorphic DNA (RAPD) markers

the sense that their segregations would be Mendelian are often misscored as a faint band and may be

interpre-if there were no linked SDL on the same chromosome. ted as absent. This may lead to misscoring of only a

The observed segregation distortions on these neutral single marker. In contrast, if segregation distortion is

markers, however, are caused by one or more SDL near caused by SDL, all markers in the vicinity of the SDL

the markers. will be affected.

Note that the flow of causality is from the SDL to the

Fu and Ritland (1994b), Mitchell-Olds (1995),

genotypic configurations of the SDL, then from the and Cheng et al. (1996) have developed

maximum-genotypic configurations of the SDL to the maximum-genotypic likelihood methods for mapping one SDL using

flank-configurations of the marker loci, and finally from the ing markers, i.e., an interval mapping strategy (Lander

genotypic configurations of the marker loci to the ob-andBotstein1989). Given a map of fully informative

served marker information. We first consider a single markers, no missing data, no interference between

re-SDL. The genotype of the F1 is heterozygous and that

combinations, and no more than one SDL per

chromo-of a BC individual (generated from F1 backcrossed to

some, this theory can be used to scan the genome for

the first inbred parent) is either heterozygous or homo-SDL. Under these assumptions, loci outside the interval

zygous for the allele of the first parent with an unequal flanking the SDL contribute no information to the

seg-probability. The degree of asymmetry in the probability regation of the SDL. But more than one SDL per

(3)

conditional on the jth inheritance digit the jth marker

␸i⫽

  

0 if i is homozygous

1 if i is heterozygous information is independent from all other variables. Given the position (␭) of the SDL on the chromo-for i ⫽ 1, . . . , N. This indicator variable, ␸i, is also some, the joint distribution for␸iandφi1, . . . ,φiMis called the “inheritance digit” because it indicates which

Pr(φi1, . . . ,φiM,␸i|␲,␭)⫽Pr(φi1, . . . ,φiM|␸i,␭)Pr(␸i|␲), of the two alleles carried by the F1 has been inherited

(5) to the ith progeny. Parameters of interest are the effect,

denoted by␲, and location, denoted by␭of the segrega- _{where Pr(}_φ

i1, . . . , φiM|␸i, ␭) can be found using the tion distorting locus. The distribution of␸iis Bernoulli _{property of a two-state Markov chain (}_Lander _and with _Green _1987; _Jiang _and _Zeng_{1997). We assume that} there is no interference between two consecutive cross-Pr(␸i|␲,␭)⫽Pr(␸i|␲)⫽ ␲1⫺␸i(1⫺ ␲)␸i (1)

overs so that Haldane’s mapping function applies.

Un-for i⫽1, . . . , N, with _{der this assumption, the sequence}

␲ ⫽Pr(␸i ⫽0). (2)

兵

φ_i1, . . . ,φ_ik,␸_i,φ_i(k_⫹₁₎, . . . ,φ_iM

其

Note that in the SDL case the distribution of the inheri- _{forms a Markov chain with two discrete states, where} tance digit of the SDL given␲ is independent of the _{the markers are ordered according to their positions} location. Another parameter of interest is the location _{on the chromosome and the SDL is located between} of the SDL on the chromosome, denoted by␭, which _{markers k and k}_⫹_{1. We, thus, have}

will be dealt with later. In the absence of segregation distortion, we have␲ ⫽1_⁄

2. Therefore, the deviation of Pr(φi1, . . . ,φiM|␸i,␭)⫽

冤

Pr(φik|␸i,␭)

兿

k⫺1

j⫽1

Pr(φij|φi(j⫹1))

冥

␲ from 1_⁄

2 is the effect or size of the SDL. If ␸i were observable, we could directly estimate and test␲. The

⫻

冤

Pr(φi(k⫹1)|␸i,␭)

兿

M⫺1

j⫽k⫹1

Pr(φi(j⫹1)|φij)

冥

,

maximum-likelihood estimate would be

(6)

␲ˆ ⫽ 1 N

兺

N

i⫽1

(1 ⫺ ␸i) (3)

where

if we could maximize the following log-likelihood

func-Pr(φij|φi[j⫹1))⫽   

1⫺rj(j⫹1) ifφij ⫽φi(j⫹1)

rj(j⫹1) ifφij ⬆φi(j⫹1)

tion:

is the transition probability between two consecutive l(␲|␸)⫽

兺

N

i⫽1

ln Pr(␸i|␲)

loci and rj(j⫹1) is the recombination fraction between

loci j and j⫹1. The transition probability between the

⫽

兺

N

i⫽1

[(1⫺ ␸i)log␲ ⫹ ␸ilog(1⫺ ␲)]. (4) _{SDL and the nearby marker k is}

But␸iis not observable; only the inheritance digits of _Pr(φ

ik|␸i,␭)⫽

  

1⫺rkl if φik⫽φi rkl ifφik⬆φi, marker alleles can be observed. Therefore, an entirely

different approach is required to estimate␲. Consider

where rklis the recombination fraction between the kth M markers with known map positions on the

chromo-marker and the SDL identified as locus l. The transition some of interest. Define the inheritance digits of the

probability between the SDL and the (k⫹1)th locus is ith individual at the jth marker locus as

obtained similarly.

Let Ii⫽ [Ii1, . . . , IiM]. Combining formula (6) with

φij⫽

冦

0 if i is homozygous for marker j

1 if i is heterozygous for marker j the marker information and “summing out” the marker inheritance digits, we get

for i⫽1, . . . , N. Without genotyping errors, there are

just three possibilities of marker information Iij of the Pr(I_i|␸_i,␭)⫽

兺

φi1 . . .

兺

φiM

冢

Pr(φi1, . . . ,φiM|␸i,␭)

兿

M

j⫽1

Pr(Iij|φij)

冣

, ith individual at the jth marker locus. The first two cases

are mutually exclusive events: either one or the other _{where we have made use of the independence from} marker inheritance digit is observed. In the third case _{other markers of the jth marker information conditional} of a missing observation, we define the marker infor- _{on the jth marker inheritance digit. Combining the} mation as the union of the former two cases. Thus, _{previous formula with formula (5) results in the} follow-Pr(Iij|φij)⫽ 1 if the marker information is compatible _{ing equation:}

withφij and Pr(Iij|φij) ⫽ 0 otherwise. In the latter case,

Pr(Ii,␸i|␲,␭)⫽ Pr(Ii|␸i,␭)Pr(␸i|␲) Pr(Iij|φij)⫽ 1 is equal to 1 independent of the

inheri-tance digit. If there are genotyping errors Pr(Iij|φij) will _⫽

Pr(Ii|␸i,␭)␲(1⫺␸i)(1⫺ ␲)␸i. (7) assume values intermediate between 0 and 1. Note that

(4)

proba-bility model, we now introduce a maximum-likelihood the SDL, ␭, can be obtained by examining the likeli-method to estimate and test the SDL. There are several hood-ratio profile along the chromosome, as is com-ways to find the maximum-likelihood estimate of ␲; monly done in interval mapping of QTL.

we adopt an expectation maximization (EM) algorithm Bayesian analysis: We now introduce the Bayesian and treat ␸i as missing data. We treat ␭ as a known analysis of SDL implemented via the Markov chain constant for the moment. Let I⫽[I1, . . . , IN] and␸ ⫽ Monte Carlo (MCMC). We first classify variables into [␸1, . . . ,␸N]. For the EM algorithm we need to determ- observables and unobservables. The observables are the ine the logarithm of Pr(I,␸|␲,␭), i.e., data, denoted by I. The unobservables include parame-ters and missing information. The parameparame-ters here in-log Pr(I,␸|␲,␭)⫽

兺

N

i⫽1

log[Pr(Ii|␸i,␭)␲(1⫺␸i)(1⫺ ␲)␸i]⫽const _clude␲_and␭_{, and the missing information consists of} the inheritance digitsφandφin the current situation.

⫹

_兺

N

i⫽1

[(1⫺ ␸i)log(␲)⫹ ␸ilog(1⫺ ␲)]. We always sum over all the missing information, such that inheritance digits will only appear in intermediate (8)

steps. The joint posterior distribution of the parameters The constant does not depend on the parameter of is

interest,␲.

Pr(␲,␭|I )⬀Pr(␲)Pr(␭)

兿

N

i⫽1

Pr(Ii|␲,␭) Conditional on the data, the position, and the initial

value of the parameter,␲(0)_{, the posterior probabilities}

of␸i ⫽0 and␸i ⫽1 are, respectively, ⫽_Pr(␲_)Pr(␭₎

兿

N

i⫽1

兺

␸i

Pr(Ii|␸i,␭)Pr(␸i|␲), (12) Pr(␸i⫽ 0|Ii,␲(0),␭)

where Pr(␲) and Pr(␭) are the prior distributions for

⫽ Pr(Ii|␸ ⫽ 1,␭)(1⫺ ␲(0))

Pr(Ii|␸ ⫽0,␭)␲(0) ⫹Pr(Ii|␸ ⫽1,␭)(1⫺ ␲(0))

the parameters of interest; beta with Beta(1, 1) for the former and uniform for the latter. Samples are simu-lated from the joint posterior distribution via the (9a)

MCMC. In the MCMC analysis, instead of sampling all and

the unobservables simultaneously, we sample one unob-Pr(␸i⫽1|Ii,␲(0),␭)

servable at a time with others taking values simulated in the previous cycle. When all the unobservables are

⫽ Pr(Ii|␸ ⫽0,␭)␲(0)

Pr(Ii|␸ ⫽0, ␭)␲(0)⫹Pr(Ii|␸ ⫽1, ␭)(1⫺ ␲(0)) .

updated, we have completed one cycle of the Markov chain. When the chain reaches a stationary stage, subse-(9b)

quent samples are considered to be drawn from the joint posterior distribution.

Because Pr(␸i|Ii,␲(0),␭) follows a Bernoulli distribution,

Starting with an initial value for each parameter, {␲(0)_,

the probability in (9a) is equivalent to the expectation

␭(0)_{}, we sample} ␲ _{using the Metropolis-Hastings}

algo-E [␸i|Ii, ␲(0), ␭] ⫽ ␸ˆ(0)i . Taking the expectation of (8)

with respect to␸and substituting␸iinto the resulting rithm (e.g.,Gelmanet al. 1995). A new proposal,␲*, is formula, we have completed the expectation step in the sampled from a beta proposal distribution J(␲*|␲(0)₎⫽

EM-algorithm. The M-step consists of maximizing the Beta(␲(0)_N ⫹ _{2, (1} ⫺ ␲(0)_)N ⫹ _{2). The proposal} ␲_{* is}

resulting equation to obtain accepted with probability min{1, a(␲*,␲(0)_{)}, where}

␲(1)⫽

兺

N

i(1⫺ ␸ˆ(0))i

N . (10) a(␲*,␲(0))⫽

Pr(␲*, ␭(0)_|_{I )}

Pr(␲(0)_,␭(0)|_{I )}

J(␲(0)_|_␲_*)

J(␲*|␲(0)₎. (13)

Equations 9 and 10 are iterated until convergence. _{Note that the first term is the ratio of posterior} probabili-We can now test the null hypothesis that there is no _{ties of the parameters and the second term is the ratio} segregation distortion for the particular location␭. The

of proposal probabilities. If ␲* is accepted, we take null hypothesis is formulated as H0:␲ ⫽1⁄2, which can ␲(1)⫽ ␲_{*; otherwise we do not update the effect of the}

be tested using the likelihood-ratio test statistic ⌳ ⫽

SDL and simply take ␲(1) _{⫽ ␲}(0)_{. The beta proposal} ⫺2(l(1_⁄

2,␭)⫺l(␲ˆ ,␭)), where l(␲ˆ ,␭) is the log likelihood _{distribution assures that 0}_{ⱕ ␲ ⱕ}_{1. The simulated value}

of ␲, denoted by ␲(1)_{, is then used to generate}␭_{. We}

log Pr(I|␲,␭)⫽

兺

N

i⫽1

log

冤

兺

␸i

Pr(Ii,␸i|␲,␭)

冥

(11)

use the Metropolis algorithm (e.g.,Gelmanet al. 1995). First, a new value of␭is proposed by a small perturba-evaluated at the maximum-likelihood estimate ␲ˆ , and

tion from␭(0)_{, i.e.,}

l(1_⁄

2, ␭) ⫽ N log(1⁄2) is the log-likelihood value under

Mendelian segregation. Under the null model,⌳is ap- _␭_*_{⫽ ␭}(0)⫾ _x,

proximately distributed as a chi-square variable with 1

where x is a uniform variable sampled from U(0, d) and d.f.

(5)

of the linkage group. We accept this proposal with prob- Pr(L) truncated at Lmax. After each existing SDL has

been updated, we propose two types of move to update ability min{1, a(␭*, ␭(0)_{)}, where}

L, adding a locus if L⬍Lmax (with probability pa) and

deleting a locus if L⬎0 (with probability pd).

a(␭*,␭(0)₎⫽ Pr(␭*, ␲ (1)_|_{I )}

Pr(␭(0)_,␲(1)|_{I )}. (14)

For adding an SDL, a new location ␭L⫹1 and effect ␲L⫹1are sampled from their uniform priors for the new

If␭* is accepted, we take␭(1)⫽ ␭_{*; otherwise}␭(1)⫽ ␭(0)_.

SDL. The new sets of parameters are␲*⫽(␲(0)_,␲

L⫹1)

Multiple-SDL model: Consider the joint action of L

and␭*⫽(␭(0)_,␭

L⫹1). We then accept this new SDL with

SDL located on the chromosome of interest. Define the

probability min{1, a(L⫹ 1, L)}, where locations of these SDL by␭⫽ {␭l} for l⫽1, . . . , L, in

contrast to the single-SDL model where ␭ is a scalar.

a(L⫹ 1, L)⫽Pr(I|␲*,␭*, L⫹ 1) Pr(I|␭(0)_{, L)}

1 L⫹1

pd

pa

. (19)

Also define the marginal effects of the SDL by␲⫽{␲l} for l⫽1, . . . , L. Assume that these SDL act

multiplica-tively then the joint effect of all the SDL can be formu- If the new SDL is accepted, its location and effect are lated as a product of these marginal effects. Define accepted simultaneously; otherwise, the number of SDL

␸i⫽ [␸i1, . . . ,␸iL] andφi⫽[φi1, . . . ,␸iM] as vectors of remains the same. In the deleting step, a random SDL is inheritance digits of all SDL and marker loci, respec- proposed to be deleted. Then the SDL are renumbered tively, for the ith individual. Using Bayes’ theorem, the such that the candidate SDL is the last SDL, i.e., the joint posterior distribution of␸ican be formulated as Lth SDL. The new parameter sets will be ␲* ⫽ (␲₁(0), . . . ,␲L⫺1(0)) and␭*⫽(␭1(0), . . . ,␭L⫺1(0)). The proposal

Pr(␸i|␲,␭)⫽

(

兿

Ll⫽⫺11 Pr(␸i(l⫹1)|␸il,␭))

兿

L

l⫽1Pr(␸il|␲l)

兺

φi(

兿

L⫺1

l⫽1Pr(␸i(l⫹1)|␸il,␭))

兺

Ll⫽1Pr(␸il|␲l)

. is accepted with probability min{1, a(L⫺1, L)}, where

a(L⫹ 1, L)⫽Pr(I|␲*,␭*, L⫺ 1) Pr(I|␲(0)_,␭(0)_{, L)}

L 1

pa

pd

. (20) (15)

The joint posterior distribution of the parameters is

Note that we handle SDL within the same marker inter-val in exactly the same way as SDL in different interinter-vals Pr(␲,␭|I )⬀Pr(␲)Pr(␭)

兿

N

i⫽1

兺

␸i

(Pr(Ii|␸i,␭)Pr(φi|␲,␭)),

and that (20) is just the inverse of (19). Our interpreta-tion of the terms (L ⫹ 1)⫺1 _{and L in (19) and (20),}

(16)

respectively, differs from the usual. Usually, these terms where Pr(␲)⫽ ⌸L

l⫽1Pr(␲l), Pr(␭)⫽ ⌸Ll⫽1Pr(␭l), and are included to account for a perceived imbalance in the number of loci selected for a delete step vs. that selected for an addition step if the order of loci is not Pr(Ii|␸i,␭)⫽

Pr(Ii,␸i|␭)

Pr(␸i|␭) _{fixed. We believe that the balance is one to one in both}

the addition and deletion steps and no balancing is

⫽

兺

␸i(Pr(␸i,␸i|␭)

兿

jPr(Iij|φij)) Pr(␸i|␭)

. (17) _{necessary; we include these terms because of the Poisson}

prior. The difference to the usual algorithm, however, Under the multiple-SDL model, formulation of an EM _{is just a minor modification of the prior distribution} algorithm seems impossible. On the other hand, the _{and thus irrelevant in most biological applications.} Bayesian method requires little modification: instead of

updating the effect and location of a single locus at a

time,␭and␲are updated iteratively for all loci. APPLICATIONS

With the Bayesian approach, the number of SDL (L)

To illustrate the method, a simulation study and an can be treated as an unknown variable. This involves a

analysis of a data set from one cross of two Scots pine change in the dimension of the model. Reversible jump

(Pinus sylvatica) trees are presented. The simulation MCMC (Green 1995;Satagopan andYandell 1996;

study conforms to an inbred line BC situation. In the

Heath 1997; Richardson and Green 1997;

Sillan-pine data analysis, we concentrate on the maternal part

pa¨a¨andArjas1998;StephensandFisch1998) is an

of the progeny of a single tree, i.e., a pseudobackcross extension to the Metropolis-Hastings sampler,

permit-design. In a backcross it is not possible to distinguish ting moves to be made between models with different

between gametic selection and viability selection after dimensions. The joint posterior distribution of the

pa-fertilization. rameters is

Simulations: In the simulation study, first, a single viability locus that eliminates 50% of the progeny of the Pr(␲,␭, L|I )⬀Pr(␲|L)Pr(␭|L)Pr(L)

heterozygous genotype, i.e., ␲ ⫽ 2_⁄

3, was placed in the ⫻

_兿

N

i⫽1

兺

␸i

(Pr(Ii|␸i,␭, L)Pr(␸i|␲,␭, L)), (18) middle of a chromosome of length 1 M; six markers were spaced at regular intervals of 0.2 M along the chromosome; no missing data were considered. In the where Pr(L) is the prior probability of the number of

(6)

the single-SDL situation were placed at locations 0.33 M and 0.67 M, respectively. In both cases, simulations with sample sizes of 500 were repeated five times and results were compared; additionally, simulations with sample sizes of 100 and less were also performed. Compared to empirical reports of distortions of marker loci from Mendelian ratios, the simulated effect is high but not unrealistic. The marker map is rather dense and fully informative.

The outcomes of the analyses of the five simulated data sets were almost identical such that we present only one of them. In the maximum-likelihood (ML) analysis, the number of SDL was fixed to one. The inferred effect, the likelihood-ratio statistic⌳, is reported at each loca-tion. We also performed an MCMC analysis of the same data. From Figure 1A, we see that the position and effect of the SDL are estimated quite accurately. For the other four simulations, the inferred positions were also mostly between the two middle markers and the estimated ef-fects were close to the true value. Reducing sample sizes did not appreciably change the estimate of location or effect. The likelihood-ratio statistic, however, dropped considerably (results not shown). We do not present the ML results with two SDL, because the model is not appropriate.

With the Bayesian MCMC analysis, the Poisson prior mean was set to ␮ ⫽1 and the maximum number of SDL was set to three. The chain length was 105_{. The}

chain was thinned by storing only after every 10th cycle. No burn-in period was discarded because the chain reached approximate stationarity very quickly. The pos-terior probability of the simulated number of SDL (i.e., one or two, respectively) was always between 0.6 and 0.9. In the one-SDL case, frequencies are higher at the center, i.e., close to the simulated position (Figure 1B). Effects are very similar to those estimated with the ML method. In the two-SDL case, posterior distributions of both the locations as well as the effects are about correct

Figure 1.—Simulated data. (A and B) A simulation with (Figure 1C). It can be easily discerned from the

poste-one SDL; (C) a simulation with two SDL. The scale on the x-rior distribution of frequencies that there are actually _{axis is 1 M, the positions of the markers are indicated with} two SDL present. When the number of individuals was an “⫻,” while the positions of the SDL are indicated with a circle. “Likelihood” refers to the broken line and to twice the reduced, the posterior probability of the different

num-log-likelihood ratio; “frequency” to the posterior probability bers of SDL approached that of the prior distribution

of an SDL in an interval of 0.04 the length of the linkage rapidly (data not shown). This corresponds to the

de-group; and “effect” to the solid line and to the probability of crease in the likelihood-ratio statistic with decreasing _{finding the homozygote genotype in the BC.}

sample size.

Pine data: In the second application, data consisted of the megagametophytes of open-pollinated offspring

of a single Scots pine P. sylvestris tree, P304 (Hurme C02-680, G13-750, K09-750, E09-250, and AC15-270 at positions 0.038 M, 0.115 M, 0.287 M, 0.461 M, and 0.478 andSavolainen1999). Megagametophytes are haploid

tissues consisting of the maternal part of the seedling’s M, respectively. As determined from other crosses, the map length of the whole linkage group wasⵑ0.85 M. genome and can be scored at the seedling stage without

damaging the seedling. We treated the progeny of this The sample size was 73 individuals, and in many individ-uals some markers were scored as missing.

tree as a pseudobackcross family. Map distances and

linkage phases were determined with Mapmaker as de- With the ML analysis, the log-likelihood ratio statistic was appreciable only close to the marker G13-750 (Fig-scribed inHurmeandSavolainen(1999). Five RAPD

(7)

were inferred, most often location and effect of one of the SDL was similar to the single-SDL case, while the other counteracted its effect at the other end of the linkage group (Figure 2C).

DISCUSSION

Herein, a method for mapping SDL in a backcross is presented. The method makes efficient use of a map of partially or fully informative marker loci by using the multipoint method (Lander and Green 1987; Jiang

andZeng1997). A maximum-likelihood analysis via an EM algorithm as well as a Markov chain Monte Carlo Bayesian analysis using a reversible jump algorithm for varying the number of loci is presented in detail. Given a dense marker map, the method can be used for preci-sion analysis of positions and effects of the SDL. The best previously available methods (FuandRitland1994b;

Mitchell-Olds1995;Chenget al. 1996) rely on fully informative markers flanking the putative SDL and as-sume just one SDL per chromosome.

With our approach, it is possible to efficiently analyze the number, positions, and effects of SDL in organisms, for which a high-resolution marker map has been devel-oped and where inbred line crosses can be performed easily. Analysis can be extended easily to a general full-sib family or to the selfing of an outcrossing individual: the dimension changes from two to four, binomial distri-butions change to multinomial distridistri-butions, and the transition probabilities between adjacent loci change. Marker information now contributes to the full or par-tial identification of four combinations of genotypic configurations. As with the BC case, partial marker infor-mation can be defined as the union of compatible cases. All the above changes are rather trivial consequences of the change in dimension but complicate presentation substantially. Additionally, the missing phase informa-tion needs to be considered. Furthermore, the

multi-Figure2.—Pine data. The notation is the same as in Figure _{pointing algorithm becomes more important for the} 1. The ML result is presented in A, and the posterior

distribu-full-sib design. tion of the single-SDL case is in B and of the two-SDL cases

Presently, our method for the backcross can only be in C. The marker loci are (from left to right) C02-680,

G13-used to analyze the SDL currently segregating in the two 750, K09-750, E09-250, and AC15-270.

lines, not those that have been segregating in the ancest-ral population from which the inbred lines derived. Segregation distortion might have already affected the of the heterozygous genotype ofⵑ0.2 over the Mende- inbreeding process for creation of the lines. Extrapola-lian value of 0.5. For the Bayesian MCMC analysis, the tion from the current to the ancestral situation is there-prior distribution was the same as for the simulation fore problematic. This problem is even more pressing study. The posterior probabilities of zero, one, two, and for recombinant inbred lines, where overrepresention three SDL were 0.01, 0.15, 0.61, and 0.23, respectively. of chromosomal fragments of one or the other parent This result is, however, quite sensitive to the prior distri- is commonly observed (e.g., Lister and Dean 1993) bution of SDL number. We report the posterior distribu- and requires a more elaborate approach.

(8)

which restricts the achievable combinations of geno- to increase differences between parents and thus the power of mapping. Probably for this reason, markers typic proportions. On the other hand, SDL acting after

fertilization may alter genotypic proportions directly. with segregation ratio distortions are commonly ob-served in data sets used for QTL mapping resulting from Thus, many more combinations of genotypic

propor-tions are possible for SDL acting after fertilization. In wide crosses (e.g.,van Ooijenet al. 1994). Segregation ratio distortion is also commonly observed in doubled experimental crosses more complex than the backcross

design, inferred genotypic proportions of an SDL may haploid lines (e.g.,Fultonet al. 1997).

Usually generation of a linkage map of marker loci thus render unlikely prefertilization mechanisms of

seg-regation distortion. Two or more SDL acting before precedes QTL analysis. If a dense map of informative fertilization may, however, mimic the effect of SDL act- markers is inferred correctly, the bias introduced by ing after fertilization because of the increase in combi- segregation distortion into QTL analysis will be

negligi-natorial possibilities. ble. But if recombination fractions or, worse, order of

In hybrids of species or subspecies, segregation distor- marker loci are inferred incorrectly, basic assumptions tion commonly occurs (see, e.g., Whitkus 1998 and of QTL analysis do not hold and results will be imprecise references therein). This may be caused by structural at best. Hence, aside from being interesting in them-rearrangements, e.g., inversions, which constitute a pre- selves, SDL cause practical problems in QTL projects fertilization mechanism. Alternatively, the segregation as observed, e.g., bySandbrinket al. (1995). Thus, segre-distortion may be caused by postfertilization differences gation distortion should be accounted for in mapping in viability between genotypic configurations, most projects.

probably caused by epistatic interactions. Our method Segregation distortion is known to bias estimation can be used to detect chromosomal areas that are caus- of recombination fractions in two-point inference of ing these distortions. But because of the presumed epis- recombination distances between markers (Lorieuxet tasis, relaxation of the assumption of a multiplicative al. 1995a,b;Liu1998). If markers are fully informative, effect of different SDL may be necessary. estimation of the recombination fraction of only the Our method may also be used to map loci influencing markers flanking the SDL will be affected. Only in the early viability. This would enhance our understanding of unlikely case of coincidence of SDL and marker location the nature of early inbreeding depression. The method will no bias be observed. If less than fully informative provides another approach for estimating the number _{markers are used, the effects of the distortion are spread} and effects of loci causing inbreeding depression. Tradi- _{out to the smallest interval of fully informative markers} tionally, such information has been derived mainly from _{flanking the distorted region. As a remedy, markers that} biometric analysis of crosses (e.g., Dudash and Carr _{show obvious segregation distortion are often excluded}

1998). But as inbreeding depression can be expressed _{from the map. But that reduces coverage of the genome} in embryonic life stages not amenable to biometric anal- _{and qualitative or quantitative trait loci might be missed.} ysis, application of this method is limited. To gain in- _{Our method can be extended to allow for detection} sight on these early life stages, sparse maps and single- _{of SDL concurrently with estimation of a linkage.}_Cheng marker methods have been used to infer the effect of _{et al. (1996) have already developed an EM algorithm} a viability locus influencing inbreeding depression _{to infer positions of two fully informative markers in} (Sorensen1967;ServitovaandCetl1984;Hedrick _{the presence of a single SDL (an interval method) in}

andMuona1990;FuandRitland1994a;Ka¨rkka¨inen _{a backcross or doubled haploid lines. This could be}

et al. 1999). With single-marker analysis, estimation of _{extended to a multipoint inference of a marker map in} position and effect of the SDL is, however, confounded _{the presence of SDL by augmenting the EM or MCMC} and multiple SDL on a single linkage group cannot be _{schemes presented herein by allowing the markers to} handled at all. Interval methods (Fu and Ritland _{change their positions relative to each other.}

1994b; Mitchell-Olds 1995; Cheng et al. 1996) rely _{The source code for a C}_⫹⫹_{program and executables} on fully informative markers flanking the putative SDL _{for a Sun workstation, with which the above calculations} and assume just one SDL per chromosome. Dense link- _{can be performed, are available from Claus Vogl (claus@} age maps of fully informative markers may be hard to _{genetics.ucr.edu).}

obtain in closely related individuals that need to be

We thank Pa¨ivi Hurme and Outi Savolainen for the data set and considered in the analysis of inbreeding depression.

Elja Arjas, Anita de Haan, Mikko Sillanpa¨a¨, and Nengjun Yi for discus-Like the interval methods, our method requires a dense _{sion of this and related issues. Outi Savolainen, Elja Arjas, and Lori} linkage map of polymorphic markers but is not re- Weingartner have commented on earlier versions of this manuscript. We thank Zhao-Bang Zeng and two anonymous reviewers for their stricted to fully informative markers; instead it can make

patient work, which helped to improve this article a lot. This work efficient use of, e.g., dominant markers.

was supported by grants from the Environment and Natural Resources Only rarely have data sets been gathered for mapping

Research Council and the Medical Research Council to Outi Savo-segregation distortion or viability selection (see, how- _{lainen and by the National Institutes of Health Grant GM-55321 and} ever, Harushima et al. 1996 and Kuang et al. 1998). the U.S. Department of Agriculture National Research Initiative

(9)

1998 An allele responsible for seedling death in Pinus radiata LITERATURE CITED

D. Don. Theor. Appl. Genet. 96: 640–644.

Lander, E. S.,andD. Botstein,1989 Mapping Mendelian factors

Charlesworth, B.,andD. Charlesworth,1987 Inbreeding

de-underlying quantitative traits using RFLP linkage maps. Genetics pression and its evolutionary consequences. Annu. Rev. Ecol.

121:185–199. Syst. 18: 237–268.

Lander, E. S.,andP. Green,1987 Construction of multilocus

ge-Charlesworth, B.,andD. Charlesworth,1998 Some

evolution-netic maps in humans. Proc. Natl. Acad. Sci. USA 84: 2363–2367. ary consequences of deleterious mutations. Genetica 102/103:

Launey, S.,andD. Hedgecock,1999 Genetic load causes

segrega-3–19.

tion ratio distortion in oysters: mapping at 6 hours. Plant and

Cheng, R., A. SaitoandY. Ukai,1996 Estimation of the position

Animal Genome VII, abstracts W14, p. 33. and effect of a lethal factor locus on a molecular marker linkage

Lister, C.,andC. Dean,1993 Recombinant inbred lines for

map-map. Theor. Appl. Genet. 93: 494–502.

ping RFLP and phenotypic markers in Arabidopsis thaliana. Plant

Dudash, M. W.,andD. E. Carr,1998 Genetics underlying

inbreed-J. 4: 745–750. ing depression in Mimulus with contrasting mating systems.

Na-Liu, B. H.,1998 Statistical Genomics: Linkage, Mapping, and QTL

Analy-ture 393: 682–684.

sis. CRC Press, Boca Raton, FL.

Fu, Y.-B.,andK. Ritland,1994a Evidence for the partial dominance

Lorieux, M., B. Goffinet, X. Perrier, D. Gonza´lez de Leo´ nand

of viability genes contributing to inbreeding depression in

Mimu-C. Lanaud,1995a Maximum likelihood models for mapping

lus guttatus. Genetics 136: 323–331.

genetic markers showing segregegation distortion. 1. Backcross

Fu, Y.-B., and K. Ritland, 1994b On estimating the linkage of

populations. Theor. Appl. Genet. 90: 73–80. marker genes to viability genes controlling inbreeding

depres-Lorieux, M., X. Perrier, B. Goffinet, C. LanaudandD. Gonza´lez

sion. Theor. Appl. Genet. 88: 925–932.

de Leo´ n,1995b Maximum likelihood models for mapping

ge-Fulton, T.-M., J. C. NelsonandS. D. Tanksley,1997 Introgression

netic markers showing segregegation distortion. 2. F2-popula-and DNA marker analysis of Lycopersicum peruvianum, a wild

rela-tions. Theor. Appl. Genet. 90: 81–89. tive of the cultivated tomato, into Lycopersicum esculentum,

McColdrick, D. J.,andD. Hedgecock,1997 Fixation, segregation

followed through three successive backcross generations. Theor.

and linkage of allozyme loci in inbred families of the Pacific Appl. Genet. 95: 895–902. _{oyster Crassostrea giga (Thunberg): implications for the causes of}

Gelman, A., J. B. Carlin, H. S. SternandD. B. Rubin,1995 Bayesian _{inbreeding depression. Genetics 146: 321–334.}

Data Analysis. Chapman and Hall, London. _{Mitchell-Olds, T.,}₁₉₉₅ _{Interval mapping of viability loci causing}

Grant, V.,1975 Genetics of Flowering Plants. Columbia University _{heterosis in Arabidopsis. Genetics 140: 1105–1109.}

Press, New York. _{Richardson, S.,} _and_{P. J. Green,} ₁₉₉₇ _{On Bayesian analysis of}

Green, P. J.,1995 Reversible jump Markov chain Monte Carlo com- _{mixtures with an unknown number of components. J. R. Stat.}

putation and Bayesian model determination. Biometrika 82: 711– _{Soc. B 59: 731–792.}

732. _{Sandbrink, J. M., J. W. van Oijen, C. C. Purimahua, M. Vrielink,}

Hartl, D. L.,andA. G. Clark,1997 Principles of Population Genetics, _{R. Verkerk}_{et al., 1995} _{Localization of genes for bacterial}

resis-Ed. 3. Sinauer, Sunderland, MA. _{tance in Lycopersicon peruvianum using RFLPs. Theor. Appl. Genet.}

Harushima, Y., N. Kurata, M. Yano, Y. Nagamura, T. Sasakiet al., 90:444–450.

1996 Detection of segregation distortions in an indica-japonica Satagopan, R. J.,andB. S. Yandell,1996 Estimating the number of rice cross using a high-resolution molecular map. Theor. Appl. quantitative trait loci via Bayesian model determination. Special Genet. 92: 145–150. Contributed Paper Session on Genetic Analysis of Quantitative

Heath, S. C.,1997 Markov-chain Monte Carlo segregation and link- Traits and Complex Diseases. Biometric Section, Statistical

Meet-age analysis for oligogenic models. Am. J. Hum. Genet. 61: 748– ing. Chicago, IL.

Servitova´, J.,andI. Cetl,1984 The use of recessive lethal

chloro-760.

phyll mutants for linkage mapping of Arabidopsis thaliana (L.)

Hedrick, P. W.,1994 Purging inbreeding depression and the

proba-Heynh. Arabidopsis Inf. Serv. 21: 59–64. bility of extinction: full-sib families. Heredity 73: 363–372.

Sillanpa¨a¨, M.,andE. Arjas,1998 Bayesian mapping of multiple

Hedrick, P. W.,andO. Muona,1990 Linkage of viability genes to

quantitative trait loci from incomplete inbred line cross data. marker loci in selfing organisms. Heredity 64: 67–72.

Genetics 148: 1373–1388.

Hurme, P.,andO. Savolainen,1999 Comparison of homology and

Sorensen, F. C.,1967 Linkage between marker genes and

embry-linkage of RAPD markers between individual trees of Scots pine

onic lethal factors may cause distrubed segregation rations. Silvae (Pinus sylvestris L.). Mol. Ecol. 8: 15–22.

Genet. 16: 132–134.

Husband, B. C.,andD. W. Schemske,1996 Evolution of magnitude

Stephens, D. A.,andR. D. Fisch,1998 Bayesian analysis of

quantita-and timing of inbreeding depression in plants. Evolution 50:

tive trait locus data using reversible jump Markov chain Monte 554–570.

Carlo. Biometrics 54: 1334–1347.

Jansen, R. C.,andP. Stam,1994 High resolution of quantitative

van Ooijen, J. W., J. M. Sandbrink, M. Vrielink, R. Verkerk, P. traits into multiple loci via interval mapping. Genetics 136: 1447–

Zabelet al., 1994 An RFLP linkage map of Lycopersicum

peruvi-1455.

anum. Theor. Appl. Genet. 89: 1007–1013.

Jiang, J.,andZ.-B. Zeng,1997 Mapping quantitative trait loci with

Whitkus, R.,1998 Genetics of adaptive radiation in Hawaiian and

dominant and missing markers in various crosses from two inbred _{Cook Island species of Tetramolopium (Asteraceae). II. Genetic} lines. Genetica 101: 47–58. _{linkage map and its implications for interspecific breeding}

barri-Ka¨rkka¨inen, K., V. KoskiandO. Savolainen,1996 Geographical _{ers. Genetics 150: 1209–1216.}

variation in inbreeding depression in Scots pine. Evolution 50: _{Williams, C. G.,}_and_{O. Savolainen,}₁₉₉₆ _{Inbreeding depression in} 111–119. _{conifers implications for breeding strategy. For. Sci. 42: 102–117.}

Ka¨rkka¨inen, K., H. Kuittinen, R. van Treuren, C. VoglandO. _{Zeng, Z.-B.,}₁₉₉₄ _{Precision mapping of quantitative trait loci.}

Genet-Savolainen, 1999 Genetic basis of inbreeding depression in _{ics 136: 1457–1468.}

Arabis petrea. Evolution 53: 1354–1365.