Pedigree Data Analysis With Crossover Interference

(1)



Pedigree Data Analysis With Crossover Interference

Sharon Browning

1

Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina 27695-7566 and GlaxoSmithKline, Research Triangle Park, North Carolina 27709

Manuscript received May 27, 2002 Accepted for publication March 7, 2003

ABSTRACT

We propose a new method for calculating probabilities for pedigree genetic data that incorporates crossover interference using the chi-square models. Applications include relationship inference, genetic map construction, and linkage analysis. The method is based on importance sampling of unobserved inheritance patterns conditional on the observed genotype data and takes advantage of fast algorithms for no-interference models while using reweighting to allow for interference. We show that the method is effective for arbitrarily many markers with small pedigrees.

E

XISTING methods for likelihood-based analysis of number of sample repetitions is increased. Because the samples are independent, the method is not plagued pedigree genotype data assume absence of

cross-over interference, even though such interference has by the difficulties in assessing convergence encountered in use of Markov chain Monte Carlo (MCMC) methods. been well documented in humans (Weeks et al. 1993;

Broman andWeber 2000; Lin et al.2001) and other The probabilities produced by this method may be

used for multipoint linkage mapping, genetic map con-species (Zhaoet al.1995b;Bromanet al.2002).

More-over, simulation studies have shown that failure to incor- struction, and relationship inference. The method can be used with arbitrarily many linked markers for small porate crossover interference into linkage analyses is

inefficient (Goldgaret al.1989;Goldsteinet al.1995; pedigrees. For small pedigrees the method is fast enough that it could be used on a routine basis in analysis of

LinandSpeed1999).

The major reason that crossover interference models pedigree genetic data. are not used is the lack of computationally feasible

meth-ods for working with more than a handful of linked

mark-BACKGROUND

ers.LinandSpeed(1996) present a method for exact

computation of probabilities for pedigree data with the _{It has long been known that the locations of multiple} chi-square models; however, computational time increases _{crossovers on a chromosome at meiosis are not} indepen-exponentially with both the number of meioses in the

dent, but exhibitcrossover interferencewhereby the exis-pedigree and the number of markers, severely limiting

tence of a crossover at one location suppresses the occur-its applicability.Thompson(2000a,b) has a Markov chain

rence of crossovers in nearby regions. A common approach Monte Carlo approach for incorporating crossover

in-is to use the Kosambi map function with methods that terference, but it is limited to at most 12 loci per

chromo-assume independence between recombination events some.

in nonoverlapping intervals. This approach does not have We propose a new method for calculating

probabili-any real effect, as genetic distances are used only for ties for pedigree genetic data that does incorporate

reporting results, while recombination fractions are used crossover interference. The method is based on

impor-at all steps in the calculimpor-ations. Thimpor-at is, observed recombi-tance sampling, which is a Monte Carlo technique for

nation fractions are converted to genetic distances with estimating the value of an integral or sum (in this case

the map function and are converted back to recombina-probabilities of the data, expressed as a sum over

unob-tion fracunob-tions with the same map funcunob-tion for use in the served inheritance indicators). Importance sampling

in-analysis. AsSpeed(1996) points out, the only adequate volves evaluating the integrand at independently

sam-way to incorporate crossover interference is with a point pled realizations from a probability distribution that is

process model. Such an approach can account for non-roughly proportional to the integrand. Correct

weight-independence of recombination between intervals. ing of the sampled values gives an unbiased estimate

A useful family of point process models for crossover that converges to the true value of the integral as the

interference is the chi-square models. These are renewal models for the occurrence of crossovers on the four-strand chromatid bundle. A parametermcontrols the 1_{Address for correspondence:}_{GlaxoSmithKline, Five Moore Dr., MAI.}

strength of interference:m⫽0 corresponds to no

inter-A112B.1G, Research Triangle Park, NC 27709.

E-mail: sharon.r.browning@gsk.com ference, whilem⫽4 corresponds to approximately the

(2)

level of interference found in humans (LinandSpeed tage of the special algorithms for independence models, while using reweighting to allow for interference. 1996). The genetic distance between crossovers on the

four-strand bundle is modeled as the sum of m ⫹ 1 independent exponential random variables each with

METHODS

rate 2(m⫹1). This sum follows a gamma, or scaled

chi-square, distribution. A gamete receives only one of the _{We present a method for calculating probabilities} four chromatids, and since each crossover involves four _{for genotypic data under the chi-square models. The} chromatids, the probability that a certain crossover on _{method is based on importance sampling of underlying} the chromatid bundle is seen in the gamete is one- _{unobserved inheritance patterns. We focus primarily} half. Assuming no chromatid interference (Zhaoet al. _{on calculation of the likelihood, which is equal to the} 1995a), whether or not one crossover on the bundle is _{probability of the genotypic data under the assumed} transmitted to the gamete is independent of the out- _{model. Likelihoods may be used to compare models,} come for other crossovers. McPeek andSpeed(1995) _{for example, to find the most likely genealogical} rela-andZhaoet al.(1995b) found that the chi-square mod- _{tionship between individuals (}_Boehnke_and_Cox_1997). els provide an excellent fit to available data.Zhaoet al. _{In addition, the methodology presented here may be} (1995b) provide a method for calculating the probabil- _{used to calculate probabilities of inheritance patterns} ity of a pattern of recombination between consecutive _{given the genotypic data, which are used in genetic map} markers for a single meiosis, which involves on the order _{construction and multipoint linkage analysis, and we} ofLm2_{calculations for}_L_{marker loci. We reproduce the}

give details of those calculations at the end of this sec-details of this method in theappendixsince our method _tion.

makes use of these probabilities. _{Importance sampling of inheritance indicators:} _For In pedigree data, the recombination patterns are gen- _{each marker locus} _l _{and meiosis} _k_{, let} _X

kl be a

zero-erally not directly observed, due to unobserved indi- _{one inheritance indicator. The parent involved in the} viduals and insufficiently polymorphic markers. Thus, _{meiosis has two copies of the DNA, one maternal (0)} to calculate probabilities for genotype data requires _{and the other paternal (1). The indicator} _X_describes summation over possible recombination patterns. For _{which of these two copies was transmitted to the} off-Kmeioses and Lmarker loci there are 2K(L⫺1)_possible

spring at this locus. If Xkl ⫽ Xk,l⫹1 no recombination

recombination patterns, so this computation, if per- _{occurred on meiosis}_k_{over the interval between markers} formed exactly, rapidly becomes infeasible for increas- _l_and_l_⫹_{1, while recombination occurred if}_X

kl⬆Xk,l⫹1.

ing numbers of markers and meioses. _{The inheritance pattern} _X _{represents the inheritance} When independence (i.e., no-interference) models _{indicators over all meioses and loci.}

for recombination are used, exploitation of the inde- _{We can write the probability of the genotype data}_Y pendence can reduce the number of computations to _{as a sum over inheritance patterns}

the order ofLK2K ₍_Kruglyak _and _Lander_{1998), so}

PC(Y)⫽

兺

X

PC(Y|X)PC(X),

that large numbers of markers may be analyzed, while the size of the pedigree (measured by the number of

meioses) must be restricted. McPeek andSun (2000) where PC denotes probabilities under the chi-square

crossover model, with a given choice of parameter m. describe a related approach for the chi-square models;

however, the number of computations required by such Since PC(Y|X)PC(X)⫽ PC(X, Y)⬀ PC(X|Y), only those

terms with large values of PC(X|Y) contribute

signifi-an approach is proportional toL4K(m⫹1)_{, so that for}

mod-erate values of m, calculation is feasible only with ex- cantly to the sum. An importance-sampling approach aims to sample with high frequency those terms with tremely small pedigrees. Thus in general to calculate

exact probabilities for pedigree data with the chi-square significant contribution, while sampling with low fre-quency the terms with negligible contribution. Ideally or other interference models involves summing over all

2KL _{recombination patterns, resulting in a computa-} _{we would like to sample from}

PC(X|Y), but we have no

way to do so directly. Instead we sample from PI(X|Y),

tional burden that increases exponentially with both

the number of markers and the number of meioses (Lin wherePIdenotes probabilities under the independence

model. The probabilities PI(X|Y) are sufficiently close

and Speed 1996). This is the approach taken by Lin

andSpeed(1999) in their work showing the efficiency toPC(X|Y) to result in a useful importance sampler.

In what follows we assume that probabilities for geno-gains obtainable using the chi-square model in gene

map-ping with pedigree data. They were able to work with 12 typesYlat a markerlare conditionally independent of

genotypes and inheritance patterns at other markers meioses (a three-generational family with four

grand-parents, two grand-parents, and four children) and seven given the inheritance pattern Xl at the marker. This

assumption holds if the markers are in linkage equilib-marker loci.

To reduce the computational burden while incorpo- rium, which will be approximately true if all the geno-typed individuals come from a single homogeneous rating crossover interference, we propose an

(3)

A consequence of this assumption is that probabilities X(i)

L⫺T⫺1, . . . ,XL(i⫺)2T⫹1, perform resampling, and continue

in this fashion until reachingX1at the end of the

chro-of genotypes Y given inheritance patterns X do not

depend on the crossover model, soPC(Y|X)⫽PI(Y|X)⬅ mosome. As noted inLiu(2001) it is best not to

resam-ple after samplingX1, as doing so would only add noise.

P(Y|X). ThenP(Y|X)⫽ PI(Y|X)⫽ PI(X|Y)PI(Y)/PI(X)

and we can write WriteX(i)

(j)for theith partial sample at thejth

resam-pling. For example,X(i)

(1)⫽{X(Li),XL(i⫺)1, . . . , X(Li⫺)T⫹1} and

PC(Y)⫽

兺

X

P(Y|X)PC(X) _X(i)

(2)⫽{X(Li),XL(i⫺)1, . . . ,X(Li⫺)2T⫹1}. The resampling weight

for this sample is

⫽

_兺

X

PI(Y)PC(X)

PI(X)

PI(X|Y).

w(i)

j ⫽w(j⫺i)1

PC(X((ij)))/PC(X((ij⫺)1))

PI(X((ij)))/PI(X((ij)⫺1))

Thus ifX(1)_,_X(2)_{, . . . ,}_X(n)_{are sampled from}_P

I(X|Y), an

unbiased estimate ofPC(Y) is given by

forj⬎ 1, while at the first resampling

Pˆ_C(Y)⫽

兺

n

i⫽1

PI(Y)PC(X(i))

PI(X(i))

.

w(i) 1 ⫽

PC(X((ij)))

PI(X((ij)))

.

Thompson(2000b, Sect. 8.4) describes how to sample

At the jth resampling, let Wj⫽ 兺iB⫽1w(ji). Start by

re-XfromPI(X|Y). LetXl⫽(Xl1,Xl2, . . . ,XlK) be the

inheri-tainingki⫽[Bw(ji)/Wj] copies of X((ij)), where [ ] is the

tance vector at locusl. Making use of the hidden-Markov

floor function giving the largest integer less than or structure (the inheritance vectors {X1, . . . , XL} form

equal to its argument. The number of elements remain-a hidden-Mremain-arkov model for the genotypes under the

ing to be resampled isr⫽B⫺兺B

i⫽1ki. Add to the set of

independence model), we can recursively sample XL

retained inheritance patterns r independent random conditional on the genotype data and then sampleXL⫺1 _{draws from the}_X(i)

(j)with probabilities proportional to

given the realized value ofXL and the genotype data

Bw(i)

j /Wj⫺ki,i ⫽ 1, 2, . . . ,B. Now reset the weights

and so on through toX1to obtain one realization ofX.

w(i)

j toWj/Bfori⫽1, 2, . . . ,Band continue with the

The sampling algorithm has computational orderL4K_,

sequential sampling of the inheritance patterns. which is the same as for the original algorithm ofLander

Let J ⫽[(L ⫺1)/T] be the total number of

resam-andGreen(1987) for the independence model. We give

pling points. After the samples are completed, the further details in theappendix.

weight of theith sample X(i)_is

Improved performance through resampling: This method is an example of sequential importance

sam-w(i)⫽_w(i)

J

PC(X(i))/PC(X((iJ)))

PI(X(i))/PI(X((iJ)))

. pling (Liu 2001, Sect. 2.6.3), because the inheritance

patternsXare sampled sequentially along each

chromo-An unbiased estimate ofP(Y) is some. Improvement, in terms of reducing the standard

error of the estimate for a fixed number of iterations

PˆC(Y)⫽

1 n

兺

n

i⫽1

w(i)_P I(Y).

n, may be obtained by resampling (Liu2001, Sect. 3.4.4). The idea is to concurrently sample a number of

inheri-tance patternsX. After sampling the inheritance indica- _{The values obtained from a single batch are correlated} tors at a couple of loci, writeX(i)

p for the partial inheri- and cannot be used to estimate the standard error of

tance pattern in realization i. We then calculate the _Pˆ_C₍_Y_{). To estimate the standard error we divide our} weightPC(X(pi))/PI(X(pi)) for eachi. If this weight is small total number of iterations into batches of sizeB. Choice

for a particular i the sample is not worth continuing, _of _B _{has negligible effect on computing time. Small} and it is replaced by a copy of one of the samples with _{choices of}_B_{are less efficient (}_i.e._{, give larger standard} a larger weight. Sampling continues for further loci error) because resampling is less effective with a smaller (punctuated by further resampling), and samples that pool. Thus it is best to chooseBto be reasonably large, had been duplicated tend to differ in the values sampled subject to computer memory limitations and to having at the remaining loci. Thus the samples are correlated, a sufficient number of batches to give a good estimate but the final weights PC(X(i))/PI(X(i)) tend to be less of standard error (say at least 20 batches). We usedB⫽

variable than they would be without resampling, which 100 in all the examples presented inresults.

reduces the standard error of the estimatePˆC(Y). A couple of issues are involved in best choice of

resam-We now give more details of the resampling algo- pling intervalT. First, resampling involves computational rithm, in which we apply the method of residual resam- time, and thus frequent resampling can be computa-pling described in Sect. 3.4.4 ofLiu (2001). First we set tionally expensive. The computational time for resam-a bresam-atch sizeB and a resampling intervalT (choice of pling is essentially independent of the pedigree size; these values is discussed below). We start by sampling thus the expense of resampling is most notable in com-X(i)

L, X(Li⫺)1, . . . , X(Li⫺)T⫹1 for i ⫽ 1, 2, . . . , B, perform putations on small pedigrees and is negligible relative

to total computation time for larger pedigrees (such as resampling as described below, and then sampleX(i)

(4)

the eight-meiosis pedigree for four full-sibs). Second, shown) indicate that standard errors tend to be lower with shorter chromosomes because of smaller numbers as the resampling interval increases, the standard error

of the estimates tends to increase. This increase in stan- of crossovers, chromosome 1 represents the “worst case.” We simulated sets of data with marker spacing of 1 cM dard error is particularly evident in larger pedigrees,

which are also the pedigrees for which resampling is (291 markers) and 10 cM (29 markers), with single-nucleotide polymorphisms (SNPs; allele frequencies 0.7 most beneficial. Thus small values of T (say 1ⱕ T ⱕ

5) are best for large pedigrees, and large values of T and 0.3), and with microsatellites (seven alleles with fre-quencies 0.4, 0.2, 0.2, 0.05, 0.05, 0.05, 0.05, and hetero-(say T ⱖ 5) are best for small pedigrees. We chose

to use T ⫽ 5 for simplicity throughout the examples zygosity 0.75). Calculation was performed under the chi-square model with m ⫽ 4 [giving PˆC(Y)] and, for

presented inresults.

We note that it is not necessary (and is very inefficient) comparison, under the independence model with the Kosambi map function [givingPI(Y)]. We look at three

to calculate PC(X((ij))) and PI(X((ij))) from scratch at each

resampling point. For the independence sampler,PI(X((ij))) relationships: two half-sibs (two meioses), an aunt-niece

pair (five meioses), and four full-siblings (eight meioses). equals PI(X((ij)⫺1)) multiplied by the probabilities of the

recombination pattern in X(i)

L⫺(j⫺1)T⫹1, XL(i⫺)(j⫺1)T, . . . , The data were simulated under the chi-square model

with m ⫽ 4, although this choice has little impact on X(i)

L⫺jT⫹1. For the chi-square sampler it is necessary to save

an m-dimensional vector of partial probabilities from the results. Table 1 shows results and computing times for one simulated chromosome for each map/relation-calculation ofPC(X((ij⫺)1)) for use in calculation ofPC(X((ij))).

Our implementation of the algorithm described ship combination.

From the results in Table 1, we see that the type above is available from the author on request.

Probabilities for linkage analysis and map construc- of marker has little impact on the computing time or standard errors, but does affect the magnitude of the tion:Linkage analysis and map construction are based

on probabilities of inheritance indicators Xlat a locus likelihoods. Marker spacing affects computing time, but

has little impact on standard errors. The most comput-lgiven the genotype information (Kruglyaket al.1996)

or the joint probabilities of inheritance indicators Xl ing-intensive part of the algorithm is sampling of the

inheritance patterns from the distribution PI(X|Y),

and Xl⫹1 at two neighboring loci given the genotype

information (LanderandGreen1987). Consider esti- which is of computational orderL2K

, so that computing time doubles for each additional meiosis. Hence, with mating

the current implementation, eight meioses is about the PC(Xl⫽xl|Y)⫽

兺

X

P(xl|X,Y)PC(X|Y) _{maximum that one would want to work with—10}5

itera-tions for chromosome 1 with a 1-cM map took ⵑ1 hr (fewer iterations and hence shorter computing time

⬀

_兺

X

P(xl|X)

PC(X)

PI(X)

PI(X|Y),

would suffice in some cases). Standard errors also in-crease with the number of meioses, so one would gener-where P(xl|X,Y)⫽ P(xl|X) is one if Xl ⫽ xl and zero _{ally want to increase the number of iterations as}

pedi-otherwise. An unbiased estimate of this probability is _{gree size increases.}

proportional to _{In considering whether an estimate of ln} _P

C(Y) is

sufficiently precise, one minimal requirement is that 1

n

兺

n

i⫽1

P(xl|X(i))

PC(X(i))

PI(X(i))

, _{the standard error be less than the difference between} lnPC(Y) and lnPI(Y), where lnPI(Y) is the natural log

whereX(1)_{, . . . ,}_X(n)_{are independent realizations from}

of the probability of the data under the independence PI(X|Y) and the constant of proportionality may be model using the Kosambi map function. If this

require-found by summing over all possible values ofxl. Probabil- ment is not satisfied, then the exact answer obtained

itiesPC(Xl⫽xl,Xl⫹1⫽xl⫹1|Y) may be estimated similarly. under the independence model is a better estimate of

Thus these probabilities may be estimated under inter- PC(Y) than that from the importance sampler. We note

ference models using the machinery presented here that the requirement is satisfied in all but 1 of the 12 and then used in linkage mapping or genetic map con- examples looked at, and in most of the examples a much struction. smaller number of iterations would have sufficed to meet this minimal requirement (in fact,⬍100 iterations, at approximately one-thousandth of the computing time

RESULTS

shown, would have sufficed in 7 out of 12 of the exam-ples). Thus 105_{iterations are more than enough (based}

We present results from analysis of simulated data, to

demonstrate the capabilities of the proposed approach. on this criterion) for this chromosome length, choice of model parameter m, and pedigree size, but more Our simulated data were designed to represent

hu-man chromosome 1, with an estimated genetic length iterations may be required for longer chromosomes, larger pedigrees, or different values ofm.

of 2.9 morgans (Bromanet al.1998). The other human

(5)

reduc-TABLE 1

Standard errors and computing times

Map Relationship lnPˆC(Y) ln(PˆC(Y)/PI(Y)) SE(lnPˆC(Y)) Time

10 cM, SNP 2 half-sibs ⫺45.09 0.03 0.003 0.4

Aunt-niece ⫺49.23 0.01 0.005 0.9

4 siblings ⫺94.61 0.3 0.008 6

10 cM, microsatellite 2 half-sibs ⫺158.15 0.3 0.003 0.3

Aunt-niece ⫺146.98 0.04 0.006 0.8

4 siblings ⫺197.21 0.003 0.005 6

1 cM, SNP 2 half-sibs ⫺518.82 0.02 0.003 8

Aunt-niece ⫺520.42 0.4 0.006 17

4 siblings ⫺717.80 1.0 0.009 65

1 cM, microsatellite 2 half-sibs ⫺1514.62 0.8 0.004 8

Aunt-niece ⫺1441.38 0.5 0.006 17

4 siblings ⫺1858.53 6.3 0.015 63

Estimates ofPC(Y) are based on 105iterations. Results are reported on the natural log scale. The data are

291 simulated microsatellite markers at 1-cM spacing. SE(ln PˆC(Y)) is the approximate standard error of ln

PˆC(Y) based on a first-order Taylor series expansion [standard error of PˆC(Y) divided by PˆC(Y)]. ln(PˆC(Y)/

PI(Y))⫽ln(PˆC(Y))⫺ln(PI(Y)) is provided for comparison with the standard error of the estimate (see the

text). Times are in minutes on a Sun Ultra 60 running at 450 MHz.

ing the amount of computing time required to achieve models. To directly apply the approach here, it is neces-a given stneces-andneces-ard error, with benefit increneces-asing with pedi- sary to be able to calculate the probability of a pattern gree size. For the aunt-niece and four full-sib relation- of recombinations over a series of markers. Such calcula-ships, savings in computational time due to the resam- tion may not be computationally feasible for some mod-pling were at least twofold and typically around fivefold. els. A more natural approach would involve sampling of not only the recombination pattern but also the actual locations of crossovers on the chromatid bundle under

DISCUSSION

the independence model and conditional on the geno-type dataY. Such an approach has the potential to be We have presented an algorithm for calculating

prob-very flexible, although it will add some noise, resulting abilities for pedigree genetic data under the chi-square

in lower precision for a given number of iterations of family of interference models. The method is based on

the importance sampler. importance sampling and thus gives an approximate

calculation with precision depending on the number _{The author thanks Reinhard Hopperger for programming} assis-of importance sampling iterations. The results shown tance and Terry Speed and the referees for helpful comments. This work was supported in part by a Research Starter Grant in Informatics

indicate that the algorithm is feasible for pedigrees with

from the PhRMA Foundation.

up to eight meioses and for any number of markers. Calculated probabilities of the data may be used as likeli-hoods for relationship inference, by repeating the

calcu-lation with differing pedigrees. In addition, the approach _{LITERATURE CITED} may be used to calculate probabilities of inheritance

Baum, L. E., 1972 An inequality and associated maximization

tech-patterns given the data for linkage gene mapping. _{nique in statistical estimation for probabilistic functions of}

Mar-kov processes, pp. 1–8 in Inequalities III: Proceedings of the Third

Computational time is higher than that under a

no-Symposium on Inequalities Held at The University of California, Los

interference assumption—in general, the computing

Angeles, September 1–9, 1969, edited by O.Shisha. Academic Press,

time without interference would be approximately the _{San Diego.}

time to perform one iteration of the importance sam- Boehnke, M., andN. J. Cox, 1997 Accurate inference of relation-ships in sib-pair linkage studies. Am. J. Hum. Genet.61:423–429.

pler used here; however, the important point is that the

Broman, K. W., andJ. L. Weber, 2000 Characterization of human

computation times for this method scale the same with _{crossover interference. Am. J. Hum. Genet.}_66:_1911–1926. increasing marker density and pedigree size as do those Broman, K. W., J. C. Murray, V. C. Sheffield, R. L. WhiteandJ. L.

Weber, 1998 Comprehensive human genetic maps: individual

for the basic no-interference algorithm ofLanderand

and sex-specific variation in recombination. Am. J. Hum. Genet.

Green(1987), so that with increasing computing speed, _63:_861–869.

the size of the problem that can be addressed with Broman, K. W., L. B. Rowe, G. A. ChurchillandK. Paigen, 2002 Crossover interference in the mouse. Genetics160:1123–1131.

interference will lag only a couple of steps behind that

Goldgar, D. E., P. R. FainandW. J. Kimberling, 1989

Chiasma-for no interference.

based models of multilocus recombination: increased power for

We have presented this algorithm for the chi-square _{exclusion mapping and gene ordering. Genomics}_5:_283–290.

Goldstein, D. R., H. ZhaoandT. P. Speed, 1995 Relative

(6)

cies for␹2_{models of recombination for exclusion mapping and}

jth entry e⫺y_yps⫹j⫺i_/(_ps ⫹_j⫺ _i_{)! (but with zero entries}

gene ordering. Genomics27:265–273.

whereps⫹j⫺i⬍0). Thei,jth entry ofDs(y) represents

Kruglyak, L., andE. S. Lander, 1998 Faster multipoint linkage

the probability that over genetic distancex,scrossovers

analysis using Fourier transforms. J. Comput. Biol.5:1–7.

Kruglyak, L., M. J. Daly, M. P. Reeve-Daly and E. S. Lander, _{occurred on the four-strand bundle, and the number} 1996 Parametric and nonparametric linkage analysis: a unified

of intermediate events (the events implied by the sum

multipoint approach. Am. J. Hum. Genet.58:1347–1363.

of m⫹1 independent exponential random variables)

Lander, E. S., andP. Green, 1987 Construction of multilocus

ge-netic linkage maps in humans. Proc. Natl. Acad. Sci. USA84: _{since the last crossover changed from}_i_to_j_{(hence the}

2363–2367.

zero terms, sincejcannot be less thaniifs⫽0 crossovers

Lin, S., and T. P. Speed, 1996 Incorporating crossover

interfer-occurred). For intervallbetween markerslandl⫹ 1,

ence into pedigree analysis using the␹2_{model. Hum. Hered.}_46:

315–322. _let_x

l be the genetic distance of the interval,yl⫽2(m⫹

Lin, S., andT. P. Speed, 1999 Relative efficiencies of the chi-square

1)xland defineNk⫽D0(yl)⫹1⁄2兺sⱖ1Ds(yl) andRl⫽1⁄2

recombination models for gene mapping with human pedigree

兺sⱖ1 Ds(yl). Then the probability of a recombination

data. Ann. Hum. Genet.63:81–95.

Lin, S., R. ChengandF. A. Wright, 2001 Genetic crossover interfer- _{pattern over the}_L⫺_{1 intervals between}_L_consecutive ence in the human genome. Ann. Hum. Genet.65:79–93.

loci on a chromosome is given by (1/p)1M1M2 . . .

Liu, J. S., 2001 Monte Carlo Strategies in Scientific Computing.

Springer-ML⫺11ⴕ, whereMl⫽Nlwhenever there is no

recombina-Verlag, New York.

McPeek, M. S., and T. P. Speed, 1995 Modeling interference in _{tion in the}_l_{th interval and}_M_l⫽_R_l_{when there is} recom-genetic recombination. Genetics139:1031–1044.

bination. See Zhaoet al.(1995b) for the proof of this

McPeek, M. S., andL. Sun, 2000 Statistical tests for detection of

result.

misspecified relationships by use of genome screen data. Am. J.

Hum. Genet.66:1076–1094. Sampling inheritance patterns conditional on the

ge-Speed, T. P., 1996 What is a genetic map function? pp. 65–88 in _{notype data under the independence model:}_T_hompson Genetic Mapping and DNA Sequencing(IMA Volumes in

Mathemat-(2000b, Sect. 8.4) describes how to sample the

inheri-ics and its Applications, Vol. 81), edited by T. P.Speedand M. S.

Waterman. Springer-Verlag, New York. tance patternX fromPI(X|Y), the probability

distribu-Thompson, E. A., 2000a MCMC estimation of multi-locus genome _{tion of the inheritance patterns given the genotype data} sharing and multipoint gene location scores. Int. Stat. Rev.68:

Yunder the independence model, and we give the

de-53–73.

tails here. LetXl⫽(Xl1,Xl2, . . . ,XlK) be the inheritance

Thompson, E. A., 2000b Statistical Inference From Genetic Data on

Pedi-grees. Institute of Mathematical Statistics, Beachwood, OH. _{vector at locus}_l_,_Y

lthe genotype at locusl, and

Weeks, D. E., G. M. LathropandJ. Ott, 1993 Multipoint mapping

under genetic interference. Hum. Hered.43:86–97. _␣

l(s)⫽P(Xl ⫽s,Y1,Y2, . . . ,Yl).

Zhao, H., M. S. McPeekandT. P. Speed, 1995a Statistical analysis

of chromatid interference. Genetics139:1057–1065. _{The terms}␣_{can be calculated recursively as} Zhao, H., T. P. SpeedandM. S. McPeek, 1995b Statistical analysis

of crossover interference using the chi-square model. Genetics ␣_l_⫹₁₍_s₎⫽

兺

t

P(Yl⫹1|Xl⫹1⫽s)P(Xl⫹1⫽s|Xl⫽t)␣l(t)

139:1045–1056.

Communicating editor: S.Tavare´ and

␣1(s)⫽P(Y1|X1⫽s)P(X1⫽ s).

APPENDIX This is an application of theBaum(1972) forward

algo-rithm and givesPI(Y)⫽兺s␣L(s). Then

Probability of recombination pattern for chi-square

models:We reproduce the method given inZhaoet al. ␣L(s)⫽ P(XL⫽s,Y)⬀P(XL⫽s|Y)

(1995b) for calculating the probability of a pattern of

can be used to sample XL from PI(X|Y). Recursively,

recombination between consecutive markers for a single

after samplingXl, sample Xl⫺1from

meiosis. Following their notation, let p ⫽ m ⫹ 1 and