Pedigree Data Analysis With Crossover Interference
Sharon Browning
1Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina 27695-7566 and GlaxoSmithKline, Research Triangle Park, North Carolina 27709
Manuscript received May 27, 2002 Accepted for publication March 7, 2003
ABSTRACT
We propose a new method for calculating probabilities for pedigree genetic data that incorporates crossover interference using the chi-square models. Applications include relationship inference, genetic map construction, and linkage analysis. The method is based on importance sampling of unobserved inheritance patterns conditional on the observed genotype data and takes advantage of fast algorithms for no-interference models while using reweighting to allow for interference. We show that the method is effective for arbitrarily many markers with small pedigrees.
E
XISTING methods for likelihood-based analysis of number of sample repetitions is increased. Because the samples are independent, the method is not plagued pedigree genotype data assume absence ofcross-over interference, even though such interference has by the difficulties in assessing convergence encountered in use of Markov chain Monte Carlo (MCMC) methods. been well documented in humans (Weeks et al. 1993;
Broman andWeber 2000; Lin et al.2001) and other The probabilities produced by this method may be
used for multipoint linkage mapping, genetic map con-species (Zhaoet al.1995b;Bromanet al.2002).
More-over, simulation studies have shown that failure to incor- struction, and relationship inference. The method can be used with arbitrarily many linked markers for small porate crossover interference into linkage analyses is
inefficient (Goldgaret al.1989;Goldsteinet al.1995; pedigrees. For small pedigrees the method is fast enough that it could be used on a routine basis in analysis of
LinandSpeed1999).
The major reason that crossover interference models pedigree genetic data. are not used is the lack of computationally feasible
meth-ods for working with more than a handful of linked
mark-BACKGROUND
ers.LinandSpeed(1996) present a method for exact
computation of probabilities for pedigree data with the It has long been known that the locations of multiple chi-square models; however, computational time increases crossovers on a chromosome at meiosis are not indepen-exponentially with both the number of meioses in the
dent, but exhibitcrossover interferencewhereby the exis-pedigree and the number of markers, severely limiting
tence of a crossover at one location suppresses the occur-its applicability.Thompson(2000a,b) has a Markov chain
rence of crossovers in nearby regions. A common approach Monte Carlo approach for incorporating crossover
in-is to use the Kosambi map function with methods that terference, but it is limited to at most 12 loci per
chromo-assume independence between recombination events some.
in nonoverlapping intervals. This approach does not have We propose a new method for calculating
probabili-any real effect, as genetic distances are used only for ties for pedigree genetic data that does incorporate
reporting results, while recombination fractions are used crossover interference. The method is based on
impor-at all steps in the calculimpor-ations. Thimpor-at is, observed recombi-tance sampling, which is a Monte Carlo technique for
nation fractions are converted to genetic distances with estimating the value of an integral or sum (in this case
the map function and are converted back to recombina-probabilities of the data, expressed as a sum over
unob-tion fracunob-tions with the same map funcunob-tion for use in the served inheritance indicators). Importance sampling
in-analysis. AsSpeed(1996) points out, the only adequate volves evaluating the integrand at independently
sam-way to incorporate crossover interference is with a point pled realizations from a probability distribution that is
process model. Such an approach can account for non-roughly proportional to the integrand. Correct
weight-independence of recombination between intervals. ing of the sampled values gives an unbiased estimate
A useful family of point process models for crossover that converges to the true value of the integral as the
interference is the chi-square models. These are renewal models for the occurrence of crossovers on the four-strand chromatid bundle. A parametermcontrols the 1Address for correspondence:GlaxoSmithKline, Five Moore Dr., MAI.
strength of interference:m⫽0 corresponds to no
inter-A112B.1G, Research Triangle Park, NC 27709.
E-mail: sharon.r.browning@gsk.com ference, whilem⫽4 corresponds to approximately the
level of interference found in humans (LinandSpeed tage of the special algorithms for independence models, while using reweighting to allow for interference. 1996). The genetic distance between crossovers on the
four-strand bundle is modeled as the sum of m ⫹ 1 independent exponential random variables each with
METHODS
rate 2(m⫹1). This sum follows a gamma, or scaled
chi-square, distribution. A gamete receives only one of the We present a method for calculating probabilities four chromatids, and since each crossover involves four for genotypic data under the chi-square models. The chromatids, the probability that a certain crossover on method is based on importance sampling of underlying the chromatid bundle is seen in the gamete is one- unobserved inheritance patterns. We focus primarily half. Assuming no chromatid interference (Zhaoet al. on calculation of the likelihood, which is equal to the 1995a), whether or not one crossover on the bundle is probability of the genotypic data under the assumed transmitted to the gamete is independent of the out- model. Likelihoods may be used to compare models, come for other crossovers. McPeek andSpeed(1995) for example, to find the most likely genealogical rela-andZhaoet al.(1995b) found that the chi-square mod- tionship between individuals (BoehnkeandCox1997). els provide an excellent fit to available data.Zhaoet al. In addition, the methodology presented here may be (1995b) provide a method for calculating the probabil- used to calculate probabilities of inheritance patterns ity of a pattern of recombination between consecutive given the genotypic data, which are used in genetic map markers for a single meiosis, which involves on the order construction and multipoint linkage analysis, and we ofLm2calculations forLmarker loci. We reproduce the
give details of those calculations at the end of this sec-details of this method in theappendixsince our method tion.
makes use of these probabilities. Importance sampling of inheritance indicators: For In pedigree data, the recombination patterns are gen- each marker locus l and meiosis k, let X
kl be a
zero-erally not directly observed, due to unobserved indi- one inheritance indicator. The parent involved in the viduals and insufficiently polymorphic markers. Thus, meiosis has two copies of the DNA, one maternal (0) to calculate probabilities for genotype data requires and the other paternal (1). The indicator Xdescribes summation over possible recombination patterns. For which of these two copies was transmitted to the off-Kmeioses and Lmarker loci there are 2K(L⫺1)possible
spring at this locus. If Xkl ⫽ Xk,l⫹1 no recombination
recombination patterns, so this computation, if per- occurred on meiosiskover the interval between markers formed exactly, rapidly becomes infeasible for increas- landl⫹1, while recombination occurred ifX
kl⬆Xk,l⫹1.
ing numbers of markers and meioses. The inheritance pattern X represents the inheritance When independence (i.e., no-interference) models indicators over all meioses and loci.
for recombination are used, exploitation of the inde- We can write the probability of the genotype dataY pendence can reduce the number of computations to as a sum over inheritance patterns
the order ofLK2K (Kruglyak and Lander1998), so
PC(Y)⫽
兺
XPC(Y|X)PC(X),
that large numbers of markers may be analyzed, while the size of the pedigree (measured by the number of
meioses) must be restricted. McPeek andSun (2000) where PC denotes probabilities under the chi-square
crossover model, with a given choice of parameter m. describe a related approach for the chi-square models;
however, the number of computations required by such Since PC(Y|X)PC(X)⫽ PC(X, Y)⬀ PC(X|Y), only those
terms with large values of PC(X|Y) contribute
signifi-an approach is proportional toL4K(m⫹1), so that for
mod-erate values of m, calculation is feasible only with ex- cantly to the sum. An importance-sampling approach aims to sample with high frequency those terms with tremely small pedigrees. Thus in general to calculate
exact probabilities for pedigree data with the chi-square significant contribution, while sampling with low fre-quency the terms with negligible contribution. Ideally or other interference models involves summing over all
2KL recombination patterns, resulting in a computa- we would like to sample from
PC(X|Y), but we have no
way to do so directly. Instead we sample from PI(X|Y),
tional burden that increases exponentially with both
the number of markers and the number of meioses (Lin wherePIdenotes probabilities under the independence
model. The probabilities PI(X|Y) are sufficiently close
and Speed 1996). This is the approach taken by Lin
andSpeed(1999) in their work showing the efficiency toPC(X|Y) to result in a useful importance sampler.
In what follows we assume that probabilities for geno-gains obtainable using the chi-square model in gene
map-ping with pedigree data. They were able to work with 12 typesYlat a markerlare conditionally independent of
genotypes and inheritance patterns at other markers meioses (a three-generational family with four
grand-parents, two grand-parents, and four children) and seven given the inheritance pattern Xl at the marker. This
assumption holds if the markers are in linkage equilib-marker loci.
To reduce the computational burden while incorpo- rium, which will be approximately true if all the geno-typed individuals come from a single homogeneous rating crossover interference, we propose an
A consequence of this assumption is that probabilities X(i)
L⫺T⫺1, . . . ,XL(i⫺)2T⫹1, perform resampling, and continue
in this fashion until reachingX1at the end of the
chro-of genotypes Y given inheritance patterns X do not
depend on the crossover model, soPC(Y|X)⫽PI(Y|X)⬅ mosome. As noted inLiu(2001) it is best not to
resam-ple after samplingX1, as doing so would only add noise.
P(Y|X). ThenP(Y|X)⫽ PI(Y|X)⫽ PI(X|Y)PI(Y)/PI(X)
and we can write WriteX(i)
(j)for theith partial sample at thejth
resam-pling. For example,X(i)
(1)⫽{X(Li),XL(i⫺)1, . . . , X(Li⫺)T⫹1} and
PC(Y)⫽
兺
XP(Y|X)PC(X) X(i)
(2)⫽{X(Li),XL(i⫺)1, . . . ,X(Li⫺)2T⫹1}. The resampling weight
for this sample is
⫽
兺
X
PI(Y)PC(X)
PI(X)
PI(X|Y).
w(i)
j ⫽w(j⫺i)1
PC(X((ij)))/PC(X((ij⫺)1))
PI(X((ij)))/PI(X((ij)⫺1))
Thus ifX(1),X(2), . . . ,X(n)are sampled fromP
I(X|Y), an
unbiased estimate ofPC(Y) is given by
forj⬎ 1, while at the first resampling
PˆC(Y)⫽
兺
n
i⫽1
PI(Y)PC(X(i))
PI(X(i))
.
w(i) 1 ⫽
PC(X((ij)))
PI(X((ij)))
.
Thompson(2000b, Sect. 8.4) describes how to sample
At the jth resampling, let Wj⫽ 兺iB⫽1w(ji). Start by
re-XfromPI(X|Y). LetXl⫽(Xl1,Xl2, . . . ,XlK) be the
inheri-tainingki⫽[Bw(ji)/Wj] copies of X((ij)), where [ ] is the
tance vector at locusl. Making use of the hidden-Markov
floor function giving the largest integer less than or structure (the inheritance vectors {X1, . . . , XL} form
equal to its argument. The number of elements remain-a hidden-Mremain-arkov model for the genotypes under the
ing to be resampled isr⫽B⫺兺B
i⫽1ki. Add to the set of
independence model), we can recursively sample XL
retained inheritance patterns r independent random conditional on the genotype data and then sampleXL⫺1 draws from theX(i)
(j)with probabilities proportional to
given the realized value ofXL and the genotype data
Bw(i)
j /Wj⫺ki,i ⫽ 1, 2, . . . ,B. Now reset the weights
and so on through toX1to obtain one realization ofX.
w(i)
j toWj/Bfori⫽1, 2, . . . ,Band continue with the
The sampling algorithm has computational orderL4K,
sequential sampling of the inheritance patterns. which is the same as for the original algorithm ofLander
Let J ⫽[(L ⫺1)/T] be the total number of
resam-andGreen(1987) for the independence model. We give
pling points. After the samples are completed, the further details in theappendix.
weight of theith sample X(i)is
Improved performance through resampling: This method is an example of sequential importance
sam-w(i)⫽w(i)
J
PC(X(i))/PC(X((iJ)))
PI(X(i))/PI(X((iJ)))
. pling (Liu 2001, Sect. 2.6.3), because the inheritance
patternsXare sampled sequentially along each
chromo-An unbiased estimate ofP(Y) is some. Improvement, in terms of reducing the standard
error of the estimate for a fixed number of iterations
PˆC(Y)⫽
1 n
兺
n
i⫽1
w(i)P I(Y).
n, may be obtained by resampling (Liu2001, Sect. 3.4.4). The idea is to concurrently sample a number of
inheri-tance patternsX. After sampling the inheritance indica- The values obtained from a single batch are correlated tors at a couple of loci, writeX(i)
p for the partial inheri- and cannot be used to estimate the standard error of
tance pattern in realization i. We then calculate the PˆC(Y). To estimate the standard error we divide our weightPC(X(pi))/PI(X(pi)) for eachi. If this weight is small total number of iterations into batches of sizeB. Choice
for a particular i the sample is not worth continuing, of B has negligible effect on computing time. Small and it is replaced by a copy of one of the samples with choices ofBare less efficient (i.e., give larger standard a larger weight. Sampling continues for further loci error) because resampling is less effective with a smaller (punctuated by further resampling), and samples that pool. Thus it is best to chooseBto be reasonably large, had been duplicated tend to differ in the values sampled subject to computer memory limitations and to having at the remaining loci. Thus the samples are correlated, a sufficient number of batches to give a good estimate but the final weights PC(X(i))/PI(X(i)) tend to be less of standard error (say at least 20 batches). We usedB⫽
variable than they would be without resampling, which 100 in all the examples presented inresults.
reduces the standard error of the estimatePˆC(Y). A couple of issues are involved in best choice of
resam-We now give more details of the resampling algo- pling intervalT. First, resampling involves computational rithm, in which we apply the method of residual resam- time, and thus frequent resampling can be computa-pling described in Sect. 3.4.4 ofLiu (2001). First we set tionally expensive. The computational time for resam-a bresam-atch sizeB and a resampling intervalT (choice of pling is essentially independent of the pedigree size; these values is discussed below). We start by sampling thus the expense of resampling is most notable in com-X(i)
L, X(Li⫺)1, . . . , X(Li⫺)T⫹1 for i ⫽ 1, 2, . . . , B, perform putations on small pedigrees and is negligible relative
to total computation time for larger pedigrees (such as resampling as described below, and then sampleX(i)
the eight-meiosis pedigree for four full-sibs). Second, shown) indicate that standard errors tend to be lower with shorter chromosomes because of smaller numbers as the resampling interval increases, the standard error
of the estimates tends to increase. This increase in stan- of crossovers, chromosome 1 represents the “worst case.” We simulated sets of data with marker spacing of 1 cM dard error is particularly evident in larger pedigrees,
which are also the pedigrees for which resampling is (291 markers) and 10 cM (29 markers), with single-nucleotide polymorphisms (SNPs; allele frequencies 0.7 most beneficial. Thus small values of T (say 1ⱕ T ⱕ
5) are best for large pedigrees, and large values of T and 0.3), and with microsatellites (seven alleles with fre-quencies 0.4, 0.2, 0.2, 0.05, 0.05, 0.05, 0.05, and hetero-(say T ⱖ 5) are best for small pedigrees. We chose
to use T ⫽ 5 for simplicity throughout the examples zygosity 0.75). Calculation was performed under the chi-square model with m ⫽ 4 [giving PˆC(Y)] and, for
presented inresults.
We note that it is not necessary (and is very inefficient) comparison, under the independence model with the Kosambi map function [givingPI(Y)]. We look at three
to calculate PC(X((ij))) and PI(X((ij))) from scratch at each
resampling point. For the independence sampler,PI(X((ij))) relationships: two half-sibs (two meioses), an aunt-niece
pair (five meioses), and four full-siblings (eight meioses). equals PI(X((ij)⫺1)) multiplied by the probabilities of the
recombination pattern in X(i)
L⫺(j⫺1)T⫹1, XL(i⫺)(j⫺1)T, . . . , The data were simulated under the chi-square model
with m ⫽ 4, although this choice has little impact on X(i)
L⫺jT⫹1. For the chi-square sampler it is necessary to save
an m-dimensional vector of partial probabilities from the results. Table 1 shows results and computing times for one simulated chromosome for each map/relation-calculation ofPC(X((ij⫺)1)) for use in calculation ofPC(X((ij))).
Our implementation of the algorithm described ship combination.
From the results in Table 1, we see that the type above is available from the author on request.
Probabilities for linkage analysis and map construc- of marker has little impact on the computing time or standard errors, but does affect the magnitude of the tion:Linkage analysis and map construction are based
on probabilities of inheritance indicators Xlat a locus likelihoods. Marker spacing affects computing time, but
has little impact on standard errors. The most comput-lgiven the genotype information (Kruglyaket al.1996)
or the joint probabilities of inheritance indicators Xl ing-intensive part of the algorithm is sampling of the
inheritance patterns from the distribution PI(X|Y),
and Xl⫹1 at two neighboring loci given the genotype
information (LanderandGreen1987). Consider esti- which is of computational orderL2K
, so that computing time doubles for each additional meiosis. Hence, with mating
the current implementation, eight meioses is about the PC(Xl⫽xl|Y)⫽
兺
X
P(xl|X,Y)PC(X|Y) maximum that one would want to work with—105
itera-tions for chromosome 1 with a 1-cM map took ⵑ1 hr (fewer iterations and hence shorter computing time
⬀
兺
X
P(xl|X)
PC(X)
PI(X)
PI(X|Y),
would suffice in some cases). Standard errors also in-crease with the number of meioses, so one would gener-where P(xl|X,Y)⫽ P(xl|X) is one if Xl ⫽ xl and zero ally want to increase the number of iterations as
pedi-otherwise. An unbiased estimate of this probability is gree size increases.
proportional to In considering whether an estimate of ln P
C(Y) is
sufficiently precise, one minimal requirement is that 1
n
兺
n
i⫽1
P(xl|X(i))
PC(X(i))
PI(X(i))
, the standard error be less than the difference between lnPC(Y) and lnPI(Y), where lnPI(Y) is the natural log
whereX(1), . . . ,X(n)are independent realizations from
of the probability of the data under the independence PI(X|Y) and the constant of proportionality may be model using the Kosambi map function. If this
require-found by summing over all possible values ofxl. Probabil- ment is not satisfied, then the exact answer obtained
itiesPC(Xl⫽xl,Xl⫹1⫽xl⫹1|Y) may be estimated similarly. under the independence model is a better estimate of
Thus these probabilities may be estimated under inter- PC(Y) than that from the importance sampler. We note
ference models using the machinery presented here that the requirement is satisfied in all but 1 of the 12 and then used in linkage mapping or genetic map con- examples looked at, and in most of the examples a much struction. smaller number of iterations would have sufficed to meet this minimal requirement (in fact,⬍100 iterations, at approximately one-thousandth of the computing time
RESULTS
shown, would have sufficed in 7 out of 12 of the exam-ples). Thus 105iterations are more than enough (based
We present results from analysis of simulated data, to
demonstrate the capabilities of the proposed approach. on this criterion) for this chromosome length, choice of model parameter m, and pedigree size, but more Our simulated data were designed to represent
hu-man chromosome 1, with an estimated genetic length iterations may be required for longer chromosomes, larger pedigrees, or different values ofm.
of 2.9 morgans (Bromanet al.1998). The other human
reduc-TABLE 1
Standard errors and computing times
Map Relationship lnPˆC(Y) ln(PˆC(Y)/PI(Y)) SE(lnPˆC(Y)) Time
10 cM, SNP 2 half-sibs ⫺45.09 0.03 0.003 0.4
Aunt-niece ⫺49.23 0.01 0.005 0.9
4 siblings ⫺94.61 0.3 0.008 6
10 cM, microsatellite 2 half-sibs ⫺158.15 0.3 0.003 0.3
Aunt-niece ⫺146.98 0.04 0.006 0.8
4 siblings ⫺197.21 0.003 0.005 6
1 cM, SNP 2 half-sibs ⫺518.82 0.02 0.003 8
Aunt-niece ⫺520.42 0.4 0.006 17
4 siblings ⫺717.80 1.0 0.009 65
1 cM, microsatellite 2 half-sibs ⫺1514.62 0.8 0.004 8
Aunt-niece ⫺1441.38 0.5 0.006 17
4 siblings ⫺1858.53 6.3 0.015 63
Estimates ofPC(Y) are based on 105iterations. Results are reported on the natural log scale. The data are
291 simulated microsatellite markers at 1-cM spacing. SE(ln PˆC(Y)) is the approximate standard error of ln
PˆC(Y) based on a first-order Taylor series expansion [standard error of PˆC(Y) divided by PˆC(Y)]. ln(PˆC(Y)/
PI(Y))⫽ln(PˆC(Y))⫺ln(PI(Y)) is provided for comparison with the standard error of the estimate (see the
text). Times are in minutes on a Sun Ultra 60 running at 450 MHz.
ing the amount of computing time required to achieve models. To directly apply the approach here, it is neces-a given stneces-andneces-ard error, with benefit increneces-asing with pedi- sary to be able to calculate the probability of a pattern gree size. For the aunt-niece and four full-sib relation- of recombinations over a series of markers. Such calcula-ships, savings in computational time due to the resam- tion may not be computationally feasible for some mod-pling were at least twofold and typically around fivefold. els. A more natural approach would involve sampling of not only the recombination pattern but also the actual locations of crossovers on the chromatid bundle under
DISCUSSION
the independence model and conditional on the geno-type dataY. Such an approach has the potential to be We have presented an algorithm for calculating
prob-very flexible, although it will add some noise, resulting abilities for pedigree genetic data under the chi-square
in lower precision for a given number of iterations of family of interference models. The method is based on
the importance sampler. importance sampling and thus gives an approximate
calculation with precision depending on the number The author thanks Reinhard Hopperger for programming assis-of importance sampling iterations. The results shown tance and Terry Speed and the referees for helpful comments. This work was supported in part by a Research Starter Grant in Informatics
indicate that the algorithm is feasible for pedigrees with
from the PhRMA Foundation.
up to eight meioses and for any number of markers. Calculated probabilities of the data may be used as likeli-hoods for relationship inference, by repeating the
calcu-lation with differing pedigrees. In addition, the approach LITERATURE CITED may be used to calculate probabilities of inheritance
Baum, L. E., 1972 An inequality and associated maximization
tech-patterns given the data for linkage gene mapping. nique in statistical estimation for probabilistic functions of
Mar-kov processes, pp. 1–8 in Inequalities III: Proceedings of the Third
Computational time is higher than that under a
no-Symposium on Inequalities Held at The University of California, Los
interference assumption—in general, the computing
Angeles, September 1–9, 1969, edited by O.Shisha. Academic Press,
time without interference would be approximately the San Diego.
time to perform one iteration of the importance sam- Boehnke, M., andN. J. Cox, 1997 Accurate inference of relation-ships in sib-pair linkage studies. Am. J. Hum. Genet.61:423–429.
pler used here; however, the important point is that the
Broman, K. W., andJ. L. Weber, 2000 Characterization of human
computation times for this method scale the same with crossover interference. Am. J. Hum. Genet.66:1911–1926. increasing marker density and pedigree size as do those Broman, K. W., J. C. Murray, V. C. Sheffield, R. L. WhiteandJ. L.
Weber, 1998 Comprehensive human genetic maps: individual
for the basic no-interference algorithm ofLanderand
and sex-specific variation in recombination. Am. J. Hum. Genet.
Green(1987), so that with increasing computing speed, 63:861–869.
the size of the problem that can be addressed with Broman, K. W., L. B. Rowe, G. A. ChurchillandK. Paigen, 2002 Crossover interference in the mouse. Genetics160:1123–1131.
interference will lag only a couple of steps behind that
Goldgar, D. E., P. R. FainandW. J. Kimberling, 1989
Chiasma-for no interference.
based models of multilocus recombination: increased power for
We have presented this algorithm for the chi-square exclusion mapping and gene ordering. Genomics5:283–290.
Goldstein, D. R., H. ZhaoandT. P. Speed, 1995 Relative
cies for2models of recombination for exclusion mapping and
jth entry e⫺yyps⫹j⫺i/(ps ⫹j⫺ i)! (but with zero entries
gene ordering. Genomics27:265–273.
whereps⫹j⫺i⬍0). Thei,jth entry ofDs(y) represents
Kruglyak, L., andE. S. Lander, 1998 Faster multipoint linkage
the probability that over genetic distancex,scrossovers
analysis using Fourier transforms. J. Comput. Biol.5:1–7.
Kruglyak, L., M. J. Daly, M. P. Reeve-Daly and E. S. Lander, occurred on the four-strand bundle, and the number 1996 Parametric and nonparametric linkage analysis: a unified
of intermediate events (the events implied by the sum
multipoint approach. Am. J. Hum. Genet.58:1347–1363.
of m⫹1 independent exponential random variables)
Lander, E. S., andP. Green, 1987 Construction of multilocus
ge-netic linkage maps in humans. Proc. Natl. Acad. Sci. USA84: since the last crossover changed fromitoj(hence the
2363–2367.
zero terms, sincejcannot be less thaniifs⫽0 crossovers
Lin, S., and T. P. Speed, 1996 Incorporating crossover
interfer-occurred). For intervallbetween markerslandl⫹ 1,
ence into pedigree analysis using the2model. Hum. Hered.46:
315–322. letx
l be the genetic distance of the interval,yl⫽2(m⫹
Lin, S., andT. P. Speed, 1999 Relative efficiencies of the chi-square
1)xland defineNk⫽D0(yl)⫹1⁄2兺sⱖ1Ds(yl) andRl⫽1⁄2
recombination models for gene mapping with human pedigree
兺sⱖ1 Ds(yl). Then the probability of a recombination
data. Ann. Hum. Genet.63:81–95.
Lin, S., R. ChengandF. A. Wright, 2001 Genetic crossover interfer- pattern over theL⫺1 intervals betweenLconsecutive ence in the human genome. Ann. Hum. Genet.65:79–93.
loci on a chromosome is given by (1/p)1M1M2 . . .
Liu, J. S., 2001 Monte Carlo Strategies in Scientific Computing.
Springer-ML⫺11ⴕ, whereMl⫽Nlwhenever there is no
recombina-Verlag, New York.
McPeek, M. S., and T. P. Speed, 1995 Modeling interference in tion in thelth interval andMl⫽Rlwhen there is recom-genetic recombination. Genetics139:1031–1044.
bination. See Zhaoet al.(1995b) for the proof of this
McPeek, M. S., andL. Sun, 2000 Statistical tests for detection of
result.
misspecified relationships by use of genome screen data. Am. J.
Hum. Genet.66:1076–1094. Sampling inheritance patterns conditional on the
ge-Speed, T. P., 1996 What is a genetic map function? pp. 65–88 in notype data under the independence model:Thompson Genetic Mapping and DNA Sequencing(IMA Volumes in
Mathemat-(2000b, Sect. 8.4) describes how to sample the
inheri-ics and its Applications, Vol. 81), edited by T. P.Speedand M. S.
Waterman. Springer-Verlag, New York. tance patternX fromPI(X|Y), the probability
distribu-Thompson, E. A., 2000a MCMC estimation of multi-locus genome tion of the inheritance patterns given the genotype data sharing and multipoint gene location scores. Int. Stat. Rev.68:
Yunder the independence model, and we give the
de-53–73.
tails here. LetXl⫽(Xl1,Xl2, . . . ,XlK) be the inheritance
Thompson, E. A., 2000b Statistical Inference From Genetic Data on
Pedi-grees. Institute of Mathematical Statistics, Beachwood, OH. vector at locusl,Y
lthe genotype at locusl, and
Weeks, D. E., G. M. LathropandJ. Ott, 1993 Multipoint mapping
under genetic interference. Hum. Hered.43:86–97. ␣
l(s)⫽P(Xl ⫽s,Y1,Y2, . . . ,Yl).
Zhao, H., M. S. McPeekandT. P. Speed, 1995a Statistical analysis
of chromatid interference. Genetics139:1057–1065. The terms␣can be calculated recursively as Zhao, H., T. P. SpeedandM. S. McPeek, 1995b Statistical analysis
of crossover interference using the chi-square model. Genetics ␣l⫹1(s)⫽
兺
t
P(Yl⫹1|Xl⫹1⫽s)P(Xl⫹1⫽s|Xl⫽t)␣l(t)
139:1045–1056.
Communicating editor: S.Tavare´ and
␣1(s)⫽P(Y1|X1⫽s)P(X1⫽ s).
APPENDIX This is an application of theBaum(1972) forward
algo-rithm and givesPI(Y)⫽兺s␣L(s). Then
Probability of recombination pattern for chi-square
models:We reproduce the method given inZhaoet al. ␣L(s)⫽ P(XL⫽s,Y)⬀P(XL⫽s|Y)
(1995b) for calculating the probability of a pattern of
can be used to sample XL from PI(X|Y). Recursively,
recombination between consecutive markers for a single
after samplingXl, sample Xl⫺1from
meiosis. Following their notation, let p ⫽ m ⫹ 1 and