• No results found

Simultaneous Fine Mapping of Multiple Closely Linked Quantitative Trait Loci Using Combined Linkage Disequilibrium and Linkage With a General Pedigree

N/A
N/A
Protected

Academic year: 2020

Share "Simultaneous Fine Mapping of Multiple Closely Linked Quantitative Trait Loci Using Combined Linkage Disequilibrium and Linkage With a General Pedigree"

Copied!
9
0
0

Loading.... (view fulltext now)

Full text

(1)

DOI: 10.1534/genetics.106.057653

Simultaneous Fine Mapping of Multiple Closely Linked Quantitative

Trait Loci Using Combined Linkage Disequilibrium and

Linkage With a General Pedigree

S. H. Lee

1

and J. H. J. Van der Werf

School of Rural Science and Agriculture and The Institute of Genetics and Bioinformatics, University of New England, Armidale, NSW 2351, Australia

Manuscript received February 26, 2006 Accepted for publication May 26, 2006

ABSTRACT

Within a small region (e.g., ,10 cM), there can be multiple quantitative trait loci (QTL) underlying phenotypes of a trait. Simultaneous fine mapping of closely linked QTL needs an efficient tool to remove confounded shade effects among QTL within such a small region. We propose a variance component method using combined linkage disequilibrium (LD) and linkage information and a reversible jump Markov chain Monte Carlo (MCMC) sampling for model selection. QTL identity-by-descent (IBD) co-efficients between individuals are estimated by a hybrid MCMC combining the random walk and the meiosis Gibbs sampler. These coefficients are used in a mixed linear model and an empirical Bayesian procedure combines residual maximum likelihood (REML) to estimate QTL effects and a reversible jump MCMC that samples the number of QTL and the posterior QTL intensities across the tested region. Note that two MCMC processes are used,i.e., an (internal) MCMC for IBD estimation and an (external) MCMC for model selection. In a simulation study, the use of the multiple-QTL model clearly removes the shade effects between three closely linked QTL located at 1.125, 3.875, and 7.875 cM across the region of 10 cM, using 40 markers at 0.25-cM intervals. It is shown that the use of combined LD and linkage information gives much more useful information compared to using linkage information alone for both single- and multiple-QTL analyses. When using a lower marker density (11 markers at 1-cM intervals), the signal of the second QTL can disappear. Extreme values of past effective size (resulting in extreme levels of LD) decrease the mapping accuracy.

T

HE information from genetic markers has been a valuable resource to detect quantitative trait loci (QTL) in natural, outbred, or experimental popula-tions. The inheritance state of markers has been used to detect existence of quantitative trait variation associ-ated with a putative QTL by using a pedigree-based linkage mapping (Landerand Botstein1989; Haley and Knott1992; Anderssonet al.1994; Georgeset al. 1995). The use of closely linked markers allows also use of linkage disequilibrium (LD) information and positioning of a QTL within a few centimorgans, i.e., fine mapping (Riquet et al. 1999; Meuwissen and Goddard2000; Farniret al.2002; Grisartet al.2002; Meuwissenet al.2002; Perez-Enciso2003).

In some cases, there may be multiple QTL underlying phenotypes of a trait within a small region (,10 cM). The effects of closely linked QTL can be easily con-founded, which may negatively affect precision and accuracy of mapping of each QTL. Multiple QTL in such a small region may be considered as a single QTL; therefore the confidence interval covers all of the

region. This makes fine mapping of each QTL impos-sible. The question is how easily multiple QTL within a small region can be accurately mapped.

Shade effects of neighboring QTL (e.g., ghost QTL) have been discussed for linkage mapping by Haley and Knott(1992) and Martinezand Curnow(1992). Jansen(1993) and Zeng(1993) have proposed a com-bined method of interval mapping and multiple re-gression on marker genotypes for mapping multiple QTL simultaneously. In the method, the effects in each putative position are sequentially estimated, conditional on cofactors, i.e., selected markers associated with sig-nificant QTL and linked or unlinked to the tested puta-tive QTL position. Due to using cofactors, the effects in a putative position are more accurately estimated, re-sulting in an improvement of QTL mapping. A decision should be made on which markers should be selected as cofactors and how many cofactors should be simulta-neously used, which is a model selection problem.

Model selection strategies have been developed and applied for this problem in gene mapping (Broman and Speed 2002; Sillanpaa and Corander 2002). Stepwise selection ( Jansen1993; Kaoet al.1999; Basten

et al. 2000) is a standard model selection technique

1Corresponding author:School of Rural Science and Agriculture, UNE,

Armidale, NSW 2351, Australia. E-mail: [email protected]

(2)

where the Akaike information criterion (Akaike1969) or Bayesian information criterion (Schwarz1978) can be used to correct the model likelihood for the number of parameters fitted. Randomized approaches such as Markov chain Monte Carlo (MCMC) or genetic algo-rithms have been proposed to find an optimal model (Satagopan et al. 1996; Sillanpaa and Arjas 1998; Calborget al.2000; Yiand Xu2000; Nakamichiet al. 2001; Meuwissenand Goddard2004).

Green (1995) proposed a reversible jump MCMC that allows the Markov chain to surface state space across different model dimensions according to the cor-rect posterior distribution. This is a generalization of Metropolis–Hastings methods (Metropoliset al.1953; Hastings 1970) dealing with model selection prob-lems. This technique has been used in multiple-QTL analysis to estimate the number of QTL and their po-sitions in linkage mapping (Heath 1997; Sillanpaa and Arjas1998, 1999; Leeand Thomas 2000; Yiand Xu2000; Yiet al.2003).

In this article, we propose the use of a reversible jump MCMC in a variance component approach using com-bined LD and linkage (LDL) information to investigate multiple-QTL mapping in a small region. The use of populationwide LD can give critical information about different identity-by-descent (IBD) probabilities in dif-ferent chromosome segments where typically the size of a segment can be>1 cM. We hypothesize that it should be possible to simultaneously map multiple closely linked QTL with a proper model selection approach such as a reversible jump MCMC. The aim of this study is to investigate by simulation the performance of LDL mapping of closely linked multiple QTL within a small region.

MATERIALS AND METHODS

Simulation study:One hundred generations of a historical population with effective size of 100 were simulated for 40 markers and three QTL in a 10-cM region. No pedigree was recorded for this historical population. The markers are evenly spaced every 0.25 cM and the QTL are positioned at 1.125, 3.875, and 7.875 cM from the first marker. In each generation, the number of male and female parents was 50 and their alleles were transmitted to descendants on the basis of Mendelian segregation using the gene-dropping method (MacClueret al.1986). Parents were randomly mated with a

total of two offspring for each of 50 mating pairs.

The number of base alleles in each marker locus was four and starting allele frequencies were all at 0.25. The marker alleles were mutated at a rate of 43104/generation (Dallas

1992; Weberand Wong1993; Ellegren1995). Therefore,

this historical population would have an equilibrium distribu-tion of alleles in all marker loci and LD between the QTL and closely flanking markers.

For each QTL, one of the base alleles surviving with a frequency (p) of.0.1 and,0.9 was randomly chosen and treated as favorable with additive effectq and nonadditive effectdcompared to other QTL alleles in generation 100. The magnitudes of q and d were determined from s2

q ¼

2pð1pÞ½q1dð12pÞ2 ands2

d¼ ½2pð1pÞd

2

with fixed variances ofs2

q ¼10;8, and 6 ands

2

d¼5;4, and 3 for the first,

second, and last QTL. Pedigree, marker genotypes, and phe-notypic values were assumed known only for generations 100 and 101 each with 100 animals. Phenotypes were simu-lated as

y¼m1X

3

i¼1

ðqi1diÞ1u1e:

The mean of population (m) was 100, values foruwere drawn fromNð0;As2

uÞwiths2u¼20, whereAignored the ancestral

relationships beyond the known pedigree, and values for e were fromNð0;Is2

eÞwiths

2

e ¼50. On the basis of this

sim-ulation model, we estimated positions of multiple QTL for 40 replicates.

Mixed linear model:A vector of phenotypic observations is written as a linear function of fixed effects, a polygenic term representing the sum of other unidentified additive genetic effects, the additive and nonadditive effects due tonQTL and residuals. It is assumed that there is no epistatic interaction between QTL. The model can be written as

y¼Xb1Zu1X

n

i¼1

ðZqi1ZdiÞ1e; ð1Þ

whereyis a vector ofNrobservations on the trait of interest,bis

a vector of fixed effects,uis a vector ofNurandom polygenic

effects for each animal,qianddiare a vector ofNqadditive and

nonadditive random effects due to theith putative QTL, and eare residuals. Note that usuallyNu ¼Nq is the number of

animals in the pedigree. The random effects (u,qi,di, ande)

are assumed to be normally distributed with mean zero and varianceAs2

u,Gis2qi, Dis 2

di, andIs 2

e, whereAis a numerator

relationship matrix,GiandDiare an additive and nonadditive

genotype relationship matrix at theith putative QTL position, and I is aNr-order identity matrix. Xand Z are incidence

matrices for the effectsbandu,qi anddi, respectively. The

associated variance–covariance matrix (V) of all observations given pedigree and marker genotypes is modeled as

V ¼ZAZ9s2u1X

n

i¼1

ðZGiZ9s2qi1ZDiZ9s2diÞ1Is

2

e: ð2Þ

The LDL-based IBD distribution and covariance structure among chromosome segments or haplotypes are accommo-dated in the matrixGandDused in a variance component approach that treats QTL as random effects. Due to its gen-erality and robustness, the variance component approach has been widely used in mapping studies (Georgeet al.2000; Yi

and Xu2000; Lundet al.2003; Perez-Enciso2003). ForGand

D, a sampler combining the random walk approach (Sobel

and Lange1996) and the meiosis Gibbs sampler (Thompson

and Heath1999) is used, which is robust and efficient

es-pecially for a complex pedigree, many markers, and missing genotypes (Lee et al. 2005). These G and D are then

in-corporated as known quantities into the QTL model selection in a two-step approach (Georgeet al.2000). The QTL model is

defined by the number of QTL and their positions, which are sampled from a proposal distribution. For a given QTL model, residual maximum-likelihood (REML) estimates for the model parameters are obtained, which is an empirical Bayesian ap-proach (Casella2001). The proposed variables and model

(3)

MCMC approach estimating IBD probabilities G and D given marker data: IBD coefficients between individuals can be estimated according to the pattern of inheritance states (S). The probability of one configuration of S given observed marker data is

prðSjGÞ ¼PprðGjSÞprðSÞ SprðGjSÞprðSÞ

; ð3Þ

where G represents the observed marker data, pr(S) is the prior probability of the segregation state, pr(GjS) is the prob-ability of the observed marker data given the S, and the denominator is summed over the probabilities of all possible configurations ofS. Since the computation of the denomina-tor is infeasible in a general pedigree with many markers, a MCMC approach is used to surface all possibleSaccording to the posterior distribution.

Sampling schemes for segregation states: In a MCMC method, updated variables for segregation states are proposed on the basis of an approximate distribution and acceptance for the updated variables is determined on the basis of the Metropolis– Hastings algorithm (Metropoliset al.1953; Hastings1970),

which gives the correct equilibrium distribution of segregation states. In a Gibbs sampler (a special case of MCMC), updated variables are always accepted because they are sampled on the basis of the correct distribution.

In the MCMC process used in this study, the meiosis Gibbs sampler (Thompsonand Heath1999) is first applied to all

loci for every individual. During the meiosis sampler, potential reducible sites can be found where transition probabilities from a current state to other states are zero. After one cycle of the meiosis sampler, the random-walk approach (Sobeland

Lange1996) is carried out for proposing segregation states

where the size and direction are randomly determined. If proposed variables include any potentially reducible sites de-tected in the meiosis sampler, proposal variables are accepted as new variables with a Metropolis acceptance probability (4). This combined sampler is computationally efficient and has a better mixing property:

acurrent;new¼min 1;

prðSnewjGÞ

prðScurrentjGÞ

¼min 1; prðGjSnewÞ prðGjScurrentÞ

ð4Þ

(for more details, see Leeet al.2005).

Haplotype reconstruction: Since LD-based IBD probabilities are derived from haplotype similarity between unrelated base animals, ordered genotypes for base animals are required to reconstruct haplotypes. The ordered genotypes can be sam-pled on the basis of the distribution of compatible allele as-signments to founder genes that are consistent with the sampled segregation states (Sobeland Lange1996). When this

pro-cedure is implemented for multiple marker loci, haplotypes for base animals are established. This procedure is performed in each sampling round.

QTL allelic IBD coefficients based on LD and linkage:Given the sampled segregation state and haplotypes, it is possible to estimate QTL allelic IBD coefficients on the basis of LD and linkage. Sampled haplotypes for unrelated founders in the pedigree are used to estimate LD-based IBD probabilities between them, using an approximated coalescence method

(Meuwissenand Goddard2001). This method assumes past

effective size and number of generations since base popula-tion known. Sampled segregapopula-tion states at multiple marker loci for descendents are used to estimate linkage-based IBD probabilities between haplotypes of relatives. Therefore, IBD

probabilities between all haplotypes can be estimated on the basis of joint information from LD and linkage. In this study, the IBD probabilities are estimated at the middle point of each marker bracket. There are four IBD probabilities between any pair of individualsiandjin the pedigree for a given puta-tive QTL position (Liuet al.2002). They are prðQP

i [QPjjGÞ,

prðQP

i[QMj jGÞ, prðQMi [QjPjGÞ, and prðQiM[QjMjGÞ, the

probability of paternal (P) or maternal (M) QTL allele of in-dividualibeing IBD to the paternal (P) or maternal (M) allele of individualj, given marker genotypes.

From the probabilities, the additive genotype coefficient between animalsiandjat the QTL is

Gij¼12½prðQ P

i [Q

P

j jGÞ1prðQ

P

i [Q

M

j jGÞ

1prðQMi [QjPjGÞ1prðQMi [QjMjGÞ

and the dominance relationship coefficient between animalsi andjat the QTL is

Dij¼½prðQPi [QPj jGÞprðQiM[QMj jGÞ

1prðQPi[QMj jGÞprðQMi [QPj jGÞ:

The brief summary of the sampling procedure to estimate G and D:

Do 1Ncycles

Sample segregation states using the combined MCMC sampler

Sample haplotypes for unrelated founders

IBD estimation based on sampled haplotypes and segrega-tion states

ConstructGbased on IBD coefficients ConstructDbased on IBD coefficients End do

AverageGandDoverNcycles

We used 100 cycles for the sampling. In each cycle, the meiosis Gibbs sampler was applied to all meiosis and then there were thousands of random-walk samples following.

Reversible jump MCMC for simultaneous mapping of mul-tiple QTL:The number of QTL (n), the position of each QTL (ri, i¼1n), and the model parameters (Q¼ fs2

q1s

2

qn; s2

d1 s

2

dn;s 2

u;s2eg) are to be estimated for the model (1).

Note thatnranges from 0 to the number of marker brackets as only the middle point of each marker bracket is investigated. The probability of estimated parameters given observed phenotypes is

prðn;r;QjyÞ ¼Pprðyjn;r;QÞprðn;r;QÞ

prðyjn;r;QÞprðn;r;QÞ; ð5Þ

where prðyjn;r;QÞis the likelihood of the observed pheno-types given the estimated parameters, prðn;r;QÞis the joint prior probability of the estimated parameters, and the de-nominator is summed over the probabilities of all possible parameter states. If the computation of the denominator is infeasible, a MCMC approach can be an efficient tool for this problem.

When varying the number of QTL, the model dimension (the number of parameters in the model) is changed. A Metropolis–Hastings sampler cannot infer the correct distri-bution unless the model dimension is fixed. However, a re-versible jump MCMC (Green1995) allows the Markov chain

(4)

For the move of the Markov chain within a fixed model dimension (the number of QTL is unchanged), each QTL position in the current model is subsequently updated with a Metropolis mechanism as in Sillanpaaand Arjas(1998) and

Yiand Xu(2000). For theith QTL in the model,r i

* is proposed from a uniform distribution over unoccupied marker brackets. For a given QTL model (i.e., a proposal forr* andi n), REML estimates for the model parameters (Q*) are explicitly de-termined using an average information (AI) algorithm (see Lee and Van der Werf 2006). Note that with the REML

estimates, individual level parameters (1) are automatically determined. This procedure is different from conventional MCMC where the model parameters are proposed with the proposal distribution. Assuming that the priors ofr* have a noninformative flat distribution, the proposal is accepted with probability

ar;r*¼min 1;pr

ðyjn;r*;Q*Þ

prðyjn;r;QÞ

: ð6Þ

For the move of the Markov chain across different model dimensions (the number of QTL is changed), a new QTL is added (n11) or deleted (n1) with a proposal probability. A new QTL with position (rn11) is uniformly sampled from all unoccupied putative positions across the region. The proposal is accepted with probability

an;n11¼min 1;

prðyjn11;r*;Q*Þ

prðyjn;r;QÞ

prðn11Þprðnjn11Þ

prðnÞprðn11jnÞ J

: ð7Þ

A QTL is deleted from the model by randomly choosing among the QTL. The proposal is accepted with probability

an;n1¼min 1;

prðyjn1;r*;Q*Þ

prðyjn;r;QÞ

prðn1Þprðnjn1Þ

prðnÞprðn1jnÞ J

: ð8Þ

In (7) and (8), the first term in the right-hand side is the likelihood ratio of proposal parameters over current param-eters, and the second term consists of the prior ratio and the proposal ratio. The prior of the number of QTL [e.g., pr(n)] has a Poisson distribution with meanmn ¼1, assuming that the tested region has a single QTL, pr(n*jn) is a proposal probability of changing the number of QTL in the model from nton*, andJis the Jacobian of the transformation function probability from the current model to the other. Because adding or deleting a QTL in the method is the identity trans-formation, J ¼ 1 (Sillanpaa and Arjas 1998; Yi and Xu

2000; Janninkand Fernando2004). The acceptance ratio is

formally derived in theappendix and empirically checked

(seediscussion).

RESULTS

Single- and multiple-QTL analysis: Likelihood ratio

(LR¼2 ½logLQTLlogLno QTL) from a single-QTL

analysis and Bayesian posterior QTL intensity from a reversible jump MCMC (as a multiple-QTL analysis) at each putative QTL position averaged over replicates are illustrated in Figure 1. The values of the posterior

QTL intensities from the multiple-QTL analysis are clearly highest at the true QTL positions, compared to the neighboring regions. The three QTL are clearly distinguishable and mapped at the correct positions. However, the likelihood-ratio (LR) values from the single-QTL analysis are constantly high across the first and second QTL region. The LR profile across the region between the second and third QTL is also less distinctive, compared to the profile from the multiple-QTL analysis. This shows that the reversible jump MCMC can accurately map the three QTL in such a small region, whereas accurate fine mapping is not feasible by using a single-QTL method.

Multiple-QTL analysis with or without D: Figure 2 compares the posterior QTL intensity from the multiple-QTL analysis with using D (Figure 2A) and that without usingD(Figure 2B). The QTL intensity on the true QTL is higher withD(0.19, 0.16, and 0.14) than on those without D(0.13, 0.09, and 0.11 for the first, second, and third QTL). The curve is more clearly peaked at the correct QTL position when usingD, in-dicating that usingDhelps map QTL more accurately in the case of the dominance mode.

LDL vs. linkage information only: The LR and the posterior QTL intensity based on LDL information are compared with those based on linkage information only in Figure 3. The LR curve is much lower and flatter when using linkage information alone, compared to using LDL information (Figure 3A). The curve of the QTL intensity in the multiple-QTL analysis also shows that linkage information alone does not catch any QTL signal while LDL information gives substantial evidence for the true QTL positions (Figure 3B). It should be noted that the LR and QTL intensity are from analyses Figure1.—Likelihood ratio (LR¼2 ½logLQTLlogLno QTL)

(5)

without usingDfor a fair comparison because there is little or no information about dominance relationships for linkage information alone. IfDis used, the advan-tage of LDL information increases over linkage in-formation only (see Figure 2A).

Effect of marker density: The profiles of the QTL intensity from multiple-QTL analysis with a marker spacing of 0.25 and 1 cM are compared for a region of 10 cM in Figure 4. Note that the intensity value is estimated at the middle of each marker bracket. When using a marker spacing of 1 cM, the QTL intensity values

are constantly high across the region between the first and the second QTL. This makes it impossible to simul-taneously map all the QTL although the region can be roughly detected. This is different from when using a marker spacing of 0.25 cM where the intensity value is highly peaked at each true QTL position. With a marker spacing of 1 cM, although the QTL intensity increases at the third QTL, it is not clearly peaked compared to a marker spacing of 0.25 cM.

Past effective size in relation to levels of LD:Levels of LD vary with different values of past effective sizes. Figure 5 shows the QTL intensities averaged over repli-cates when effective sizes of 10, 100, 400, and 2000 were simulated for 10, 100, 400, and 2000 generations, re-spectively (the number of generations was equal to the effective size so that a similar population equilibrium was achieved). The intensity values are lower and flatter with the extreme values of effective size (e.g.,Ne¼10 or

2000), compared to those with the intermediate values of effective size (e.g.,Ne ¼100 or 400). The expected

levels of LD of chromosome segments given the past effective size are E(LD) ¼ 1=ð4Nec11Þ (Sved 1971)

where Ne is the past effective size and c is the

recom-bination rate of the chromosome segment. The levels of LD can be defined here as the probability of the chromosome segment being IBD when two random haplotypes are taken from the population (Hayeset al. 2003). For the marker interval used in the simulated data, the expected levels of LD are 0.9, 0.5, 0.2, and 0.05 forNe¼10, 100, 400, and 2000, respectively. This shows

thatNe¼10 or 2000 results in extreme values of LD (0.9

or 0.05) and the mapping accuracy decreases.

Estimation of the number of QTL and variance com-ponents: Figure 6 shows the histogram of estimated QTL number from 40 replicates. Each estimated value is the average of sampled values of all RJMCMC rounds in each replicate. The distribution coincides with the true value (n ¼ 3) with a small standard error. Estimated values of variance components are also shown to be accurate (Figure 7). The distribution of estimated poly-genic and residual variances from 40 replicates shows the highest density at the true value (s2

u¼20 ands

2

e ¼

50) (Figure 7, A and B). Although the distribution of estimated additive and dominance QTL variance is

Figure 2.—Posterior QTL intensity from multiple-QTL

analysis with fitting dominance (A) and without fitting dom-inance (B) averaged over 40 replicates. The vertical bars indi-cate empirical standard error. Triangles show the true QTL positions.

Figure3.—Likelihood ratio (LR) and posterior QTL

inten-sity from single- (A) and multiple- (B) QTL analyses based on combined LD and linkage (shaded line) and linkage only (solid thin line). Dominance is not considered for a fair com-parison. The vertical bars indicate empirical standard error. Triangles show the true QTL positions.

Figure 4.—Posterior QTL intensity from multiple-QTL

(6)

upwardly skewed, the mode of the distribution coin-cides with the true value (s2

q11s

2

q21s

2

q3¼24, ands

2

d11

s2

d21 s

2

d3 ¼12) (Figure 7, C and D).

DISCUSSION

In this study, we investigated the performance of si-multaneous fine mapping of closely linked QTL that each had different and relatively small effects (the heritabilities of the first, second, and third QTL were h2

q1¼0.094,h

2

q2 ¼0.076, andh

2

q3¼0.057, and the ratios

of dominance variance over phenotypic variance were h2

d1 ¼0.047,h

2

d2 ¼0.038, andh

2

d3 ¼0.028). A

multiple-QTL analysis using a reversible jump MCMC could simul-taneously position every QTL within a reasonably fine region with 200 genotyped and phenotyped individuals. However, a single-QTL analysis could not remove shade effects between closely linked QTL. The multiple-QTL analysis with dominance relationship matrices im-proved the mapping accuracy and resolution.

The use of linkage information alone gave a poorer mapping resolution compared to using LDL informa-tion. This was because such a small region (10 cM) is not likely to have sufficient recombination from the pedi-gree of two generations, explaining lower and flatter LR and QTL intensity values from the analyses based on only linkage information. This result agreed with that of Leeand Van derWerf(2005).

Figure 5.—Posterior QTL intensity from multiple-QTL

analysis averaged over 20 replicates with effective sizes of 10 (A), 100 (B), 400 (C), and 2000 (D).

Figure 6.—Histogram of estimated QTL number for 40

replicates. The true value is 3.

Figure7.—Histogram of estimated variance components

(7)

When a marker spacing is not dense enough, IBD information from variation in small segments will not contribute to the analyses. This is probably why the second QTL was not clearly distinguishable with a marker spacing of 1 cM while the signal at all the QTL was clearly shown with a marker spacing of 0.25 cM. Optimal marker spacing might depend on the LD in the population. With very small past effective size (Ne¼10)

resulting inE(LD)¼0.9 between flanking markers, the mapping resolution was decreased. This was expected because 400 haplotypes used for mapping QTL were descended from only a few founder haplotypes. This made haplotype homozygosity very high, which is not useful for mapping. With very large past effective size (e.g.,Ne¼2000), resulting inE(LD)¼0.05, the

map-ping resolution was also decreased. This was because there were too few common haplotypes with a marker interval of 0.25 cM. If marker density increases, the number of common haplotypes can be increased (i.e., high level of LD), which will give more useful informa-tion for mapping of QTL.

In our method, model parameters (Q) were not sam-pled from their proposal distributions but the most likely parameter values were determined by REML for a given model defined by the number (n) and location (r) of QTL. Using estimates rather than sampled values for some parameters is known as an empirical Bayesian approach (see,e.g., Casella2001). This differs from the full Bayesian approach where in an MCMC algorithm all model parameters are sampled conditional on data and other parameters. Hence, the posterior distribu-tions for the model parameters might differ somewhat from those of a full Bayesian approach.

The empirical Bayesian approach likely has an effi-ciency advantage because for sampling values fornand

rno time is wasted through staying with less likely values ofQand estimates converge more quickly compared to the full Bayesian approach. It is unlikely that much information is lost in this empirical Bayesian approach because parameters in Q have smooth distributions and it is not likely that less likely values have critical information. Casella (2001) discussed the empirical Bayesian procedure where in an iterative procedure maximum-likelihood estimates (MLEs) were obtained for hyperparameters in a hierarchical model and other parameters were sampled conditional on these MLEs. He showed that this procedure can be statistically jus-tified by showing that it implies an EM algorithm. In our approach, REML estimates for Q and the likeli-hood of the data are explicitly estimated for each model and used in the RJMCMC to get the posterior QTL density across model dimensions. The justification for our procedure is given in theappendix, showing that the acceptance ratios for n and r do not depend on conditional distributions forQ, and hence acceptance ratios used are not different from that in a full Bayesian approach.

We checked the validity of this approach empirically through an analysis without phenotypic data (500,000 rounds), following the suggestion by Jannink and Fernando(2004) and Sillanpaa et al.(2004). In this case, prðyjn;r;QÞ ¼ 1 for all models; therefore, the posterior distribution of the QTL number and their positions should be the same as the prior distribution. Note that the prior for the QTL position was a uniform distribution (see Equation 6) and that for the QTL number was a Poisson distribution with mean mn ¼1 (see Equation 7). After 50,000 RJMCMC rounds, the posterior of the QTL position was a uniform distribu-tion, and the posterior of the QTL number was a Poisson distribution withmn¼1:006. This implies that the acceptance ratio used in the RJMCMC was correct.

Although, for estimating the number of QTL (n), we used a prior of Poisson distribution with meanmn ¼1, the posterior QTL intensities were substantially higher at all three QTL, and the estimated QTL number was unbiased and accurate (Figure 6). This agreed with pre-vious studies (Yi et al. 2003; Jannink and Fernando 2004; Sillanpaa et al. 2004) that estimates of QTL number were robust against prior values.

The distribution of the estimated variance compo-nents from 40 replicates was upwardly skewed and wide especially for additive QTL variance although the mode of the distribution coincided with the true values. This is probably due to the fact that 200 records used in the analyses are not sufficient to (very) accurately estimate variance components. However, the estimated number of QTL and their positions were estimated very accu-rately with 200 records.

In analyses with linkage information alone, with lower marker density (1-cM intervals), and with alternative past effective sizes, we used true haplotypes instead of sampled haplotypes in the MCMC, as the latter is very time consuming. The results with true and sampled haplotypes were very similar. This agreed with Morris

et al.(2004) and Leeand Van derWerf(2005) when using complete genotypes.

Interaction between QTL (epistasis) may be impor-tant in multiple-QTL mapping but has been ignored in this study. Epistasis can be considered by working out the appropriate Hadamard products of haplotype (gamete) relationship matrices whose levels are two times the number of individuals. Further investigation on fine mapping of epistasis QTL would be desirable.

We are grateful for the comments from an anonymous reviewer. This study was supported by Australian Wool Innovation and Sygen.

LITERATURE CITED

Akaike, H., 1969 Fitting autoregressive models for prediction. Ann.

Inst. Stat. Math.21:243–247.

Andersson, L., C. S. Haley, H. Ellegren, S. A. Knott, M. Johansson

(8)

Basten, C. J., B. S. Weirand Z-B. Zeng, 2000 QTL Cartographer,

Version 1.14. North Carolina State University, Raleigh, NC.

Broman, K. W., and T. P. Speed, 2002 A model selection approach

for identification of quantitative trait loci in experimental crosses. J. R. Stat. Soc. B64:641–656.

Calborg, O., L. Anderssonand B. P. Kinghorn, 2000 The use of a

genetic algorithm for simultaneous mapping of multiple inter-acting quantitative trait loci. Genetics155:2003–2010.

Casella, G., 2001 Empirical Bayes Gibbs sampling. Biostatistics2:

485–500.

Dallas, J. F., 1992 Estimation of microsatellite mutation rates in

recombinant inbred strains of mouse. Mamm. Genome3:452– 456.

Ellegren, H., 1995 Mutation rates at porcine microsatellite loci.

Mamm. Genome6:376–377.

Farnir, F., B. Grisart, W. Coppieters, J. Riquet, P. Berziet al.,

2002 Simultaneous mining of linkage and linkage disequilib-rium to fine map quantitative trait loci in outbred half-sib pedi-grees: revisiting the location of a quantitative trait locus with major effect on milk production on bovine chromosome 14. Genetics161:275–287.

George, A. W., P. M. Visscherand C. S. Haley, 2000 Mapping

quantitative trait loci in complex pedigrees: a two-step variance component approach. Genetics156:2081–2092.

Georges, M., D. Nielsen, M. Mackinnon, A. Mishra, R. Okimoto

et al., 1995 Mapping quantitative trait loci controlling milk pro-duction in dairy cattle by exploiting progeny testing. Genetics 139:907–920.

Green, P., 1995 Reversible jump Markov chain Monte Carlo

compu-tation and Bayesian model determination. Biometrika82:711– 732.

Grisart, B., W. Coppieters, F. Farnir, L. Karim, L. Fordet al.,

2002 Positional candidate cloning of a QTL in dairy cattle: identification of a missense mutation in the bovine dgat1 gene with major effect on milk yield and composition. Genome Res. 12:222–231.

Haley, C. S., and S. A. Knott, 1992 A simple regression method for

mapping quantitative trait loci in line crosses using flanking markers. Heredity69:315–324.

Hastings, W. K., 1970 Monte Carlo sampling methods using

Markov chains and their applications. Biometrika57:97–109. Hayes, B. J., P. M. Visscher, H. C. McPartlanand M. E. Goddard,

2003 Novel multilocus measure of linkage disequilibrium to estimate past effective population size. Genome Res. 13:635– 643.

Heath, S. C., 1997 Markov chain Monte Carlo segregation and

linkage analysis for oligogenic models. Am. J. Hum. Genet.61: 748–760.

Jannink, J.-L., and R. L. Fernando, 2004 On the Metropolis–

Hastings acceptance probability to add or drop a quantitative trait locus in Markov chain Monte Carlo-based Bayesian analyses. Genetics166:641–643.

Jansen, R. C., 1993 Interval mapping of multiple quantitative trait

loci. Genetics135:205–211.

Kao, C.-H., Z-B. Zengand R. D. Teasdale, 1999 Multiple interval

mapping for quantitative trait loci. Genetics152:1203–1216.

Lander, E. S., and D. Botstein, 1989 Mapping Mendelian factors

underlying quantitative traits using RFLP linkage maps. Genetics 121:185–199.

Lee, J. K., and D. C. Thomas, 2000 Performance of Markov chain

Monte Carlo approaches for mapping genes in oligogenetic models with an unknown number of loci. Am. J. Hum. Genet. 67:1232–1250.

Lee, S. H., and J. H. J. Van derWerf, 2005 The role of pedigree

information in combined linkage disequilibrium and linkage mapping of quantitative trait loci in a general complex pedigree. Genetics169:455–466.

Lee, S. H., and J. H. J. Van derWerf, 2006 An efficient variance

component approach implementing an average information REML suitable for combined ld and linkage mapping with a gen-eral complex pedigree. Genet. Sel. Evol.38:25–43.

Lee, S. H., J. H. Van derWerfand B. Tier, 2005 Combining the

meiosis Gibbs sampler with the random walk approach for link-age and association studies with a general complex pedigree and multimarker loci. Genetics171:2063–2072.

Liu, Y., G. B. Jansenand C. Y. Lin, 2002 The covariance between

relatives conditional on genetic markers. Genet. Sel. Evol.34: 657–678.

Lund, M. S., P. Sorensen, B. Guldbrandtsenand D. A. Sorensen,

2003 Multitrait fine mapping of quantitative trait loci using combined linkage disequilibria and linkage analysis. Genetics 163:405–410.

MacCluer, J. W., J. L. VanderBerg, B. Raed and O. A. Ryder,

1986 Pedigree analysis by computer simulation. Zoo Biol.5: 147–160.

Martinez, O., and R. N. Curnow, 1992 Estimating the locations

and sizes of effects of quantitative trait loci using flanking markers. Theor. Appl. Genet.85:480–488.

Metropolis, N., A. Rosenbluth, M. Rosenbluth, A. Tellerand

E. Teller, 1953 Equations of state calculation by fast

comput-ing machines. J. Chem. Phys.21:1087–1092.

Meuwissen, T. H. E., and M. E. Goddard, 2000 Fine mapping of

quantitative trait loci using linkage disequilibria with closely linked marker loci. Genetics155:421–430.

Meuwissen, T. H. E., and M. E. Goddard, 2001 Prediction of

iden-tity by descent probabilities from marker-haplotypes. Genet. Sel. Evol.33:605–634.

Meuwissen, T. H. E., and M. E. Goddard, 2004 Mapping

mul-tiple QTL using linkage disequilibrium and linkage analysis information and multitrait data. Genet. Sel. Evol. 36: 261– 279.

Meuwissen, T. H. E., A. Karlsen, S. Lien, I. Olsakerand M. E.

Goddard, 2002 Fine mapping of a quantitative trait locus for

twinning rate using combined linkage and linkage disequilib-rium mapping. Genetics161:373–379.

Morris, A. P., J. C. Whittakerand D. J. Balding, 2004 Little loss of

information due to unknown phase for fine-scale linkage disequi-librium mapping with single-nucleotide-polymorphism genotype data. Am. J. Hum. Genet.74:945–953.

Nakamichi, R., Y. Ukaiand H. Kishino, 2001 Detection of closely

linked multiple quantitative trait loci using genetic algorithm. Genetics158:463–475.

Perez-Enciso, M., 2003 Fine mapping of complex trait genes

combining pedigree and linkage disequilibrium information: a Bayesian unified framework. Genetics163:1497–1510.

Riquet, J., W. Coppieters, N. Cambisano, J. J. Arranz, P. Berzi

et al., 1999 Fine-mapping of quantitative trait loci by identity by descent in outbred populations: application to milk pro-duction in dairy cattle. Proc. Natl. Acad. Sci. USA96: 9252– 9257.

Satagopan, J. M., B. S. Yandell, M. A. Newtonand T. C. Osborn,

1996 A Bayesian approach to detect quantitative trait loci using Markov chain Monte Carlo. Genetics144:805–816.

Schwarz, G., 1978 Estimating the dimension of a model. Ann. Stat.

6:461–464.

Sillanpaa, M. J., and E. Arjas, 1998 Bayesian mapping of multiple

quantitative trait loci from incomplete inbred line cross data. Genetics148:1373–1388.

Sillanpaa, M. J., and E. Arjas, 1999 Bayesian mapping of multiple

quantitative trait loci from incomplete outbred offspring data. Genetics151:1605–1619.

Sillanpaa, M. J., and J. Corander, 2002 Model choice in gene

map-ping: what and why. Trends Genet.18:301–307.

Sillanpaa, M. J., D. Gasbarraand E. Arjas, 2004 Comment on

‘‘On the Metropolis–Hastings acceptance probability to add or drop a quantitative trait locus in Markov chain Monte Carlo-based Bayesian analyses.’’ Genetics167:1037.

Sobel, E., and K. Lange, 1996 Descent graphs in pedigree analysis:

applications to haplotyping, location scores, and marker-sharing statistics. Am. J. Hum. Genet.58:1323–1337.

Sorensen, D., and D. Gianola, 2002 Likelihood, Bayesian, and

MCMC methods in quantitative genetics. Springer, New York. Sved, J. A., 1971 Linkage disequilibrium and homozygosity of

chro-mosome segments in finite populations. Theor. Popul. Biol.2: 125–141.

Thompson, E. A., and S. C. Heath, 1999 Estimation of conditional

multilocus gene identity among relatives, pp. 95–113 inStatistics in Molecular Biology and Genetics(IMS lecture notes), edited by

F. Seller-Moiseiwitsch. Institute of Mathematical Statistics,

(9)

Weber, J. L., and C. Wong, 1993 Mutation of human short tandem

repeats. Hum. Mol. Genet.2:1123–1128.

Yi, N., and S. Xu, 2000 Bayesian mapping of quantitative trait loci

under the identity-by-descent-based variance component model. Genetics156:411–422.

Yi, N., S. Xuand D. B. Allison, 2003 Bayesian model choice and

search strategies for mapping interacting quantitative trait loci. Genetics165:867–883.

Zeng, Z-B., 1993 Theoretical basis for separation of multiple linked

gene effects in mapping quantitative trait loci. Proc. Natl. Acad. Sci. USA90:10972–10976.

Communicating editor: C. Haley

APPENDIX

Proving the validity of the empirical Bayesian procedure:In this method, we obtain a REML estimate ofQwith (RJ)MCMC rather than samplingQfrom proposal distributions, which would be the case in a full Bayesian approach. We show here that the proposal probabilities with REML estimates are not omitted but canceled out from the acceptance ratios used in the full Bayesian approach.

Metropolis–Hastings ratio for QTL position:In a full Bayesian procedure the acceptance ratio for QTL position given a fixed number of QTL is

ar;r* ¼min 1;

prðyjn;r*;Q*ÞprðnÞprðr*jnÞprðQ*jr*;nÞ

prðyjn;r;QÞprðnÞprðrjnÞprðQjr;nÞ

prðrjr*ÞprðQjr;nÞ

prðr*jrÞprðQ*jr*;nÞ

with all symbols defined in the text (seeReversible jump MCMC for simultaneous mapping of multiple QTL). The first term is the posterior density consisting of the likelihood and the prior, and the second term is the proposal probability ofr. In our approach,Qis always determined by REML with prðQjr;nÞ. The equation can be rewritten as

ar;r*¼min 1;

prðyjn;r*;Q*Þprðr*jnÞ

prðyjn;r;QÞprðrjnÞ

prðrjr*Þ

prðr*jrÞ

:

The prior and the proposal probability of QTL positions are a uniform distribution; therefore, they can cancel out. The acceptance ratio is therefore simplified to Equation 6.

RJMCMC probability for QTL birth:The original acceptance ratio for a QTL birth is

an;n11¼min 1;

prðyjn11;r*;Q*Þprðn11Þprðr*jn11ÞprðQ*jr*;n11Þ

prðyjn;r;QÞprðnÞprðrjnÞprðQjr;nÞ

prðnjn11Þ

prðn11jnÞ

ðn11Þ1prðQjr;nÞ prðrn11ÞprðQ*jr*;n11Þj

:

The first term is the posterior density consisting of the likelihood and the prior, and the second term is the proposal probability of adding or deleting a QTL from the model. When deleting a QTL from the model, one QTL is randomly selected with a probability of (n11)1, and the parameters involved in the selected QTL are removed from the model. REML estimates of the model parameters given the reduced number of QTL and their positions are determined with prðQjr; nÞ. When adding a QTL to the model,rn11is used with its prior probability ( Janninkand Fernando2004), and REML estimates of the model parameters given the increased number of QTL and positions are determined with prðQ*jr*;n11Þ. The equation can be rewritten as

an;n11¼min 1;

prðyjn11;r*;Q*Þprðn11Þprðr*jn11Þ

prðyjn;r;QÞprðnÞprðrjnÞ

prðnjn11Þ

prðn11jnÞ

ðn11Þ1 prðrn11Þ

j

:

Following Janninkand Fernando(2004),

prðr*jn11Þ

prðrjnÞprðrn11Þ¼

ðn11Þ!Qni¼111prðriÞ n!ðQn

i¼1prðriÞÞ prðrn11Þ

¼n11:

This simplifies the equation foran;n11to Equation 7. The acceptance ratio for QTL death (Equation 8) can be derived

References

Related documents