A Mixed-Model Approach to Mapping Quantitative Trait Loci in Barley on the Basis of Multiple Environment Data

(1)

Copyright2000 by the Genetics Society of America

A Mixed-Model Approach to Mapping Quantitative Trait Loci

in Barley on the Basis of Multiple Environment Data

Hans-Peter Piepho

Institut fu¨r Nutzpflanzenkunde, FB 11, Universita¨t Kassel, 37213 Witzenhausen, Germany Manuscript received March 22, 2000

Accepted for publication July 25, 2000

ABSTRACT

In this article, I propose a mixed-model method to detect QTL with significant mean effect across environments and to characterize the stability of effects across multiple environments. I demonstrate the method using the barley dataset by the North American Barley Genome Mapping Project. The analysis raises the need for mixed modeling in two different ways. First, it is reasonable to regard environments as a random sample from a population of target environments. Thus, environmental main effects and QTL-by-environment interaction effects are regarded as random. Second, I expect a genetic correlation among pairs of environments caused by undetected QTL. I show how random QTL-by-environment effects as well as genetic correlations are straightforwardly handled in a mixed-model framework. The main advantage of this method is the ability to assess the stability of QTL effects. Moreover, the method allows valid statistical inferences regarding average QTL effects.

T

HERE are several different strategies to map quan- ian methods using cofactors (Sillanpa¨a¨ and Arjas 1998).

titative trait loci (QTL; Kearsey and Farquhar

1998),e.g., single-marker locus analysis (Liu1998); sim- In this article, I analyze the barley dataset by the North American Barley Genome Mapping Project (Hanand ple interval mapping (IM;LanderandBotstein1989);

composite interval mapping (CIM;Zeng 1993, 1994), Ullrich 1993). This dataset comprises 16 trials per-formed in different environments. The aim of the analy-also called multiple QTL mapping (MQM;Jansenand

Stam 1994); simplified CIM (Tinker and Mather sis is to detect QTL with significant mean effect across environments and to characterize the stability of effects 1995); marker regression (KearseyandHyne1994;Wu

andLi1994); Bayesian methods (Sillanpa¨a¨andArjas across multiple environments. A need for mixed model-ing arises in two ways. First, for an assessment of mean 1998); and multiple interval mapping (MIM;Kaoet al.

1999;Zenget al.1999). The latter methods have been effects and of stability, it is necessary to regard environ-ments as a random factor. Second, I expect a genetic shown to yield better power of QTL detection than IM

and single-marker locus analysis (Liu1998;Lynchand correlation among pairs of environments caused by un-detected QTL (Korol et al. 1998). Random QTL-by-Walsh1998). In this article the main focus is on CIM

for its simplicity and its importance in practice. environment effects as well as genetic correlations are straightforwardly handled in a mixed-model framework. Within a frequentist framework, two different

model-fitting approaches can be distinguished among pro- The main advantage of my method is the ability to assess the stability of QTL effects. Moreover, the method allows cedures for QTL mapping by CIM: those based on

maximum-likelihood (ML) estimation (Lander and valid statistical inferences regarding average QTL ef-fects.

Botstein 1989;Zeng1994) and those using multiple linear regression (HaleyandKnott1992; Martinez andCurnow1992, 1994). The latter approach usually

MATERIALS AND METHODS gives results very similar to the former, though it must

be seen as an approximate method. The main advantage _{The data:}_{I use the six-row Steptoe/Morex mapping} popula-of the multiple regression approach is its computational _{tion by the North American Barley Genome Mapping Project} simplicity (MartinezandCurnow1994). The compu- (Hanand Ullrich1993). The data comprise 150 doubled haploid (DH) lines, 223 markers, and 16 environments in the tational tradeoff may become quite considerable in

com-United States and Canada in 1991 and 1992. The same 150 plex situations,i.e., when the model is extended to cover

genotypes were tested in all environments. The data are par-several random effects, as in this article. I therefore

tially replicated with two replications. I use genotype-by-envi-prefer the regression approach to ML and also to Bayes- _{ronment means for different analyses.}

The model for a single environment: The model will be

adjusted to DH and backcross progeny data, but it is readily extended for F2and other populations. Assume that the F1

Address for correspondence:Institut fu¨ r Nutzpflanzenkunde,

Universi-cross isM1QM2/m1qm2with respect to the two flanking markers

ta¨t Kassel, Steinstrasse 19, 37213 Witzenhausen, Germany.

E-mail: [email protected] bordering the interval of interest (M1andM2) and the QTL

(2)

where ␤ ⫽ (␮, ␣, ␥⬘)⬘ with ␥ ⫽ (␥1, ␥2, . . . )⬘, xⴕi ⫽

TABLE 1

(1,zi,cⴕi), and ciⴕ⫽(c1i,c2i, . . . ). The parameter vector␤ is

Expectation of gconditional on flanking markers for a DH _{easily extended to include multiple QTL and effects for}

epista-population (MartinezandCurnow1992) _{sis (Moreno-Gonzalez}_{1992), but this is not elaborated here.}

Model for data from multiple environments: The basic

model (1) may be taken as a building block for more refined z⫽E(g|flanking markers)a _{Flanking markers}

modeling to account for the design. Here, I am particularly

1⫺a M1M2 _{interested in modeling data from multiple-environment trials}

1⫺b m1M2 (METs). Specifically, I contend that a realistic model for

geno-b⫽(1⫺r)(␪ ⫺r)/[(1⫺2r)␪] M1m2 type-by-environment effects is needed to make unbiased infer-a⫽r(␪ ⫺r)/[(1⫺2r)(1⫺ ␪)] m1m2 ences regarding QTL effects and positions. Moreover, the problem of genetic correlation among environments needs a_{r, recombination frequency between left flanking marker}

to be taken into account in case the same set of genotypes is and QTL;␪, recombination frequency between both flanking _{tested in the different environments (Korol}_{et al.}_{1998). Let}

markers. _y

ijbe the observation of theith genotype (i⫽1, . . . ,N) in thejth environment (j⫽1, . . . ,M). The proposed model is

(Q). Define a random variablegifrom theith genotype taking

yij⫽ ␮ ⫹ ␣zi⫹ ␥1c1i⫹ ␥2c2i⫹. . .⫹ei valuegi⫽1 if the F1gamete carries theQallele andgi⫽0 if

the F1gamete carries theqallele. In the regression approach _⫹_u

j⫹ajzi⫹g1jc1i⫹g2jc2i⫹. . .⫹dij. (3) to QTL mapping (Haley andKnott 1992; Martinez and

Curnow1992), the phenotype is regressed on the expected _{This model is derived by regarding all regression parameters} value ofgi, given the flanking markers. Table 1 gives explicit _{in (1) as means across environments and allowing an} environ-expressions forzi⫽E(gi|flanking markers) assuming no inter- _{ment-specific deviation. Thus, the environmental main effect} ference (MartinezandCurnow1992). Missing marker data _u

jcorresponds to␮,ajis a deviation of the QTL effect from can be handled using the method ofMartinezandCurnow _{the average}_␣_{in the}_{jth environment,}_g

kj(k⫽1, 2, . . . ) is a (1994). In the case of backcross progeny, the genetic effect _{deviation from the average cofactor regression coefficient}_␥

k forziconstitutes a mixture of additive and dominance effects, _(k⫽_{1, 2, . . . ) in the}_{jth environment, and}_d

ijis a random whereas for doubled haploids, the genetic effects are confined _{deviation for the}_{ith genotype in the}_{jth environment from} to additive effects. For CIM using DH data, the following basic

the “average” residual effectei.Note that as opposed to (1), regression model can be used (Lynch andWalsh 1998, p.

eiin (3) is a residual genetic effect for theith genotype, which 465),

is free of experimental error. The experimental error is now captured bydij, which models both error and residual geno-yi⫽ ␮ ⫹ ␣zi⫹ ␥1c1i⫹ ␥2c2i⫹. . .⫹ei, (1)

type-by-environment interaction. An alternative way of deriv-whereyiis a phenotypic observation for theith genotype (i⫽ ing (3) is to allow a separate model of the form (1) for each 1, . . . ,N);␮is an intercept term;␣is the effect of the putative environment, to then combine all models into a single model, QTL; c1i, c2i, . . . are cofactors corresponding to markers on and finally to partition model parameters into average effects the map, which control for other QTL; ␥1, ␥2 . . . are the _{across environments and environment-specific deviations.} associated regression coefficients; andeiis a residual account- _{All environment-specific deviations (u}_j_,_a_j_,_g_kj_{, and}_d_ij_{) and} ing for both environmental variation and unexplained genetic _{the residual genetic effect}_e_i_{are regarded as random normal} variation. The residual genetic variation modeled byeiis re- _{deviates. To fully state the model, I need to specify the} vari-garded as random. If the unexplained residual genetic varia- _{ances and covariances of all random terms. The} variance-tion captured by ei is made up of a sum of small genetic _{covariance structure should allow sufficient generality to} realis-contributions, the normality assumption foreimay be a suitable _{tically model real data. Before stating the full second moment} approximation. Note, however, thateiwill also contain a com- _{assumptions, model (3) is modified to allow more generality.} ponent due to the different genetic effects at the putative _{Model (3) contains a residual genetic effect}_e

iand a residual QTL, so a more realistic model is a mixture of normal distribu- _d

ij.This model corresponds to the usual factorial partition-tions, with the number of components depending on the _{ing of main effects and interaction for} genotype-by-environ-number of genotypes at the putative QTL. The normality _{ment data in plant breeding and quantitative genetics (Lynch} assumption foreiis a matter of convenience, allowing model _and_Walsh_{1998). To highlight the fact that}_d

ijcontains both fitting by ordinary least squares in the familiar regression _{error and genotype-by-environment interaction, I may write} framework, rather than by ML. Several authors have pointed _d

ij⫽fij⫹hij, wherefijis the interaction andhijis error. The out that using least squares in place of ML tends to cause _{customary assumption in mixed-model analyses, henceforth} only a marginal loss of information (HaleyandKnott1992; _{denoted as compound symmetry assumption, is that}

ei,fij, and MartinezandCurnow1992). Also, ML estimation has large

hij are identically distributed with zero mean and constant sample optimality properties only when the model is correctly

variances,␴2

e,␴2f, and␴2h, respectively. This assumption is quite specified. In most applications, the true underlying genetic

restrictive because it implies constancy of genetic correlation model will be complex and any fitted model can at best be

among pairs of environments as well as constancy across envi-regarded as an attempt to approximate the true model as

ronments of genetic variances within environments. For this closely as possible (BurnhamandAnderson1998). If there

reason, I replace the termei ⫹dijby a term eijand initially are a number of QTL not accounted for by the fixed part of

assume that the random vectorei⫽(e1i, . . . ,eiM)⬘has unstruc-the model and errors are normal, unstruc-the residual will be a mixture

tured variance-covariance matrix var(ei)⫽R, whereRis sym-of an unspecified number sym-of component normals.

Approxi-metric and positive definite. I then explore various structured mating this mixture by a normal distribution should be

ade-models for R, which have fewer parameters than in the un-quate in most circumstances. Model (1) may be expressed in

structured case. The compound symmetry assumption is vector notation as

(3)

IM is the M-dimensional identity matrix, and ␴2⫽ ␴2f ⫹ ␴2h. diagonal elements may be either homogeneous (R⫽IM␴2) or heterogeneous (R⫽D).

The modified scalar model reads

QTL-by-environment effects and environmental main effect:In my yij⫽ ␮ ⫹ ␣zi⫹ ␥1c1i⫹ ␥2c2i⫹. . . analysis inferences are to be drawn with respect to a target population of environments. I regard the testing

environ-⫹uj⫹ajzi⫹g1jc1i⫹g2jc2i⫹. . .⫹eij. (4)

ments as a random sample from the target. The purpose of a mixed-model analysis is to reveal mean QTL effects across In vector notation this can be written as

environments. Here, random QTL-by-environment interac-tion essentially plays the role of an error term. Moreover, the yij⫽␤ⴕxi⫹b⬘jxi⫹eij, (5)

stability of QTL effects across environments is an important where ␤is as defined in (1) and bj⫽(uj,aj,g⬘j)⬘withgj⫽ aspect. The larger the variance of QTL-by-environment effects (g1j, g2j, . . . )⬘. The main interest of our analysis is in the the lower the stability. Finally, I can make environment-specific average QTL effects in␤, while the random vectorbjbasically inferences, employing best linear unbiased predictions plays the role of an error term. To cater for generality, correla- _{(BLUPs) of QTL-by-environment effects.}

tion among elements inbjis allowed. Details are discussed in It is necessary to allow for correlation among the effects in the next paragraphs. A desirable feature of the model is this: bj. For example, a perfect correlation must be assumed for instead of dropping certain genetic effects completely, I can regression coefficients pertaining to adjacent markers, since choose whether to move them to ␤and bj, or toeij. I now both are linear in the additive genetic effect of the flanked discuss suitable variance-covariance structures foreijandbj. QTL (Whittakeret al.1996). Also, variances of two compo-Genotype-by-environment effects:I regard the genetic components in (bj) corresponding to a pair of markers will have to nents ineijas random. When testing the same set of genotypes be heterogeneous, considering the explicit expressions given in the various environments, as has been assumed so far, inWhittakeret al.(1996, p. 25, right, bottom). Moreover, genetic correlation among observations on the same genotype different QTL may be responding similarly to differential envi-made in different environments needs to be allowed for. Many ronments, giving rise to positive correlation among regression articles on the mapping of QTL based on multienvironment coefficients corresponding to markers adjacent to different data corresponding to this design (Jansenet al.1995;Beavis QTL. For these reasons I make the general assumption that andKeim1996;Romagosaet al.1996;Sari-Gorlaet al.1997;

var(bj)⫽G, (6)

Korolet al.1998) have employed a multiple regression model

with independent errors. This model implicitly assumes ab- _where_G _{is symmetrical and positive definite, but otherwise} sence of genetic correlation, which is an unrealistic assump- _{unstructured. Note that this assumption also ensures scale} tion. It is to be expected that in the presence of positive genetic _invariance,_{i.e., invariance to the particular coding for markers} correlation, the information on QTL position provided by _{(0/1, 1/2,}_⫺_{1/1). Again, explicit modeling of the} variance-data from a sample of environments is smaller than when _{covariance structure is worthwhile to keep the number of} genetic correlation is absent. This is best explained by a simple _{parameters low, although many parsimonious structures suffer} example (not specifically designed for QTL analysis). Con- _{from lack of scale invariance. I can consider the same} struc-sider two observationsy1and y2. The mean (y1⫹ y2)/2 has _{tures as for}_{R. It should be stressed that}_G_{does not contain} variance (␴2

1⫹ ␴22⫹2␳␴1␴2)/4, where␴21and␴22are the vari- _{genetic effects, but merely QTL-by-environmental effects and} ances ofy1andy2, respectively, and␳is the correlation. Clearly, an environmental main effect.

the variance of the mean is an increasing function of the _{The effect of the putative QTL in the}_{jth environment is} correlation␳. Similarly, in QTL mapping, accuracy of parame- _{given by (}_{␣ ⫹}_a_j_{). The diagonal element in}_G_{corresponding} ter estimates is expected to decrease with genetic correlation. _to_a_j_{can therefore be interpreted as a measure of stability of} Ignoring genetic correlation can therefore lead to overopti- _{the effect of the putative QTL. The larger the variance of}_a_j_{, the} mistic inferences,i.e., to spurious detections and to inappro- _{more variable/less stable are the environment-specific QTL} priate standard errors for parameter estimates. For this reason, _{effects (}_{␣ ⫹}_a_j_{). The breeder will seek a large absolute value} I principally allow correlations among elements inei.A varie- for ␣ and a small variance for a_j, i.e., a high stability. This ty of models forR ⫽ var(ei) can be considered. The most interpretation of a variance component as a measure of QTL complex choice leavesRunstructured, and the simplest model effect stability is akin to approaches for assessing yield stability isR⫽IM␴2. While the unstructured model is often the most in MET data (Linet al.1986;BeckerandLe´on1988;Piepho realistic one, it may entail an unnecessarily large number of 1998b).

parameters, when the number of environments (M) is large. It is convenient to write the full model foryijin matrix form Overparameterization may be avoided by imposing a certain as

variance-covariance structure. I consider models commonly

y⫽(1M丢X)␤⫹(IM丢X)b⫹e, (7) used for the analysis of genotype-by-environment data,i.e.,

compound symmetry (R⫽JM␴2e⫹IM␴2), heteroscedastic (R⫽ _where _y ⫽_(y

11, . . . ,yN1, y12, . . . ,yNM)⬘, 1Mis a vector ofM DandR⫽JM␴2g⫹D, whereDis a diagonal matrix;Shukla _ones,_X⫽_{(x1, . . . ,}_x

N)⬘,␤⫽(␮,␣,␥⬘)⬘,IMis theM-dimensional 1972), and various factor-analytic models, which are character- _{identity matrix,} _b_⫽_(b_⬘_{1, . . . ,}_b_⬘

M)⬘, and e ⫽ (e11, . . . , eN1, ized by a component␭␭ⴕ, where␭⫽(␭1, . . . ,␭M)⬘is a vector _e

12, . . . ,eNM)⬘. Apart from the residual variance, this model is of factor loadings associated with the individual environments _{similar to the random effects model for longitudinal data} (Piepho1998a). The simplest factor-analytic variance-covari- _(Laird_and _Ware_{1982). The first two moments under this} ance structure is given byR⫽␭␭ⴕ⫹IM␴2, corresponding to _{model are}

the modeleij⫽ ␭jui⫹wij, whereuiis a standard normal score

for theith genotype and wij is normal with zero mean and E(y)⫽(1M丢X)␤ (8)

variance␴2_{. Environments with a large absolute value for the}

and factor loading␭jwill have residualseijmore widely spread out

than environments with small␭j. I note in passing that model _var(y)⫽_I

M丢XGXⴕ⫹R丢IN. (9) (3) is also applicable if a different set of genotypes is tested

(4)

it is tempting to compute genotype means across environ- TABLE 2

ments and subject these to standard CIM. Such an analysis

Fitting information for different models forR

assumes that means of different genotypes,y⫽M⫺1₍₁ⴕ M丢IN)y, are stochastically independent and have homoscedastic errors

Model forRa _{Log-likelihood} _pb _SBC

(Mis the number of environments). This assumption is prob-lematic with model (5), under which I have for the means

Unstructured ⫺199.94 136 ⫺729.20

E(y)⫽X␤ (10) D ⫺531.36 16 ⫺593.63

JM␴2g⫹D ⫺467.97 17 ⫺534.13

and _␭␭ⴕ_⫹

D ⫺428.73 32 ⫺553.26

JM␴2e⫹␭␭ⴕ⫹D ⫺344.35 33 ⫺472.77

var(y)⫽M⫺1_XGXⴕ⫹_I

N␴2R, (11)

JM␴2e⫹IM␴2 ⫺680.41 2 ⫺688.19

where ␴2

R⫽M⫺21⬘MR1M. The important point to note here is _I

M␴2 ⫺741.22 1 ⫺745.11

that all elements inyare correlated among one another due to the termM⫺1_XGX_ⴕ_{on the right-hand side of (11). Thus,}

Environment main effects and QTL-by-environment inter-the means y violate the assumptions underlying a standard _{action effects are regarded as fixed. Models fitted by REML.} QTL analysis. An appropriate means analysis could be based a_{D, a diagonal matrix;} _J

M, a square matrix of ones every-on generalized least squares using (11), but this is not recom- _where; _␭_{, column vector of} _M _{factor loadings;} _I_M_{, the} mended for two reasons. First, using (11) requires an estimate _{M-dimensional identity matrix.}

of G, which necessitates an analysis of replicate data, thus b_{p, number of parameters for the variance-covariance} ma-annihilating the computational advantages of the means analy- _trix_R.

sis. Second, and more importantly, the means themselves are unweighted and thus ignore the genetic correlation structure. Therefore the means analysis is not optimal.

proach is of anad hocnature, considering the fact that strictly

Model selection:In what follows I assume random

environ-speaking the means violate the independence assumption. As ments and consider CIM for mean QTL effects across

environ-more efficient software becomes available, applying MPS to the ments. Model selection is necessary regarding three aspects:

replicate data using a mixed-model framework is preferable. (1) the markers to be used as cofactors, (2) the

variance-Genotype-by-environment interaction (R):I initially assume the covariance structure forR, and (3) the model for

QTL-by-most general model for QTL-by-environment interaction on environment interaction,i.e., forG.These selection problems

the basis of the genetic model corresponding to the selected are briefly discussed. Unfortunately, the three problems

can-cofactors (Wolfinger 1993). At this stage, the model does not be tackled in an entirely independent manner. For

exam-not yet contain the covariatezifor the pair of markers flanking ple, in a joint analysis of the data, the chosen

variance-covari-the putative QTL. Interactions are regarded as fixed, although ance model will have an effect on the selected set of markers

at later steps of the analysis I take QTL-by-environment interac-and vice versa. From a theoretical point of view it seems

desir-tion as random. No specific model is assumed for the interac-able to handle these three model components simultaneously.

tions. The model I select for R is used subsequently when Due to the large number of candidate models (i.e.,

combina-modeling QTL-by-environment interaction and when scan-tions of choices for 1–3), however, this simultaneous approach

ning the genome for putative QTL. is not usually feasible in practice. Thus, some form of

sequen-QTL-by-environment interaction and environmental main effect tial approach is preferable. While this may entail the risk

(G):I now take QTL-by-environment interactions and the envi-of missing some good-fitting models, it has the important

ronmental main effect as random. Thus, I have the task of advantage of reducing the total number of models to be

con-selecting an appropriate variance-covariance structure forG. sidered. I suggest to first select the markers, then the

variance-The need for invariance under recoding of the markers dic-covariance structure forR, and finally the model for

QTL-by-tates the unstructured model forG, while the parsimony prin-environment interaction (G). At each step, I use the Schwarz

ciple suggests that simpler approximating structures may be Bayesian Criterion (SBC) to choose among options

(Wol-worth considering. I propose to generally fit an unstructured finger 1993; McQuarrieand Tsai1998). The criterion is

model, except when the dimension ofGis large and an un-given by SBC⫽logL⫺1_⁄

2plog(n), whereLis the maximized _{structured model is difficult to fit.} likelihood,pis the number of parameters, andnis the number

Scanning the genome:The same model as selected for both

of observations. Models with large values for SBC are

pre-R and G in the previous steps is used when scanning the ferred.

genome for QTL. Of course,RandGare reestimated at each Cofactor selection:I select cofactors by multiple linear

regres-putative QTL position. Note thatGis extended at the scanning sion of meansyon marker types using the marker pair

selec-stage by the covariatezifor the two flanking markers. Assuming tion (MPS) approach by H.-P. Piepho and H. G. Gauch

the same structure as selected for the case wherexicontains (unpublished results). This procedure has three distinctive

only the cofactors may not be optimal. It would not be practica-features: (i) markers are selected in adjacent pairs to increase

ble, however, to select a different model at each step during the chance of selecting flanking markers while reducing the

the genome scan. Also, the type of model finally chosen for risk of selecting markers not linked to QTL; (ii) an exhaustive

RandGnot only depends on model fit but also on ease of search per chromosome is used in place of simple forward

estimation. Some structures forRand Gmay be well fitting selection, which reduces the risk of missing the best fitting

but difficult to estimate (convergence problems, difficulty in model; (iii) a model selection criterion such as SBC is

em-choosing good starting values, etc.), thus making them infeasi-ployed to select the final model among a sequence of models.

ble for automated QTL scans, where the same model has to Among a selected pair, I use the marker that fits best as a

be estimated a large number of times. cofactor for CIM. The procedure was developed for models

with a single error term. Extension to the mixed model (4) for the replicate data is straightforward in principle but not

generally feasible at present, mainly because of the prohibitive EXAMPLE workload of having to fit a multitude of complex models by

I used the barley data to exemplify the proposed ML or restricted maximum likelihood (REML). This is the

(5)

Be-TABLE 4 TABLE 3

Fitting information for different models forG Parameter estimates for fixed effects and standard errors

(SE) based on analysis of means and a mixed-model

analysis withR⫽JM␴2e⫹␭␭ⴕ⫹Dand unstructured

Model forGa _{Log-likelihood} _pb _SBC

G(includes a cofactor for M82, but

—c ⫺_782.60 ₃₃ ⫺_911.023

no effect for putative QTL)

Unstructured ⫺353.28 36 ⫺493.378

D ⫺355.46 35 ⫺491.666

Analysis of means Mixed model Environmental

main effect only ⫺386.24 34 ⫺518.555 _Parameter _Estimate _SE _Estimate _SE

Mixed-model analysis withR⫽JM␴2e⫹␭␭ⴕ⫹D. Model fit- Intercept 5.56 0.040 5.55 0.368

ted by REML. _M82 _⫺_0.50 _0.055 _⫺_0.49 _0.112

a_{D, a diagonal matrix.}

b_{p, number of parameters for the whole variance-covariance} _{Model fitted by REML.} structure,i.e., forRandG. The model forRhas 33 parameters

in all cases.

c_{This model has no random effects corresponding to}_G.

covariatezifor the putative QTL. This again raises the question of how to modelG. A priori, the unstructured model seems most appropriate, especially since there is tween these two, M82 had a better fit than M81 if fitted _{only one cofactor so that parsimony is not a pressing} alone. Subsequently, the cofactor M82 and the interac- _{issue. Thus, I used the unstructured model. The window} tions with environments were included in the fixed part _{size was 10 cM;}_i.e._{, for putative QTL within 10 cM of a} of the model and various structures were fitted toRby _{cofactor, the cofactor was dropped from the model. The} the REML method. At this stage,xidid not yet contain _{step size of the chromosome scan was 1 cM. During the} a covariate zi for the putative QTL. All mixed-model _{chromosome scan, parameter estimates for} _G _and _R analyses were done using ASREML (Gilmour et al. _{from the present putative QTL position were used as} 1999). The same genotypes have been tested in all envi- _{starting values at the next position. This resulted in} ronments, so genetic correlation needs to be modeled _{convergence of the REML algorithm within a few} itera-inR. Since there areM⫽16 environments and hence _{tions (typically 5–10). [One referee noted that this}

Ris a M ⫻ M matrix, there are M(M ⫹ 1)/2 ⫽ 136 _{choice of starting values may cause convergence to a} parameters for the unstructured model. The results for _{local, but not the global maximum of the (restricted)} different models are shown in Table 2. On the basis of _{log-likelihood. In my experience the likelihood of this} SBC the factor-analytic modelR⫽ JM␴2e ⫹ ␭␭ⴕ⫹Dwas _{problem is small with a short step size. In the present}

selected. _{example the problem was not observed. The same}

ref-The Wald F-statistic for cofactor(M82)-by-environ- _{eree indicated that using fixed starting values gives more} ment interaction inbwas 10.84, which is significant at _{stable results.] At each position, I computed a Wald}

(6)

Figure 1.—Fprofile for composite interval mapping based on mixed model for genotype-by-environment data (left, critical threshold atF⫽ 12.83) and on simple regression model for geno-type means across environments (right, critical threshold atF⫽13.01). Chromosomes 1–7.

mixed-model analysis. Since the estimated QTL position 0.10, where⌽is the cumulative density function of the is within 10 cM of the cofactor M82, this cofactor was standard normal distribution. Table 5 gives BLUPs ofaj. dropped for the analysis at this position. The estimate For 1 out of the 16 environments (SKg92), the resulting forGwas estimate of␣ ⫹ajhas a positive sign. This finding is in good agreement with the estimated probability of 10%. Thus, despite the relatively large average QTL effect,

Gˆ ⫽

冢

2.14 ⫺0.296

⫺0.296 0.137

冣

. some surprises are possible in specific environments.

Thus, the variance of environmental main effects was 2.14, which is notably larger than the variance of the

DISCUSSION QTL effects across environments (0.137). The variance

A common feature of QTL analyses is that QTL effects of QTL effects corresponds to a standard deviation of

depend on environment. Many researchers have dealt 0.370, which is fairly large relative to the average QTL

with this problem by analyzing each environment sepa-effect␣. This shows that the detected QTL is not very

rately. This approach is quite useful, if one is interested stable across environments. For example, assuming

nor-in the particular test environments. As ponor-inted out by mality, the probability that the QTL effect in a randomly

TinkerandMather(1995), “separate analysis by envi-chosen environment (given by␣ ⫹aj) has positive sign

(7)

QTL-TABLE 5 ences, the main one being thatWanget al.(1999) do not allow for genetic correlation. They exploit map

in-BLUPs of QTL-by-environment effectsajat position 58.9 cM

formation to model the equivalent of my G. This

ap-on chromosome 3 in mixed-model analysis with

proach requires a specific genetic model, including

epis-R⫽JM␴2e⫹␭␭ⴕ⫹Dand unstructuredG

tasis and specification of multiple QTL. It assumes that

Environment BLUP(aj) SE all genetic effects are modeled byGand that all

correla-tion among effects is solely due to the map. It is to be

ID91 ⫺0.530 0.175

expected that the approach is susceptible to

misspecifi-ID92 ⫺0.177 0.141

cation of the model. By contrast the model used in this

MAN92 0.045 0.131

article allows unexplained genetic effects and genetic

MIN92 0.022 0.130

MTd91 ⫺0.201 0.124 correlation to be subsumed in the residualeij.As a result

MTd92 0.202 0.142 _{there is no need to assume a specific overall model.}

MTi91 ⫺0.128 0.131 _{Moreover, using a general model for}_G_{allows for}

covari-MTi92 ⫺0.163 0.131 _{ance among QTL that is due to correlated response to}

NY92 ⫺0.033 0.136

differential environmental condition. Such correlation

ONT92 0.193 0.129

can arise even for QTL on different chromosomes,

OR91 0.281 0.199

which would be regarded as independent under the

SKg92 0.583 0.133

SKo92 ⫺0.183 0.177 model used byWanget al.(1999). My model could be

SKs92 0.461 0.118 _{easily extended to exploit the map in modeling the}

WA91 ⫺0.753 0.156 _{correlation among cofactors and the putative QTL in}

WA92 0.380 0.126

G, but I prefer to work with a general model forGboth for ease of computation and to reduce the danger of Model fitted by REML.

model misspecification. Also, if the approach is ex-tended to multiple QTL and epistatic effects, it is still preferable to work with generalGfor the same reasons. by-environment interaction and avoids complications

due to environmental heterogeneity. However, the re- When the dimension ofGbecomes large, it is reasonable to consider more parsimonious models such as those sults of separate analyses are difficult to interpret, and

they do not take advantage of the built-in replication used forR.

In this article I have demonstrated how to use mixed provided by multiple environments.” Quite frequently,

test environments are just a sample from a target popula- models for assessing mean and stability of QTL effects based on genotype-by-environment means from MET tion, and the breeder is interested in making broad

inferences not restricted to the particular test environ- data. My mixed-model framework is easily extended to other settings,e.g., when spatial heterogeneity needs to ments (Melchinger et al. 1998). This objective calls

for a mixed-model analysis with random environments be modeled at the plot level (nearest neighbor analyses; Moreauet al.1999) and when the experimental designs (BeavisandKeim1996). The present article has shown

how to implement such an analysis for CIM. The ap- give rise to random effects (incomplete blocks, etc.; HaleyandKnott1992). Error variance heterogeneity proach is easily extended to cover multiple-QTL and

epistatic effects by appropriate modifications of␤and among different environments in MET can be ac-counted for by weighting (Culliset al.1996), though

bj(Moreno-Gonzalez1992;Wanget al.1999). This is

useful as an additional analysis step, when several QTL in my experience weighting has little effect on final parameter estimates and standard errors.

have been detected by CIM.

A simple alternative to a mixed-model analysis of MET My analysis has assumed random environments. In a model with fixed environments, the focus is on studying QTL data is to proceed in two steps as follows: first, the

quantitative trait is analyzed by ANOVA techniques to QTL-by-environment interactions. It has been stressed byKorolet al. (1998) that the number of interaction obtain (adjusted) genotype means across environments.

Second, the means together with the marker data are parameters in models such as (4) increases linearly with the number of environments. These authors used the submitted to a routine for QTL analysis. My theoretical

considerations and analysis of a real dataset led me to regression approach ofEberhartandRussell(1966) to model interactions with fewer parameters. The re-conclude that such analyses may lead to inappropriate

inferences, mainly because standard error estimates are gression approach by Eberhart and Russell(1966) was originally proposed for the analysis of genotype-by-inappropriate. A mixed-model framework allows more

valid inferences to be obtained by incorporation of dif- environment interaction, but it can be applied in the same way to model QTL-by-environment interaction, as ferent random components of variance that

appropri-ately account for the environmental and genetic struc- demonstrated byKorolet al.(1998). In fact, there are a large number of different models for genotype-envi-ture of the data.

(8)

differ-Lander, E. S.,andD. Botstein,1989 Mapping Mendelian factors

which are potentially useful for modeling

QTL-by-envi-underlying quantitative traits using RFLP linkage maps. Genetics

ronment interaction. This potential seems to have gone _121:_185–199.

largely unnoticed in QTL work. For example, extensions Lin, C. S., M. R. BinnsandL. P. Levkovitch,1986 Stability analysis: where do we stand? Crop Sci.26:894–900.

of the Eberhart-Russell regression such as the additive

Liu, B.-H.,1998 Statistical Genomics.CRC Press, Boca Raton, FL.

main effects multiplicative interaction model (Gauch _{Lynch, M.,}_and_{B. Walsh,}₁₉₉₈ _{Genetics and Analysis of Quantitative} 1988) typically explain a much larger fraction of the Traits.Sinauer, Sunderland, MA.

Martinez, O.,andR. N. Curnow,1992 Estimating the locations

total interaction and so promise improved performance

and the sizes of the effects of quantitative trait loci using flanking

(Romagosaet al.1996). These approaches can be incor- _{markers. Theor. Appl. Genet.}_85:_480–488.

porated into my mixed model by takingbas fixed and Martinez, O., andR. N. Curnow, 1994 Missing markers when estimating quantitative trait loci using regression mapping.

He-imposing some structure such as in Eberhart-Russell

redity73:198–206.

regression. _{McQuarrie, A. D. R.,}_and_{C.-L. Tsai,}₁₉₉₈ _{Regression and Time Series} Model Selection.World Science Publishing Company, Singapore. Thanks are due to Hugh Gauch Jr. and Susan McCouch for inspiring

Melchinger, A. E., H. F. UtzandC. C. Scho¨ n,1998 Quantitative discussions. H.F. Utz (University of Hohenheim, Germany) is thanked

trait locus (QTL) mapping using different testers and indepen-for helpful comments on an earlier draft. Support of the Heisenberg

dent population samples in maize reveals low power of QTL Programm of the Deutsche Forschungsgemeinschaft is gratefully ac- _{detecting and large bias in estimates of QTL effects. Genetics} knowledged. Part of the research for this article was conducted while _149:_383–403.

the author was visiting the Department of Biometrics and the Depart- _{Moreau, L., H. Monod, A. Charcosset} and A. Gallais, 1999 ment of Plant Breeding, College of Agriculture and Life Sciences, Marker-assisted selection with spatial analysis of unreplicated field

trials. Theor. Appl. Genet.98:234–242. Cornell University, Ithaca, NY.

Moreno-Gonzalez, J.,1992 Genetic models to estimate additive and non-additive effects of marker-associated QTL using multiple regression techniques. Theor. Appl. Genet.85:435–444.

Piepho, H. P.,1998a Empirical best linear unbiased prediction in

LITERATURE CITED _{cultivar trials using factor analytic variance-covariance structures.}

Theor. Appl. Genet.97:195–201.

Beavis, W. D.,andP. Keim,1996 Identification of quantitative trait

Piepho, H. P.,1998b Methods for comparing the yield stability of loci that are affected by environment, pp. 123–149 in

Genotype-cropping systems—A review. J. Agron. Crop Sci.180:193–213.

by-Environment Interaction, edited byM. S. KangandH. G. Gauch

Romagosa, I., S. E. Ullrich, F. HanandP. M. Hayes,1996 Use

Jr.CRC Press, Boca Raton, FL.

of additive main effects and multiplicative interaction model in

Becker, H. C.,andJ. Le´on,1988 Stability analysis in plant breeding.

QTL mapping for adaption in barley. Theor. Appl. Genet.93:

Plant Breed.101:1–23.

30–37.

Burnham, K. P.,andD. R. Anderson,1998 Model Selection and

Infer-Sari-Gorla, M., T. Calinski, Z. KaczmarekandP. Krajewski,1997

ence.Springer, New York.

Detecting QTL⫻environment interaction in maize by a least

Cullis, B. R., F. M. Thomson, J. A. Fisher, A. R. GilmourandR.

squares interval mapping method. Heredity78:146–157.

Thompson,1996 The analysis of the NSW wheat variety data

Shukla, G. K.,1972 Some statistical aspects of partitioning geno-base. II. Variance component estimation. Theor. Appl. Genet.

type-environmental components of variability. Heredity29:237–

92:28–39.

245.

Davies, R. B.,1977 Hypothesis testing when a nuisance parameter

Sillanpa¨a¨, M. J.,andE. Arjas,1998 Bayesian mapping of multiple is present only under the alternative. Biometrika64:247–254.

quantitative trait loci from incomplete inbred line cross data.

Davies, R. B.,1987 Hypothesis testing when a nuisance parameter

Genetics148:1373–1388. is present only under the alternative. Biometrika74:33–43.

Tinker, N. A.,andD. E. Mather,1995 Methods for QTL analysis

Eberhart, S. A.,andW. A. Russell,1966 Stability parameters for

with progeny replicated in multiple environments. J. Quant. Trait comparing varieties. Crop Sci.6:36–40.

Loci1:http://probe.nalusda.gov:8000/otherdocs/jqtl/.

Gauch, H. G.,1988 Model selection and validation for yield trials

van Eeuwijk, F. A., J. B. DenisandM. S. Kang,1996 Incorporating with interaction. Biometrics44:705–715.

Gilmour, A. R., B. R. Cullis, S. J. WelhamandR. Thompson,1999 additional information on genotypes and environments in

mod-ASREML. User manual, ftp://ftp.res.bbsrc.ac.uk/pub/aar/. els for two-way genotype by environment tables, pp. 15–50 in

Haley, C. S.,andS. A. Knott,1992 A simple regression method Genotype-by-Environment Interaction, edited by M. S. Kang and for mapping quantitative trait loci in line crosses using flanking H. G. Gauch Jr.CRC Press, Boca Raton, FL.

markers. Heredity69:315–324. Wang, D. L., J. Zhu, Z. K. LiandA. H. Paterson,1999 Mapping

Han, F.,andS. E. Ullrich,1993 The North American Barley Ge- QTLs with epistatic effects and QTL⫻environment interaction nome Mapping Project: Mapping of quantitative trait loci associ- by mixed linear model approaches. Theor. Appl. Genet.99:1255–

ated with malting quality. Barley Genet. Newsl.23:84–97. 1264.

Jansen, R. C.,andP. Stam, 1994 High resolution of quantitative Whittaker, J. C., R. ThompsonandP. M. Visscher,1996 On the traits into multiple loci via interval mapping. Genetics136:1447– mapping of QTL by regression of phenotype on marker-type.

1455. Heredity77:23–32.

Jansen, R. C., J. W. van Ooijen, P. Stam, C. ListerandC. Dean, Wolfinger, R. D.,1993 Covariance structure selection in general

1995 Genotype-by-environment interaction in genetic mapping mixed models. Commun. Stat. A22:1079–1106.

of multiple quantitative trait loci. Theor. Appl. Genet.91:33–37. Wu, W.-R.,andW.-M. Li,1994 A new approach for mapping

quanti-Kao, C. H., Z-B. ZengandR. D. Teasdale,1999 Multiple interval tative trait loci using complete genetic marker linkage maps. mapping for quantitative trait loci. Genetics152:1203–1216. _{Theor. Appl. Genet.}_89:_535–539.

Kearsey, M. J.,andA. G. L. Farquhar,1998 QTL analysis in plants; _{Zeng, Z-B.,}1993 Theoretical basis of separation of multiple linked

where are we now? Heredity80:137–142. gene effects on mapping quantitative trait loci. Proc. Natl. Acad.

Kearsey M. J.,andV. Hyne,1994 QTL analysis: a simple ‘marker _{Sci. USA}_90:_{10972–10976.}

regression’ approach. Theor. Appl. Genet.89:698–702. _{Zeng, Z-B.,}₁₉₉₄ _{Precision mapping of quantitative trait loci.}

Genet-Korol, A. B., Y. I. RoninandE. Nevo,1998 Approximate analysis _ics_136:_1457–1466.

of QTL-environment interaction with no limits on the number _{Zeng, Z-B., C. H. Kao}_and_{C. J. Basten,}₁₉₉₉ _{Estimating the genetic}

of environments. Genetics148:2015–2028. _{architecture of quantitative traits. Genet. Res.}_74:_279–289.

Laird, N. M., andJ. H. Ware, 1982 Random effects model for