• No results found

Single-marker regression (SMR), and simple (SIM), composite (CIM) and multiple interval mapping (MIM) can be solved using the same general linear model framework. SMR and SIM use essentially the same model to detect QTL: the former uses marker data, while the latter uses interpolated QTL genotype estimates. CIM and MIM also use the same model. CIM uses marker data, while MIM uses the calculated QTL estimates as the cofactors.

1995). MT-CIM exploits the correlation structure to improve the accuracy of QTL detection. In addition, it provides additional tests that cannot be performed for single traits, such as pleiotropy and QTL-by-environment interaction.

In the absence of multiple traits, MT-CIM reduces to single-trait CIM and its perfor- mance and accuracy are the same as that of CIM. Multiple-trait analyses, such as pleiotropy and QTL-by-environment tests, also can no longer be performed.

Although SIM and CIM are still widely used today, they have less power than, and have been largely superseded by, MIM (Kao et al. 1999). SIM and CIM may be useful for preliminary analyses because they can be computed rapidly. Although MIM computation is much slower than that of SMR, SIM, or CIM, modern computers can perform MIM computation in seconds to minutes.

Chapter 3

Shrinkage interval mapping

Abstract

The conventional solution to the QTL-mapping model-selection problem, in which only a few QTLs are selected from all QTLs, obtained from stepwise or forward variable-selection methods has been shown to perform poorly due to bias introduced by favoring QTLs that are associated with the largest statististics. A proposed remedy to this problem is penalized regression, in particular the penalized maximum-likelihood method (PMLE). However, this method tends to overpenalize, and thereby may fail to detect, QTLs with smaller effects. As an attempt to overcome this defect, I develop a two-stage hybrid method between MIM and partially penalized regression that can be considered as a generalization of PMLE. A multiple-trait extension is also developed. Simulated experiments showed that it may obtain a more precise QTL-location estimate, but may overestimate the QTL effects. This method has a marginal advantage over PMLE in detecting QTLs with smaller effects. Both PMLE and this method (shrinkIM) are shown to be superior to multiple-interval mapping (MIM).

3.1

Introduction

Consider the linear model Y=Xβ+, where Y is a n×1 trait vector, X an×m marker matrix, β a m×1 vector of regression coefficients, and a n×1 random error vector with

∼N(0,Inσ2). For an oversaturated model, wherem > n, ordinary least-squares estimates of β cannot be calculated as (X0X)−1X0Y because matrix X0X is singular.

Composite interval mapping (CIM) was proposed to overcome this situation by stepwise or forward variable-selection methods. However, it has been shown (Hoerl et al. 1986) that such methods perform poorly due to the bias introduced by favoring variables (or in this case QTL) that are associated with the largest statistics.

Another solution, ridge regression (Hoerl 1962), proposed to overcome this problem is the imposition of penalties on the regression coefficients. Let τ be a penalty parameter. The restricted least-squares estimate is (X0X+τIn)−1X0Y under the quadratic constraint

P

2

j < τ onβ, for j = 1, . . . , m. Ridge regression has been proposed (Boer et al. 2002) for QTL epistasis analysis, with varying penalties on regression coefficients.

Because the inversion of matrix X0X+τIn required by ridge regression becomes time- consuming as m grows, Xu (2003) developed a Bayesian shrinkage regression method for simultaneously estimating the genetic effect associated with the markers along the whole genome map. Each marker effect is allowed to have its own variance parameters so that the variance can be estimated from the data. This method was extended (Wang et al. 2005)

to allow localizing a QTL within an interval, using Metropolis–Hastings sampling since the QTL location parameter does not have an explicit posterior distribution.

To eliminate the need of intensive computation imposed by the Bayesian method, the penalized maximum-likelihood estimation (PMLE) method (Zhang and Xu 2005) was

developed. It imposes a prior normal N(µj, σ2j) penalty on each QTL effect j, allowing the penalty to vary across the βj. An EM-based algorithm is used to estimate regression coefficients and other parameters. PMLE is similar in spirit to the multiple-marker Bayesian shrinkage method in that both shrink small marker or QTL effects to zero. PMLE was shown (Zhang and Xu2005) to be comparable to the shrinkage method in terms of performance.

The initial PMLE method could localize a QTL only to a marker and not between markers. An extension was developed (Zhang 2006) to accommodate QTLs within intervals.

While both shrinkage and PMLE methods offer much power for QTL detection, the QTL effects can sometimes be underestimated (Zhang and Xu2005). Although this is not serious if the effect is large, it can be highly misleading for small effects.

To overcome the limitation of PMLE, an unpublished method called shrinkage interval mapping (shrinkIM) (Guo et al. 2007) was proposed. It used PMLE as QTL selector and used unpenalized QTL effect estimates to find other QTLs with smaller effects. This method exploited partially penalized regression to leave QTL effect estimates unpenalized while penalizing spurious effects.

Here, I propose an improvement to shrinkIM that combines PMLE with multiple interval mapping (MIM) (Kao et al. 1999). This method is a multiple-pass method. In the first pass, PMLE is used to detect QTLs with higher effects. In the second pass, a hybrid between MIM and partially penalized regression is used to fit without penalty the QTLs found in the first pass while simultaneously searching for additional QTLs with smaller effects. This simultaneous unpenalized QTL fitting is analogous to that of MIM and is intended to improve precision and power. Further passes repeat the second until no more QTLs are found.

Related documents