• No results found

Effects of Genetic and Environmental Factors on Trait Network Predictions From Quantitative Trait Locus Data

N/A
N/A
Protected

Academic year: 2020

Share "Effects of Genetic and Environmental Factors on Trait Network Predictions From Quantitative Trait Locus Data"

Copied!
13
0
0

Loading.... (view fulltext now)

Full text

(1)

DOI: 10.1534/genetics.108.092668

Effects of Genetic and Environmental Factors on Trait Network

Predictions From Quantitative Trait Locus Data

David L. Remington

1

University of North Carolina, Greensboro, North Carolina 27402 Manuscript received June 15, 2008

Accepted for publication December 31, 2008

ABSTRACT

The use of high-throughput genomic techniques to map gene expression quantitative trait loci has spurred the development of path analysis approaches for predicting functional networks linking genes and natural trait variation. The goal of this study was to test whether potentially confounding factors, including effects of common environment and genes not included in path models, affect predictions of cause–effect relationships among traits generated by QTL path analyses. Structural equation modeling (SEM) was used to test simple QTL-trait networks under different regulatory scenarios involving direct and indirect effects. SEM identified the correct models under simple scenarios, but when common-environment effects were simulated in conjunction with direct QTL effects on traits, they were poorly distinguished from indirect effects, leading to false support for indirect models. Application of SEM to loblolly pine QTL data provided support for biologically plausiblea priorihypotheses of QTL mechanisms affecting height and diameter growth. However, some biologically implausible models were also well supported. The results emphasize the need to include any available functional information, including predictions for genetic and environmental correlations, to develop plausible models if biologically useful trait network predictions are to be made.

T

HE emergence of genome technologies provides

unprecedented opportunities to dissect the regu-latory and functional networks by which genes affect phenotypes. The field of systems biology arising from these opportunities has been devoted primarily to un-derstanding constitutive processes by investigating and modeling phenotypic effects of knockout mutants, but recent interests have expanded into understanding how natural genetic variation affects phenotypic variability (Federand Mitchell-Olds2003; Benfeyand Mitchell -Olds2008; Keurentjeset al.2008; Kooninand Wolf 2008). Genetic mapping of quantitative trait loci (QTL) in segregating families has become a key tool for uncov-ering the genetic architecture of natural trait variation and ultimately identifying the underlying genes over the last two decades (Mackay2001, 2004). However, the use of high-throughput techniques has expanded the def-inition of ‘‘quantitative trait’’ to encompass transcript levels assayed on a genomewide scale, leading to expres-sion QTL (eQTL) mapping (Bremet al.2002; Schadt et al. 2005; Keurentjes et al. 2007), and similar ap-proaches are being used to map QTL associated with metabolomic variation (Keurentjeset al. 2006; Lisec et al.2008).

Interest in incorporating eQTL data into a systems biology framework has spawned the development of new

analytical approaches for predicting functional net-works linking genes and traits using multiple-trait QTL data. Basic methods for combined analysis of traditional multiple-trait QTL, such as mapping QTL for eigenvec-tors defined by principal components analysis (Langlade et al.2005; Breweret al.2007) and multiple-trait QTL analysis ( Jiangand Zeng1995; Rauhet al.2002; Hall et al.2006), may be useful for detecting pleiotropic (or closely linked) QTL and for providing insights into the nature of multiple-trait QTL effects, but yield little information for causal inference. One proposed method for identifying key regulators of gene expres-sion variation uses average expresexpres-sion levels of gene regulatory networks identifieda priorias composite traits for eQTL analysis (Kliebensteinet al.2006). In another approach, expression patterns of genes located within eQTL regions shared among genes in known pathways are compared to those of the genes in the pathway to identify likely pathway regulators (Keurentjes et al. 2007).

In contrast with the above approaches, Doss et al. (2005) suggested using patterns of overallvs. residual correlations of gene expression in eQTL studies, which differ among tightly linked gene pairs with shared vs. independent regulatory mechanisms, in combination with path analysis to make causative predictions about gene regulation. Inference methods based on this reasoning have been proposed for de novo prediction of causal relationships in gene regulatory pathways (Zhu 1Address for correspondence:Department of Biology, 312 Eberhart Bldg.,

University of North Carolina, P.O. Box 26170, Greensboro, NC 27402-6170. E-mail: dlreming@uncg.edu

(2)

et al. 2004; Liu et al. 2008), between gene regulation processes and disease (Schadtet al.2005), and among sets of functionally related complex morphological and metabolic phenotypes (Liet al.2006). Various models of network regulatory structure, in which sets of QTL control variation in multiple traits directly or indirectly through cause–effect relationships, are tested with genetic marker and trait data to identify the models best sup-ported by the data. Statistical approaches include Bayes-ian networks (Zhuet al. 2004), likelihood methods to evaluate conditional trait correlations for their support under alternative regulatory model configurations (Schadt et al.2005; Netoet al. 2008), and structural equation modeling to compare the fit of different regulatory path models to the multiple-trait covariance structure (Li et al. 2006; Liu et al. 2008). The information from residual trait correlations used in these techniques is normally neglected in conventional QTL analysis, ex-cept for some multiple-trait QTL analysis techniques ( Jiangand Zeng1995) where it is used to increase the power to detect joint QTL but not for functional prediction.

Structural equation modeling (SEM) is well suited to causal network prediction because it can accommodate a large variety of model structures, including feedback relationships (Liuet al.2008), and is adaptable to var-ious modes of exploratory and confirmatory analyses of biological models (Stein et al. 2003; Li et al. 2006; Tonsor and Scheiner 2007; Liu et al. 2008). More-over, SEM is supported by software packages such as

SAS/STAT (SAS Institute 2002) and AMOS (Amos

Development2006). Path analysis has long been recog-nized as a potentially valuable but probably underutilized tool for testing causative rather than merely correlative relationships in quantitative genetic data (Wright1921; Lynchand Walsh1998). Nevertheless, some important issues must be considered in evaluating the reliability of SEM and other path analysis methodologies for in-ferring trait networks from QTL data. For one thing, cause–effect relationships among traits, environmental effects, and effects of other QTL can all generate within-genotype trait correlations (Steinet al.2003; Dosset al. 2005; Schadtet al. 2005). Can path analysis correctly distinguish these potentially confounding effects? Sec-ond, how valid are the results of path analysis for exploring the likelihood of different model structures (i.e., different directions of cause–effect relationships among traits) rather than merely testing for the pres-ence of particular connections within an established structure or estimating path coefficients for a particular model? Third, the true regulatory network shaping patterns of trait variation may be much more complex than the tested models, including feedback loops or unobserved traits and QTL (Schadtet al.2005; Liuet al. 2008). Under these circumstances, will poor support for overly simple models provide reliable clues to the existence of a more complex network structure?

The goal of this study was to test the causal inferences from structural equation models under a set of relatively simple QTL regulatory scenarios. Specifically, I wanted to test whether models could correctly distinguish direct vs.indirect mechanisms of QTL effects on a pair of traits and test the effects of incomplete identification of QTL, common-environment (CE) effects, and direct regula-tion of unobserved traits on model predicregula-tions. To do this, I simulated multiple QTL data sets using a back-cross design involving three QTL jointly affecting two traits in various direct and indirect modes of action. SEMs representing different models were tested for their ability to distinguish the correct functional sce-nario from incorrect scesce-narios in each case. For the sake of consistent comparisons, similar locations for each QTL and similar sets of effect sizes on the observed traits were simulated under all of the initial scenarios, a constraint that was later relaxed. Finally, I used SEM to evaluate data from a loblolly pine (Pinus taedaL.) QTL study of diameter and height growth for which bi-ologically reasonable predictions could be made about models of joint genetic effects on the set of traits.

MATERIALS AND METHODS

QTL simulations and detection:All data sets were initially simulated using QTL Cartographer version 1.17 (Bastenet al. 1994, 2004) and modified as needed to incorporate indirect QTL effects. The genome simulated using Rmap consisted of five chromosomes, each with eight markers spaced 15 cM apart starting at position 0 cM, generated using the Kosambi map function. Three QTL were simulated using Rqtl at the following locations: chromosome 1, 50 cM (Q1); chromosome 2, 40 cM (Q2); and chromosome 3, 60 cM (Q3). Thus, the first two simulated QTL were in intervals between markers and the third was at a marker position. Environmental variance was set to keep the total phenotypic variance close to 1.0 so all effect sizes would be approximately in phenotypic standard de-viation units. Effects of all QTL were strictly additive, with simulated allele substitution effects (a) after all modifications on traits T1 and T2, respectively, of 0.7 and 0.5 for Q1 and 0.5 and 0.35 for Q2 and Q3. Thus,afor each QTL was uniformly greater on T1 than on T2 by a factor of pffiffiffi2, and the simulated variance explained by each QTL for T1 was very close to double that for T2. Backcross families of 500 progeny segregating for the genetic markers and three QTL in the two traits were simulated in QTL Cartographer using Rcross.

Direct-effects scenario: Under a scenario of direct effects, the simulated QTL directly and independently affected both traits (Figure 1a). All marker and trait data were directly simulated in QTL Cartographer as described above.

Indirect-effects scenario: Under the indirect-effects (or cause–effect) scenario, the simulated QTL directly affected only trait 1, which in turn affected trait 2 (Figure 1b). Only the data fort1were simulated in QTL Cartographer with the effect

sizes as described above for the basic model. Data fort2were

simulated using the equation

t2i¼

ffiffi

2

p =2t1i1

ffiffi

2

p

=2zi; ð1Þ

wheret1iandt2iare the values for T1 and T2, respectively, in

progeny i, and zi is a standard normal random variate

(3)

(SAS Institute 2002). Thus, half the variance in T2 is explained by the value of T1, yielding expected substitution effects for Q1, Q2, and Q3 of0.5, 0.35, and 0.35, respectively, in T2, similar to the effect sizes under the direct-effects model. A new marker-trait input file was then created for Rcross that included the simulated values for both traits.

Latent-direct scenario:In this scenario, the three QTL were assumed to have direct effects on the unobserved trait T0, which in turn had independent direct effects on T1 and T2 (Figure 1c). Progeny values for T0 were simulated in Rcross using substitution effects of 1.0, 0.7, and 0.7 for the three QTL. Progeny values for T1 and T2 were then simulated using the equations

t1i¼

ffiffi

2

p =2t0i1

ffiffi

2

p

=2z1i; ð2aÞ

and

t2i¼1=2t0i1

ffiffi

3

p

=2z2i; ð2bÞ

with T0 explaining half of the variance in T1 and one-fourth of the variance in T2, where z1i and z2i are standard normal

variates. This resulted in substitution effects for the three QTL on the two traits that were similar to those described above for the base model. The generated values fort1andt2were then

used as input into Rcross.

Direct-effects, common-environment scenarios: In these scenarios, the simulated QTL directly and independently affected the two traits in similar fashion to the direct-effects model, but a common-environment effect on the two traits was included (Figure 1d). To generate simulated data sets under this scenario, the two traits T1* and T2* were first simulated using Rqtl and Rcross, with Q1, Q2, and Q3 having substitution effects of 1.0, 0.7, and 0.7, respectively, on T1* and 0.7, 0.5, and 0.5 on T2*. Three vectors of the standard normal variates,zC,

z1, and z2, were then generated to represent common, T1-specific, and T2-specific environmental effects, respectively. Progeny values for T1 and T2 were then generated using the equations:

t1i¼

ffiffi

2

p

=2t1*i1bzCi1cz1i; ð3aÞ

and

t2i¼

ffiffi

2

p =2t*

2i1bzCi1cz2i: ð3bÞ

Values ofbandcwere chosen so thatb21c2¼0.5 and were

varied in different simulated data sets to adjust the strength of the common-environment effect. Values ofb¼pffiffi2=

2andc¼0

were used to simulate strong environmental correlation, and b¼pffiffiffiffiffiffiffiffiffi0:15 and c¼pffiffiffiffiffiffiffiffiffi0:35 were used to generate a weaker environmental correlation. The resulting substitution effects of QTL on the two traits were thus similar to those of the base model. The values for T1 and T2 were then substituted for the original T1* and T2* values as input into QTL Cartographer.

Heterogeneous-effects, common-environment scenarios:

These scenarios were similar to the direct-effects, common-environment scenarios and data were generated using the same approach, except that the assumption of a homogeneous ratio of relative QTL effects on the two traits was relaxed (Figure 1e). Values for T1* and T2* were changed to result in substitution effects for Q1, Q2, and Q3 of 0.7, 0.5, and 0.3 for T1, and 0.5, 0.5, and 0.5 for T2. Both strong and weak environmental correlations were simulated in different data sets.

F2 direct-effects scenario: A model with direct and in-dependent effects on the two traits was also used to simulate an F2 family of 500 progeny. QTL locations and additive

sub-stitution effects on the two traits were identical to the basic model described above, and a dominance effect was also included such that the simulatedffiffi d/aratio for both traits was

2

p

=2for Q1, 1.0 for Q2, and 0 for Q3.

QTL detection and structural equation modeling: QTL effects and locations were estimated by maximum-likelihood composite interval mapping in QTL Cartographer, using model 6 in Zmapqtl. Prior to running Zmapqtl, background markers were fit using forward plus backward stepwise re-gression using a P-value threshold of 0.05 in SRmapqtl to control for background effects, including those of other QTL when conducting the QTL scans. In Zmapqtl, a 2-cM walking speed was used to scan the genome for QTL effects, a window size of 10 cM around each tested position was used to control for QTL effects outside the current marker interval, and up to five of the markers selected in SRmapqtl were used to control for background effects. For backcross data sets, an empirical genomewidea¼0.05 likelihood-ratio test statistic threshold of 11.28 was estimated from 1000 permutations of simulated

Figure1.—Path models for scenarios simulated or

mod-eled in this study: (a) Direct-effects scenario. (b) Indirect sce-nario. (c) Latent-direct scesce-nario. (d) Direct-effects scenario with common-environment effects (direct CE scenario). (e) Direct CE scenario with heterogeneous QTL effect ratios. (f) Full model including both direct effects and indirect ef-fects on T2. In a–f, Q1, Q2, and Q3 represent simulated QTLs, T1 and T2 represent observed traits, T0 represents an unob-served (latent) trait, and ECrepresents common-environment

(4)

data set 1 for the direct-effects scenario. For F2data sets, an

empirical threshold of 14.37 was similarly estimated from permutation tests of one of the two F2 simulations. Finally,

multiple-trait composite interval mapping was done using model 6 in JZmapqtl to estimate joint QTL effects on the two traits. The positions with the peak multiple-trait likelihood ratios when comparing the full and reduced (a¼0) models were used as the QTL locations in the subsequent SEM analysis in place of the ‘‘true’’ simulated locations. If the detected multiple-trait likelihood peaks were.3 cM outside the range defined by the single-trait likelihood peaks identified using Zmapqtl, a location closer to the range of the detected single-trait peaks was chosen to represent the QTL.

Virtual markers were generated for the likelihood-ratio peak locations of each detected QTL. For backcross progeny sets, the virtual marker score was the probability of inheriting a donor parent allele from the heterozygous parent, given the flanking marker genotypes (coded as 1 for a heterozygote and 0 for a recurrent parent homozygote) and recombination fre-quencies with the respective markers. For F2progeny, additive

genotypic coefficients at flanking markers were scored as 1, 0, and1 for parent 1 homozygotes, heterozygotes, and parent 2 homozygotes, respectively. Dominance coefficients were scored as 0.5 and0.5 for heterozygotes and homozygotes, respectively. The additive and dominance scores for each virtual marker were calculated as the expected values of the additive and dominance genotypic coefficients at the esti-mated QTL position, given the flanking marker scores and recombination frequencies with each marker. The MultiRe-gress module in QTL Cartographer was used to calculate the additive and dominance scores for the F2data sets.

SEM was conducted using PROC CALIS in SAS version 9 (SAS Institute2002). SEMs were implemented by solving a set of linear equations for each tested model using maximum-likelihood estimation, with each equation describing a trait value (an endogenous variable) as a linear function of variables directly ‘‘upstream,’’ which could be virtual marker scores at QTL positions and/or another trait. In F2designs,

additive and dominance scores at each virtual marker were included as separate variables. Virtual markers for all QTL significant at the empirical threshold for at least one of the two traits were included as variables in the SEM. All simulated QTL were detected with genomewide significance for at least one, and generally both, traits. In all cases in which a simulated QTL lacked genomewide significance for a trait, a likelihood-ratio peak with at least comparisonwise significance was present. In several simulated data sets, significant effects were also detected at loci that did not correspond to simulated QTL locations, and these were also included in the models. The coefficients of the virtual marker scores in the linear equations represent the maximum-likelihood estimates of direct QTL effects on the traits. This approach differs from the mixture model maximum-likelihood estimation used for interval mapping in QTL Cartographer. However, in models in which the only explanatory variables were direct QTL effects on the observed traits, the estimated QTL effects from SEM were very similar to those obtained with QTL Cartographer. When common-environment effects were included in models, they were incorporated either as unobserved variables (known as latent variables) affecting multiple traits or as a covariance between the error terms for the traits, with both approaches producing the same results. The SEM methodology described above was implemented in PROC CALIS by using Lineqs statements.

Analyses of all data sets included: (1) a full model with direct effects of all three QTL on both T1 and T2 and indirect effects of T1 on T2 (Figure 1f); (2) a direct-effects model correspond-ing to Figure 1a; (3) an indirect-effects model correspondcorrespond-ing

to Figure 1b; and (4) a reversed indirect-effects model, corresponding to Figure 1b but with T2 downstream of T1. The full model under both backcross and F2scenarios could

not be tested because it was fully specified (i.e., had zero degrees of freedom), and thus the solutions resulted in a perfect fit to the covariance structure of the marker-trait data (see also Netoet al.2008). Alternative fully specified models with the same sets of observed (manifest) variables but different path structures also have a statistically equivalent perfect fit to the data. Including the full model in the analyses, however, provided unique estimates of the relative contribu-tions of the direct and indirect paths to the values of T2, given the model structure. A model corresponding to the direct-effects, common-environment scenario (Figure 1d) could not be tested under the simulated conditions. This model has no degrees of freedom if all QTL are included as effects for both traits, and thus the maximum-likelihood solutions perfectly fit the data as in the full model. Models corresponding to the latent-direct scenario (Figure 1c, with the unobserved T0 included in the model as a latent variable) also could not be validly tested. Due to the path structure in which T0 is directly connected to all explanatory QTL and observed traits, the latent variable can be fit optimally to be highly correlated to the upstream trait in an indirect scenario or to characterize the correlated QTL effects on the two traits in a direct scenario. Consequently, a latent-direct model fit the data better than both the direct and indirect models under all of the simulated scenarios and was uninformative about the true model.

Structural equation models were evaluated and compared using three criteria.

1. The likelihood-ratio (LR) test statistic (LRT) is2 times the log-likelihood ratio for the data under the tested modelvs. a fully specified model. All tested models are nested within a fully specified model (either the full model described above or a statistically equivalent model with a different path structure), so the LR test statistic has an asymptoticx2

distribution based on the number of degrees of freedom in the tested model. A significantP-value (P,0.05) repre-sents a failure of the tested model to explain the covariance structure of the data fully (i.e., insufficiency of the model), and insignificantP-values identify models that adequately fit the data (Mitchell1992).

2. Standard and adjusted goodness-of-fit indices (GFI and AGFI, respectively) estimate model fit on the basis of the residual errors, with the latter adjusted for degrees of freedom. Thus, GFI may be considered analogous to theR2

in a linear model and AGFI analogous to an adjustedR2.

While both measures should produce values between 0 and 1, poorly fitting models with small degrees of freedom may result in negative AGFI (PROC CALIS documentation in SAS Institute2002).

3. Akaike’s information criterion (AIC; Akaike 1974, 1987) reduces the LRT statistic by twice the number of degrees of freedom in the model, allowing comparison of non-nested models. The model with the lowest AIC is favored.

Loblolly pine QTL data collection and analysis: Year 8 height and diameter data were collected from 146 surviving selfed progeny of a single loblolly pine parental clone from a previously reported genetic mapping study of inbreeding depression; the progeny were growing in a South Carolina Forestry Commission field site near Tillman, South Carolina

(Remington and O’Malley 2000a,b). Malformed trees for

(5)

the field measurement crew determined to be reasonable estimates.

QTL mapping was done using the 8-year field data, pre-viously scored AFLP marker data (Remingtonand O’Malley 2000a,b), and a genetic map developed from outcross progeny of the same parental clone (Remingtonet al.1999). Compos-ite interval mapping was performed using QTL Cartographer as described above for simulated data. Trait data for year 3 height from the study of Remingtonand O’Malley(2000a) were also included, and height growth between year 3 and year 8 (years 3–8 growth) was calculated from the difference between the year 8 and the year 3 height measurements. Per-mutation tests on year 8 height data established an extremely high empirical genomewidea¼0.05 LR test statistic threshold of 55.76 for year 8 height, probably because the trait data were bimodally distributed due to very large phenotypic effects of the QTL associated with thecadlocus in year 8. Therefore, for all traits I used the empirical genomewidea ¼0.05 LR test statistic threshold of 16.42 estimated for earlier height growth measurements in Remingtonand O’Malley(2000a). I also included two suggestive QTL peaks for year 3 height with lower test statistic values because they corresponded to significant QTL regions for annual growth in either year 2 or year 3. For the purposes of this study, using this lower threshold is actually conservative, since not including all true QTL effects in the model introduces residual trait correlations that lead to bias in the SEM results. Virtual markers were developed for single-trait or multiple-single-trait QTL likelihood peak locations estimated from Zmapqtl or JZMapqtl and evaluated by SEM in SAS PROC CALIS as described above for simulated data.

In evaluating SEMs for year 3 height and years 3–8 growth and for year 8 height and diameter, QTL regions were included as direct effects for each trait unless there was no evidence at all of contribution to one of the traits (i.e., no likelihood ratio peak in the region exceeding the nominal comparison-wise significance level of 5.99 and with homozygous effects in the same direction as those of the other traits). Excluding non-contributing QTL effects provided additional degrees of freedom that allowed testing of models that included various combinations of direct, indirect, and correlated-environment effects on traits. This opportunity did not exist with the simulated data, since in almost every case all QTL were either significant at a genomewide level or suggestive at a compari-son-wise level for both traits. In models with indirect effects, QTL regions contributing only to the downstream trait were included as effects on that trait.

RESULTS

Backcross simulations and SEM testing: Two simu-lated backcross data sets were tested against SEMs for each causal scenario to test the hypotheses (a) that the correct model would have the best fit to the simulated data sets using AIC as a criterion and (b) that the correct model would adequately explain the simulated data (like-lihood ratioP-value$0.05). All simulations consisted of three QTL (Q1, Q2, and Q3) affecting two observed traits (T1 and T2). The three QTL were simulated to have homogeneous effects ratios on the two traits except as noted. Simulation results are summarized in Table 1. Direct-effects scenario: The first pair of simulations was generated under the direct scenario (Figure 1a), in which the three QTL each have direct effects on both T1 and T2. In both simulated data sets, the direct model

correctly outperformed the indirect model in which T2 was downstream of T1 (Figure 1b), with the two simu-lations resulting in AIC values of 2.7627vs.18.6995 and 3.6130vs.16.7255, respectively, for the directvs.indirect models. The direct model outperformed the reversed indirect model (T1 downstream of T2) by even larger margins. In both simulations, however, the direct model was weakly rejected as a sufficient explanation for the data (LRTP-values of 0.029 and 0.018, respectively, and modestly positive AIC values relative to a fully specified model) while the indirect model was strongly rejected (P,0.0001). In the full model in which T2 was affected by the QTL via both direct and indirect paths, a substantial proportion of the effect on T2 (0.165 and 0.196, respectively) was inferred to be indirect and exerted through T1.

Single QTL in SEM:I also tested the first of the two direct model simulations with three SEMs that each included only one of the predicted QTL. Not including all true QTL effects in the model can be expected to result in residual correlations between the traits, con-founding the ‘‘signal’’ used by SEM to distinguish direct and indirect models. Consistent with this prediction, the direct model was strongly rejected and the indirect model had higher AIC values in each case when only one of the three QTL was included (Table 2). In one case, when only Q3 was included in the model, the indirect model was found to fit the data adequately (P¼0.1253). Indirect-effects scenario: When simulations were done under the indirect-effects scenario (Figure 1b), the indirect-effects model adequately fit the data, with insignificantP-values and negative AIC values, and cor-rectly performed far better than either the direct model or the indirect model with the trait effects reversed from the simulated scenario (T2 upstream of T1). In the full model, the proportion of the effect on T2 inferred to be via the indirect path was.0.98 in both simulations.

Latent-direct scenario: A more complex but poten-tially realistic scenario was represented by a pair of simulations in which the QTL affected an unobserved trait, which in turn had independent direct effects on T1 and T2 (Figure 1c). The latent-direct scenario could correspond to a situation in which genetic variation in the amount or activity of a transcriptional regulator af-fects expression of downstream genes or other observed traits or any other situation in which the direct QTL effects are on a trait that is not readily observed. In these simulations, both the direct and indirect models were rejected, with the direct model showing a much poorer fit than the indirect model with T2 downstream but better than the reversed indirect model with T1 down-stream. In the full model, proportions of the effects on T2 inferred to come through the indirect pathway were 0.70 and 0.84, respectively, in the two simulations.

Common-environment direct scenario: The next

(6)

common-environment effects on T1 and T2 were in-cluded (CE direct scenario; Figure 1d). This corresponds to a highly realistic situation in which an environmental variable such as nutrient availability may jointly affect multiple aspects of growth and development. In the first of these simulations, common-environment effects were simulated to be strong, explaining 50% of the variance in T1 and T2. Under this scenario, the indirect model not only was incorrectly favored but also adequately fit the data, while the direct model and the reversed indirect model were strongly rejected. In the full model, essentially all of the effect on T2 was inferred to come through the indirect pathway. In a second simulation, in which the common-environment effect explained only 15% of the phenotypic variance in both traits, all three of these models were strongly rejected but the indirect model showed by far the best overall fit. In the full model, the proportion of the effect on T2 attributed to the indirect path was 0.73.

CE direct scenario with relaxed homogeneity assumptions: There is no particular reason to expect multiple QTL to have a homogeneous ratio of effects on multiple traits under a direct-effects scenario. Thus, I generated additional data sets under the common-environment effects scenario in which the relative effects of the three QTL on the two traits were different to test whether the direct model would perform better than the indirect model under these relaxed assumptions. Under the CE heterogeneous-direct scenarios, Q1, Q2, and Q3 had larger, equal, and smaller effects, respectively, on T1 relative to T2. Two data sets were simulated under this scenario—one with strong correlated-environment ef-fects and the other with weaker correlated efef-fects.

Under both CE heterogeneous-direct scenarios, the fit of the direct model and of both indirect models were all strongly rejected (Table 1). When the correlated-environment effects were strong, the indirect model performed best by AIC and adjusted goodness-of-fit TABLE 1

Summary of structural equation model analyses for backcross simulations

Simulated scenario T2 VI/VQa Tested model GFI AGFI LRP-value AIC

Direct (set 1) 0.165 Directb 0.9962 0.9432 0.0291 2.7627

Indirectc 0.9810 0.9052 ,0.0001 18.6995

Indirect reversedd 0.9332 0.6660 ,0.0001 92.3938

Direct (set 2) 0.196 Direct 0.9955 0.9332 0.0178 3.6130

Indirect 0.9825 0.9125 ,0.0001 16.7255

Indirect reversed 0.9408 0.7042 ,0.0001 79.3354

Indirect (set 1) 0.989 Direct 0.8440 1.3404 ,0.0001 307.5001

Indirect 0.9955 0.9776 0.1297 0.3460

Indirect reversed 0.9553 0.7767 ,0.0001 56.0307

Indirect (set 2) 0.985 Direct 0.8880 2.1365 ,0.0001 288.6861

Indirect 0.9964 0.9800 0.2796 3.7162

Indirect reversed 0.9804 0.8904 ,0.0001 26.1377

Latent direct (set 1) 0.844 Direct 0.9591 0.3861 ,0.0001 54.2958

Indirect 0.9907 0.9537 0.0081 5.7892

Indirect reversed 0.9381 0.6904 ,0.0001 83.9911

Latent direct (set 2) 0.696 Direct 0.9594 0.3903 ,0.0001 53.8670

Indirect 0.9810 0.9048 ,0.0001 18.8166

Indirect reversed 0.9477 0.7387 ,0.0001 68.0184

CE (strong) 0.999 Direct 0.8987 1.1270 ,0.0001 203.9092

Direct Indirect 0.9992 0.9957 0.8750 6.7810

Indirect reversed 0.9496 0.7357 ,0.0001 78.4443

CE (weak) 0.724 Direct 0.9698 0.3665 ,0.0001 46.8858

Direct Indirect 0.9872 0.9327 0.0005 11.8183

Indirect reversed 0.9524 0.7500 ,0.0001 73.0995

CE (strong) 0.925 Direct 0.8766 0.8516 ,0.0001 214.5460

Heterogeneous Indirect 0.9815 0.9073 ,0.0001 18.1409

Direct Indirect reversed 0.9698 0.8492 ,0.0001 34.3724

CE (weak) 0.325 Direct 0.9880 0.8194 ,0.0001 13.4429

Heterogeneous Indirect 0.9747 0.8736 ,0.0001 27.4442

Direct Indirect reversed 0.9595 0.7975 ,0.0001 49.6496

aProportion of model variance in T2 (sum of squared standardized model path coefficients) attributed to T1 in a fully specified model including both direct QTL effects and T1 effects on T2.

bDirect model has 1 residual degree of freedom (d.f.) in all backcross simulations. c

Indirect models have 1 residual d.f. per QTL in the model. d

(7)

criteria, followed in order by the indirect-reversed and direct models. Under the full model,0.92 of the effects on T2 were attributed to the indirect pathway. When the correlated-environment effects were weaker, the direct model outperformed the other models using AIC as the criterion, but the indirect model had the highest adjusted goodness of fit. Under the full model, more than two-thirds of the effect on T2 was attributed to the direct pathway.

F2 simulations under direct scenario: I also tested SEM performance on F2progeny sets simulated under the direct scenario. The additive effects of the three QTL were simulated to be the same as those in the backcross simulations. Dominance effects were also simulated, withd/aratios differing among QTL but homogeneous between the two traits. The F2SEMs included indepen-dent additive and dominance effects for each detected QTL. The SEM results were similar to those obtained for the direct scenario in the backcross simulations, with the direct model strongly outperforming the indirect mod-els but still failing as a sufficient explanation of the data (Table 3). Under the full model, the proportions of the effects on T2 attributed to the indirect pathway were 0.11 and 0.13, respectively, in the two simulations, slightly less than in the direct-scenario backcross simulations.

SEM predictions with loblolly pine growth QTL data: The F2models were tested on data from loblolly pine (P.

taedaL.) progeny of a self-pollinated parent, originally planted as part of a QTL study of inbreeding depression (Remingtonand O’Malley2000a,b). Surviving 8-year-old trees were scored for height and breast-height diameter. The F2progeny had previously been scored for a set of mapped AFLP markers covering the loblolly pine genome (Remingtonet al.1999; Remingtonand O’Malley 2000b) and for annual height growth through the first 3 years after germination (Remington and O’Malley2000a). Applying SEM to year 3 and year 8 QTL data provided an opportunity to address ques-tions about the genetic basis for juvenile-mature corre-lations and plant allometry, topics of interest to plant biologists and tree breeders. One hypothesis was that QTL affecting year 3 height would continue to affect subsequent height growth between years 3 and 8 (years 3–8 growth). A prediction of this hypothesis is that year 3 height QTL would also have direct effects on years 3–8 growth. However, year 3 height QTL might also be ex-pected to continue to affect subsequent growth indirectly through a carryover effect of the growth trajectory established in individual plants through year 3. A second hypothesis was that QTL effects on year 8 diameter would largely be indirect effects of genetic variation for year 8 height rather than vice versa, under the assump-tion that diameter growth in trees is controlled largely as an allometric response to height growth. A height-diameter allometric exponent close to 1 (i.e., a linear relationship between height and diameter) has been reported for relatively young trees such as those in this study, so indirect effects of tree-height QTL on diameter would be expected to be more or less linear (Niklas 1995). Common-environment effects would also be ex-pected to generate positive correlations between traits in both the years 3–8 and height-diameter analyses.

Four QTL regions that exceeded the likelihood ratio test threshold of 16.42 were identified for eight-year height or diameter. Two QTL regions, LG1-14 and LG8-88, were significant for both year 8 height and diameter. The LG8-88 region corresponds to the location of the segregating cad-n1 mutation, which results in produc-TABLE 2

Predictions for direct scenario (set 1) with only one QTL included in the structural equation model

QTL included

in model Tested model LRP-value AIC

Q1 Direct 0.0002 12.0834

Indirect 0.0104 4.5721

Q2 Indirect 0.0001 12.6641

Direct 0.0004 10.7011

Q3 Direct ,0.0001 15.1621

Indirect 0.1253 0.3499

TABLE 3

Summary of structural equation model analyses for F2simulations

Simulated scenario T2 VI/VQa Tested modelb GFI AGFI LRP-value AIC

Direct (set 1) 0.105 Direct 0.9963 0.7948 0.0021 7.4309

Indirect 0.9721 0.8082 ,0.0001 61.2901

Indirect reversed 0.9306 0.5229 ,0.0001 216.8199

Direct (set 2) 0.134 Direct 0.9957 0.7624 0.0009 8.9432

Indirect 0.9760 0.8351 ,0.0001 49.4271

Indirect reversed 0.9398 0.5863 ,0.0001 176.5202

a

Proportion of model variance in T2 (sum of squared standardized model path coefficients) attributed to T1 in a fully specified model including both direct QTL effects and T1 effects on T2.

b

(8)

tion of chemically altered lignin and reduced wood stiffness, apparently interfering with the trees’ ability to maintain upright form and sustained height growth (Mackayet al.1997; Ralphet al.1997; Remingtonand O’Malley 2000a). A third QTL region (LG9-08) was significant for year 8 height and strongly suggestive for diameter (LRT¼16.24). Each of these regions had large additive effects with substantial overdominance for both traits. A fourth QTL region (LG1-80) was significant for year 8 height, with large symmetrically overdominant effects, but had no apparent effect on diameter (Table 4). The height effect associated with LG1-80 may be an artifact of extreme segregation distortion, as it is flanked by near-lethal recessive alleles linked in repulsion and has substantial deficits of both homozygous genotypes, but this locus was included in the SEMs anyway to account for all prospective QTL effects. The LG8-88 and LG1-14 QTL regions had similar effects on height at earlier ages (Remington and O’Malley 2000a). The LG9-08 region was also suggestive of effects on year 3 growth, with a peak LRT value of 9.18 and additive and dominance effects in the same direction as in year 8. Two QTL regions on linkage groups 12 (LG12-116) and 3 (LG3-03), which were significant for year 2 and year 3 growth, respectively, had nearly significant effects on total year 3 height but showed no evidence of effects in

year 8. QTL effects for years 3–8 growth were very similar to those for year 8 height (Table 5).

In fitting SEMs for year 3 height and years 3–8 growth, only QTL with significant or suggestive effects were included as direct effects; thus, LG12-116 and LG3-03 were included only in direct effects for year 3 height, and LG1-80 was included only for years 3–8 growth. Similarly, in comparing year 8 height and diameter, LG1-80 was included only in direct effects for year 8 height. In both cases, QTL regions with direct effects only on the down-stream trait were included in models with indirect effects. The additional degrees of freedom provided by excluding noncontributing QTL allowed testing of mod-els that included various combinations of direct, in-direct, and correlated-environment effects on traits.

Effects on year 3 height and years 3–8 growth were poorly explained both by models with only direct QTL effects and with only indirect effects of year 3 height on years 3–8 growth (Table 6). A model with both direct and indirect effects on years 3–8 growth (direct1indirect model) adequately fit the data and had the lowest AIC value. Adding environmental correlation to the direct1

indirect model only marginally improved model fit but this direct 1 indirect 1 CE model had a higher AIC value than the simpler direct1indirect model. A model with direct effects and environmental correlation (di-TABLE 4

QTL map locations and standardized effects for year 8 height and diameter in a self-pollinated loblolly pine family

Year 8 height (cm)a Year 8 diameter (cm)a

QTL Peak position (cM) LR statistic 2a/SD d/SD Peak position (cM) LR statistic 2a/SD d/SD

LG1-14 17.8 21.41 0.91 0.77 15.8 18.15 0.79 0.74

LG1-80 79.6 21.41 0.05 1.62 NA NA NA NA

LG8-88 91.4 93.75 1.59 1.20 83.4 71.87 1.33 1.17

LG9-08 7.7 20.57 0.80 0.58 7.7 16.24 0.59 0.55

aEstimates were derived using Zmapqtl in QTL Cartographer version 1.17 (Basten

et al.1994, 2004), using model 6 (composite interval mapping) with window size 10 cM and fitting of up to five background markers. LR statistics and parameter estimates are from hypothesis tests comparing a full model with both additive and dominance values (aandd) estimated to a null model (H3:H0). Phenotypic standard deviation¼246.3 cm for year 8 height and 4.62 cm for year 8 diameter.

TABLE 5

QTL map locations and standardized effects for year 3 height and years 3–8 growth in a self-pollinated loblolly pine family

Year 3 height (cm)a Years 3–8 growth (cm)a

QTL Peak position (cM) LR statistic 2a/SD d/SD Peak position (cM) LR statistic 2a/SD d/SD

LG1-14 23.8 32.34 1.11 0.81 13.8 23.05 1.02 0.74

LG1-80 NA NA NA NA 79.6 18.15 0.46 1.52

LG3-03 2.2 13.54 0.64 0.20 NA NA NA NA

LG8-88 87.4 36.94 0.76 0.57 91.4 99.57 1.72 1.25

LG9-08 3.4 9.18 0.51 0.25 7.7 18.62 0.80 0.55

LG12-116 116.1 12.68 0.67 0.03 NA NA NA NA

a

(9)

rect1CE model) was marginally significant for lack of fit (P¼0.0344) and had slightly poorer fit and a slightly higher AIC value than the full model, suggesting that adding indirect effects may explain a small amount of effect not explained by direct and common-environment effects alone but not vice versa. Even with the additional degrees of freedom available with this data set, however, there is very little information to distinguish whether indirect effects or common-environment effects (or both) contribute to the relationship between the two traits.

To evaluate the utility of SEM for distinguishing be-tween different causal structures, I also tested a set of biologically implausible models in which year 3 height was a downstream effect of years 3–8 growth. One im-plausible model with year 3 height QTL explained only as an indirect effect of years 3–8 growth, and another model with both direct and indirect effects on year 3 height, had only marginally significant lack of fit to the data (P ¼ 0.0169 and P ¼ 0.0256, respectively); the former model had an AIC value within 1.0 of the best-fitting plausible direct1indirect model.

By contrast, effects on year 8 height and diameter were adequately explained by a model with only indirect effects of year 8 height on year 8 diameter (Table 7). Models that included direct QTL effects on both traits in combination with indirect effects on either height or diameter or correlated-environment effects also

ade-quately fit the data, but had AIC values that were.4.0 larger than the indirect-only model due to the addi-tional degrees of freedom used in fitting the model. Models with only direct QTL effects on both traits and with height explained only by diameter failed to fit the data and had much higher AIC values. Thus, the data were consistent with the prediction that diameter growth was largely an allometric response to height growth.

DISCUSSION

For this study, I used simulations to examine the ability of structural equation models to distinguish between simple but realistic genetic regulatory scenarios for multiple-trait QTL mapping data, including those with common-environment effects and incomplete QTL formation, and to examine the sensitivity of SEM to in-correct or incomplete model structures. The consistent results from the replicate simulations in each case sug-gest that the patterns that they reveal are valid and that their implications should be taken into account when interpreting the results of QTL-based causal inference studies.

The correct models were strongly favored using P -value and AIC criteria when data sets were simulated under direct and indirect scenarios without confound-ing factors and all QTL were included in the model. TABLE 6

Summary of structural equation model analyses on loblolly pine year 3 height and years 3–8 height-growth QTL data

Tested model Years 3–8 VI/VQa d.f. GFI AGFI LRP-value AIC

Direct1indirect 0.453 6 0.9886 0.8006 0.0689 0.2958

Direct1reversed indirectb 6 0.9861 0.7566 0.0256 2.3834

Direct 7 0.9486 0.2286 ,0.0001 53.6942

Direct1CE 6 0.9867 0.7671 0.0344 1.5998

Indirect 12 0.9346 0.4276 ,0.0001 65.2851

Reversed indirect 12 0.9771 0.7995 0.0169 0.5877

Direct1indirect1CE 5 0.9895 0.7792 0.0558 0.7843

aProportion of model variance in years 3–8 growth (sum of squared standardized model path coefficients) attributed to year 3 height in a fully specified model including both direct QTL effects and year 3 height effects on years 3–8 growth.

bIndirect model with year 3 height downstream of years 3–8 growth.

TABLE 7

Summary of structural equation model analyses on loblolly year 8 height- and diameter-QTL data

Tested model Diameter VI/VQa d.f. GFI AGFI LRP-value AIC

Direct1indirect 0.981 2 0.9993 0.9802 0.7730 3.4851

Direct1reversed indirectb 2 0.9951 0.8643 0.1660 0.4087

Direct 3 0.8978 0.8739 ,0.0001 114.4400

Direct1CE 2 0.9951 0.8642 0.1658 0.4059

Indirect 8 0.9890 0.9245 0.4171 7.8308

Reversed indirect 8 0.9513 0.6651 ,0.0001 25.0491

a

Proportion of model variance in year 8 diameter (sum of squared standardized model path coefficients) attributed to year 8 height in a fully specified model including both direct QTL effects and year 8 height effects on year 8 diameter.

b

(10)

However, the direct model provided only marginal fit to the simulated data when it was the correct model, which is discussed in more detail below. The simulated QTL effects were relatively small, with Q2 and Q3 explaining only3% of the phenotypic variance for T2, the trait with smaller QTL effects, in the backcross design. Suc-cess in identifying the correct model in these circum-stances is consistent with the results of a larger set of simulations by Schadtet al.(2005), who found that the power to detect the correct model under similar circum-stances was80% when QTL explained as little as 1–2% of the variation in the less-affected trait.

Confounding effects of incomplete QTL and trait detection: These simulations clearly show that not including all relevant QTL in the models can lead to incorrect functional predictions with multiple-trait QTL data. Much of the ‘‘signal’’ for detecting indirect QTL effects on traits in a network, whether the traits are gene expression or functional phenotypes, comes from the degree and pattern of residual correlations between trait values within QTL genotypic classes (Steinet al.2003; Dosset al.2005; Schadtet al.2005). In the absence of common-environment effects, only the genetic compo-nent of trait variation in a pair of traits should be correlated if a set of genes affects both traits indepen-dently. When QTL effects on one trait are indirect effects of variation in an upstream trait, however, both genetic and environmental effects on the upstream trait will affect the downstream trait in the same manner, leading to similar correlations within and among QTL classes. If all QTL that jointly affect a set of traits are not included in the model, effects of the unobserved or excluded QTL will also result in residual correlations between traits. These residual correlations may generate false support for indirect models, with the trait having larger standardized QTL effects appearing to be up-stream. Thus, analyses that test only a single QTL region for effects on a suite of polygenic traits are virtually guar-anteed to place the traits in a common regulatory path-way, whether such a pathway exists or not. Even when all experimentally detected QTL regions are included in a model, some actual QTL are likely to be missed due to limited experimental power (Beavis1994) and lead to correlated residuals. Testing of directvs.indirect QTL path models has been suggested as a test of close linkage vs.pleiotropy with shared QTL regions, since the former are by definition independently regulated (Dosset al. 2005; Schadt et al.2005). For this test to identify in-stances of close linkage effectively, however, these results show that all other QTL regions that jointly affect the two traits must be included in the model.

Unobserved traits can also affect model predictions if they occur at branching points leading to observed traits, as in the latent-direct scenario. In the two latent-direct simulations, neither the direct nor the indirect model was well supported, providing some indication that the true model was more complex, but these results are not

distinguishable from those expected if the true scenario were a combination of direct and indirect effects.

Distinguishing indirect vs. common-environment

effects: These results show that shared environmental effects on a pair of traits may be difficult to distinguish from indirect effects. Models combining direct effects with either indirect effects or with common-environment effects on a pair of traits cannot be distinguished if all QTL affect both traits; both will lack residual degrees of freedom and have perfect fit. It should be noted that similar indistinguishable submodels could potentially be nested within more complex trait networks even when the overall model has residual degrees of freedom. Even if both model structures can be tested, however, as with the loblolly pine years 3–8 analyses, there may be little information to distinguish common-environment effects from indirect effects because both mechanisms will generate trait correlations within QTL genotype classes. When common-environment effects were included in simulations of direct-effects scenarios, indirect models were more strongly supported in most cases. Genes with smaller standardized genetic effects tended to be placed downstream of those with larger genetic effects even when their genetic regulation by shared QTL was independent. In biologically realistic situations, shared environmen-tal effects are probably common because each individual organism in a genetic mapping population is shaped by the unique environment that it has experienced and by the peculiarities of developmental ‘‘noise.’’ These envi-ronmental and developmental effects are likely to have correlated effects on multiple traits (Lynchand Walsh 1998). The influence of common environment on re-sidual trait correlations is well recognized (Steinet al. 2003; Doss et al. 2005; Schadt et al. 2005), but the question of whether or not path models can distinguish environmental correlations from causative indirect ef-fects has received little scrutiny.

(11)

and for these three regions the ratio of standardized homozygous effects on the two traits ranged from 0.44 to 1.09. By contrast, three QTL regions affecting year 8 height and diameter (ignoring LG1-80, which had negligible homozygous effects) had standardized effect ratios in a much narrower range of 1.16–1.37, consistent with the SEM inference of indirect effects on diameter. When heterogeneous QTL effects were simulated, the false support for the indirect model relative to the direct model due to common-environment effects was weak-ened or even reversed. Moreover, genetic and environ-mental correlations may be in opposite directions in actual trait networks, unlike those in our simulations. For example, traits involved in life-history trade-offs are likely to have negative genetic correlations, but shared environmental effects (e.g., variation in nutritional status) may generate positive correlations. In such situations, confounding of common-environment effects with in-direct effects would be unlikely, and in some cases SEM results might even be biased against indirect models.

Sensitivity of the direct models:In contrast with the effects of confounding factors that favor indirect mod-els, these data show that direct models are highly sensitive to any residual correlation between trait values. In all four simulations under simple direct-effects scenar-ios (two backcross and two F2simulations), the correct direct model was rejected as an adequate explanation of the data (LRP-values,0.05) although it outperformed the indirect model. Concomitantly, the full models that included both direct and indirect effects estimated as much as 20% of the effect on T2 to occur via the indirect pathway under the direct-effects scenarios, even though none of the actual effect was indirect. The basis for this bias toward indirect effects is uncertain. Imprecise estimates of the true QTL locations is one possibility, although replacing the estimated QTL locations with the actual simulated locations in one of the simulations did not improve the fit (data not shown). Imprecise estimates of the true QTL effects will result in under- or overfitting of the model to the true sources of effects and could also result in residual correlations. It is also possible that the residual values generated in the QTL Cartographer simulations were not entirely random.

Regardless of the explanation for the lack of complete model fit in the simulated data, actual data from directly regulated trait sets can generally be expected to show even less independence of residuals due to the perva-siveness of common-environment effects and undetected QTL discussed above. Given these realities, standard tests of model fit are inappropriate for evaluating the sufficiency of direct-effects models. Empirical tests will be necessary but not easy to implement, as permutation tests would have to randomize trait values within QTL classes and handling recombinants will present special problems. Parametric bootstrap simulations may be a better option, but the algorithms required to generate hundreds of simulated data sets, estimate QTL locations

and virtual marker scores, and run multiple SEMs on each simulation will be complex. Effects of unobserved QTL could be incorporated as polygenic effects in a parametric bootstrap. Crude but potentially useful estimates of polygenic variance could be obtained from the percentage of differences between parental lines not explained by the identified QTL, especially if the mapping population were derived from divergent pa-rental lines.

In spite of the factors causing bias against the direct model, direct effects were identified as the largest con-tributor to QTL effects in the experimental data on lob-lolly pine years 3–8 growth. These results were consistent witha prioripredictions of the carryover effects of earlier QTL action and possibly common-environment effect as additional contributors to years 3–8 growth. In contrast, QTL effects on year 8 diameter were fully explained as an indirect effect of year 8 height, which is also consistent witha prioripredictions based on allometry theory.

(12)

tool for testing the adequacy of causal hypotheses (Mitchell 1992). Purely exploratory analyses may be useful in systems-biology applications for reducing the set of model structures for further experimentation (or ‘‘sparsification,’’ per Liuet al.2008), but the results of such analyses should not be interpreted as support for particular causal structures.

Summary: Results of this study confirm that the re-sidual correlation structure in multiple-trait QTL data, which is typically unused in QTL analyses, provides a powerful but limited source of information on the func-tional basis of trait variation. Factors likely to be pervasive in real biological data, including common-environment effects and incomplete identification of QTL, can mas-querade as indirect causal effects and lead to incorrect conclusions if not carefully considered. Applications of SEM or other approaches to path analysis for functional predictions from QTL data need to incorporate as much a priori information as possible on the traits being studied, including the expected patterns of common-environment effects on traits, if the predictions are to be biologically useful.

Further studies should be conducted to test the pre-dictive power of SEM under scenarios such as the ones used here. Other studies have used large numbers of simulations to examine the power of network modeling approaches (Schadtet al.2005; Liu et al.2008; Neto et al. 2008), but have not explored the questions ad-dressed in this study. Issues with statistically indistin-guishable models incorporating common-environment vs.indirect effects need to be examined in more com-plex trait networks, where untestable, fully specified modules may be nested within larger models. Future research needs also include evaluating different pat-terns of direct genetic and common-environment ef-fects, developing methods for empirical statistical tests, and incorporating epistatic QTL effects.

I thank Ross Whetten and Elizabeth Lacey for assistance with year 8 data collection from the Tillman, South Carolina, loblolly pine genetic mapping population; Rongling Wu for first suggesting to me the use of path analysis approaches for functional prediction from QTL data; Elizabeth Lacey, Sergey Nuzhdin, and David Aylor for valuable discussions; and Malcolm Schug, Rebecca Doerge, and two anony-mous reviewers for helpful comments that improved the manuscript.

LITERATURE CITED

Akaike, H., 1974 A new look at the statistical identification model.

IEEE Trans. Automat. Contr.19:716–723.

Akaike, H., 1987 Factor analysis and AIC. Psychometrika52:317–332.

AmosDevelopment, 2006 Amos 7.0.0. Amos Development, Spring

House, PA.

Basten, C. J., B. S. Weirand Z.-B. Zeng, 1994 Zmap: a QTL

cartog-rapher. Fifth World Congress on Genetics Applied to Livestock Production: Computing Strategies and Software, edited by C. Smith, J. S. Gavora, B. Benkel, J. Chesnais, W. Fairfull

et al.Organizing Committee, 5th World Congress on Genetics Applied to Livestock Production, Guelph, ON, Canada, pp. 65–66.

Basten, C. J., B. S. Weirand Z.-B. Zeng, 2004 QTL Cartographer,

Version 1.17. North Carolina State University, Raleigh, NC.

Beavis, W. D., 1994 The power and deceit of QTL experiments:

les-sons from comparative QTL studies. Proceedings of the 49th An-nual Corn and Sorghum Industry Research Conference, Chicago, pp. 250–266.

Benfey, P. N., and T. Mitchell-Olds, 2008 From genotype to

phe-notype: systems biology meets natural variation. Science 320:

495–497.

Brem, R. B., G. Yvert, R. Clintonand L. Kruglyak, 2002 Genetic

dissection of transcriptional regulation in budding yeast. Science

296:752–755.

Brewer, M. T., J. B. Moyseenko, A. J. Monforteand E.van der

Knaap, 2007 Morphological variation in tomato: a

comprehen-sive study of quantitative trait loci controlling fruit shape and de-velopment. J. Exp. Bot.58:1339–1349.

Doss, S., E. E. Schadt, T. A. Drakeand A. J. Lusis, 2005 Cis-acting

expression quantitative trait loci in mice. Genome Res.15:681– 691.

Feder, M. E., and T. Mitchell-Olds, 2003 Evolutionary and

ecolog-ical functional genomics. Nat. Rev. Genet.4:651–657. Grace, J. B., and B. H. Pugesek, 1998 On the use of path analysis

and related procedures for the investigation of ecological prob-lems. Am. Nat.152:151–159.

Hall, M. C., C. J. Bastenand J. H. Willis, 2006 Pleiotropic

quan-titative trait loci contribute to population divergence in traits as-sociated with life-history variation inMimulus guttatus.Genetics

172:1829–1844.

Jiang, C.-J., and Z. B. Zeng, 1995 Multiple trait analysis of genetic

mapping for quantitative trait loci. Genetics140:1111–1127. Keurentjes, J. J. B., J. Fu, C. H. Ric deVos, A. Lommen, R. D. Hall

et al., 2006 The genetics of plant metabolism. Nat. Genet.38:

842–849.

Keurentjes, J. J. B., J. Fu, I. R. Terpstra, J. M. Garcia, G.van den

Ackevekenet al., 2007 Regulatory network construction in

Ara-bidopsisby using genome-wide gene expression quantitative trait loci. Proc. Natl. Acad. Sci. USA104:1708–1713.

Keurentjes, J. J. B., M. Koornneef and D. Vreugdenhil,

2008 Quantitative genetics in the age of omics. Curr. Opin. Plant Biol.11:123–128.

Kliebenstein, D. J., M. A. L. West, H.vanLeeuwen, O. Loudet,

R. W. Doergeet al., 2006 Identification of QTLs controlling

gene expression networks defineda priori.BMC Bioinformatics7:

308.

Koonin, E. V., and Y. I. Wolf, 2008 Evolutionary systems biology, pp.

11–25 inEvolutionary Genomics and Proteomics, edited by M. Pagel

and A. Pomiankowski. Sinauer Associates, Sunderland, MA.

Langlade, N. B., X. Feng, T. Dransfield, L. Copsey, A. I. Hanna

et al., 2005 Evolution through genetically controlled allometry space. Proc. Natl. Acad. Sci. USA102:10221–10226.

Li, R., S.-W. Tsaih, K. Shockley, I. M. Stylianou, J. Wergedalet al.,

2006 Structural model analysis of multiple quantitative traits. PLoS Genet.2:e114.

Lisec, J., R. C. Meyer, M. Steinfath, H. Redestig, M. Becheret al.,

2008 Identification of metabolic and biomass QTL in Arabidop-sis thalianain a parallel analysis of RIL and IL populations. Plant J.53:960–972.

Liu, B., A.de laFuenteand I. Hoeschele, 2008 Gene network

in-ference via structural equation modeling in genetical genomics experiments. Genetics178:1763–1776.

Lynch, M., and B. Walsh, 1998 Genetics and Analysis of Quantitative

Traits.Sinauer, Sunderland, MA.

Mackay, T. F. C., 2001 The genetic architecture of quantitative

traits. Annu. Rev. Genet.35:303–339.

Mackay, T. F. C., 2004 Genetic dissection of quantitative traits, pp. 51–

73 inThe Evolution of Population Biology, edited by R. S. Singhand M.

Y. Uyenoyama. Cambridge University Press, Cambridge, UK.

MacKay, J. J., D. M. O’Malley, T. Presnell, F. L. Booker, M. M.

Campbellet al., 1997 Inheritance, gene expression, and lignin

characterization in a mutant pine deficient in cinnamyl alcohol dehydrogenase. Proc. Natl. Acad. Sci. USA94:8255–8260. Mitchell, R. J., 1992 Testing evolutionary and ecological

hypothe-ses using path analysis and structural equation modeling. Funct. Ecol.6:123–129.

Mu¨ ndermann, L., Y. Erasmus, B. Lane, E. Coenand P. Prusinkiewicz,

(13)

Neto, E. C., C. T. Ferrara, A. D. Attie and B. S. Yandell,

2008 Inferring causal phenotype networks from segregating populations. Genetics179:1089–1100.

Niklas, K. J., 1995 Size-dependent allometry of tree height,

diame-ter and trunk-taper. Ann. Bot.75:217–227.

Ralph, J., J. J. MacKay, R. D. Hatfield, D. M. O’Malley, R. W.

Whettenet al., 1997 Abnormal lignin in a loblolly pine

mu-tant. Science277:235–239.

Rauh, B. L., C. Bastenand E. S. Buckler, 2002 Quantitative trait

loci analysis of growth response to varying nitrogen sources in

Arabiodpsis thaliana.Theor. Appl. Genet.104:743–750. Remington, D. L., and D. M. O’Malley, 2000a Evaluation of major

genetic loci contributing to inbreeding depression for survival and early growth in a selfed family of Pinus taeda. Evolution

54:1580–1589.

Remington, D. L., and D. M. O’Malley, 2000b Whole-genome

characterization of embryonic stage inbreeding depression in a selfed loblolly pine family. Genetics155:337–348.

Remington, D. L., R. W. Whetten, B.-H. Liuand D. M. O’Malley,

1999 Construction of an AFLP genetic map with nearly complete genome coverage inPinus taeda.Theor. Appl. Genet.98:1279– 1292.

SAS Institute, 2002 The SAS System for Windows, Version 9.1.Cary,

NC.

Schadt, E. E., J. Lamb, X. Yang, J. Zhu, S. Edwards et al.,

2005 An integrative genomics approach to infer causal asso-ciations between gene expression and disease. Nat. Genet.37:

710–717.

Stein, C. M., Y. Song, R. C. Elston, G. Jun, H. K. Tiwariet al.,

2003 Structural equation model-based genome scan for the metabolic syndrome. BMC Genet.4(Suppl. 1): S99.

Tonsor, S. J., and S. M. Scheiner, 2007 Plastic trait integration

across a CO2 gradient in Arabidopsis thaliana. Am. Nat. 169:

E119–E140.

Wright, S., 1921 Correlation and causation. J. Agric. Res.20:557–

585.

Zhu, J., P. Y. Lum, J. Lamb, D. GuhaThaktura, S. W. Edwardset al.,

2004 An integrative genomics approach to the reconstruction of gene networks in segregating populations. Cytogenet. Ge-nome Res.105:363–374.

Zhu, J., B. Zhang, E. N. Smith, B. Drees, R. B. Bremet al., 2008

In-tegrating large-scale functional genomic data to dissect the com-plexity of yeast regulatory networks. Nat. Genet.40:854–861.

Figure

TABLE 1
TABLE 2
TABLE 4
TABLE 6

References

Related documents

The figure below depicts the main elements of the search form in this mode and the corresponding search process behavior in the CQC Catalogue (quality reports not in the scope of

I am going to emphasize three of them; data about fraud losses, the need for competitive markets and the need for fairness.. The knowledge assumption points to the need

ink level, checking 17 inserting 17 replacing 17 storing 18, 42 testing 40 printer accessories 9, 42 cleaning 39 connecting 35 documentation 3 error messages 53 menu 10 parts 4

Economic impact assessments are the most commonly used form of measuring the economic benefits of cultural sector organisations, so a second case study that used this method has

What if the project team were able to provide a complete set of the business requirements previously captured, review them with the users who have the changes in

While the focus of this research was to study the role of training, incentives, and benefits received by part-time hotel employees on job commitment, it is possible that other

 Cohort paper: Data Resource Profile: The Aarhus Birth Cohort Biobank (ABC

Sustainable development for leadership is the development of leadership qualities that meet the economic, environmental, and social needs of the present while preparing to meet