Estimating polygenic models for multivariate data on large pedigrees.

(1)

Estimating Polygenic Models for Multivariate Data on Large Pedigrees

E.

A. Thompson*

and R.

G. Shawt

*Department of Statistics, University ofwashington, Seattle, Washington 981 15, and TDepartment ofBotany and Plant Sciences, University of Cal$ornia, Riverside, Calqornia 92521

Manuscript received July 15, 199 1 Accepted for publication April 28, 1992

ABSTRACT

We have developed algorithms for the likelihood estimation of additive genetic models for quantitative traits on large pedigrees. The approach uses the expectation L-maximization (EM) algorithm, but avoids intensive computation. In this paper, we focus on extensions of previous work to the case of multivariate data. We exemplify the approach by analyses of bivariate data on a four- generation, 949-member pedigree of the snail Lymnaea elodes, and on a three-generation pedigree of the guppy Poecilia reticulata containing about 400 individuals.

I

N genetic studies of the inheritance of continuous traits, measures of multiple traits are often available on each member of a set of related individuals. It is then of interest to determine genetic and environmental components of covariance between traits, in addition to the genetic and environmental components of variance of each trait. The genetic covariance between traits reflects their shared genetic basis, due either to segregation of pleiotropic alleles or to linkage disequilibrium. The environmental covariance between traits quantifies the degree to which they are jointly influenced by environmental variation. Mag- nitudes of both types of covariance component, and their associated correlations, can suggest mechanistic bases for phenotypic associations between traits. Ge- netic covariances, moreover, are involved in the connected goals of predicting the joint response of multiple characters to selection on all or any subset of the characters (FALCONER 1989; LANDE 19’19) and of deriving selection indices used to maximize genetic improvement of multiple traits simultaneously (FAL-

CONER 1989). Over several decades, these goals have

motivated massive efforts toward estimation of quantitative genetic parameters in populations of domesti- cated plants and animals. Moreover, many recent studies of natural populations similarly have parti- tioned phenotypic variances and covariances with the objective of predicting response to natural selection.

The statistical problem of estimation of variance and covariance components for random effects in a mixed linear model has a long history (SEARLE 197 1). However, a great deal of attention has recently fo- cused on methods based on the principle of maximum likelihood (HARTLEY and RAO 1967; PATTERSON and THOMPSON 197 1). Extensions of likelihood methods

for the multivariate case were developed by THOMP-

SON (1973) for a two-way classification. ~CHAEFFER, WILTON and THOMPSON (1 978) extended these methods to infer the genetic components of variance and

Genetics 131: 971-978 (August, 1992)

covariance of yearling weights in male and female cattle.

General treatments of likelihood analysis in the multivariate case have been presented by LANCE and BOEHNKE (1983) and SHAW (1987). The computational burden of repeated operations on large matrices involved in iterative solution has, however, discour- aged application of these general methods. Others, taking account of particular aspects of commonly used experimental designs, have introduced specialized algorithms that simplify the computations and render likelihood analysis feasible for very large data structures. For example, MEYER (1985) provides an algo- rithm using FISHER’S method of scoring to obtain restricted maximum likelihood estimates (REMLEs) from data in a two-way classification, employing a transformation to canonical scale so that the multivariate analysis can be carried out on the new variates singly. GIANOLA (1986), considering data in the same two-way design, presents an EM algorithm for REMLEs of variances and covariances when these may differ among levels of the fixed factor. Previous developments in covariance estimation especially in the context of animal breeding were reviewed by HEN-

DERSON (1 986a), who urged application of these methods in multiple-trait evaluations (HENDERSON 1988).

(2)

mains computationally demanding if data are taken in a nonstandard design; for example, an extended pedigree with some intervening relatives unavailable for study. Some authors have therefore proposed alternative estimators and advocated them over likelihood methods on the basis of computational ease. For example, SRIVASTAVA, KEEN and KATAPA (1988) proposed one alternative method and illustrated it using bivariate data on 92 families of first-degree relatives. They note the similarity between their estimates and estimates based on likelihood and remark on the vast difference in computational effort for the two methods.

We recently presented details of an EM algorithm that minimizes matrix manipulation in obtaining MLEs for variance components of a single trait taken o n an arbitrary pedigree (THOMPSON and SHAW

1990). GUO and THOMPSON (1 991) have developed a Monte Carlo EM approach allowing extension to more complex genetic models with multiple fixed and random heritable effects. In this paper, we retain a genetic model with only additive genetic random effects, but extend the development of THOMPSON and SHAW (1990) to accommodate estimation of variance and covariance components for joint analysis of q characters, showing that this requires only repeated inversion of q by q matrices. The approach has been implemented for the case of two characters; in this case, formulae for the inverses and eigenvalues of the 2 by 2 matrices involved are of course explicitly available.

THEORY AND METHODS

The univariate case; basic equations: THOMPSON

and SHAW (1 990) show how the simple additive polygenic model may be fitted to a quantitative trait observed on members of a large and complex pedigree, without the need for inversion of large matrices. For clarity, before developing the extension to a set of traits which may be genetically and environmentally correlated, we review first some of the EM equations from the univariate case.

The model for the vector of

k

observed values of the trait, y, is

y = p + z + e (1)

where p is a vector of fixed effects, which may include effects of any observable covariates (for example, sex and/or generation effects) as well as an overall mean, e is the vector of environmental residual effects and

is N(O,T'I), z is the vector of additive genetic values

for the observed individuals and is N(0,u2G) where G

is the "numerator relationship matrix" for the

k

observed individuals. (The elements of G are twice the coefficients of kinship: CROW and KIMURA 1970). The variance of y is thus V = u2G

+

7'1.

The form of the EM equations for this model is well known (HARTLEY and RAO 1967; HENDERSON 1986b);

for simplicity they are given here for the case p = p l .

The natural sufficient statistics for the "complete data" (y,z) are k"l'(y

-

z), k"z'G"z and k"e'e whose unconditioned expectations are p , u2 and T', respec-

tively. Thus the iterative equations are formed by setting new values for these parameters (denoted here by *) equal to the conditional expectations of the statistics taken at current parameter values:

p* = k-'Efl,uz,,z(l'(y

-

z)

1

y) = k"l'(y

-

a),

1

= k - 1 ~ , , u ~ , , 2 ( ~ L ~ - 1 ~

1

y)

= k"(a'G"a

+

tr(G"W))

T*' = k"E,,,z.,z(e'e

I

y) = k"(h%

+

tr(W))

where

a = Er,uz,r2(~

I

y) = T - ~ W X ,

h = Efl,r2,,z(e

I

y) = x

-

a, x = y

-

p1

(3)

and

These alternative equivalent representations of the conditional covariance matrix W result from the fact that we have only two random effects, whose conditional variances are thus equal. From these two forms, we see that the trace terms in Equation 2 are readily known if the eigenvalues of V" are known, and the key to the approach that avoids direct computation of inverses and traces of large matrices results from the fact that these eigenvalues are simple functions of the eigenvalues of

G.

Finally, a and h = x

-

a are computed by iterative updating of the local conditioning expectations of the breeding value z1 of each individual 1 in turn, given the parameter values, the data yl (if 1 is observed) and the current estimates of the expected breeding values of all pedigree neighbors of 1 (THOMPSON and SHAW 1990). A useful form for the value of the log-likelihood is (THOMPSON and SHAW 1990)

IogJVI

+---

x'x a'a a'G-'a]. ( 5 )

T 2 7 U 2

Additionally

x = a

+

h, a = T - ~ W X , (6)

so a = 7-'u2Gh.

This enables a'G"a to be computed as ~ - ~ u ~ h ' G h , so

that G" is nowhere required in the implementation of the EM equations or log-likelihood evaluation.

Breeding values of unobserved individuals: In this

(3)

exponent in the joint normal probability density of x

(for the observed individuals, "on) and z: =

(zi,d)

for any larger set possibly including unobserved individuals ("u") is

-

zo)

;

where G+ is the numerator relationship matrix for the set of individuals in the vector z. So if

the vector elements corresponding to the observed individuals satisfy the equation

5j+[$(:)]

= - X

0 72

and for the unobserved individuals

[F

(31

ld = 0.

Since x = a,

+

h,,

(7)

becomes

[F

(;)I0

=

5.

Combining (9) and (8) we have

In the case where no observed individuals are included in z, rearrangement gives Equation 6. More generally,

1 1

-

G;'a =

-

h+

U2 7 2

where h+ is h, augmented by zeros.

The case of multivariate traits; basic equations:

For a set of observations y on a multivariate trait, let

yi. be the subvector of observations for trait i, so that (1) becomes

yi. gi.

+

R .

+

ei. or Xi. = yi.

-

Pi.

(1 1) = zi.

+

ei. for

i

= 1,

- .

.,

q

where zi. is distributed N(O,uiiG) and

e.

is N(O,T~~I).

Further let the genetic and environmental covariances between traits i and j be u, and 7, respectively. We

assume the k individuals observed are the same for all traits (see DISCUSSION). Thus the total vector

y'

=

(yl I ,

. . .

, yl,,

. . .

,yql,

. . .

,yq,) is normally distributed with mean M~ = (bi.,

. . .

,pi.) and qk X qk variance covariance matrix

V = A @ G + T @ I

where

A = (aq) and T = (7,)

are the q X q symmetric matrices of additive and residual variance component parameters. Many of the EM equations go through directly; the natural sufficient statistics corresponding to the variance parameters in A and T for the "complete-data" model (with z observed) are

a$.G"zj., ei.ej., for

i,

j = 1,

-,

q ,

i,

j .

Thus we require the conditional expectations of these statistics, given the data y. The key ingredient is the conditional variance of z given y. This is (see Equation 4)

W = ((T @ I)-'

+

(A @ G)-')-'

= (A @ G)V"(T 63 I) (12)

= (T @ I)(I

-

V"(T @ I)),

so that

W" = (A @ G)-'

+

(T €9 I)-'

= A" @ G"

+

T" @ I.

Further, with regard to the eigenvalues and eigenvectors of this variance matrix, which determine the trace terms required for the EM algorithm, suppose that g

is an eigenvector of G and a is an eigenvector of A.

Then a @ g is an eigenvector of the Kronecker product A @ G, with eigenvalue equal to the product of the two corresponding eigenvalues. Further, let V, =

UUG

+

~~1 be the block of V corresponding to the pair of traits i and j , and V" the corresponding block of the inverse V". (Of course, V" is not the inverse of

Vg). Now if g is an eigenvector of G with correspond-

ing eigenvalue X, it is an eigenvector of V, with eigenvalue

s.. = Xu..

+

7 . .

and thus also of each subblock of G"V. Moreover (see APPENDIX) g is also an eigenvector of each block

VJ

of V" with eigenvalue s,, where the matrix (s,) is

the inverse of the q X q matrix SA = (s,).

Thus we have, from the eigenvalues of G, the eigenvalues of all the matrices appearing in the trace terms of the EM equations; this will enable the EM algorithm to be implemented without inversion of more than the q X q matrices SA. Note, however, that there are k eigenvectors g of G, each with a corresponding eigenvalue X and matrix SA. Moreover, these matrices change as iteration changes the parameter estimates (u,) and (7,). Although this is ameliorated

to some degree by the fact that, for many pedigrees, large numbers of the eigenvalues of G may be equal, computational feasibility will be limited to fairly small values of q .

The case q = 2 is very readily implemented, for in this case explicit formulae are available. For the relationship between

Vj

and V, we have

'I Y r l

(4)

with analogous formulas for VZ1 and VZ2. Correspond- ingly, we have

S l l = (s11

-

s&/s22),-1 = -322s - 1 1 1 s12

= -sI1s "1 22 s12, and sZ2 = (s22

-

s?2/sl1)-'.

where

S i 1 = ~ I I X

+

7 1 1 , S i 2 = a i d

+

7 1 2 ,

and s22 = a22X

+

~ 2 ~ .

T h e decomposition of eigenvalues via the matrices SA may be compared with the canonical transformations approach of MEYER (1 985); these transformations likewise depend on current variance component estimates. In that case also there are explicit formulae in the case q =

2

(JUGA and THOMPSON 1990).

EM equations for residual variances: For the EM

equations for the elements of T we have (see Equation 2)

TB

= k"(E(e,(.

I

y)) = k"((x

-

a):.(x

-

a)j. (13)

+

tr(Wq)) = k"(hf.hj.

+

tr(W,)).

T h e trace term is most readily computed, as in the univariate case, by using W = (T €3 I)(I

-

V"(T €3 I) so that

4 4

W.. _'I= 7,1

-

_{~ i t ~ j , ~ ' ~ .}

i = l m = l

Thus the eigenvalues of W, are given by

9 4

79

-

T , ~ T ~ ~ s ' ~

I = l m = l

- 1

or the

(id)

element of the matrix (T

-

TSA T). Each eigenvalue X of G gives one matrix SA = XA

+

T, and hence one eigenvalue for each of the W,.

Also, ai. and hence hi. = x,.

-

ai. (i = 1,

. . .

,

q ) can be computed iteratively. Analogously to the univariate case, the q-vector of breeding values for each individual in turn is updated to its conditional expectation given the trait values for the individual (if observed) and the q-vectors of breeding values for neighboring pedigree members. T h e q X q matrices A" and T"

replace the univariate components a-2 and T-' in the

equation of THOMPSON and SHAW (1 990). T h e multivariate equations are tedious; genetic and environmental covariances mean that the computation of conditional means cannot be done separately for each trait. In the case q = 2, explicit formulae are available and the iterative evaluation can be implemented readily; as in the univariate case convergence of breeding values is rapid. More generally, only the inverses of q

x q matrices are involved. Thus the EM equations (1 3) can be implemented, provided q is not large.

EM for genetic variance components: In the EM

equation for a,, the relevant natural sufficient statistic

is zi.G"zj. whose unconditional expectation is ka,. T h e

EM Equation

2

becomes

a,* = ~"E(z~.G"z,.

I

y) = k"(ai,G"aj.

(14)

+

tr(G-'W,)) i, j = 1,

- -

e , q , i s j .

where a = E(z

I

y) and, since, using (1 2) above

4 4

G"W.. 9 = flil7jmVlrn

I = l m = l

the eigenvalues of G determine those of G"W,, as

4 4

C C

~ i ~ j r n ~ ' ~

I=lrn=l

- 1

or as the (i, j ) element of the matrix AS1 T. Deter- mination of the conditional means ai. was already described above. Thus, if G" were known, we could implement the EM equations (14). However, these EM equations involve the quadratic forms a,(.G"aj..

Although there are analogues to the univariate Equa- tion 6 that avoid direct use of G-', these are not computationally appealing. Moreover, since this G is the numerator relationship matrix for the observed individuals G" is not readily available. Further, neither G nor G" is sparse. It is highly desirable to avoid storing either matrix. T h e following result shows that use of these matrices can be avoided.

A useful result: As above, let G be the numerator

relationship matrix for the observed individuals, G+

be the matrix for any larger set, and Go be G augmented by rows and columns of zeros to make it of the dimension of G+. Let a,. be the expected breeding values for the ith trait for the observed individuals (as in Equation 14), and ai+) for the larger set of individuals corresponding to the individuals in G+. Let h = x

-

a be the conditional expected individual residuals for the observed individuals and h(+) the vector h

augmented by zeros. Now the multivariate analogue of Equation 10 is

(A-I €3 G;')a(+) = (T-' €3 I)h(+).

That is,

a(+) = (AT" €3 G+)h(+) = (AT" €3 Go)h(+)

the last equality deriving from the zeros in h(+). Thus

(I €3 G;l)a(+) = (AT" €3 I)h(+) = (I @ G;')a(+).

Hence,

a$+)'G;'aj+) = $)'Go lay)

or, noting the zeros

a$+)'G;'aj+) = 4.G-'aj, (15)

(5)

(16) can use any chosen set of individuals provided it includes at least the observed individuals; it need not be the same set as is used in computing the trace terms. If G" is more readily available for a larger set, this can be used; HENDERSON (1976) gives an algorithm for the determination of the inverse of the numerator relationship matrix for the full pedigree. Moreover, in the case of a pedigree without inbreed- ing, this algorithm requires neither recursion nor iteration, contributions to the components of this inverse matrix being immediately determined from the neighborhood structure (parents, spouses, and offspring) of each individual in turn. Additionally the matrix is sparse; only its nonzero term and their readily determined locations need be held by the program. Thus all the EM equations for the residual and additive variance parameters can be implemented, while those for the means (fixed effects) are analogous to the single-trait case.

Note that this convenient result does not affect the EM procedure in any way; it is simply an algebraic identity within the existing EM equations (1 4). In the trace terms of our EM equations we consider eigenvalues of matrices corresponding to only the observed individuals. It is the breeding values considered "missing" that determine the exact matrices involved in the trace terms, and thence the precise EM iteration. If desired, additional unobserved individuals can also be included in

W

and

V,

as is done in the univariate case by some authors (HENDERSON, 1986; GIANOLA 1988).

However, generally there seems to be no advantage in augmenting the set of individuals in this way; the proportion of "missing" data should be kept as small as possible.

Log-likelihood evaluation: The form of the log- likelihood for multivariate data that is analogous to the univariate form (5) is

-1/2[10g

I

V

I

+

x'(T €3 I)-'x

-

a'(T" €3 I)a

(16)

-

a'(A" €3 G"))a].

Here the vectors and matrices again correspond to the observed individuals, but again the result (15)

enables the quadratic forms involving G" to be eval- uated knowing only the nonzero terms of the sparse inverse of the numerator relationship matrix for all the individuals in the pedigree.

Thus, to use (1 6), we now require the eigenvalues of

V

= (A €3

G)

+

(T €3 I). Again suppose that g is an eigenvector of

G

with eigenvalue X. Let @ be any q

vector. Then

V(@

@ g) = ((XA

+

T)@) @

g

= (Sd) €3 g. If @ @ g is to be an eigenvector of

V

with eigenvalue

q , we therefore require

( 7 4 @ g = ?I(@ €3 g) =

V(@@

g) = ( S d ) €3 g. That is, q@ = S d , or q is an eigenvalue of SA. Each

of the

k

eigenvalues X of

G

provides q eigenvalues q

of the corresponding matrix SA, and the total set of

kq

numbers is the total set of eigenvalues of

V,

and the sum of the logs of these is the required log-determi- nant of

V

in (16). Log-likelihood evaluation is an important aspect of an EM implementation, since the progress of the log-likelihood often provides better information about convergence than does the progress of individual parameter values. Moreover, it is differences in log-likelihood that provide the shape of the likelihood surface and test statistics. However, likelihood evaluation need not be performed every EM step.

RESULTS

We have implemented the above methods for the case of bivariate data, independently in C on a

DEC3100 Unix Workstation (E.A.T.), and in Pascal

on a MicroVAXII and Sun4 (R.G.S.). We illustrate their feasibility and computational efficiency using data for life-history traits on the large pedigrees of

two experimental populations. We present here the numerical results, but do not discuss their biological interpretation or evolutionary implications; that is rightfully left to the colleagues who generously provided their unpublished data.

Example 1: Data were obtained from M. LYNCH (University of Oregon) on life-history traits on 1 123

non-inbred members of the species Lymnaea elodes. A subset of 949 individuals formed the connected four- generation pedigree used in this example. Age and size at first reproduction were the two traits chosen for this analysis; they were both log-transformed to improve Normality. Of the full pedigree, 936 individuals were measured for both these traits. Since this snail is hermaphroditic and individuals were used both as sires and as dams, no sex effects were modeled. However, exploratory analysis revealed an increasing trend in age at first reproduction over the four generations; a fixed generation effect was therefore included for this trait. Under this model the transformed data were analyzed with our programs. Sev- eral runs from different starting parameters all indicated little or no genetic variance for either trait. After 200 iterations the genetic variances were less than and constituted less than 5% of the random effects variance. Therefore, the likelihood for the case of independent observations (zero genetic variances) was also computed, verifying that the MLE was

(711 = u12 = u22 = 0

and

711 = 0.0107, 7 1 2 = 0.0016, 7 2 2 = 0.0136 with grand means 3.90 and 3.74 for the two traits, and generation effects

(6)

for the first trait. T h e log-likelihood at the boundary MLE was 2.7 greater than at the nearby 200-iteration value. Thus this analysis, which is supported by an independent ML analysis using the Fisher scoring algorithm, provides strong evidence of zero genetic effects; 936 observed individuals are quite informa- tive.

On a DEC-3100 workstation, determination of the eigenvalues of the 936 by 936 relationship matrix, G , for the observed individuals took 15 min, but each subsequent set of 200 EM iterations took less than 1 min. Since the eigenvalue determination is required once only for any given set of observed individuals, regardless of how many traits, transformations, and fixed-effects models are then analyzed, the speed of the EM process is the key feature of computational efficiency. Each EM-run did include determination of the 2526 nonzero upper-triangular terms of the sparse symmetric 949 by 949 G;’; this also could be done once only and then used in multiple runs, but in fact it takes only a few seconds.

Determination of the eigenvalues of G is a substan- tial component of the total computing time. THOMP-

SON and SHAW (1 990; Section 4) proposed a method

of first seeking the “immediate” eigenvalues and then determining the remainder from a reduced matrix. However, the reduced matrix of THOMPSON and SHAW (1990) is not symmetric. Efficient determination of eigenvalues for a nonsymmetric n X n matrix, which is not sparse, requires computations of order 4n3; in contrast, the eigensolution of symmetric k X k is of order 2k3/3 (PRESS et al. 1989). Thus computing time would be reduced only if nlk < 6””, or if about half the eigenvalues were immediate. We are indebted to ROBIN THOMPSON (personal communication) who points out that a symmetric reduced G matrix with the same eigenvalues is similarly obtainable. Thus although, in the case of the snails data, only 217 of the k = 936 eigenvalues are “immediate,” leaving a reduced matrix of dimension n =

7

19, use of the symmetric reduced matrix would reduce the time for determination of eigenvalues by a factor (719/936)’, or by about half.

Times taken for the EM part of the analysis depend

o n how frequently it is decided to evaluate the log-

likelihood,.how stringent the convergence criteria for the breeding values within each EM iterations, and how many EM iterations are performed. Log-likelihood values are useful in monitoring EM convergence, but are not required at every step. It is important that posterior mean breeding values are suffi- ciently accurate that the EM procedure is not invalidated. However, after the first few EM steps, breeding values change little as parameter values change, and often only one iteration of breeding values is required to meet even a stringent convergence criterion.

Example 2: Data on life history traits of guppies

Poecilia reticulata were obtained from D. REZNICK (University of California, Riverside). Observations were available on males and females of the final two generations of two unconnected extended three-generation pedigrees. T w o of the female traits measured were age and weight at first parturition. T h e males were measured for the corresponding characters of age and weight at sexual maturity, but since the male traits differed both in distribution and heritability from their female counterparts (D. REZNICK, personal communication) the male and female traits were analyzed separately. T h e pedigrees for the analysis of the female traits contained 269 individuals of whom 165 females were observed. T h e corresponding numbers for the analysis of the traits observed in males were 273 and 171.

T h e traits were log-transformed to improve normality, and models with no fixed effects (other than grand means) were fitted. For each analysis, 200 EM iterations were performed. On a DEC3100 workstation, the total time taken was about 70 sec, including the eigenvalue determination. With the Pascal program run on a Sun4, the total time taken for each analysis was about 15 min. In the analysis of the data on females, parameter and log-likelihood convergence were essentially complete within 50 iterations. The maximum likelihood estimates of the grand means for the two traits were 4.50 and 5.17, for the logs of age and weight at first parturition, respectively. T h e estimates of the (co)variance components were

a: = 0.0027, u I 2 = 0.0045, and ui: = 0.0091 with

7 1 1 = 0.0191, 7 1 2 = 0.0254, and 7 2 2 = 0.0447.

Thus the heritabilities for the two traits are estimated as 12% and 17%. The estimated genetic correlation between the two traits is 0.91.

For the corresponding male traits the estimates of the grand means for the two traits were 3.91 and 4.33 for the logs of age and weight at sexual maturity, respectively. T h e estimates of the (co)variance components were

u: = 0.0258, uI2 = 0.0157, and ui: = 0.0149, with

7: = 0.0064, 712 = 0.0017, and 7% = 0.0060.

Thus the heritabilities for the two male traits are estimated as 80% and 71%. The estimated genetic correlation is 0.80.

Determination of the eigenvalues of the matrix G

(7)

977

64 immediate eigenvalues, with 79 remaining. (As noted above, there are 17 1 observed males in all.) For

the 165 observed females, the corresponding numbers were 15, 15,62 and 73, respectively. Given the simple structure and (relatively) large sibship sizes, almost 50% of the eigenvalues are immediate, even though the first generation ancestors are unobserved. Use of the reduced symmetric G matrix, proposed by ROBIN THOMPSON (see above), reduces the total time taken for eigenvalue determination by a factor of about 6. On larger pedigrees, this level of reduction can be an important factor in insuring computational practical- ity.

The male analysis demonstrated the sensitivity of EM performance to starting values of the parameters (see also GROENEVELD and KOVAC 1990). In some runs, a grand mean was started at a value outside the range of the data, necessitating a very high estimate of total variance. In this case, 200 EM iterations took the process to a point at which the residual variance component for this trait was near zero, the additive variance remaining rather large. Maxima of a con- strained model in which some variance components are set to zero are stationary points of the EM equations for the full model. It is well known that the EM process can remain close to such boundary points for a very large number of iterations. Close examination revealed that convergence had not been achieved; eventually, the residual variance estimate begins to increase again. Moreover, analyses using raw means as starting values for the fixed effects converged directly to the interior MLE above. The log-likelihood at this true MLE is more than 50 greater than at the boundary solution. This illustrates the necessity both for caution in choice of starting values and for check- ing convergence to the true MLE by using several alternative starting values (GROENEVALD and KOVAC

1990).

DISCUSSION

Because genetic covariances between traits can greatly influence theirjoint response to selection (FAL-

CONER 1989), multivariate analysis of traits is crucial (HENDERSON 1988). Our previous algorithm (THOMP-

SON and SHAW 1990), although extremely computa-

tionally economical, was limited to single traits. The extension to the case of multivariate traits greatly expands its potential usefulness to researchers involved in applications in plant and animal breeding, in conservation genetics, and in other studies of natural populations. Although our computing implementation is only bivariate, pairwise analysis of multiple traits provides estimates of genetic covariances and hence a first approximation to prediction of joint response to selection.

The basic equations involved in the estimation for the multivariate case are, in most cases, direct gener-

alizations of the univariate equations, and are no more computationally intensive. The recognition that the sub-blocks of the total variance matrix,

V,

share eigenvectors leads to the explicit determination of eigenvalues, and permits this multivariate extension. The only complication in this extension is that the previous approach to the determination of a‘G”a,

using the G for the restricted set of only the observed individuals, has no computationally simple multivariate extension. Instead, the result (15) which requires only the explicitly determined G;’ for all individuals completely circumvents this barrier. Use of G;’ rather than G in the evaluation of quadratic forms is also computationally economical, since the former matrix is sparse and can be stored very compactly. We em- phasize again that the “missing values” of the EM derivation (and the trace terms of the EM equations) still relate only to the observed individuals. Thus the eigenvalue determination required (once only) is for a possibly much smaller

k

X

k

matrix. The result (1 5),

or its usefulness in greatly enhancing computational economy, seems not to have been previously recog- nized.

The equations of this paper require that the same individuals are observed for all the traits which are jointly analyzed. While for a bivariate analysis this

may not be a severe restriction, when multiple traits are analyzed there are likely to be individuals observed for some but not all traits. For more complex models

or data structures (such as multiple incomplete multivariate observations), the simplest solution is to use a Monte Carlo EM approach (Guo and THOMPSON

1991). That approach provides a natural framework for handling missing observations. Its disadvantage is its “random” nature, but the Monte Carlo variation can be controlled. Moreover, it is possible to limit the Monte Carlo to obtaining conditional expectations that are not otherwise obtainable, and to use the deterministic formulas of this paper for those parts of the pedigree or model where they remain applicable.

Our examples illustrate that analysis of quantitative genetic data on large pedigrees should no longer be considered computationally prohibitive, even on small workstations. In contrast to practice in plant and animal breeding, the multigeneration pedigrees of

experimental populations are often unrecorded. Ex- tensive pedigrees contain much information on the inheritance of quantitative traits. With the availability of new algorithms, pedigree and trait information on large natural or experimental populations becomes valuable. The programs of either author are available to interested researchers; the Pascal versions (R.G.S.) are being developed to facilitate their wider use.

(8)

of a symmetric reduction of the matrix G , and two referees for

their careful reading and expert and useful comments. This re- search was supported in part by National Science Foundation (NSF) grant BSR-8817756 (R.G.S.) and U.S. Department of Agriculture contract 88-37151-3958 and NSF grant BSR-8921839 (E.A.T.). Computing support to R.G.S. was provided by Pioneer Hi-Bred International. Travel to complete the work was made possible by a grant to R.G.S. from the Academic Senate of the University of California, Riverside.

LITERATURE CITED

CROW, J. F., and M. KIMURA, 1970 A n Introduction to Population Genetics Theory. Harper & Row, New York.

FALCONER, D. S., 1989 Introduction to Quantitative Genetics, Ed. 3.

Longman Group, London.

GIANOLA, D., 1986 On selection criteria and estimation of parameters when the variance is heterogeneous. Theor. Appl. Genet.

GROENEVELD, E., and M. KOVAC, 1990 A note on multiple solu-

tions in multivariate restricted maximum likelihood covariance component estimation. J. Dairy Sci. 73: 2221-2229.

Guo, S.-W., and E. A. THOMPSON, 1991 Monte Carlo estimation of variance component models. IMA J. Math. Appl. Med. Biol.

HARTLEY, H. O., and J. N. K . RAO, 1967 Maximum likelihood estimation for the mixed analysis of variance model. Biometrics

HENDERSON, C. R., 1976 A simple method for computing the inverse of a numerator relationship matrix used in prediction of breeding values. Biometrics 32: 69-83.

HENDERSON, C. R., 1986a Recent developments in variance and covariance estimation. J. Anim. Sci. 63: 208-216.

HENDERSON, C. R., 1986b Estimation of variances in animal model and reduced animal model for single traits and single records.

J. Dairy Sci. 6 9 1394-1402.

HENDERSON, C. R., 1988 Progress in statistical methods applied to quantitative genetics since 1976, pp. 85-90 in Proceedings of the Second International Conference on Quantitative Genetics, ed- ited by B. S. WEIR, E. J. EISEN, M. M. GOODMAN and G. NAMKOONG. Sinaur, Sunderland, Mass.

JUGA, J., and R. THOMPSON, 1990 Estimation of bivariate variance components, in Proceedings of the 4th World Congress on Genetics Applied to Livestock Production, Edinburgh (23-27

LANDE, R., 1979 Quantitative genetic analysis of multivariate evolution, applied to brainhdysize allometry. Evolution 33:

LANCE, K., and M. BOEHNKE, 1983 Extensions to pedigree analysis. IV. Covariance components models for multivariate traits. Am. J. Hum. Genet. 14: 513-524.

MACKINNON, M. J., K. MEYER and D. J. S. HETZEL, 1991 Genetic variation and covariation for growth, parasite resistance and heat intolerance in tropical cattle. Livest. Prod. Sci. 27: 105-

122.

MEYER, K., 1985 Maximum likelihood estimation of variance components for a multivariate mixed model with equal design matrices. Biometrics 41: 153-165.

MEYER, K., 1991 Estimating variances and covariances for multivariate animal models by restricted maximum likelihood. Ge- net.Sel. Evol. 23: 67-83.

MEYER, K., K . HAMMOND, M. J. MACKINNON and P. F. PARNELL,

1991 Estimates of covariance between growth and reproduction in Australian beef cattle. J. Anim. Sci. 6 9 3533-3543.

PATTERSON, H. D., and R. THOMPSON, 1971 Recovery of inter- block information when block sizes are unequal. Biometrika

72: 671-677.

8: 171-189.

54: 93-108.

July).

402-416.

58: 545-554.

PRESS, W. H., B. P. FLANNERY, S. A. TEUKOUKY and W. T . VETTERLING, 1989 Numerical Recipes in Pascal. Cambridge University Press, Cambridge.

SCHAEFFER, L. R., J. W. WILTON and R. THOMPSON, 1978

Simultaneous of variance and covariance components from multitrait mixed model equations. Biometrics 3 4 199-208.

SEARLE, S. R., 1971 Linear Models. Wiley, New York.

SHAW, R. G., 1987 Maximum likelihood approaches applied to quantitative genetics of natural populations. Evolution 41:

SRIVASTAVA, M. S., K. J. KEEN and R. S. KATAPA, 1988 Estimation of interclass and intraclass correlations in multivariate familial data. Biometrics 44: 141-150.

THOMPSON, E. A., and R. G. SHAW, 1990 Pedigree analysis for quantitative traits: variance components without matrix inversion. Biometrics 46: 399-4 13.

THOMPSON, R., 1973 The estimation of variance and covariance components with an application when records are subject to culling. Biometrics 2 9 527-550.

8 12-826.

Communicating editor: T. F. C. MACKAY

APPENDIX

As in the text, we have the total kq X kq variance matrix V which consists of q 2

k

X

k

blocks V,. The inverse V-' consists of corresponding blocks Vv; of course, V" is not the inverse of V,.

Further, for any given eigenvector g of G with eigenvalue X, sv = Xuv

+

is an eigenvalue of V,. The elements (s,) form the q X q matrix SA; (s,) are the elements of Sh'. Our objective is to show that sg is an eigenvalue of Vi, corresponding to the same original eigenvector g. The matrices V, Vu, V" and SA are all symmetric.

Now, from the notation definition

Z V ' V ~ = ~ , I

i , j = 1 , - - - , q

where 6, = 1 if i = j and 0 otherwise, and I is the

k

X

k

identity matrix. Thus

1

VfVljg = 6,g i , j = 1 ,

.

9 4

I

or

sljVdg = 6,g i, j = 1 , 9 4.

T h u s , f o r m = 1 , .

. .

, q , i = 1 , .

.

. , q I

smjsljvilg = smjavg = smig.

1 1 j

Since SA is symmetric, and from the definition of (smj), the left side reduces to

61,V"g =

v"g.

I

Or (appealing again to symmetry) we have

\ p m g = S i m g

and, as was required, the elements of SA are indeed the eigenvalues of the blocks of V-I.