• No results found

An Approximate Model of Polygenic Inheritance

N/A
N/A
Protected

Academic year: 2020

Share "An Approximate Model of Polygenic Inheritance"

Copied!
8
0
0

Loading.... (view fulltext now)

Full text

(1)

Copyright Q 1997 by the Genetics Society of America

An

Approximate Model of Polygenic Inheritance

Kenneth

Lange

Departments of Biostatistics and Mathematics, Universily of Michigan, Ann Arbor, Michigan 481 09-2029 Manuscript received November 25, 1996

Accepted for publication July 23, 1997

ABSTRACT

The finite polygenic model approximates polygenic inheritance by postulating that a quantitative trait is determined by n independent, additive loci. The 3" possible genotypes for each person in this model limit its applicability. CANNINCS, THOMPSON, and SKOLNICK suggested a simplified, nongenetic version of the model involving only 2 n

+

1 genotypes per person. This article shows that this hypergeo- metric polygenic model also approximates polygenic inheritance well. In particular, for noninbred pedigrees, trait means, variances, covariances, and marginal distributions match those of the ordinary finite polygenic model. Furthermore as n + m, the trait values within a pedigree collectively tend

toward multivariate normality. The implications of these results for likelihood evaluation under the polygenic threshold and mixed models of inheritance are discussed. Finally, a simple numerical example illustrates the application of the hypergeometric polygenic model to risk prediction under the polygenic threshold model.

T

HE mixed model of polygenic plus major gene in- heritance has proved to be a useful alternative to classical Mendelian models in the analysis of pedigree data (ELSTON and STEWART 1971; MORTON and MACLEAN 1974). Unfortunately, exact likelihood calcu- lation under the mixed model is virtually intractable except for nuclear families and small pedigrees. This impasse has prompted the development of approximate methods of likelihood evaluation in the mixed model. In two recent papers, ELSTON, FERNANDO, and STRICKER

(FERNANDO et al. 1994; STRICKER et al. 1995) have sug- gested a particularly interesting approach that appears to produce much more accurate results than the best previous approximations ( HMSTEDT 1982,1991 )

.

This new approach is computationally fast enough to permit linkage analysis under the mixed model.

ELSTON, FERNANDO, and STRICKER (FERNANDO et al.

1994; STRICKER et al. 1995) adopt as their point of de- parture the finite polygenic model suggested earlier by WINGS et al. ( 1978). The finite polygenic model approximates polygenic inheritance (FISHER 1918) by postulating that trait values are determined by a small number of biallelic loci that have equal and additive effects. If for convenience the two alleles at each con- tributing locus are termed positive and negative, then the finite polygenic model incorporates the symmetry assumption that all positive genes contribute +1 and all negative genes -1 to an individual's trait. Trait means are forced to equal 0 by taking positive and nega- tive alleles to be equally frequent in the surrounding population. An arbitrary trait variance can be achieved by scaling the positive and negative contributions by the same multiplicative constant.

Author email: [email protected]

Genetics 147: 1423-1430 (November, 1997)

As

just described, the finite polygenic model for n

loci involves 3" possible genotypes for each person. This represents a depressingly fast escalation in combinato- rial complexity that defeats likelihood calculation for n

sufficiently large to approximate normality well. How- ever, WINGS et al. (1978) note that the 2n

+

l possi- ble phenotypes of the model are determined solely by the 2n

+

1 possible counts for the number of positive genes at the participating loci. They then suggest treat- ing multilocus genotypes as equivalent if they involve the same number of positive genes. This radical simpli- fication, which we call the hypergeometric polygenic model, is inconsistent with Mendelian transmission at

n separate loci. Because of these objections, the finite and hypergeometric polygenic models have languished.

ELSTON, FERNANDO, and STRICKER (FERNANDO et al.

1994; STRICKER et al. 1995) now raise the possibility of reviving the finite polygenic model for computational purposes. They do so by altering its gamete transmission probabilities. Their tinkering with the finite polygenic model captures the essential features of the hypergeo- metric polygenic model. In this article, we explore some of the logical consequences of the hypergeometric poly- genic model and demonstrate mathematically that it provides an excellent approximation to polygenic in- heritance. Because of this good mimicry, the hypergeo- metric polygenic model is also apt to be extremely help ful in pedigree calculations involving the polygenic mixed model (MORTON and MACLEAN 1974) and the polygenic threshold model ( LANGE et al. 1976a). In this regard, the numerical results of ELSTON, FERNANDO,

and STRICKER (FERNANDO et al. 1994; STRICKER et al.

(2)

1424 K. Lange

MODEL DEFINITION

The hypergeometric polygenic model of CANNINCS et al. (1978) is distinguished by two features. One is the absence of loci; the other is transmission from parent to offspring by sampling without replacement. To avoid introducing a completely alien vocabulary in discussing the model, we will retain traditional genetic terms such as gene and genotype but invest them with slightly dif- ferent meanings. The transmitted agents or genes in the model are classified as either positive or negative. As noted above, positive genes contribute

+

1 and negative genes -1 to the quantitative trait x of a person. Each person has a genotype consisting of a set of exactly 2n genes and a trait value ranging from -2n to +2n. Be- cause genes come in only two varieties, we can identify a genotype g with the number of positive genes con- tained in it. In this sense, there are 2n

+

1 possible genotypes per person. The formulas x = 2 ( g

-

n) and

g = n

+

( x/ 2) convert a genotype g into a trait value x and vice versa.

Pedigree founders are created by independently sam- pling 2n genes from an infinite pool of equally likely positive and negative genes. Thus, the founder geno- type g occurs with the binomial probability

The genotype of each child in the pedigree is created by sampling without replacement n genes from the ge- notype of his mother and n genes from the genotype of his father. These two transmitted sets of genes or gametes are pooled to form the child's genotype. Sam- pling of parental genes is done independently for each gamete created. Sampling without replacement implies that gamete transmission probabilities are hypergeo- metric. In other words, the probability that a parent with genotype i contributes a gamete with j positive genes is

Because of independent formation of gametes, parents with genotypes i and j produce a child with genotype k

with probability

min(i,k)

7 i x p k =

x

7 i - + m r j + k - m ,

m=rnax(O,k-j)

as noted by CANNINCS et al. ( 1978).

This completes our description and reinterpretation of the hypergeometric polygenic model. We will dem- onstrate that it satisfies three crucial desiderata. First, in noninbred pedigrees it gives exactly the same covari- ance structure as the finite polygenic model and indeed the polygenic model itself. Second, in this setting it also entails the same marginal distribution of trait values as

the finite polygenic model. Third and finally, it shares with the finite polygenic model the desirable feature that the standardized trait values within a pedigree tend toward multivariate normality as the underlying num- ber of genes 2n tends to CQ (LANCE 1978; LANCE and

BOEHNKE 1983)

.

We will also consider a variant of the hypergeometric polygenic model that involves gamete formation by sam- pling with replacement. This variant is apt to give a decidedly inferior approximation to polygenic inheri- tance because it fails to capture correctly the covariance structure of a pedigree. This and other issues will be taken up after reviewing some elementary moment cal- culations relevant to sampling without replacement.

REVIEW OF SAMPLING WITHOUT REPLACEMENT

Suppose that a sample of n genes is taken without replacement from a random collection of 2n genes with numerical values W = ( Wl,

.

. .

, Wzn). The trait value associated with this gamete sample can then be ex- pressed as

2 n

Y = A i m ,

i= 1

where the Ai are correlated indicator random variables satisfying the conditions Pr (Ai = 1 ) =

'/*

and Ai

= n. From this indicator function representation ( COCHRAN 1977), we immediately deduce the condi- tional and unconditional means

2 n

1 2 n

E ( Y ( W) = E(A,)W, = - W ,

i= 1 2 :=1

1 2n

2 i-1

E ( Y ) =

-

E ( W , ) .

If E(W,) = 0, then E ( Y ) = 0 as well.

and that under sampling without replacement

Cov(Ai, Aj) = Pr(Ai = 1, Aj = 1)

To compute variances, note that each Var ( A i ) = 1/4

- Pr(Ai = 1) Pr(Aj = 1 )

for i f j . It follows that

=

[-+

1

4 4(2n- 1 ) i=l

(3)

A Model of Polygenic Inheritance 1425

Var(YI W)

1

] 2 n - 2

4 4 ( 2 n + 1) 4 ( 2 n - 1 )

(1

ZW,

)

and

Var(Y) = E[Var(YI W)]

+

Var[E(YI W)]

4 4 ( 2 n - 1) ] 2 n - 4 ( 2 n - 1 1) E[

(z

E[Var(YI W)] =

[:

-

+

4 ( 2 n - 1) ]2n

and

Var(Y) =

[:

- +

4 ( 2 n - 1) ]2n

4 )

+

["

4 4 ( 2 n - 1)

If, in addition, the random variables

w

are indepen- dent and have unit variances, then Var ( Y ) = n.

MEANS, VARIANCES, AND COVARIANCES

We now undertake computation of the means, vari-

ances, and covariances of the trait values within a pedi- gree. Assuming each person has 2n genes, let

X i

be the trait value of pedigree member i. When i is a pedigree founder, then E (

X i )

= 0 by virtue of binomial s a m - pling. When i has parents k and I in the pedigree, then we can decompose

X i

into a gamete contribution from k plus a gamete contribution from 1. In symbols,

X ,

= Y,,ei

+

Yn,bi. If we assume inductively that the means of the parental values X , and

X l

vanish, then

= E[E(Y,,bil -X,)]

+

E[E(Yn,,il X , ) ]

= E(Y2-G)

+

E(%&,) = 0.

Obvious iteration of this argument shows that all trait means within the pedigree vanish. The equality E

( X i )

= 0 also holds under the polygenic model, the finite polygenic model, and the sampling with replacement variant of the hypergeometric polygenic model.

We next turn to the calculation of variances and co- variances. If i is a founder, then Var (

Xj)

= 2n. If i has parents k and I , then in view of ( 4 )

When Var ( X , ) = Var

( X , )

= 2n and Cov ( X &

X l )

= 0, this reduces to

Var

(Xi)

=

[-+

1 1

2 2 ( 2 n - 1) ] 2 n + i ( 1 - & ) 2 n

= 2n.

If the parents k and I of i are unrelated, then X , and are independent and Cov( &,

X , )

= 0 holds. Next let us address covariances. If i is a founder and j is not a descendant of i, then just as with unrelated parents the trait pairs

X i

and

X i

are uncorrelated. If i

has

parents k and 1 and j is not a descendant of i, then standard calculations show that

COV (

Xi

9

X i )

=

'/2

COV(&,

X i )

+

72

COV(&~,

X i ) ,

( 6 ) based on the conditional independence of X& and X . j given the parental values

&

and

X,.

This same recur- rence relation holds under the polygenic model, the finite polygenic model, and the sampling with replace- ment variant of the hypergeometric polygenic model.

To speclfy the trait covariance matrix Ow = ( w , ~ ) for a pedigree of q people, we begin by numbering the people 1,

. . .

,

q in such a way that parents precede their children. Then mimicking the usual recursive pro- cedure for computing a kinship matrix ( LANCE 1976b), we fill in R,, starting in its upper left corner with person 1. The initial conditions for founders and the recur- rences ( 5 ) and ( 6 ) inductively enlarge the upper left block of 0, by adding the partial row and partial col- umn corresponding to the current person i. The pedi- gree numbering scheme insures that all previously vis- ited people j are not descendants of i.

(4)

1426 JL Lange

uniformly in a noninbred pedigree. In the presence of inbreeding, wnii

>

2n can occur. However, we always have the bounds 0 5 wnii 5 ( 2

+

q ) n on the entries of the q X q matrix

a,,.

Verification is left to the reader.

For a noninbred pedigree, the hypergeometric poly- genic model not only correctly captures means, vari- ances, and covariances, but it also implies that all peo- ple share the marginal trait distribution characterizing founders. This fact is more or less obvious because all genes contributing to any person i are unique and equally likely to derive from positive or negative genes drawn from the ancestral pool sampled in creating founders. Skeptical readers can verify analytically that the number of positive genes possessed by i follows the binomial distribution ( 1 ) by checking the reproductive property

for gene transmission under sampling without replace- ment.

When properly standardized, the marginal trait distri- bution (1) shared by noninbred people quickly a p proaches univariate normality. Not only does the mar- ginal distribution behave well, the joint distribution of trait values within a pedigree follows an approximate multivariate normal distribution. Because our treat- ment of this central limit theorem is necessarily lengthy, we defer a precise statement and proof to the APPENDIX.

It is interesting to contrast these encouraging results with the results under sampling with replacement. As noted above, the mean condition E (

X i )

= 0 and covar- iance recurrence ( 6 ) continue to hold. However, the variance recurrence now becomes

Var(Xl) +-Cov(&, 1

K,).

( 7 )

4 2

This recurrence can be verified noting that the gamete contribution Yn,ei can be expressed as 2Gn,ei - n ,

where GnTri is the number of positive genes sampled from

k.

It is clear that given

&,

the random variable

Gn,ej is binomially distributed with success probability

over n trials. Hence,

1 1

= E

(

n

- -

X', = n

-

- V a r ( X h ) .

4n

)

4n

Substituting this expression and the corresponding ex- pression for the quantity E [Var( Yn,+jl X,,,) ] into the expansion ( 5 ) yields the recurrence (

7 )

.

The recurrence relation ( 7 ) does not lead to stability of trait variances in a noninbred pedigree. Let c, be the trait variance of a person m generations removed from all pedigree founders. Then c, satisfies the recur- rence

with solution

c m = * + [ 5 ( 1 - : ) ] m 1

(G)-+)

.

1 + - 1 + -

n \ n /

The limiting value 4 n / ( 1

+ 1

/ n ) imposes an upper bound on the trait variances in a noninbred pedigree.

VARIANCES AND COVARIANCES

FOR INBRED PEDIGREES

When inbreeding occurs under the hypergeometric polygenic model, trait variances can diverge from those predicted under the finite polygenic model. According to known arguments involving kinship coefficients (JAG QUARD 1 9 7 4 ) , the correct recurrence relation for vari- ances under the finite polygenic model is

Var (

K i )

= 2 n

+ 7

'

Cov (&,

x i ) . ( 8 )

The two recurrences ( 5 ) and ( 8 ) differ whenever ei- ther Var (&)

>

2n or Var (

K l )

>

2n. For example, the hypergeometric polygenic implies a slightly higher variance for the child of an inbred parent than does the finite polygenic model. In general, one can prove by induction that the variances and covariances of the hypergeometric polygenic model always dominate their counterparts in the finite polygenic model.

To gain some feel for the differences involved be- tween the two models, it is instructive to consider a simple example. For convenience, let us first pass to standardized random variables ( 1 /&)

X i

with unit variances for founders. In the limit as n "* 01

,

the matrix R = ( w ~ ) of standardized variances and covariances satisfies under sampling without replacement the initial conditions

w..

= 1

w..

= 0

for a founder i and his nondescendant j and the recur- rence relations

v ( 9 )

wii = 7 2

+

74%

+

74WU

+

Y2WU

(5)

A Model of Polygenic Inheritance 1427

1

2

3

4

5

6

7

8

FIGURE 1.-An inbred pedigree.

genic model except for substitution of the variance re- currence wii = 1

+

1/2wM.

Now consider the eight-member pedigree depicted in Figure 1. The covariance matrix calculated under the polygenic initial conditions and recurrences is

7 2 '/2 7 2

7 2 7 2 7 2 '/2

74

74

1

74 74

"/4

74 74

74

7 4 7 4

78

78

7*

0 0 0

0 0 0 0

0 0 '/2 1

78

7 8

We derive exactly the same matrix R = ( w g ) from the initial conditions ( 9 ) and recurrences ( 10) except for the single slightly inflated entry wg8 = 17/ 16.

APPLICATION T O RISK PREDICTION

For a simple numerical application to the polygenic threshold model, consider the pedigree of Figure 2. In this pedigree, darkened individuals are afflicted by a hypothetical disease with a prevalence of 0.01 and a heritability of 0.5. We approximate the polygenic liabil- ity to disease of person i in the pedigree by the sum

where X, is determined by the hypergeometric poly- genic model with 2n polygenes, the Y , are independent, standard normal deviates, the additive genetic variance

I

[7

3 4 5 6 7 8 9 10

6

11

6

12

FIGURE 2.-Risk prediction under the polygenic threshold model.

uz = 0.5, and the random environmental variance a: = 0.5. Given that each

.&

follows an approximate stan- dard normal distribution, the liability threshold of 2.326 is determined by the prevalence condition 1

/&

s2m326

dz = 0.01.

Three potential children are represented by open diamonds ( 0 ) in Figure 2. Table 1 gives the risks that these unborn children will be afflicted with the disease under the hypergeometric polygenic model. The recur- rence risks recorded evidently stabilize at -24%, 8%,

and 5% as the number of polygenes 2n +

03.

Under

the alternative hypothesis of an autosomal dominant mode of disease inheritance, these risks are 1/2,

and 0, respectively. Evidently the calculated risks are strongly model dependent.

DISCUSSION

The stunning reduction in computational complexity in exchanging 2n

+

1 possible genotypes for 3 possible genotypes in the finite polygenic model should pay rich dividends in genetic epidemiology. Quick computing and good biological modeling are inseparable. If a model does not permit accurate likelihood evaluation, then by and large it is untestable.

As

geneticists tackle common diseases, the necessity of alternatives to classi- cal Mendelian inheritance becomes paramount. The mixed model is one well-posed alternative, even if rather naive.

The hypergeometric polygenic model developed

here should be a good approximation to the polygenic model. Although strictly speaking the hypergeometric polygenic model is nongenetic, there is no compelling theoretical evidence to suggest that it approximates the polygenic model less well than the finite polygenic model does. It is true that the hypergeometric polygenic model gives slightly inflated variances and covariances for inbred pedigrees, but the majority of applications involve noninbred data. Furthermore, likelihood evalu- ation is substantially more demanding for inbred pedi- grees than for noninbred pedigrees. Of course, it would be helpful to know the rate of convergence to multivari- ate normality of each model, but this would require a more refined analysis than the one undertaken in the

The finite polygenic model neatly sidesteps some of the computational problems associated with the poly-

(6)

1428 K. Lange

TABLE 1

Recurrence risks for the unborn children in Figure 2

Polygenes 2n Child 8 Child 11 Child 12

10 0.189 0.058 0.045

20 0.221 0.073 0.051

30 0.231 0.079 0.053

40 0.235 0.082 0.053

50 0.237 0.083 0.054

genic model. For a large pedigree, naive likelihood computation under the polygenic model is easily frus- trated by the task of inverting the trait covariance matrix

(LANCE 1976b). Fortunately, ELSTON and coworkers (ELSTON and STEWART 1971; ELSTON et al. 1992) have devised likelihood algorithms that avoid matrix inver- sion for the polygenic model under some forms of shared environment and no dominance component. Under the polygenic threshold model, even graver problems arise in evaluating complicated multivariate normal distribution functions ( MENDELL and ELSTON 1974; M G E et al. 1976; RICE et al. 1979). Finally, exact likelihood evaluation under the mixed model is virtu- ally impossible except for small pedigrees. The hyper- geometric polygenic model dramatically improves on the advantageous behavior of the finite polygenic ap- proximation to the polygenic threshold and mixed models.

Besides the obvious computational advantages for de- terministic likelihood evaluation, the hypergeometric polygenic model lends itself well to Markov chain Monte Carlo methods (LANCE and MATTHYSSE 1989;

LANGE and SOBEL 1991; THOMPSON and GUO 1991; SO-

BEL and LANGE 1993; THOMPSON 1994). One can easily construct a relevant Markov chain for a pedigree. A state of the chain specifies the number of positive genes carried by each person and the number of positive genes transmitted by each gamete. A transition between two states occurs by random resampling of founder ge- notypes or by random resampling of gametes. Resam- pling via the binomial distribution ( 1 ) and the hyper- geometric distribution ( 2 ) are natural in this context, but other proposal distributions such as the uniform also offer interesting possibilities. Radical rearrange- ments can be achieved by taking a random number of transitions per step of the chain (SOBEL and LANGE

1993). Imposing the usual Metropolis mechanism for accepting proposed steps guarantees that the chain has an equilibrium distribution consistent with the condi- tional distribution of states given observed phenotypes. One can even include this chain as part of more com- prehensive Markov chains involving the mixed model and linked markers.

Over time, the hypergeometric polygenic model promises to become a standard technique in the reper- toire of genetic epidemiologists. Before this happens,

good software needs to be developed and tested. EL

STON, FERNANDO, and STRICKER (FERNANDO et al. 1994;

STRICJSER et al. 1995) have made an excellent start. In spite of continuing dramatic gains in computer hard- ware, good algorithms are as relevant as ever.

I thank MICHAEL BOEHNKE, SUN-WEI Guo, and STEVEN MATTHYSSE for suggesting various improvements to the first draft of this manu- script. This research was supported in part by U.S. Public Health Service grant GM-53275.

LITERATURE CITED

BICKEL, P. J., and K. A. DOKSUM, 1977 Mathematical Statistics: Bmic

BILLINGSLFX, P., 1986 Probability and Memure. Wiley, New York. CANNINGS, C., E. A. THOMPSON and M. H. SKOLNICK, 1978 Probabil-

ity functions on complex pedigrees. Adv. Appl. Prob. 10: 26-61. COCHRAN, W. G., 1977 Sampling Techniques, Ed. 3. Wiley, New York. ELSTON, R. C., andJ. STEWART, 1971 A general model for the genetic

analysis of pedigree data. Hum. Hered. 21: 523-542.

ELSTON, R. C., V. T. GEORGE, and F. SEVERTSON, 1992 The Elston- Stewart algorithm for continuous genotypes and environmental factors. Hum. Hered. 4 2 16-27.

FERNANDO, R. L., C. STRICKER, and R. C . ELSTON, 1994 The finite polygenic mixed model: an alternative formulation for the mixed model of inheritance. Theoret. Appl. Genet. 88: 573-580. FISHER, R. A,, 1918 The correlation between relatives on the suppo-

sition of Mendelian inheritance. Trans. Roy. SOC. Edinb. 52: 399- 433.

HASSTEDT, S. J., 1982 A mixed model approximation for large pedi- grees. Comput. Biomed. Res. 1 5 295-307.

HASSTEDT, S. J., 1991 A variance components/major locus likeli- hood approximation on quantitative data. Genet. Epidemiol. 8:

JACQUARD, A., 1974 The Genetic Structure ofPopulutions. Springer, New York.

LANGE, K , 1978 Central limit theorems for pedigrees. J. Math. Biol.

6: 59-66.

LANGE, K., and M. BOEHNKE, 1983 Extensions to pedigree analysis.

lV. Covariance components models for multivariate traits. Am. J. Med. Genet. 1 4 513-524.

LANGE, K, and S. M A ~ S E , 1989 Simulation of pedigree g e n e types by random w a l k s . Am. J. Hum. Genet. 4 5 959-970. LANGE, K , and E. SOBEL, 1991 A random walk method for comput-

ing genetic location scores. Am. J. Hum. Genet. 4 9 1320-1334. LANGE, K., J. WESTLAKE and M. A. SPENCE, 1976a Extensions to pedi-

gree analysis. 11. Recurrence risk calculation under the polygenic threshold model. Hum. Hered. 2 6 337-348.

LANGE, K, J. WESTLAKE and M. A. SPENCE, 1976b Extensions to ped& gree analysis. 111. Variance components by the scoring method. Ann. Hum. Genet. 3 9 485-491.

MENDELL, N., and R. C. ELSTON, 1974 Multifactorial qualitative traits: genetic analysis and prediction of recurrence risks. Biomet-

MORTON, N. E., and C. J. MACLEAN, 1974 Analysis of family resem- blance. 111. Complex segregation analysis of quantitative traits.

Am. J. Hum. Genet. 2 6 489-503.

Idem and Selected Topics. Holden-Day, Oakland, C A .

113-125.

r i c ~ 3 0 41-57.

REM, A,, 1970 Probability Themy. North-Holland, Amsterdam. RICE, J., T. REICH and C. R. CLONINGER, 1979 An approximation to

the multivariate normal integral: its application to multifactorial qualitative traits. Biometrics 3 5 451-459.

SOBEL, E., and K. LANGE, 1993 Metropolis sampling in pedigree analysis. Stat. Methods Med. Res. 2: 263-282.

STRICKER, C., R. L. FERNANDO and R. C . E ~ o N , 1995 Linkage analy- sis with an alternative formulation for the mixed model of inheri- tance: the finite polygenic mixed model. Genetics 141: 1651-1656. THOMPSON, E. A., 1994 Monte Carlo likelihood in genetic mapping.

Stat. Sci. 9: 355-366.

THOMPSON, E. A., and SW. Guo, 1991 Evaluation of likelihood ra- tios for complex genetic models. I. M. A. J. Math. Appl. Biol. Med. 8: 149-169.

(7)

A Model of Polygenic Inheritance 1429

APPENDIX

MULTIVARIATE NORMALITY

To demonstrate multivariate normality, let X,, = ( X , , l ,

. . .

,

X,,,)

denote the trait vector for a pedigree with q

people. We now show that the standardized random vectors ( 1 /&) X,, tend in law to a multivariate normal distribution with mean vector 0 and covariance matrix

0 = ( w V ) defined by the initial conditions ( 9 ) and recurrences (10). According to the well-known Cram&-Wold device ( BILLINGSLEY 1986), it suffices to prove for every vector v that ( 1 /&) V ' X , tends in law

to a univariate normal distribution with mean 0 and variance V Q V .

We argue by induction on the number of people q.

The claim is certainly true for q = 1 owing to the classi- cal central limit theorem for independent, identically distributed random variables. Now suppose we add per- son q to an existing pedigree of q - 1 people, assuming as before that parents always precede their children in the pedigree. If q is a founder of the enlarged pedigree, then consider the decomposition

The terms (1/&) Zg: viKiand (l/&)v,&are independent and by the inductive hypothesis separately converge to univariate normal distributions with means 0 and variances Zj'Z,'

Zp:

viwvq and viwgg, respectively. But the sum of these two quadratic forms is the correct quadratic form associated with the block diagonal ma- trix R under independence. Since convergence in law preserves convolution of distribution functions and the convolution of normal distributions is normal, multivar- iate normality follows in this special case.

If q is the child of the existing members k and 2 of the pedigree, then consider the decomposition

where wi = vi when i f k or 1, Wk = vk

+

1/2vg, and w, = vl

+

%vq.

As

usual we decompose

X,,,

as X,,q = Y,,,,

+

Yn,trq,

where Y,,,, and Y,,,, are the gamete contribu- tions of k and 1 to q, respectively. '

The centered gamete contributions Yn,,g

-

( X,,k/ 2 )

and Y,,,,

-

( X,,J 2 ) are almost independent of (

X l ,

.

.

.

, X , , , g - . l ) . We can achieve independence by defin-

ing

2&,

and &,eg to be gamete contributions sampled independently without replacement from two separate gene pools containing exactly n positive genes and n negative genes each. With this in mind, we rewrite the decomposition ( 11 ) as

Our strategy now is to couple Y,,,,

-

( & / 2 ) and

&,hq so that

tends in probability to 0. Likewise coupling Yn,,q

-

(X,,,/

2 ) and we can invoke Slutsky's theorem

(BICKEL and DOKSUM 1977) and deduce that

( 1 /&) ut& has in the limit a distribution that is the convolution of the limit distributions of the three inde- pendent terms ( 1 /&)

XS:

wi

X i ,

(v,/&)%,,,, and ( V ~ / & ) Z , , , , ~ . By the induction hypothesis, the

sum ( 1 /&)

B

?Z: w i X,,i tends in law to a univariate normal distribution with mean 0 and variance

q-1 9 - 1 9-1 q-1 9- 1

w i w q w j =

C

V i w # q

+

vq w k j v j i = l

j = 1 i = l

j = 1 j = l

9 - 1

+

vg w g v j

+

v', ( 7 4 W M

+

y4wu

+

7 * W M ) . j = l

According to Bernstein's central limit theorem for the hypergeometric distribution (REM 1970),

( 1

/I&)&,,,

and ( 1 converge in law to normal distributions with means 0 and variances 1 / 4 .

Hence, the sum

converges in law to a univariate normal distribution with mean 0 and variance

q-1 9 - 1 9-1 9- 1

X

v i w i j v j

+

vg w k j v j

+

vg WgVj i=l j = l

j = l j = 1

+

( 7 2

+

Y4WM

+

7 4 w u

+

7 2 w k l ) *

In view of the variance and covariance recurrences ( 10) applied to i = q, the total variance reduces to the qua- dratic form uQv =

X?,,

X$=,

v i w v v j .

This completes the proof except for coupling Yn,bg

-

( & / 2 ) and By definition, person k

has

n

+

1/2& positive genes and n

-

1 / 2 & negative genes. If

we imagine ordering these 2n genes so that the positive genes come before the negative genes, then we can write

2 n

Yn,k+g =

X

A W ,

i= 1

(8)

1430 K. Lange

l/z& s i s 2n, and Ai is a random variable indicating whether the i t h gene is transmitted to q or not.

As

noted earlier, we have E (Ai ) = Var (Ai ) = and Cov(Ai,Aj) = -(1/4(2n- l)).Therandomvariable

Z,,,bq can be represented as

I n

Z z . b q =

X

A i

ui

9

i= 1

w h e r e U , = l i f l s z s n a n d q = - l i f n + l s i s

2n. By construction,

Pn

r.

( W

-

u , )

=&,

x

( W

-

= 21&1. i= 1

2 n

i=l

Conditional on

&,

the random difference

has mean

and variance

Var(Dn,wql &) =

-

1

x

2 n ( W ,

-

G I 2

4 i=l

1 1

4 4 ( 2 n - 1)

12'&'

- 4 ( 2 n - 1 )

X L .

Hence, the unconditional mean and variance of

( 1 /&) Dn,eq are 0 and

Because Cauchy's inequality implies

and

1 1

" E ( X 2 )

= -Var(&)

2n 2n

is bounded, it follows from ( 1 2 ) that

1

Chebychev's inequality finally gives the bound

Figure

FIGURE 8 1.-An inbred  pedigree.
TABLE 1

References

Related documents

More specifically, we aimed to test a model of reciprocal relations between three classroom achievement emotions (enjoyment, boredom, and anxiety) and test performance in a sample

Vaccination with Brucella recombinant DnaK and SurA proteins induces protection against Brucella abortus infection in BALB/c mice. Cloeckaert A, Bowden

Passed time until complete analysis result was obtained with regard to 4 separate isolation and identification methods which are discussed under this study is as

In WNDF method choosing the square root of the spectrum as transfer function of a filter and the input to this filter a white noise, a random irregular wave was generated.. The

RT-qPCR analysis demon- strated that gene expression of MMP3 and MMP9 was increased following IL-1 β stimulation ( p < 0.001, Fig. 1a ), and the induction was especially

commercial agreement between the parties, there may be restrictions on distributions to the Limited Partners or the General Partner (other than certain limited exceptions) while

To make the process easier, here is a checklist to help you determine all of your direct deposits and automatic payments in order to redirect them to your new Farmers Bank

Looking into figure 02 knowledge management practices dimensions it is revealed knowledge creation among the employees was high compared to the other dimensions