Simultaneous Procedures in Discriminant Analysis Involving Two Groups

(1)

Simultaneous Procedures in Discriminant Analysis Involving Two Groups

Ronald J. McKay

School of Mathematical and Physical Sciences Murdoch University

Murdoch, Western Australia 6153

A simrdtaneous test, procedure is proposed for redrIcing the number of variables required to discriminate between two multivariate normal poprdatiorls. This procedure forms part of a wider simrdtaneolls procedure, for detailed descriptive analysis of group differences, which isolates subsets of variables providing significant discrimination as well as sltbsets of variables providing essentially as milch grollp separation as the original set of variables.

KEY WORDS

Discriminant Analysis Simultaneous Test Procedure Irrelevant Variables

Union-Intersection Principle Group Separation

1. INTRoDUCN~N

A nlumber of procedures have been proposed for reducing the number of variables required to discriminate between (separate) two multivariate normal populations. Possibly the best known is Roy’s [13] “stepwise 8”’ procedure in which Rae’s [la] test for “additional information” is used to examine variables successively for their contribution to discrimination (Bock and Haggard [5]). The distribution theory on which the repeated F-tests are based assumes that the variables are examined in a predetermined order of relative importance. In this case the Type I family error rate for the sequence of tests can bc determined. In practice, however, the order of relative importance is often decided from the data (Hall [lo]). Then, t’he true significance levels of the tests involved and the Type I family error rate are unknowi. An alternative approach is needed for t’he case when variables cannot be prc- arranged in a specified order.

In this paper a procedure is developed to find all subsets of variables whose discriminatory capabil- itits arc not significantly poorer than those of the set of variables original1.v considered. The Type I

family error rate can be cont’rolled and the significance level of each test performed can be determined. The procedure is based on Rae’s test for

Bel:eived November 1’374; revised August 1973.

addit’ional information and is a simultaneous test procrdure (ST!?) in Gabriel’s [9] sense.

Gabriel [S] has given an STP which can be used to find all subsets of variables whose corresponding mran vectors are not the same in each populat’ion.

Gabriel’s procedure is closely connected to t’he STP proposed in this paper, and the two procedures can be combined in a wider STP having meaningful applications.

2. THE HYPOTHESIS FOR VARIABLE REDUCTION Suppose that the vector x, consists of p characteristics, z1 , . . . , mp , measurable on the in- dividuals of each of two populations. s = (1, . . . , pj is a set’ of subscripts and the elements in a subvector xI of x, can bc identified by the subscripts in the subset f of s. Let, x,,~ consist of the measurements of thcsc characteristics on the jth individual in a sample of size n, from the ith population, and suppose that x,,; - N,(p,i , I;,,), i = 1, 2; j =

1, ... ,n,. It is assumed that I;,, is positive definite and that v = n, + n2 - 2 1 p.

The hypothesis that, the variables in x, do not, discriminate betlvecn populations is

w, : 6, = 0

wlicrc 6, = t’s1 - ps2 . A statistic for testing this hypothesis is Hotclling’s (Anderson [3])

T,” = ,<‘d,‘S,-‘d,

where K’ = n,n~‘(n, + nz), d, = x,, - ZL, ,

SE.? = 2 2 (x,;i - z,,)(x,;i - Z8,)‘/V, ,=I ,=,

in which

2,; = 5 x,;;/n, , i = 1,2.

,=I

47

(2)

48 RONALD J. MCKAY

The quantity ((v - p + l)/pv]T,’ follows the nonwntral F-distribution with degrees of freedom p and ‘,’ - p + 1 and non-centrality parameter

7, = K26*‘z,,-‘6, 2

This measure of departure from nullity of W, is essentially a measure of the separat’ion of the two populations; T~‘/K’ is known as the squared Mahala- nobis distance betwcn the populations.

Consider the arbitrary partition X3 ’ = (x,‘x,‘)

where I, I contain p - r, T elements respectively (1 5 T < p). Correspondingly, let 6,’ = (S,‘Su’) x,, = x If x,0 . V8 ’ = 88’xss-1 = (v,. 8’vy. ,‘) : x ^Of x0 I r\‘ote that the subvectors of the vector v, of discriminant function coefficients are not in general the vectors v, = 8,,-‘6, , up = Z09-16, .

It is not’ difficult to show t’hat the variables comprising x, separate the populations to the same extent as do all p variables if and only if v,,, = 0.

Otherwise, the separation provided by t’hc p-set is greater than that provided by the (p - r)-subset.

For (McKay [ll])

7, a = 7, 2 + K2Vg’.a&g.&‘g.~ ,

whcrc 7,’ = K’&,z~,-’ 6, is the appropriate measure of the extent to which the variables in x, discriminate between populations, and

It follows that

with equality if and only if vy.s = 0.

For discrimination purposes the variables in x, may be considered irrelevant, in the presence

of

the vaulables in xI , if and only if the corresponding discriminant function coefficients are zero. In Rae’s terminology, x, is said to supply no additional information about departures from nullity of W, , independently of x1 , if v~.~ = 0. Rao has given a test of the hypothesis

H g. s : vg.s = 0

for a specified set of subscripts 8. However, in practice it is not generally known which variables are to be considered possibly irrelevant, so that it will be nccwsary to test simultaneously all hypoth- em H,., specified by varying 61 over all nonempty proper subsets of s.

TECHNOMETRICSO, VOL. 18, NO. 1, FEBRUARY 1976

3. SIMULTANEOUS TEST PROCEDURE

A suitable simultaneous procedure can be con- structed using the union-intersection (UI) principle of S. N. Roy [ 141 and Rao’s “additional information”

statistic. To conform with previous partitions let d,’ = (d,‘dU’)

v ’ = d,‘S,,-’ s = (v,.s’v,,,‘) Rao’s test is based on the statistic

6’ - p + 1)K2Vo.s’f&,~V~.a

=- 2 I

v + T, - K V,., %,.sVo,,

since, as for corresponding population quantities, T,= = Tf” + K’V,,,‘S gg.s (1.8 . v

T,” = K’d,‘S,,-Id, is Hotelling’s statistic for testing w, : 6, = 0,

and S,,., = S,, - SB,S,,-lSrB . Note that Sg’n_8-1 consists of those elements common to the rows and columns of the matrix Ss8-l specified by the sub- script’s in I/. Under H,., , 4,. s/r is distributed as central F with degrees of freedom r and v - p + 1.

In deriving the desired STP it is convenient to view the hypothesis H,., as a partial intersection of atomic hypotheses

H(a) : a’v, = 0

specified by varying a over all non-null p-vectors.

In particular, with a’ = (a,‘a,‘) to conform with the partition of x, , H,., can be expressed as

n (H(a)).

*1=0 a

The intersection of all hypotheses H(a), namely

is the hypothesis v, = ~~~~~~~ = 0 which is clearly equivalent to w, . The required STP will follow from a procedure for simultaneously testing all H(a), and this approach will facilitate comparisons with Gabriel’s [8] STP.

A statistic for testing H(a), for some specified a, is obtained by considering the transformation x. + P’x, where P is a p X p orthogonal matrix whose last column is proportional to a. H(a) is equivalent to t,he hypothesis that the last element in the vector of discriminant function coefficients

(P’L;,,P))‘P’S, = P’B,,-‘&

(3)

is equal to zero. The appropriate test statistic is 444 = (V - p + l)Kza’V,V,‘a

(V + T,‘)a’S,,‘a - K*a’v,v, I a which is distributed as central F on 1 and v - p + 1

degreles of freedom under the null hypothesis.

A simultaneous level y test (that is, a test having probability y of incorrectly rejecting at least one hypothesis) is obtained by not rejecting when

+(a) < G,,’

where c, .py is the upper lOOr% point of the null distribution of the UI statistic

for testing w, It is straightforward to verify that + y--P+1

.3 - T,’

V

(see Appendix for details). Thus, the simultaneous level ‘y acceptance region is

h-4 < PF~,~-~+~~.

The statistic c$~ is included among the +(a) as is the UI statistic for testing H,., , namely,

max 444 = 9,..

*,=o a

(Appendix and McKay [ll]). This establishes an STP of level y (Gabriel [9]) which consists of rejecting w, if

9. > PF~.~--Y+,~, and of rejecting any H,,, for which

+,.a > pFp.v--y+lY,

In applying this procedure the hypothesis W, is tested initially, the more familiar rejection region being

T 2 > VP&~+?.

8 v-p+1

If w, is not rejected then the variables in x, provide no (significant) separation of groups and the analysis is concluded. On the other hand, if w, is rejected then all hypothese H,., are of interest. If any H,,, is not rejected then the variables in x, can be con- sidewd irrelevant for discrimination purposes, and the separation of groups provided by t’he variables in x, is not significantly less at simultantous level 7 than that provided by all p variables. The acccpt- ante region for any H,. s may be written

T,2 > (V - P + l)Ta’ - PFp,v--p+t’

v - P - 1 + pFz,,v--p+lY ’

and the set of variables comprising any x, ,

f

C s, for which this condition holds will be called an adequate (7) discriminant set.

An important property of the procedure is that of cohel,ence. Since Hotelling’s statistic based on a given set of variables must be at least as great as Hotelling’s statistic based on a subset of the given set, all subsets of variables including adequate (7) discriminant sets must themselves be adequate (y) discriminant sets, and if a subset of variables is not an adequate (y) discriminant set then it cannot contain any adcquatc (-y) discriminant sets. The property of coherence of STI’s in general is discussed at length by Gabriel [9].

It is a simple mat’tcr to dctcrminc the significance level a of the test of any H,,, in the STP. For any specified y, (Y must satisfy

TF,,,-,+~~ = ~Fp.v-n+,~

which may be solved for a by interpolation in F-tables or in tables of the incomplete beta function.

Alternatively, it may be desirable to specify a for some T and then y can be determined accordingly.

The question of the choice of y is discussed in the ANOVA context by Gabriel [7].

4. AN EXAMPLE

Batcn and Dwitt [4] have used four characteristics of coal, x1 , x2 , x3 and x4 representing, respectively, B.T.U. per pound, percent of volatile material, percent of fixed carbon and percent of ash, to discriminate between coals from two mines.

For these data s = (1,2,3,4], p = 4, n, = n2 = 100 and Hotelling’s statistic TS2 = 266.31.

The hypothesis that, the four characteristics do not discriminate bet’ween coals is rejected at level y = 0.025 since

T 2 > (19WF,,,,,0’025

8 195 = 11.49.

Further, the set of variables comprising any x, ,

f

C s, for which

T 2 > (195)266.31 - (198)4F,,,,,0’025

f 195 + 4F4,195”.025 = 240.84 is an adequate (0.025) discriminant set,. Any such set provides essentially the same discrimination between coals as the original set of four variables.

All Tf2 values are listed below, those satisfying the last condition being marked with asterisks.

f wh31 w,41 u,3,41 w,41 u,w u,31 Ii,41 T,z 212.1G 242.653 207.28 250.51* 198.00 204.41 92.32

f i2,3) i2,41 (3,41 (11 121 (31 (41

Tf2 210.98 225.54 173.96 61.81 197.09 127.47 0.02 TECHNOMETRICSO, VOL. 18, NO. 1, FEBRUARY 1976

(4)

50 RONALD J. MCKAY

Since the procedure is coherent it is usually not necessary to compute all statistics T,‘. Moreover, the results of the analysis can be summarised by listing only minimal adequate sets, these being adequate discriminant sets which contain no further adequate discriminant sets. (Any subset of variables containing a minimal adequate discriminant set is an adequate discriminant set.) In the present example it is ncccssary to compute only the italicized values of T,‘, and the minimal adequate (0.025) discriminant sets are {x1 , x2 , x,) and {x2 , x3 , 2,).

In this case thcsc are the only adequate (0.025) discriminant sets. These results indicate t’hat the variable x1 may be considered irrelevant in the presence of x2 , :r3 , xq , and that x3 may be considered irrelevant in the presence of x1 , xa , Z~ ; either x1 or x3 may be eliminated. However, since (x2 , x,) is not adequate, x1 and x3 are not irrelevant when considcrcd jointly, and elimination of both would result in significant loss of discrimination.

In the above analysis an STP of conventional level 0.02;5 was applied for demonstration purposes.

This means that an I.-subset of the variables x1 , x2 , x3, x4 was tested for relevance at significance level (Y, given by

1F r.lc4.5 cL = 4F4,,Yj0.025 = 11.32.

It is readily established that o( has the approximate (conservative) values 0.0008, 0.005 and 0.01 for 1‘ = 1, 2, and 3, respectively. Less conservative levels would correspond to an increased simultaneous level, and in practice ST!& of unconventional levels may have to bc contemplated (Gabriel [7]).

5. A WIDER STP

An STl’ given by Gabriel [8] can be used in the present set-up to test all hypotheses w, , specified by varying { over all nonempty subsets of s. The procedure consists of rejecting any wI , f c s, at simultaneous level y if

The analysis achieved by this STP is clearly different from that achieved by the STP proposed above.

Rejection of an w/ means simply that the variables in X, discriminate significantly between groups at simultaneous level y.

Since Gabriel’s procedure is coherent not all statistics 7’,’ need be computed to determine all rejected

llypotllesc~s w, . Further, the analysis can be sum- marked by listing only those subsets of variables corresponding to minimal rejected hypotheses; a minimal rcjectcd hypothesis in the family of hypotheses involved in Gabriel’s STP is one which is rejected and SUCK that all hypotheses in the family implying

TECHNOMETRICS~, VOL. 18, NO. 1, FEBRUARY 1976

it are rejected, while all hypotheses in the family implied by it are not rejected.

For the Batcn and Dewitt example, the simultaneous size 0.025 rejection region for Gabriel’s procedure is

T 2 > (198)4F,.,,,0’“2” _

I 195 11.49.

Subsek of variables corresponding to minimal rejected hypotheses are (x1}, {x,), (x3). That is, any subset’ of variables including one or more of the variables x1 , x2 , x3 provides significant discrimination between coals at simultaneous level 0.025.

So far two STPs concerned with analysis of the two group set-up have been described. It is immediately apparent that there is a connection between the two procedures. Firstly, both procedures involve the same overall hypothesis w, . Secondly, if both procedures are of the same level y then they involve the same percentage point Fp,v--p+lY of the F-distribution. In fact, the hypotheses of the two STPs can be viewed as partial intersections of the same atomic hypotheses. To construct Gabriel’s procedure using the UI principle in the manner described in section 3 it is appropriate to consider initially all hypotheses

w(l) : 1’6, = 0,

specified by varying the elements of the non-null p-vector 1, whose intersection,

Q Ml)),

is t’hc hypothesis w, . Clearly, the hypothesis a’u, =O, considered in the construction of the STP for variable reduction, can be rewritten 1’6, = 0 where 1 = Xc,,-%, It follows immediately that the two y-level STPs can be combined in a wider STP, also of level y, which includes all tests for variable reduction and all tests for finding those subsets of variables which provide significant discrimination between groups.

For example, the simultaneous level for the family consisting of all tests carried out previously on the Baten and Dewitt data is 0.025.

While each of the STPs malting up the wider STP is coherent’, there are cert’ain implication relations among the hypotheses of the two families which may not he preserved by the wider procedure. It is clear that a subset of variables may provide significant discrimination between groups at simultaneous level y and yet not be an adequate (7) discriminant set. This simply reflects the different purposes of the two STPs, and is apparent in the analysis of the Baten and Dewitt data. However, it is also possible that an adequate (7) discriminant set may not provide significant discrimination at simultaneous level y. Such a result is not as alarming

(5)

as it may appear at first glance if hypotheses are viewed as being “not rejectrd” rather than (‘a~- cepted”. It simply means that wither the additional information supplied by the omitted variables nor the d,iscrimination provided by the retained vnri- ables reaches the extent specified by the experi- menter through his choice of the level y of the procedurr. One way to treat this situation is to require that a subset of variables provide significant discrimination at simultaneous level y as a prc- requisite to being called an adequate (7) discriminant set. Dissonances of this type are discussed at length for general STI’s by Gabriel [R] and for the present set-up by 1IcKay [II].

6. UTILIZING PRIOR INFORMATION

When there is information available suggesting that a particular subset of the original variables is of particular importance in describing group separation, a modification of the STP of section 3 may be applied. In this situation only those subsets of varia,bles including the specified important subset need to be examined for adequacy.

Suppose that ody the variables comprising xh , where h(Cs) contains 73’ elements (1 5 Y’ < p), are to ho examined for their relevance, the remaining p - 1.’ variables being considered important for discrimination purposes. Here, it is necessary to test simultaneously all hypotheses H,. R , I/ C 11, and this means that a subset of variables x, , s - It C f C s, will IIOW bc called an adequate (y) discriminant set if

T 2 > (v - P + IIT,’ - vfFv,,v-D+,‘.

Q

v - p + 1 + l.‘F,,,,-,+lY

In. the Baten and Dewitt example, if the variable x2 is considered a particularly important discrim- iminator then only the subsets of variables specified by t hc following sets of subscripts j need be examined for adequacy: (21, {1,2), (2,3), (2,4\, (1,2,3),

(1,2,4], { 2,3,4}. Any such subset for which

?’ * > (195)266.31 - (198)3F3,,‘J90.1’25

J 195 + 3FR.J(151).“2~ = 244.45 is an adcquatc (0.025) discriminant set. The increase in the lowr bound defining adequacy reflects the utilization of prior information, and a slightly dif- fcrcnt conclusion is rcachcd; there is only one adequate (0.025) discriminant~ set, namely, {x2 ) J& ) x4 } .

In situations of this kind Gabriel’s STP is not applicable sinw it is tacitly assumed that t,hc spccificd important subset of discriminators provides some scparntion of populations, implying that this will also br true of all subs&s of variables considcrcd for adequacy.

7. Discussion

The proccdurr prrscntrd in this paper provides a useful dcscriptivc tool in discriminant analysis. It indicates how variables perform individually and togrthcr in describing group separation, and comple- ments the procedure proposed by Gabriel for detailed analysis of group differences. Thcsc two procedures can be carried out concurrently and the family type I error rate (simultaneous level of significance) can be comrollcd.

An important point is that IN) one adequate discriminant set is isolated as bring LLbest” in some sensr. I’rom a statistical viewpoint it can only be said that each adequate discriminant set provides cssrntially as much information about group separation as is provided by the original set of variables.

The rxperimenter wishing to “cconomise” on variables in further studies of the two populat,ions must use subjective judgement in choosing a suitable adequate discriminant set. Those containing the least’ number of variables need not be the only contrnders for selection. Aitkin [2] has given a simultaneous procedure for choosing variable subsets in multiple regression which is analogous to thr procedure presented in this paper. His remarks (for example, on the difficulty of finding a “best”

subset, and on the choice of the simultaneous Ievrl y) are relevant in the prcscnt context.

Variable reduction in two group discriminant analysis is oftrn sought’ in an effort to construct an economical sample discriminant function to be used for classification purposes. The usual aim is to eliminate variables while maintaining low probabilities of misclassification. Examples of some suggested approaches to this problem have been cited by Weiner and Dunn [15], who have also compared somr of the bcttcr known techniques (including

“stepwise-8”‘) in a sampling study (the usefulness of which has been questioned by Aitkin [l]). Each of these procedures involves a search for a “best”

sample discriminant function, but it is doubtful that such a function can be isolated on purrly statistical grounds.

It is not unreasonable to expect that a reduction in the number of variablrs in the sample discriminant) function may sometimes lead to a decrease in the probabilities of misclassification, since a reduction in the variability of the discriminant function may override a slight decrcasc in group separation. The point is clearly demonstrated by the results of a study performed by Dunn [6]. Ideally, a simul- tancous procedure for variable reduction is rcquircd to indicate subsets of variables whose corresponding misclassilication probnbilitirs are not significantly larger than those of the original set of variables.

(6)

52 RONALD J. MCKAY

A suitable prowdurc has not yet bwn dcriwd.

Howww, such a procrdurc could be cxpcctc~d to be of a similar form to the proccdurc proposed in this paper in that, it would set a lower bound on values of Hotclling’s statistic. This lower bound could bc cxpcctcd to bc different to some cxtcnt from the lower bound of the procedure given in section 3 for a spccifkd simultnncous lcvcl. One of the subs&s of variables whose corresponding statistic excccdcd thcl loww bound could then bc sclcctcd subjectil~ely for the classification problem. Until such a procedure is available it’ is suggested that an adequate discriminant set derived from the procedure of this paper can be sclcctcd for classification purposes, and that, thr probabilities of misclassification n-ill not be vastly differrnt from those arising when all of the original variables arc employed. The recent dcvclop- mcnt by Aitltin [a] of a procedure for choosing variable subsets in multiple regression on the basis of the mean square error of prediction, and its similarit,y to his procedure for variable reduction in nultiple rrgrcssion which is analogous to the discriminant, analysis procedure described hcrc, would appear to support these remarks. Dunn’s study also supports this contc>ntion to some extent, and may be useful in sclccting a suitable adequate discriminant set to be used in the construction of the sample discriminant, function.

It is fitting to conclude this discussion with a commrnt offcrcd by an associate editor and relevant to much of the literature on variable selection in disrriminant (and regression) analysis. The associate editor has pointed out that, in practical applications the sample from which a discriminant’ function is computed will not bc pcrfcctly representative of the population to which the discriminant function will be applied. Assessment of the performance of a discrirninant function on the sample from which it is computed may bc misleading; performance may detcrioratc \vllc>n the discriminant function is applied to further samples from the population. This applies to discriminant functions based on subsets of variablrs as wrll as to that based on the rntirc set.

In particular, a subsrt of variables which yields lowst misclassification probabilitiw in the original samplr may not do so in further samplrs.

S. AC~~N~~~LEIM~EJIENTS

I wish to rxprcss my appreciation to Dr. Murray Aitkin for his crprrt guidnncr and assistanc~c~ irl the

rcwarch 011 \vhich this paper is bawd. I am also gratrful to two refrrws and an nssociatr editor for

thrir helpful commrnts.

9. APPENDIX

The folio\\-ing ~~11 ltno\v11 (Rae [l?]) results arc

important in establishing that 4. , +,., are UI statistics for testing the hypotheses w, , H,., respectively:

(i) If S is a nonsingular p X p matrix, and u, v are p-vectors, then

(S + uv’)-1 = s-1 - g$&

(ii) If S is a positive definite p X p matrix, and u, v are p-vectors, then

I I

max!Es! = $s-Iv.

u u’su NOW,

max 4(a) = (V - p + 1)~’

. max ^a

1

i’C(y+)S,,-’ ^a’v,v,‘a- K’v,v,‘]a >

= (u - p + l)l?v,‘[(v + !!‘82)S88-1 - K2v,v,‘]-1v,, by (ii) above

K’%v.~vs’s,.~

I _

2 I

(u + T.“)(u + T,’ - K v, s,,v,) ’ ’ by (i) above which simplifies immediately to

Lb - P + l)14T,2 = d.

011 wcalling that T,’ = t~zd8’Sss-1ds = ,~‘v,‘~,,v,~ . Similarly,

max $(a) = (LJ - p + 1)~’

a,=0 *

{

I

. max ---!wL?sL,--- ‘a

82

h’lb + ~‘~2)Soo.s~1 - ~~Zv,.8v,.,‘la2 i

= d,.. .

[1] AITRIN, 11. A. (1071). Statistical theory (behaviolual science application). Ann. Rev. Psychol., 22, 22S5%230.

[2] AITKIN, ;\I. A. (1974). Simultaneous inference and the choice of variable subsets in multiple regression. Tcchno- metrics, 16, 221-227.

[3] ANDERSON, T. W. (1958). An Introduction to Jfdti- variate Statistical Analysis. Wiley, New York.

[4] BITEN, W. D. and DIC~ITT, C. C. (1944). Use of t,he discriminant fnnct ion in the comparison of proximate coal arralyscs. In&s. Eng. Chern.. Anal. Ed., 10, 3S34.

[5] Boc~c, It. D. and HAGG.\RD, E. A. (1968). The use of multivariate analysis in behavionral research. In D. K.

Whitla (Ed.), Handbook of Llfcasurcment and Assessment in the Bchavioural Sciences, 100-142. Addison-Wesley, 1Iassachusetts.

[G] DUNN, 0. J. (1971). Some expected values for prob-

(7)

abilities of correct classification in discriminant analysis.

l’rchnometrics, 13, 345%3%53.

[7] GAIIIIIF:L, K. R. (1964). A procedure for testing the homogeneity of all sets of means in analysis of variance.

Biometrics, 20, 4X-477.

[8] G.\rrn~m,, K. R. (196X). Simultaneous test, procedures in multivariate analysis of variance. Biometrika, 55, 489-504.

[<I] GAHRIXL, K. R. (1969). Simultaneous test procedurea- some theory of multiple comparisons. Ann. Math.

Statist., 40, 224-230.

[lo] HALL, C. E. (1971). Generalising the Wherry-Doolittle battery reduction procedure to canonical correlation and MANOVA. Journnl Exp. Ed., 39, 47-51.

[11] RIcK.\u, It. J. (1973). Simultaneous procedures in discriminant and regression analysis. Unpublished Ph.D.

thesis. University of New South Wales.

[12] R.\o, C. R. (1965). Linear Statistical Inference and its Applications. Wiley, New York.

1131 ROY, J. (19%). Step-down procedures in multivariate analysis. Ann. Math. Statist., 29, 177-187.

[14] ROY, S. N. (1953). On a heuristic method of test construction and its use in multivariate analysis. Ann.

Math. Statist., $4, 220-238.

[15] WEINER, J. 51. and I~NN, 0. J. (1966). Elimination of variates in linear discrimination problems. Biometrics, 22, 268-275.