Inference for Multivariate Means

(1)

Inference for Multivariate Means

Statistics 407, ISU

1

Inference for the Population Mean

This section focuses on the question:

Whether a pre-determined mean is a ________ value for the normal population mean, given the sample that we have collected.

In hypothesis testing language, this means we want to test

where is the true population mean and is the hypothesized mean.

Whether a pre-determined mean is a plausible value for the normal population mean, given the sample that we have collected.

Ho : µ = µ0 vs HA : µ �= µ0

where µ = (µ₁, µ₂, . . . , µ_p) is the true population mean and µ₀ = (µ₀₁, µ₀₂, . . . , µ₀_p) is the hypothesized mean.

1 Inference for the Population Mean

Ho : µ = µ0 vs HA : µ �= µ0

where µ = (µ₁, µ₂, . . . , µp) is the true population mean and µ0 =

(µ₀₁, µ₀₂, . . . , µ₀_p) is the hypothesized mean.

1 Inference for the Population Mean

Ho : µ = µ0 vs HA : µ �= µ0

where µ = (µ₁, µ₂, . . . , µp) is the true population mean and µ0 = (µ01, µ02, . . . , µ0p) is the hypothesized mean.

(2)

Univariate Testing of a Mean

Recall, in the univariate situation, when we want to test

we calculate the t-statistic, as follows:

Then we would compare |t| with values from a t-distribution with (n-1)

degrees of freedom, and reject Ho only if |t| is larger than the critical

values.

Univariate Testing of a Mean

Recall, in the univariate situation, when we want to test

Ho : µ = µ0 vs HA : µ �= µ0

we would calculate the t-statistic, as follows:

t = X¯ −µ0 s/√n .

Then we would compare _|t_| with values from a t-distribution with (n₋1) degrees of freedom, and reject Ho only if |t| is larger than

the critical values.

2 3

Confidence Interval for a Mean

We could also compute a 100(1- )% confidence interval for the population mean, , using

We could also compute a 100(1

−

α

)% confidence interval for

the population mean,

µ

, using

¯

X

±

t

_n₋₁

(

α

/

2)

√

s

n

3

We could also compute a 100(1

−

α

)% confidence interval for

the population mean,

µ

, using

¯

X

±

t

n

−

1

(

α

/

2)

√

s

n

3 4

(3)

Multivariate Testing of a Mean

The multivariate analog of the t-statistic is:

It is called Hotellings .

follows an distribution. (Remember? In univariate case when then .)

Multivariate Testing of a Mean The multivariate analog of the t-statistic is:

T2 = n( ¯X₋µ₀)�S−_n₋1₁( ¯X₋µ₀) It is called Hotellings T2.

T2 follows an (n_n−₋1)_pp_Fp,n₋p,α distribution. (Remember that, in the univariate case, when t _∼t_n₋₁, then t2 _∼ _F₁_,n₋₁. Note in the multivariate analog, that the degrees of freedom incorporate the dimensionality of the data.)

4

T2 = n( ¯X ₋

µ

₀)�S−_n₋1₁( ¯X₋

µ

₀)

It is called Hotellings T2.

T2 follows an (n_n−₋1)_p p_F_p,n₋_p,_α distribution. (Remember that, in the univariate case, when t _∼ t_n₋₁, then t2 _∼ _F₁_,n₋₁. Note in the multivariate analog, that the degrees of freedom incorporate the dimensionality of the data.)

4 Multivariate Testing of a Mean

T2 = n( ¯X₋

µ

₀)�S−_n₋1₁( ¯X ₋

µ

₀)

T2 follows an (n_n−₋1)_pp_F_p,n₋_p,_α distribution. (Remember that, in the univariate case, when t _∼ t_n₋₁, then t2 _∼ _F₁_,n₋₁. Note in the multivariate analog, that the degrees of freedom incorporate the dimensionality of the data.)

4

Multivariate Testing of a Mean

The multivariate analog of the

t

-statistic is:

T

2

=

n

( ¯

X

−

µ

₀

)

�

S

−_n₋1₁

( ¯

X

−

µ

₀

)

It is called

Hotellings

T

2

.

T

2

follows an

(n_n−₋1)_pp

F

_p,n₋_p,_α

distribution. (Remember that, in

the univariate case, when

t

∼

t

_n₋₁

, then

t

2

∼

F

₁_,n₋₁

. Note in the

multivariate analog, that the degrees of freedom incorporate the

dimensionality of the data.)

4

T2 = n( ¯X₋

µ

₀)�S−_n₋1₁( ¯X₋

µ

₀)

T2 follows an (n_n−₋1)_pp_Fp,n₋p,α distribution. (Remember that, in

the univariate case, when t _∼ t_n₋₁, then t2 _∼ _F₁_,n₋₁. Note in the

multivariate analog, that the degrees of freedom incorporate the dimensionality of the data.)

4

T2 = n( ¯X₋µ₀)�S−_n₋1₁( ¯X ₋µ₀)

It is called Hotellings T2. T2 follows an (n_n−1)p

−p Fp,n−p,α distribution. (Remember that, in

the univariate case, when t _∼ t_n₋₁, then t2 _∼ _F₁_,n₋₁. Note in the multivariate analog, that the degrees of freedom incorporate the dimensionality of the data.)

4

5

Multivariate Testing of a Mean

Null hypothesis: Alternative hypothesis:

BRIEF ARTICLE

THE AUTHOR

H

0

:

µ

=

µ

₀

H

A

:

µ

�

=

µ

0

Why aren’t there one-sided alternatives?

(4)

Example: Sweat data

Perspiration of 20 healthy females for 3 components: Sweat Rate, Sodium Content and Potassium Content.

Example: Sweat data

SweatRate 20 30 40 50 60 70 ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 2 3 4 5 6 7 8 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 20 30 40 50 60 70 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Na ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 2 3 4 5 6 7 8 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 7 8 9 10 12 14 7 8 9 10 12 14 K ¯ X =    4.61 45.6 9.97    S_n₋₁ =    2.75 9.37 ₋1.72 9.37 190.8 ₋5.35 −1.72 ₋5.35 3.45    S−_n₋1₁ =    0.606 ₋0.0222 0.268 −0.0222 0.00630 ₋0.00132 0.268 ₋0.00132 0.422    5

Example: Sweat data

SweatRate 2030 40 50 6070 ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 2 3 4 5 6 7 8 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 20 30 40 50 60 70 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Na ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 2 3 4 5 6 7 8 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 7 8 9 10 12 14 7 8 9 10 12 14 K ¯ X=    4.61 45.6 9.97    S_n₋₁ =    2.75 9.37 ₋1.72 9.37 190.8 ₋5.35 −1.72 ₋5.35 3.45    S−_n₋1₁ =    0.606 ₋0.0222 0.268 −0.0222 0.00630 ₋0.00132 0.268 ₋0.00132 0.422    5 7 We want to test We want to test if Ho : µ = (4 50 10)� vs HA : µ �= (4 50 10)� T2 = 20    4.61₋4 45.6₋50 9.97₋10    �   0.606 ₋0.0222 0.268 −0.0222 0.00630 ₋0.00132 0.268 ₋0.00132 0.422       4.61₋4 45.6₋50 9.97₋10    = 9.74 6 We want to test if Ho:µ = (4 50 10)� vs HA :µ �= (4 50 10)� T2 = 20    4.61₋4 45.6₋50 9.97₋10    �   0.606 ₋0.0222 0.268 −0.0222 0.00630 ₋0.00132 0.268 ₋0.00132 0.422       4.61₋4 45.6₋50 9.97₋10    = 9.74 6 Calculate: 8

(5)

Using then we would _______ Ho at the 10% level of significance.

Using

then we would ______________ Ho at the 5% level of significance.

Taking α = 0.10, (n_n−1)p

−p Fp,n−p(0.10) = 1917×3×2.44 = 8.18. Then we would reject Ho at the 10% level of significance.

Taking α = 0.05, (n_n−1)p

−p Fp,n−p(0.05) = 1917×3 ×3.2 = 10.7. Then we would NOT reject Ho at the 5% level of significance.

7 Taking α = 0.10, (n_n−1)p

−p Fp,n−p(0.10) = 1917×3×2.44 = 8.18. Then

we would reject Ho at the 10% level of significance.

Taking α = 0.05, (n_n−₋1)_pp_F_p,n₋_p(0.05) = 19₁₇×3 _×3.2 = 10.7. Then we would NOT reject Ho at the 5% level of significance.

7

9

Confidence Regions and Simultaneous

Comparisons of Component Means

A confidence region for the mean of a p-dimensional normal distribution is the ellipsoid determined by all such that

where are sample observations from a population.

This equation defines an ___________ in p-space.

Confidence Regions and Simultaneous Comparisons of Component Means

A 100(1₋α)% confidence region for the mean of a p-dimensional

normal distribution is the ellipsoid determined by all µ such that

n( ¯X₋µ)�S−_n₋1₁( ¯X₋µ) _≤ p(n−1)

n₋p Fp,n−p(α)

where X₁, . . . ,Xn are sample observations from a Np(µ,Σ)

pop-ulation.

This equation defines an ellipse in p-space.

8

Confidence Regions and Simultaneous Comparisons of

Component Means

A 100(1

−

α

)% confidence region for the mean of a

p

-dimensional

normal distribution is the ellipsoid determined by all

µ

such that

n

( ¯

X

−

µ

)

�

S

−_n₋1₁

( ¯

X

−

µ

)

≤

p

(

n

−

1)

n

−

p

F

p,n−p

(

α

)

where

X

₁

, . . . ,

X

_n

are sample observations from a

N

_p

(

µ

,

Σ

)

pop-ulation.

This equation defines an ellipse in

p

-space.

A 100(1₋α)% confidence region for the mean of a p-dimensional normal distribution is the ellipsoid determined by all µ such that

n( ¯X₋µ)�S−_n₋1₁( ¯X₋µ)_≤ p(n−1)

n₋p Fp,n−p(α)

where X₁, . . . ,Xn are sample observations from a Np(µ,Σ) pop-ulation.

8

A 100(1₋α)% confidence region for the mean of a p-dimensional

normal distribution is the ellipsoid determined by all µ such that

n( ¯X₋µ)�S−_n₋1₁( ¯X₋µ) _≤ p(n−1)

n−p Fp,n−p(α)

where X₁, . . . ,Xn are sample observations from a Np(µ,Σ)

pop-ulation.

(6)

Lengths of the major axes.

The shape of the confidence ellipse is dictated by the

variance-covariance matrix, . The ____________ is given by the eigenvectors, and the ______ of the major axes of the confidence ellipse are given by

where is the i'th eigenvalue.

Lengths of the major axes.

The shape of the confidence ellipse is dictated by the variance-covariance matrix, S. The orientation is given by the eigenvec-tors of S, and the lengths of the major axes of the confidence ellipse are given by

�

λ_i _×

�

p(n ₋1)

n(n ₋p)Fp,n−p(α) where λ_i is the i’th eigenvalue of S.

9

Lengths of the major axes.

The shape of the confidence ellipse is dictated by the

variance-covariance matrix,

S

. The orientation is given by the

eigenvec-tors of

S

, and the lengths of the major axes of the confidence

ellipse are given by

�

λ

_i

×

�

p

(

n

−

1)

n

(

n

−

p

)

F

p,n−p

(

α

)

where

λ

_i

is the

i

’th eigenvalue of

S

.

9 11

Example: Sweat data

90% confidence region x data points

+ sample mean,

hypothesized mean, confidence region (3D)

Notice that is ________ the confidence region, which

corresponds to the “_____” result for the hypothesis test.

Inference for the Population Mean

Whether a pre-determined mean is a plausible value for

the normal population mean, given the sample that we

have collected.

H

o

:

µ

=

µ

0

vs

H

A

:

µ

�

=

µ

0

where

µ

= (

µ

₁

, µ

₂

, . . . , µ

p

) is the true population mean and

µ

0

=

(

µ

₀₁

, µ

₀₂

, . . . , µ

₀_p

) is the hypothesized mean.

1

have collected.

H

_o

:

µ

=

µ

₀

vs

H

_A

:

µ

�

=

µ

₀

where

µ

= (

µ

₁

, µ

₂

, . . . , µ

p

µ

0

=

(

µ

₀₁

, µ

₀₂

, . . . , µ

₀_p

) is the hypothesized mean.

T2 = n( ¯X₋ µ₀)�S−_n₋1₁( ¯X ₋µ₀) It is called Hotellings T2.

4

(7)

Relationship between Interval and

Hypothesis Test

Hypothesis testing is equivalent to examining the hypothesized value in relation to the simultaneous confidence ellipse. If the hypothesized

value lies ________ the ellipse, then the null hypothesis is _________.

13

Simultaneous Confidence Intervals

Scheffe’s:

Bonferroni Method vs T2 intervals

Another method for computing simultaneous confidence intervals is ¯ X_i± � (n−1)p n₋p Fp,n−p,α s_i √ n

which is called Scheﬀe’s method. These intervals are the confi-dence ellipse projected into the coordinate axes.

The t-interval is the smallest interval, followed by Bonferroni, followed by Scheﬀe’s.

(8)

Simultaneous Confidence Intervals

Bonferroni:

Bonferroni Method of Multiple Comparisons

The (1

−

α)100%

t

-interval

¯

X

i

±

t

n

−

1

,

α 2

s

i

√

n

is calculated to contain the true mean with probability (1

−

α).

When computing multiple (

p

) confidence intervals the joint

prob-ability that they contain the true means needs to be at least

(1

−

α

). To achieve this each interval will need to have a higher

probability that it contains the true mean. One method for

en-suring this is the Bonferroni correction:

¯

X

i

±

t

n

−

1

,

α 2p

s

i

√

n

11 15

Intervals vs Region

90% confidence region x data points + sample mean, hypothesized mean, confidence region (3D) Bonferroni confidence interval, region

Ellipse extends outside of box, in some directions. Hypothesized mean is inside box always.

This section focuses on the question:

have collected.

H

o

:

µ

=

µ

0

vs

H

A

:

µ

�

=

µ

0

where

µ

= (

µ

₁

, µ

₂

, . . . , µ

p

µ

0

=

(

µ

₀₁

, µ

₀₂

, . . . , µ

₀_p

) is the hypothesized mean.

T2 = n( ¯X₋ µ₀)�S−_n₋1₁( ¯X ₋µ₀) It is called Hotellings T2.

4

(9)

Paired Samples

! _____________ the second value from the first and treat as a one

sample problem.

! Example: “Municipal wastewater treatment plants are required by law to monitor

their discharges into rivers and streams on a regular basis. Concern about the reliability of data from one of these self-monitoring programs led to a study in which samples of effluent were divided and sent to two laboratories were divided and sent to two laboratories for testing. One half of each sample was sent to the Wisconsin state lab and the other half sent to a private commercial lab routinely used in the monitoring program. Measurements were made on biochemical oxygen demand (BOD) and suspended solids (SS) were obtained, on n=11 samples.”

! Are the analyses producing equivalent results?

17

Data

Sample BOD

SS

BOD SS

1

6

27

25

15

-19

12

2

6

23

28

13

-22

10

3

18

64

36

22

-18

42

4

8

44

35

29

-27

15

5

11

30

15

31

-4

-1

6

34

75

44

64

-10

11

7

28

26

42

30

-14

-4

8

71

124

54

64

17

60

9

43

54

34

56

9

-2

10

33

30

29

20

4

10

11

20

14

39

21

-19

-7

18

(10)

Plots

! Plot _____ vs _________

! Are the values for BOD

the same from labs 1 & 2?

! Are the values for SS the

same from labs 1 & 2?

! Plot _________ in

multivariate plots.

! Is there a systematic shift

from one rep to another.

! Plot ____________, along

with sample mean and hypothesized mean.

! How similar are the

sample mean and hyp mean? ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1.0 1.2 1.4 1.6 1.8 2.0 10 20 30 40 50 60 70 Lab BOD ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● 1.0 1.2 1.4 1.6 1.8 2.0 20 40 60 80 100 120 Lab SS 10 20 30 40 50 60 70 20 40 60 80 100 120 BOD SS 1 1 1 1 1 1 1 1 1 1 122 2 2 2 2 2 2 2 2 2 ● ● ● ● ● ● ● ● ● ● ● −20 −10 0 10 0 10 20 30 40 50 60 Differences BOD SS x +

¯

D

=

�

−

9

.

36

13

.

27

�

S

d

=

�

199

.

3

88

.

3

88

.

3

418

.

6

�

T

2

= 11[

−

9

.

36 13

.

27]

�

0

.

0055

−

0

.

0012

−

0

.

0012

0

.

0026

� �

−

9

.

36

13

.

27

�

= 13

.

6

2

×

10

9

F

2

,

9

(0

.

05) = 9

.

47

⇒

Reject

H

o

.

21 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1.0 1.2 1.4 1.6 1.8 2.0 10 20 30 40 50 60 70 Lab BOD ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● 1.0 1.2 1.4 1.6 1.8 2.0 20 40 60 80 100 120 Lab SS 10 20 30 40 50 60 70 20 40 60 80 100 120 BOD SS 1 1 1 1 1 1 1 1 1 1 122 2 2 2 2 2 2 2 2 2 ● ● ● ● ● ● ● ● ● ● ● −20 −10 0 10 0 10 20 30 40 50 60 Differences BOD SS x +

¯

D

=

�

−

9

.

36

13

.

27

�

S

d

=

�

199

.

3

88

.

3

88

.

3

418

.

6

�

T

2

= 11[

−

9

.

36 13

.

27]

�

0

.

0055

−

0

.

0012

−

0

.

0012

0

.

0026

� �

−

9

.

36

13

.

27

�

= 13

.

6

2

×

10

9

F

2

,

9

(0

.

05) = 9

.

47

⇒

Reject

H

o

.

21 19

Two Sample Hotellings

For two samples collected independently from multivariate normal

populations, , the test statistic for vs is

distributed as and

Two Sample Hotellings

T

2

For two samples collected independently from multivariate

nor-mal populations,

N

(

µ

1

,

Σ

)

, N

(

µ

2

,

Σ

), the test statistic for

H

₀

:

µ

1

=

µ

2

vs

H

A

:

µ

1

�

=

µ

2

is:

T

2

= ( ¯

X

1

−

X

¯

2

)

�

((

1

n

1

+

1

n

2

)

S

pooled

)

−

1

( ¯

X

1

−

X

¯

2

)

is distributed as

(

n

1

+

n

2

−

2)

p

n

1

+

n

2

−

1

−

p

F

p,n

1

+

n

2

−

p

−

1

,

α

and

S

pooled

=

n

1

−

1

n

1

+

n

2

−

2

S

1

+

n

2

−

1

n

1

+

n

2

−

2

S

2

.

22

Two Sample Hotellings T2

For two samples collected independently from multivariate nor-mal populations, N(µ₁,Σ), N(µ₂,Σ), the test statistic for H₀ : µ₁ = µ₂ vs H_A : µ₁ _�= µ₂ is: T2 = ( ¯X₁₋X¯₂)�(( 1 n₁ + 1 n₂)Spooled) −1_{( ¯}_X 1−X¯2) is distributed as (n1+n2−2)p n₁+n₂₋1₋pFp,n1+n2−p−1,α and S_pooled = n1−1 n1+n2−2S1 + n₂₋1 n1+n2−2S2. 22

For two samples collected independently from multivariate nor-mal populations, N(µ₁,Σ), N(µ₂,Σ), the test statistic for H0 : µ₁ = µ2 vs HA :µ1 �= µ2 is: T2 = ( ¯X1−X¯2)�(( 1 n1 + 1 n2 )S_pooled)−1( ¯X1 −X¯2) is distributed as (n1+n2−2)p n1+n2−1−pFp,n1+n2−p−1,α and S_pooled = n1−1 n₁+n₂₋2S1+ n₁n+2n−₂1₋2S2. 22

Two Sample Hotellings

T

2

For two samples collected independently from multivariate

nor-mal populations,

N

(

µ

₁

,

Σ)

, N

(

µ

₂

,

Σ), the test statistic for

H

₀

:

µ

₁

=

µ

₂

vs

H

_A

:

µ

₁

�

=

µ

₂

is:

T

2

= ( ¯

X

₁

−

X

¯

₂

)

�

((

1

n

₁

+

1

n

₂

)

S

pooled

)

−1

( ¯

X

1

−

X

¯

2

)

is distributed as

(n1+n2−2)p n₁+n₂₋1₋p

Fp,n

1+n2−p−1,α

and

S

_pooled

=

n1−1 n₁+n₂₋2

S

1

+

n₁n+n2−₂1₋2

S

2

.

22

For two samples collected independently from multivariate nor-mal populations, N(µ₁,Σ), N(µ₂,Σ), the test statistic for H0 :

µ₁ = µ2 vs HA : µ1 �= µ2 is: T2 = ( ¯X₁ ₋X¯₂)�(( 1 n1 + 1 n2 )S_pooled)−1( ¯X₁ ₋X¯₂) is distributed as (n1+n2−2)p n₁+n₂₋1₋pFp,n1+n2−p−1,α and S_pooled = n1−1 n₁+n₂₋2S1 + n₁n+n2−₂1₋2S2. 22 20

(11)

Test for Homogeneity of

Variance-covariances

To test the hypothesis against the alternative that they are not all equal, SAS PROC DISCRIM uses a LRT test

(extends Bartlett’s univariate test.) No exact distribution, but for large data can be used to obtain p-values. Also Box’s M test,

, there’s R code for this. BUT, both tend to be sensitive, to ______________________________!! BRIEF ARTICLE THE AUTHOR H0 : µ1 = µ2 HA : µ1 �= µ2 H0 : Σ1 = Σ2 = ... = Σg 1

Testing for Equality of Variance-Covariance

Bartlett’s test for homogeneity of variances, for univariate data is: which generalizes to Λ = �g i=1|Si|(ni−1)/2 |S_pooled_|(n₋p)/2

has no exact distribution.

27

BRIEF ARTICLE

THE AUTHOR

H

₀

:

µ

₁

=

µ

₂

H

A

:

µ

1

�

=

µ

2

H

₀

:

Σ

₁

=

Σ

₂

=

...

=

Σ

_g

χ

2 2 Appendix 14

Statistics

Means y_i y_ij _i j n n i = =

!

1

Cell Covariance Matrix

S y y y y 0 i ij ij i j n ij i i i i w n n n i = " " # " > $

!

"#

$#

=

!

1

%

&%

& '

1

(

1 1 if if

Pooled Covariance Matrix

S S 0 = " " > $

!

"#

$#

=

!

n n g n g n g i i i g 1 1

'

( ) *

if if Box’s M Statistic M n g ni i i g = " " " > $

!

"#

$#

!

=

) *

logS

'

(

logS S S 1 0 0 1 if SYSMIS if 21

Comparing several Multivariate

Means - MANOVA

! Data: __samples of size ; __ variables.

! Population: g populations; different means , same

covariance , multivariate normal.

! Arises from (1) designed experimental studies, where there are

multiple responses measured for one or more treatments, (2) in

classification problems, where we want to classify a new observation into a class, based on two or more measured variables.

Comparing several Multivariate Means - MANOVA

Data

:

g

samples of size

n

₁

, . . . n

g

;

p

variables.

Population

:

g

populations; di

ﬀ

erent means (µ

₁

, . . . ,

µ

_g

), same

covariance (

Σ

), multivariate normal.

Arises from (1) designed experimental studies, where there are

multiple responses measured for one or more treatments, (2) in

classification problems, where we want to classify a new

obser-vation into a class, based on two or more measured variables.

Comparing several Multivariate Means - MANOVA Data: g samples of size n₁, . . . ng; p variables.

Population: g populations; diﬀerent means (µ₁, . . . ,µ_g), same covariance (Σ), multivariate normal.

Arises from (1) designed experimental studies, where there are multiple responses measured for one or more treatments, (2) in classification problems, where we want to classify a new obser-vation into a class, based on two or more measured variables.

28

Comparing several Multivariate Means - MANOVA Data: g samples of size n₁, . . . ng; p variables.

Population: g populations; diﬀerent means (µ₁, . . . ,µ_g), same covariance (Σ), multivariate normal.

Arises from (1) designed experimental studies, where there are multiple responses measured for one or more treatments, (2) in classification problems, where we want to classify a new obser-vation into a class, based on two or more measured variables.

(12)

Recall Univariate ANOVA

We have random samples from

l=1, ..., g. We are interested to know if the population means of the groups are different, that is, if .

The problem is parametrized into the following model formulation:

with the constraint , which leads to the null hypothesis notation:

Recall Univariate ANOVA

We have random samples X_l₁, X_l₂, . . . , X_ln_l from N(µ_l,σ2), l =

1, . . . , g. We are interested to know if the population means of

the groups are diﬀerent, that is, if µ1 = µ2 = . . . = µg.

The problem is parametrized into the following model formula-tion:

X_lj = µ + τ_l + e_lj

with the constraint �g_l₌₁n_lτ_l = 0, which leads to the null

hy-pothesis notation:

Ho : τ1 = τ2 = . . . = τg = 0

29

Recall Univariate ANOVA

We have random samples

X

_l₁

, X

_l₂

, . . . , X

_ln_l

from

N

(

µ

_l

,

σ

2

),

l

=

1

, . . . , g

. We are interested to know if the population means of

the groups are di

ﬀ

erent, that is, if

µ

₁

=

µ

₂

=

. . .

=

µ

g

.

The problem is parametrized into the following model

formula-tion:

X

_lj

=

µ

+

τ

_l

+

e

_lj

with the constraint

�g_l₌₁

n

_l

τ

_l

= 0, which leads to the null

H

o

:

τ

1

=

τ

2

=

. . .

=

τ

g

= 0

29

Recall Univariate ANOVA

We have random samples X_l₁, X_l₂, . . . , X_ln_l from N(µ_l,σ2), l =

1, . . . , g. We are interested to know if the population means of

the groups are diﬀerent, that is, if µ₁ = µ₂ = . . . =µ_g.

The problem is parametrized into the following model formula-tion:

with the constraint �g_l₌₁n_lτ_l = 0, which leads to the null

H_o : τ₁ = τ₂ = . . . = τ_g = 0

29

23

The model is fit to the data by splitting each observation into components:

X_lj = ¯X + ( ¯X_l₋X¯) + (X_lj₋X¯_l), j = 1, . . . , n_l; l = 1, . . . , g

and computing the sums of squares for each component:

g � l=1 n_l � j=1 (X_lj−X¯)2= g � l=1 n_l( ¯X_l−X¯)2+ g � l=1 n_l � j=1 (X_lj−X¯_l)2 30

The model is fit to the data by splitting each observation into components:

X_lj = ¯X + ( ¯X_l₋X¯) + (X_lj₋X¯_l), j = 1, . . . , n_l; l= 1, . . . , g

g � l=1 n_l � j=1 (X_lj−X¯)2 = g � l=1 n_l( ¯X_l−X¯)2+ g � l=1 n_l � j=1 (X_lj−X¯_l)2 30 24

(13)

ANOVA Table

Then we test the hypothesis using

ANOVA Table

Source SS df Treatment (Between) SS_{T r} g ₋1

Error (Within) SS_Res �g_l₌₁n_l ₋g

Total SS_{CorT ot} �g_l₌₁n_l ₋1

Then we test the hypothesis Ho : τ1 = τ2 = . . . = τg = 0 vs HA : not all equal 0, using:

F = SST r/(g −1) SS_Res/(�g_l₌₁n_l ₋g) ∼ Fg−1,�nl−g(α) 31 ANOVA Table Source SS df Treatment (Between) SS_{T r} g−1

Error (Within) SS_Res �g_l₌₁n_l₋g

Total SS_{CorT ot} �g_l₌₁n_l−1

Then we test the hypothesis Ho: τ1 =τ2 =. . .= τg = 0 vs HA : not all equal 0, using:

F = SST r/(g−1)

SS_Res/(�g_l₌₁n_l−g)∼Fg−1,�nl−g(α)

31

25

Multivariate ANOVA

The corresponding model formulation is

And the corresponding data decomposition

Multivariate ANOVA

The corresponding model formulation is

where X_lj,e_lj are n_l_× p matrices, µ,τ_l are p-D vectors, giving the corresponding data representation:

X_lj = ¯X + ( ¯X_l₋X¯) + (X_lj₋X¯_l), j = 1, . . . , n_l; l = 1, . . . , g

(14)

The corresponding sums of squares is given by: BRIEF ARTICLE THE AUTHOR H0:µ1=µ2 HA:µ1�=µ2 H0:Σ1=Σ2=...=Σg χ2 g � l=1 nl � j=1 (Xlj−X¯)(Xlj−X¯)�₌ g � l=1 nl( ¯Xl−X¯)( ¯Xl−X¯)�+ g � l=1 nl � j=1 (Xlj−Xl¯ )(Xlj−Xl¯ )� 1

The corresponding sums of squares is given by:

g � l=1 n_l � j=1 (X_lj ₋X¯)(X_lj ₋X¯)� = g � l=1 n_l( ¯X_l ₋X¯)( ¯X_l ₋X¯)�+ g � l=1 n_l � j=1 (X_lj ₋X¯_l)(X_lj ₋X¯_l)� = B+W MANOVA Table Source SS df Treatment (Between) B g ₋1 Error (Within) W �g_l₌₁n_l ₋g Total B+W �g_l₌₁n_l ₋1 33 MANOVA Table 27

The big complication here is that we are comparing matrices now

instead of single numbers. The matrices correspond to variance ellipses in p-D space. So there are numerous test statistics. The most popular is Wilks' lambda:

If is too small then reject . If is large,

is distributed as . For smaller

n, some combinations of p,g the test statistic has an F distribution.

Testing the hypothesis

Testing the hypothesis,

H

_o

:

τ

₁

=

. . .

=

τ

_l

=

0

instead of single numbers. The matrices correspond to variance

ellipses in

p

-D space. So there are numerous test statistics. The

is too small then reject

H

_o

. If

�

n

_l

=

n

is large,

−

(

n

−

1

−

(

p

+

g

)

/

2) ln

�

|W_|

|B+W_|

�

is distributed as

χ

2_p(g₋₁₎

(

α

). For smaller

n

,

some combinations of

p, g

the test statistic has an

F

distribution.

34

H

o

:

τ

1

=

. . .

=

τ

l

=

0

instead of single numbers. The matrices correspond to variance

ellipses in

p

-D space. So there are numerous test statistics. The

Λ

∗

=

|

W

|

B

+

W

|

If

Λ

∗

is too small then reject

H

o

. If

�

n

_l

=

n

is large,

−

(

n

−

1

−

(

p

+

g

)

/

2) ln

�

|W_|

|B+W_|

�

is distributed as

χ

2_p₍_g₋₁₎

(

α

). For smaller

n

,

p, g

F

distribution.

34

H

_o

:

τ

1

=

. . .

=

τ

l

=

0

instead of single numbers. The matrices correspond to variance

ellipses in

p

-D space. So there are numerous test statistics. The

Λ

∗

=

|

W

|

B

+

W

|

If

Λ

∗

is too small then reject

H

o

. If

�

n

_l

=

n

is large,

−

(

n

−

1

−

(

p

+

g

)

/

2) ln

�

|W_| |B+W|

�

is distributed as

χ

2_p₍_g₋₁₎

(

α

). For smaller

n

,

p, g

F

distribution.

34

Testing the hypothesis, Ho : τ1 = . . . = τl = 0

The big complication here is that we are comparing matrices now instead of single numbers. The matrices correspond to variance ellipses in p-D space. So there are numerous test statistics. The most popular is Wilks’ lambda:

Λ∗ = |W|

|B +W_|

If Λ∗ is too small then reject Ho. If �nl = n is large, −(n − 1−

(p+g)/2) ln � |W_| |B+W_| � is distributed as χ2_p₍_g −1)(α). For smaller n,

some combinations of p, g the test statistic has an F distribution.

34 BRIEF ARTICLE THE AUTHOR H0:µ1 =µ2 HA :µ1 �=µ2 H0:Σ1 =Σ2 =...=Σg χ2 g � l=1 nl � j=1 (Xlj−X¯)(Xlj −X¯)� = g � l=1 nl( ¯Xl−X¯)( ¯Xl−X¯)�+ g � l=1 nl � j=1 (Xlj−X¯l)(Xlj −X¯l)� −(n−1−(p+g)/2) ln�_|_B|W₊_W| _|� 1

Testing the hypothesis, Ho : τ1 = . . . = τl = 0

The big complication here is that we are comparing matrices now instead of single numbers. The matrices correspond to variance ellipses in p-D space. So there are numerous test statistics. The most popular is Wilks’ lambda:

Λ∗ = |W|

|B+W_|

If Λ∗ is too small then reject Ho. If �n_l = n is large, −(n−1 −

(p+g)/2) ln �

|W_|

|B+W|

�

is distributed as χ2_p₍_g₋₁₎(α). For smaller n, some combinations of p, g the test statistic has an F distribution.

34

(15)

Note that we can also express as

Note that we can also express

Λ

∗

as

Λ

∗

=

min(_�p,g₋1) i=1

1

1 +

λ

_i

where

λ

_i

;

i

= 1

, . . . ,

min(

p, g

−

1) are the eigenvalues of

W

−1

B

.

Also note, being based on determinants of the matrices that

Wilks lambda is essentially comparing volumes of the

correspond-ing ellipses. Another way of calculatcorrespond-ing volumes of ellipses is the

products of the lengths of the leading axes (eigenvalues).

35 Note that we can also express Λ∗ as

Λ∗ =

min(_�p,g₋1)

i=1

1 1 + λ_i

where λ_i; i = 1, . . . ,min(p, g ₋1) are the eigenvalues of W−1B. Also note, being based on determinants of the matrices that Wilks lambda is essentially comparing volumes of the correspond-ing ellipses. Another way of calculatcorrespond-ing volumes of ellipses is the products of the lengths of the leading axes (eigenvalues).

35

where are the _______________ of .

Note that we can also express Λ∗ as

Λ∗ =

min(_�p,g−1)

i=1

1

1 + λ_i

where λ_i; i = 1, . . . ,min(p, g ₋ 1) are the eigenvalues of W−1B.

Also note, being based on determinants of the matrices that Wilks lambda is essentially comparing volumes of the correspond-ing ellipses. Another way of calculatcorrespond-ing volumes of ellipses is the products of the lengths of the leading axes (eigenvalues).

35

Note that we can also express Λ∗ as

Λ∗ = min(p,g₋1) � i=1 1 1 +λ_i

where λ_i; i = 1, . . . ,min(p, g ₋ 1) are the eigenvalues of W−1B.

Also note, being based on determinants of the matrices that Wilks lambda is essentially comparing volumes of the correspond-ing ellipses. Another way of calculatcorrespond-ing volumes of ellipses is the products of the lengths of the leading axes (eigenvalues).

35

Also note, being based on determinants of the matrices that Wilks lambda is essentially ___________________ of the corresponding ellipses. Another way of calculating volumes of ellipses is the products of the lengths of the leading axes (eigenvalues).

29

Other Test Statistics

! Roy's largest eigenvalue of

! Lawley and Hotellings: ! Pillai's trace:

Other Test Statistics

• Roy’s largest eigenvalue of BW−1

• Lawley and Hotellings: T = trace(BW−1) • Pillai’s trace: V = trace(B(B+W)−1)

36

•

Roy’s largest eigenvalue of

BW

−1

•

Lawley and Hotellings:

T

=

trace

(

BW

−1

)

•

Pillai’s trace:

V

=

trace

(

B

(

B

+

W

)

−1

)

36

• Roy’s largest eigenvalue of BW−1

• Lawley and Hotellings: T = trace(BW−1)

• Pillai’s trace: V =trace(B(B+W)−1)

(16)

Example: Dominican Republic

Student Testing

82 Students, enrolled in one of Technology (23), Architecture (38), Medical Technology (21) programs.

The students were given 4 tests: Aptitude, Math, Language, General Knowledge.

Is there is a difference in the average test scores for each student type?

31

Summary Statistics

BRIEF ARTICLE THE AUTHOR H0:µ1=µ2 HA:µ1�=µ2 H0:Σ1=Σ2=...=Σg χ2 g � l=1 nl � j=1 (Xlj−X¯)(Xlj−X¯)� = g � l=1 nl( ¯Xl−X¯)( ¯Xl−X¯)�+ g � l=1 nl � j=1 (Xlj−X¯l)(Xlj−X¯l)� −(n−1−(p+g)/2) ln�_|_B|W₊_W| _|� ¯ X=     49.1 46.8 73.1 22.3     X¯1=     39.0 47.4 70.6 19.2     X¯2=     67.2 51.2 77.3 21.5     X¯3=     27.4 38.1 68.1 27.2     1 n=82, n1=23, n2=38, n3=21, p=4, g=3 2 THE AUTHOR S1 =     1067.7 224.1 97.5 74.0 224.1 192.9 34.6 37.9 97.5 34.6 68.9 3.2 74.0 37.9 3.2 50.5     S2 =     488.9 25.8 37.8 67.5 25.8 235.7 46.0 ₋28.3 37.8 46.0 57.2 0.57 67.5 −28.3 0.57 68.2     S3 =     672.9 112.9 101.2 68.3 112.9 81.2 7.8 4.4 101.2 7.8 56.3 ₋9.7 68.3 4.4 −9.7 170.4     2 THE AUTHOR S1 =     1067.7 224.1 97.5 74.0 224.1 192.9 34.6 37.9 97.5 34.6 68.9 3.2 74.0 37.9 3.2 50.5     S2 =     488.9 25.8 37.8 67.5 25.8 235.7 46.0 −28.3 37.8 46.0 57.2 0.57 67.5 ₋28.3 0.57 68.2     S3 =     672.9 112.9 101.2 68.3 112.9 81.2 7.8 4.4 101.2 7.8 56.3 ₋9.7 68.3 4.4 −9.7 170.4     2 THE AUTHOR S1 =     1067.7 224.1 97.5 74.0 224.1 192.9 34.6 37.9 97.5 34.6 68.9 3.2 74.0 37.9 3.2 50.5     S2 =     488.9 25.8 37.8 67.5 25.8 235.7 46.0 ₋28.3 37.8 46.0 57.2 0.57 67.5 ₋28.3 0.57 68.2     S3 =     672.9 112.9 101.2 68.3 112.9 81.2 7.8 4.4 101.2 7.8 56.3 ₋9.7 68.3 4.4 ₋9.7 170.4   

 Can we consider that the population variance-covariances

might be equal?

(17)

Descriptive

graphics

value co un t 0 5 10 15 0 5 10 15 0 5 10 15 Apt 0 20 40 60 80 100 Math 0 20 40 60 80100 Lang 0 20 40 60 80 100 GenKnow 0 20 40 60 80 100 T ech Arch Me dT ech StudType Tech Arch MedTech

Can we consider that the population

variances might be equal?

Are the population means different? 33

Descriptive

graphics

Ap t Ma th Lang G en Kn ow

Apt Math Lang GenKnow

20 40 60 80 Corr: 0.408 Corr: 0.512 Corr: 0.139 0 20 40 60 80 Corr: 0.413 Corr: -0.104 0 20 40 60 80 Corr: -0.0811 0 20 40 60 80 10 20 30 40 50 10 20 30 40 50 10 20 30 40 50 10 20 30 40 50 Can we consider that

the population

variance-covariances might be equal?

Are the means different?

(18)

Descriptive

graphics

Can we consider that the population

variance-covariances might be equal?

Are the means different? 35

Test

OR B =      24600.2 6832.6 5709.5 ₋2040.4 6832.6 2329.6 1570.2 ₋1064.7 5709.5 1570.2 1325.7 ₋455.6 −2040.4 ₋1064.7 ₋455.6 743.5      W =      55036.2 8140.0 5570.0 5490.1 8140.0 14589.0 2619.2 ₋128.0 5570.0 2619.2 4759.9 ₋102.4 5490.1 ₋128.0 ₋102.4 7040.6      42 36

(19)

Test

2 THE AUTHOR

S

1

=







1067

.

7 224

.

1 97

.

5 74

.

0

224

.

1 192

.

9 34

.

6 37

.

9

97

.

5

34

.

6 68

.

9 3

.

2

74

.

0

37

.

9

3

.

2 50

.

5







S

2

=







488

.

9

25

.

8

37

.

8

67

.

5

25

.

8

235

.

7 46

.

0

−

28

.

3

37

.

8

46

.

0

57

.

2

0

.

57

67

.

5

−

28

.

3 0

.

57

68

.

2







S

3

=







672

.

9 112

.

9 101

.

2 68

.

3

112

.

9 81

.

2

7

.

8

4

.

4

101

.

2

7

.

8

56

.

3

−

9

.

7

68

.

3

4

.

4

−

9

.

7 170

.

4







H

_o

:

µ

₁

=

µ

₂

=

µ

₃

H

_o

:

τ

₁

=

τ

₂

=

τ

₃

=

0

OR

> summary(manova(cbind(Apt, Math, Lang, GenKnow)~StudType, domrep), test="Wilks")

Df Wilks approx F num Df den Df Pr(>F) StudType 2 _______ _______ 8 152 1.384e-07 *** Residuals 79 2 THE AUTHOR S1 =     1067.7 224.1 97.5 74.0 224.1 192.9 34.6 37.9 97.5 34.6 68.9 3.2 74.0 37.9 3.2 50.5     S2 =     488.9 25.8 37.8 67.5 25.8 235.7 46.0 −28.3 37.8 46.0 57.2 0.57 67.5 ₋28.3 0.57 68.2     S3 =     672.9 112.9 101.2 68.3 112.9 81.2 7.8 4.4 101.2 7.8 56.3 ₋9.7 68.3 4.4 ₋9.7 170.4     Ho:µ1 =µ2 =µ3 Ho:τ1 =τ2=τ3=0 B=     24600.2 6832.6 5709.5 ₋2040.4 6832.6 2329.6 1570.2 ₋1064.7 5709.5 1570.2 1325.7 −455.6 −2040.4 −1064.7 −455.6 743.5     W=     55036.2 8140.0 5570.0 5490.1 8140.0 14589.0 2619.2 −128.0 5570.0 2619.2 4759.9 −102.4 5490.1 −128.0 −102.4 7040.6     37

Which variables?

Analysis of variance on each variable

All variables have significant difference between means! In order of importance: ____________________________ 2 THE AUTHOR S1 =     1067.7 224.1 97.5 74.0 224.1 192.9 34.6 37.9 97.5 34.6 68.9 3.2 74.0 37.9 3.2 50.5     S2 =     488.9 25.8 37.8 67.5 25.8 235.7 46.0 ₋28.3 37.8 46.0 57.2 0.57 67.5 ₋28.3 0.57 68.2     S3 =     672.9 112.9 101.2 68.3 112.9 81.2 7.8 4.4 101.2 7.8 56.3 ₋9.7 68.3 4.4 ₋9.7 170.4     Ho : µ1 =µ2 =µ3 Ho : τ1 =τ2 =τ3 =0 B=     24600.2 6832.6 5709.5 ₋2040.4 6832.6 2329.6 1570.2 −1064.7 5709.5 1570.2 1325.7 ₋455.6 −2040.4 −1064.7 −455.6 743.5     W=     55036.2 8140.0 5570.0 5490.1 8140.0 14589.0 2619.2 ₋128.0 5570.0 2619.2 4759.9 −102.4 5490.1 ₋128.0 ₋102.4 7040.6    

Df Sum Sq Mean Sq F value Pr(>F) Apt 2 24600 12300.1 17.656 4.589e-07 *** Math 2 2329.6 1164.80 6.3074 0.002875 ** Lang 2 1325.7 662.85 11.001 6.097e-05 *** GenKnow 2 743.5 371.74 4.1712 0.01896 *

(20)

Which groups?

Tukey’s multiple comparisons

Apt: _____________________________________ Math: ____________________________________ Lang: ____________________________________ GenKnow: _______________________________ 2 THE AUTHOR S1 =     1067.7 224.1 97.5 74.0 224.1 192.9 34.6 37.9 97.5 34.6 68.9 3.2 74.0 37.9 3.2 50.5     S2 =     488.9 25.8 37.8 67.5 25.8 235.7 46.0 −28.3 37.8 46.0 57.2 0.57 67.5 −28.3 0.57 68.2     S3 =     672.9 112.9 101.2 68.3 112.9 81.2 7.8 4.4 101.2 7.8 56.3 −9.7 68.3 4.4 −9.7 170.4     Ho:µ1=µ2=µ3 Ho:τ1=τ2=τ3=0 B=     24600.2 6832.6 5709.5 −2040.4 6832.6 2329.6 1570.2 −1064.7 5709.5 1570.2 1325.7 −455.6 −2040.4 −1064.7 −455.6 743.5     W=     55036.2 8140.0 5570.0 5490.1 8140.0 14589.0 2619.2 −128.0 5570.0 2619.2 4759.9 −102.4 5490.1 −128.0 −102.4 7040.6    

Df Sum Sq Mean Sq F value Pr(>F)

Apt 2 24600 12300.1 17.656 4.589e-07 *** Math 2 2329.6 1164.80 6.3074 0.002875 ** Lang 2 1325.7 662.85 11.001 6.097e-05 *** GenKnow 2 743.5 371.74 4.1712 0.01896 * diff lwr upr p adj Arch-Tech 28.16 11.50 44.81 0.00 MedTech-Tech -11.57 -30.60 7.46 0.32 MedTech-Arch -39.73 -56.87 -22.59 0.00 Apt BRIEF ARTICLE 3 diff lwr upr p adj Arch-Tech 3.79 -4.78 12.37 0.54 MedTech-Tech -9.30 -19.09 0.50 0.07 MedTech-Arch -13.09 -21.92 -4.26 0.00 diff lwr upr p adj Arch-Tech 6.68 1.78 11.58 0.00 MedTech-Tech -2.47 -8.06 3.13 0.55 MedTech-Arch -9.15 -14.19 -4.11 0.00 diff lwr upr p adj Arch-Tech 2.31 -3.65 8.27 0.63 MedTech-Tech 7.97 1.17 14.78 0.02 MedTech-Arch 5.66 -0.47 11.80 0.08 Math BRIEF ARTICLE 3 diff lwr upr p adj Arch-Tech 3.79 -4.78 12.37 0.54 MedTech-Tech -9.30 -19.09 0.50 0.07 MedTech-Arch -13.09 -21.92 -4.26 0.00 diff lwr upr p adj Arch-Tech 6.68 1.78 11.58 0.00 MedTech-Tech -2.47 -8.06 3.13 0.55 MedTech-Arch -9.15 -14.19 -4.11 0.00 diff lwr upr p adj Arch-Tech 2.31 -3.65 8.27 0.63 MedTech-Tech 7.97 1.17 14.78 0.02 MedTech-Arch 5.66 -0.47 11.80 0.08 Lang BRIEF ARTICLE 3 diff lwr upr p adj Arch-Tech 3.79 -4.78 12.37 0.54 MedTech-Tech -9.30 -19.09 0.50 0.07 MedTech-Arch -13.09 -21.92 -4.26 0.00 diff lwr upr p adj Arch-Tech 6.68 1.78 11.58 0.00 MedTech-Tech -2.47 -8.06 3.13 0.55 MedTech-Arch -9.15 -14.19 -4.11 0.00 diff lwr upr p adj Arch-Tech 2.31 -3.65 8.27 0.63 MedTech-Tech 7.97 1.17 14.78 0.02 MedTech-Arch 5.66 -0.47 11.80 0.08 GenKnow 39

Summary

In general, using a multivariate test, such as _________, is preferable to multiple univariate tests, such as __________. This controls for the correlation structure in the data. Its possible to get non-significant results in the univariate tests, but significant results in the multivariate tests. That is, the univariate tests may not detect the differences, in the presence of correlation between the responses.

(21)

This work is licensed under the Creative Commons Attribution-Noncommercial 3.0 United States License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc/3.0/ us/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.