• No results found

Multivariate normal distribution and testing for means (see MKB Ch 3)

N/A
N/A
Protected

Academic year: 2021

Share "Multivariate normal distribution and testing for means (see MKB Ch 3)"

Copied!
8
0
0

Loading.... (view fulltext now)

Full text

(1)

Multivariate normal distribution and testing for means (see MKB Ch 3)

Where are we going? 2

One-sample t-test (univariate). . . 3

Two-sample t-test (univariate) . . . 4

One sample T2-test (multivariate) . . . 5

Two sample T2-test (multivariate). . . 6

Multivariate normal distribution 7 Definition. . . 8

Why is it important in multivariate statistics? . . . 9

Why is it important in multivariate statistics? . . . 10

Characterization . . . 11

Properties . . . 12

Properties . . . 13

Data matrices . . . 14

Wishart distribution 15 Quadratic forms of normal data matrices . . . 16

Definition. . . 17

Properties . . . 18

Properties . . . 19

Properties . . . 20

Hotelling’s T2 distribution 21 Definition. . . 22

One-sample T2 test . . . 23

Relationship with F distribution . . . 24

Mahalonobis distance. . . 25

Two-sample T2 test. . . 26

Final remarks 27 Assumptions for one-sample T2 test. . . 28

Assumptions for two-sample T2 test. . . 29

Multivariate test versus univariate tests . . . 30

(2)

Where are we going? 2 / 30

One-sample t-test (univariate)

Suppose that x1, . . . , xn are i.i.d. N (µ, σ2). Then

x =¯ n1Pn

i=1xi ∼ N(µ, σ2/n)

ns2=Pn

i=1(xi− ¯x)2∼ σ2χ2n−1

x and s¯ 2 are independent

Suppose that the mean µ is unknown. Then we can do a one-sample t-test:

H0: µ = µ0, Ha: µ 6= µ0.

Test statistic: t = s¯x−µ0

u/

n, where su is sample standard deviation with n − 1 in the denominator.

Under H0, t ∼ tn−1, i.e., it has a student t-distribution with n − 1 degrees of freedom.

Compute p-value. If p-value < 0.05, we reject the null hypothesis.

3 / 30

Two-sample t-test (univariate)

Suppose that x1, . . . , xn are i.i.d. N (µX, σ2X), and let y1, . . . , ym be i.i.d. N (µY, σ2Y) (independent of x1, . . . , xn).

Suppose we want to test whether µX = µY, under the assumption that σX = σY. Then we can do a two-sample t-test:

H0: µX = µY, Ha: µX 6= µY.

Test statistic: t = q x−¯¯ y

s2p(n1+m1), where s2p= 1

n + m − 2 ns2X+ ms2Y .

Under H0, t ∼ tn+m−2.

4 / 30

One sample T2-test (multivariate)

Suppose that x1, . . . , xn are i.i.d.Np(µ, Σ). Then

x =¯ n1Pn

i=1xi ∼ Np(µ, Σ/n).

nS ∼ Wp(Σ, n − 1), where S is the sample covariance matrix and Wp is aWishart distribution.

x and S are independent.¯

Suppose that the mean µ is unknown. Then we can do a one-sample t-test:

H0: µ = µ0, Ha: µ 6= µ0.

Test statistic: T = n(¯x − µ0)Su−1(¯x − µ0), where Su is the sample covariance matrix with n − 1 in the denominator.

Under H0, T ∼ T2(p, n − 1), i.e., it hasHotelling’s T2 distributionwith parameters p and n − 1.

5 / 30

(3)

Two sample T2-test (multivariate)

Suppose that x1, . . . , xn are i.i.d. NpX, ΣX), and let y1, . . . , ym be i.i.d. NpY, ΣY) (independent of x1, . . . , xn).

Suppose we want to test whether µX = µY, under the assumption that ΣX = ΣY. Then we can do a two-sample t-test:

H0: µX = µY, Ha: µX 6= µY.

Test statistic: T = (nm/(n + m))(¯x − ¯y)Su−1(¯x − ¯y), where

Su= 1

n + m − 2(nS1+ mS2) .

Under H0, T ∼ T2(p, n + m − 2).

6 / 30

Multivariate normal distribution 7 / 30

Definition

A random variable x ∈ R has a univariate normal distribution with mean µ and variance σ2 (we write x ∼ N(µ, σ2)) iff its density can be written as

f (x) = 1

√2πσ exp



−1 2

(x − µ)2 σ2



= {2πσ2}−1/2exp



−1

2(x − µ){σ2}−1(x − µ)

 .

A random vector x ∈ Rp has a p-variate normal distribution with mean vector µ and covariance matrix Σ (we write x ∼ Np(µ, Σ)) iff its density can be written as

f (x) = |2πΣ|−1/2exp



−1

2(x − µ)Σ−1(x − µ)

 .

8 / 30

Why is it important in multivariate statistics?

It is an easy generalization of the univariate normal distribution. Such generalizations are not obvious for all univariate distributions; sometimes there are several plausible ways to generalize them.

The multivariate normal distribution is entirely defined by its first two moments. Hence, it has a sparse parametrization using only p(p + 3)/2 parameters (see board).

In the case of normal variables, zero correlation implies independence, and pairwise independence implies mutual independence. These properties do not hold for many other distributions.

9 / 30

(4)

Why is it important in multivariate statistics?

Linear functions of a multivariate normal are univariate normal. This yields simple derivations.

Even when the original data is not multivariate normal, certain functions like the sample mean will be approximately multivariate normal due to the central limit theorem.

The multivariate normal distribution has a simple geometry. Its equiprobability contours are ellipsoids (see picture on slide).

10 / 30

Characterization

x is p-variate normal iff ax is univariate normal for all fixed vectors a ∈ Rp (To allow for a = 0, we regard constants as degenerate forms of the normal distribution.)

Geometric interpretation: x is p-variate normal iff its projection on any univariate subspace is normal.

This characterization will allow us to derive many properties of the multivariate normal without writing down densities.

11 / 30

Properties

(Th 3.1.1 of MKB) If x is p-variate normal, and if y = Ax + c where A is any q × p matrix and c is any q-vector, then y is q-variate normal (see proof on board).

(Cor 3.1.1.1 of MKB) Any subset of elements of a multivariate normal vector are multivariate normal. In particular, all individual elements are univariate normal (see proof on board).

12 / 30

Properties

(Th 3.2.1 of MKB) If x ∼ Np(µ, Σ) and y = Ax + c, then y ∼ Nq(Aµ + c, AΣA) (see proof on board).

(Cor 3.2.1.1 of MKB) If x ∼ Np(µ, Σ) with Σ > 0, then y = Σ−1/2(x − µ) ∼ Np(0, I) and (x − µ)Σ−1(x − µ) =Pp

i=1yi2 ∼ χ2p (see proof on board).

13 / 30

(5)

Data matrices

Let x1, . . . , xn be a random sample from N (µ, Σ). Then we call X = (x1, . . . , xn) a data matrix from N (µ, Σ) or a normal data matrix.

(Th. 3.3.2 of MKB, without proof) If X(n × p) is a normal data matrix from Np(µ, Σ) and if Y (m × q) satisfies Y = AXB, then Y is a normal matrix iff the following two properties hold:

A1 = α1 for some scalar α, or Bµ = 0

AA = βI for some scalar β, or BΣB = 0

When both these conditions are satisfied then Y is a normal data matrix from Nq(αBµ, βBΣB).

Note: Pre-multiplication with A means that we take linear combinations of the rows. The conditions on A ensure that the new rows are independent. Post-multiplication with B means that we take linear combinations of the columns (variables).

14 / 30

Wishart distribution 15 / 30

Quadratic forms of normal data matrices

We now consider quadratic forms of normal data matrices, i.e., functions of the form XCX for some symmetric matrix C.

A special case of such a quadratic form is the covariance matrix, which we obtain when C = n−1(I − n−111) (see proof on board; H = I − n−111 is called the centering matrix).

16 / 30

Definition

If M (p × p) can be written as XX where X(m × p) is a data matrix from Np(0, Σ), then M is said to have a p-variate Wishart distribution with scale matrix Σ and m degrees of freedom. We write M ∼ Wp(Σ, m). When Σ = Ip, then the distribution is said to be in standard form.

Note: The Wishart distribution is a multivariate generalization of the χ2 distribution: when p = 1, the W12, m) distribution is given by xx, where x ∈ Rm contains i.i.d. N1(0, σ2) variables. Hence, W12, m) = σ2χ2m.

17 / 30

(6)

Properties

(Th 3.4.1 of MKB) If M ∼ Wp(Σ, m) and B is a p × q matrix, then BM B ∼ Wq(BΣB, m) (see proof on board).

(Cor 3.4.1.1 of MKB) Diagonal submatrices of M (square submatrices of M whose diagonal corresponds to the diagonal of M ) have a Wishart distribution.

(Cor 3.4.1.2 of MKB) Σ−1/2M Σ−1/2∼ Wp(I, m)

(Cor 3.4.1.3 of MKB) If M ∼ Wp(I, m) and B(p × q) satisfies BB = Iq, then BM B ∼ Wq(I, m)

(Cor 3.4.2.1 of MKB) The ith diagonal element of M , mii, has a σi2χ2m distribution (where σi2 is the ith diagonal element of Σ).

All these corollaries follow by choosing particular values of B in Th 3.4.1.

18 / 30

Properties

(Th 3.4.3 of MKB) If M1∼ Wp(Σ, m1) and M2 ∼ Wp(Σ, m2), and if M1 and M2 are independent, then M1+ M2∼ Wp(Σ, m1+ m2) (see proof on board).

19 / 30

Properties

(Th 3.4.4 of MKB) If X(n × p) is a data matrix from Np(0, Σ) and C(n × n) is a symmetric matrix, then

XCX has the same distribution as a weighted sum of independent Wp(Σ, 1) matrices, where the weights are eigenvalues of C;

XCX has a Wishart distribution if C is idempotent. In this case XCX ∼ Wp(Σ, r) where r = tr(C) = rank(C);

If S = n−1XHX is the sample covariance matrix, then nS ∼ Wp(Σ, n − 1).

(See proof on board)

20 / 30

Hotelling’s T

2

distribution 21 / 30

Definition

If α can be written as mdM−1d where d and M are independently distributed as Np(0, I) and Wp(I, m), then we say that α has the Hotelling T2 distribution with parameters p and m. We write α ∼ T2(p, m).

Hotelling’s T2 distribution is a generalization of the student t-distribution. If x ∼ tm, then x2∼ T2(1, m).

22 / 30

(7)

One-sample T2 test

(Th 3.5.1 of MKB) If x and M are independently distributed as Np(µ, Σ) and Wp(Σ, m) then m(x − µ)M−1(x − µ) ∼ T2(p, m) (see proof on board).

(Cor 3.5.1.1 of MKB) if ¯x and S are the mean vector and covariance matrix of a sample of size n from Np(µ, Σ), and Su= (n/(n − 1))S, then

T12 = (n − 1)(¯x − µ)S−1(¯x − µ) = n(¯x − µ)Su−1(¯x − µ) has a T2(p, n − 1) distribution (see proof on board).

23 / 30

Relationship with F distribution

The T2 distribution is not readily available in R. But the T2 distribution is closely related to the F -distribution:

(Th 3.5.2 of MKB, without proof)

T2(p, m) = {mp/(m − p + 1)}Fp,m−p+1.

(Cor 3.5.2.1 of MKB) If ¯x and S are the mean and covariance of a sample of size n from Np(µ, Σ), then (n−1)pn−p T12 has a Fp,n−p distribution.

24 / 30

Mahalonobis distance

The so-called Mahalonobis distance between two populations with means µ1 and µ2 and common covariance matrix Σ is given by ∆, where

2 = (µ1− µ2)Σ−11− µ2)

In other words, D is the euclidian distance between the re-scaled vectors Σ−1/2µ1 and Σ−1/2µ2.

The sample version of the Mahalonobis distance, D, is defined by D2 = (¯x1− ¯x2)Su−1(¯x1− ¯x2),

where Su= (n1S1+ n2S2)/(n − 2), ¯xi is the sample mean of sample i, ni is the sample size of sample i, and n = n1+ n2.

25 / 30

(8)

Two-sample T2 test

(Th 3.6.1 of MKB, without proof) If X1 and X2 are independent data matrices, and if the ni rows of Xi are i.i.d. Npi, Σi), i = 1, 2, then when µ1= µ2 and Σ1 = Σ2,

T22= n1n2 n D2 has a T2(p, n − 2) distribution.

Corollary: n−1−p(n−2)pT22 has a Fp,n−1−p distribution.

26 / 30

Final remarks 27 / 30

Assumptions for one-sample T2 test

We have a simple random sample from the population

The population has a multivariate normal distribution

28 / 30

Assumptions for two-sample T2 test

We have a simple random sample from each population

In each population the variables have a multivariate normal distribution

The two populations have the same covariance matrix

29 / 30

Multivariate test versus univariate tests See board

30 / 30

References

Related documents

We test the performance of random forests as a screening procedure to identify small numbers of risk-associated SNPs from among large numbers of unassociated SNPs using complex

The Committee and those it consulted did not find an ideal communication method with which to reach physicians, especially, and to ensure that they were familiar with and able to

Conference participants will tour Nexus Press, as well as view the Atlanta College of Art's renowned artist book collection.. DECORATIVE ARTS • Deanne Levison, co-author of

pMatlab implements the distributed arrays parallel programming model used to achieve high performance on the largest computers in the world.. This model gives the

Con el propósito de evitar dicha liberación, se han preparado y caracterizado cementos obtenidos a partir de la reacción entre el ácido poliacrílico y la wollastonita (CaSiO 3 ) en

You demonstrate an excellent understanding of the course objectives and course materials based on exam scores, assignment scores and class discussions..  You demonstrate

Since this might have important implications in the estimated relationship between evasion and tax rate, we have used transaction-level data on trade costs to generate a value

This article examines key issues for companies to consider when advertising in social media, including through consumer endorsements and testimonials, and running a contest,