Multivariate normal distribution and testing for means (see MKB Ch 3)
Where are we going? 2
One-sample t-test (univariate). . . 3
Two-sample t-test (univariate) . . . 4
One sample T2-test (multivariate) . . . 5
Two sample T2-test (multivariate). . . 6
Multivariate normal distribution 7 Definition. . . 8
Why is it important in multivariate statistics? . . . 9
Why is it important in multivariate statistics? . . . 10
Characterization . . . 11
Properties . . . 12
Properties . . . 13
Data matrices . . . 14
Wishart distribution 15 Quadratic forms of normal data matrices . . . 16
Definition. . . 17
Properties . . . 18
Properties . . . 19
Properties . . . 20
Hotelling’s T2 distribution 21 Definition. . . 22
One-sample T2 test . . . 23
Relationship with F distribution . . . 24
Mahalonobis distance. . . 25
Two-sample T2 test. . . 26
Final remarks 27 Assumptions for one-sample T2 test. . . 28
Assumptions for two-sample T2 test. . . 29
Multivariate test versus univariate tests . . . 30
Where are we going? 2 / 30
One-sample t-test (univariate)
■ Suppose that x1, . . . , xn are i.i.d. N (µ, σ2). Then
◆ x =¯ n1Pn
i=1xi ∼ N(µ, σ2/n)
◆ ns2=Pn
i=1(xi− ¯x)2∼ σ2χ2n−1
◆ x and s¯ 2 are independent
■ Suppose that the mean µ is unknown. Then we can do a one-sample t-test:
◆ H0: µ = µ0, Ha: µ 6= µ0.
◆ Test statistic: t = s¯x−µ0
u/√
n, where su is sample standard deviation with n − 1 in the denominator.
◆ Under H0, t ∼ tn−1, i.e., it has a student t-distribution with n − 1 degrees of freedom.
◆ Compute p-value. If p-value < 0.05, we reject the null hypothesis.
3 / 30
Two-sample t-test (univariate)
■ Suppose that x1, . . . , xn are i.i.d. N (µX, σ2X), and let y1, . . . , ym be i.i.d. N (µY, σ2Y) (independent of x1, . . . , xn).
■ Suppose we want to test whether µX = µY, under the assumption that σX = σY. Then we can do a two-sample t-test:
◆ H0: µX = µY, Ha: µX 6= µY.
◆ Test statistic: t = q x−¯¯ y
s2p(n1+m1), where s2p= 1
n + m − 2 ns2X+ ms2Y .
◆ Under H0, t ∼ tn+m−2.
4 / 30
One sample T2-test (multivariate)
■ Suppose that x1, . . . , xn are i.i.d.Np(µ, Σ). Then
◆ x =¯ n1Pn
i=1xi ∼ Np(µ, Σ/n).
◆ nS ∼ Wp(Σ, n − 1), where S is the sample covariance matrix and Wp is aWishart distribution.
◆ x and S are independent.¯
■ Suppose that the mean µ is unknown. Then we can do a one-sample t-test:
◆ H0: µ = µ0, Ha: µ 6= µ0.
◆ Test statistic: T = n(¯x − µ0)′Su−1(¯x − µ0), where Su is the sample covariance matrix with n − 1 in the denominator.
◆ Under H0, T ∼ T2(p, n − 1), i.e., it hasHotelling’s T2 distributionwith parameters p and n − 1.
5 / 30
Two sample T2-test (multivariate)
■ Suppose that x1, . . . , xn are i.i.d. Np(µX, ΣX), and let y1, . . . , ym be i.i.d. Np(µY, ΣY) (independent of x1, . . . , xn).
■ Suppose we want to test whether µX = µY, under the assumption that ΣX = ΣY. Then we can do a two-sample t-test:
◆ H0: µX = µY, Ha: µX 6= µY.
◆ Test statistic: T = (nm/(n + m))(¯x − ¯y)′Su−1(¯x − ¯y), where
Su= 1
n + m − 2(nS1+ mS2) .
◆ Under H0, T ∼ T2(p, n + m − 2).
6 / 30
Multivariate normal distribution 7 / 30
Definition
■ A random variable x ∈ R has a univariate normal distribution with mean µ and variance σ2 (we write x ∼ N(µ, σ2)) iff its density can be written as
f (x) = 1
√2πσ exp
−1 2
(x − µ)2 σ2
= {2πσ2}−1/2exp
−1
2(x − µ){σ2}−1(x − µ)
.
■ A random vector x ∈ Rp has a p-variate normal distribution with mean vector µ and covariance matrix Σ (we write x ∼ Np(µ, Σ)) iff its density can be written as
f (x) = |2πΣ|−1/2exp
−1
2(x − µ)′Σ−1(x − µ)
.
8 / 30
Why is it important in multivariate statistics?
■ It is an easy generalization of the univariate normal distribution. Such generalizations are not obvious for all univariate distributions; sometimes there are several plausible ways to generalize them.
■ The multivariate normal distribution is entirely defined by its first two moments. Hence, it has a sparse parametrization using only p(p + 3)/2 parameters (see board).
■ In the case of normal variables, zero correlation implies independence, and pairwise independence implies mutual independence. These properties do not hold for many other distributions.
9 / 30
Why is it important in multivariate statistics?
■ Linear functions of a multivariate normal are univariate normal. This yields simple derivations.
■ Even when the original data is not multivariate normal, certain functions like the sample mean will be approximately multivariate normal due to the central limit theorem.
■ The multivariate normal distribution has a simple geometry. Its equiprobability contours are ellipsoids (see picture on slide).
10 / 30
Characterization
■ x is p-variate normal iff a′x is univariate normal for all fixed vectors a ∈ Rp (To allow for a = 0, we regard constants as degenerate forms of the normal distribution.)
■ Geometric interpretation: x is p-variate normal iff its projection on any univariate subspace is normal.
■ This characterization will allow us to derive many properties of the multivariate normal without writing down densities.
11 / 30
Properties
■ (Th 3.1.1 of MKB) If x is p-variate normal, and if y = Ax + c where A is any q × p matrix and c is any q-vector, then y is q-variate normal (see proof on board).
■ (Cor 3.1.1.1 of MKB) Any subset of elements of a multivariate normal vector are multivariate normal. In particular, all individual elements are univariate normal (see proof on board).
12 / 30
Properties
■ (Th 3.2.1 of MKB) If x ∼ Np(µ, Σ) and y = Ax + c, then y ∼ Nq(Aµ + c, AΣA′) (see proof on board).
■ (Cor 3.2.1.1 of MKB) If x ∼ Np(µ, Σ) with Σ > 0, then y = Σ−1/2(x − µ) ∼ Np(0, I) and (x − µ)′Σ−1(x − µ) =Pp
i=1yi2 ∼ χ2p (see proof on board).
13 / 30
Data matrices
■ Let x1, . . . , xn be a random sample from N (µ, Σ). Then we call X = (x1, . . . , xn)′ a data matrix from N (µ, Σ) or a normal data matrix.
■ (Th. 3.3.2 of MKB, without proof) If X(n × p) is a normal data matrix from Np(µ, Σ) and if Y (m × q) satisfies Y = AXB, then Y is a normal matrix iff the following two properties hold:
◆ A1 = α1 for some scalar α, or B′µ = 0
◆ AA′ = βI for some scalar β, or B′ΣB = 0
When both these conditions are satisfied then Y is a normal data matrix from Nq(αB′µ, βB′ΣB).
Note: Pre-multiplication with A means that we take linear combinations of the rows. The conditions on A ensure that the new rows are independent. Post-multiplication with B means that we take linear combinations of the columns (variables).
14 / 30
Wishart distribution 15 / 30
Quadratic forms of normal data matrices
■ We now consider quadratic forms of normal data matrices, i.e., functions of the form X′CX for some symmetric matrix C.
■ A special case of such a quadratic form is the covariance matrix, which we obtain when C = n−1(I − n−111′) (see proof on board; H = I − n−111′ is called the centering matrix).
16 / 30
Definition
■ If M (p × p) can be written as X′X where X(m × p) is a data matrix from Np(0, Σ), then M is said to have a p-variate Wishart distribution with scale matrix Σ and m degrees of freedom. We write M ∼ Wp(Σ, m). When Σ = Ip, then the distribution is said to be in standard form.
■ Note: The Wishart distribution is a multivariate generalization of the χ2 distribution: when p = 1, the W1(σ2, m) distribution is given by x′x, where x ∈ Rm contains i.i.d. N1(0, σ2) variables. Hence, W1(σ2, m) = σ2χ2m.
17 / 30
Properties
■ (Th 3.4.1 of MKB) If M ∼ Wp(Σ, m) and B is a p × q matrix, then B′M B ∼ Wq(B′ΣB, m) (see proof on board).
■ (Cor 3.4.1.1 of MKB) Diagonal submatrices of M (square submatrices of M whose diagonal corresponds to the diagonal of M ) have a Wishart distribution.
■ (Cor 3.4.1.2 of MKB) Σ−1/2M Σ−1/2∼ Wp(I, m)
■ (Cor 3.4.1.3 of MKB) If M ∼ Wp(I, m) and B(p × q) satisfies B′B = Iq, then B′M B ∼ Wq(I, m)
■ (Cor 3.4.2.1 of MKB) The ith diagonal element of M , mii, has a σi2χ2m distribution (where σi2 is the ith diagonal element of Σ).
All these corollaries follow by choosing particular values of B in Th 3.4.1.
18 / 30
Properties
■ (Th 3.4.3 of MKB) If M1∼ Wp(Σ, m1) and M2 ∼ Wp(Σ, m2), and if M1 and M2 are independent, then M1+ M2∼ Wp(Σ, m1+ m2) (see proof on board).
19 / 30
Properties
■ (Th 3.4.4 of MKB) If X(n × p) is a data matrix from Np(0, Σ) and C(n × n) is a symmetric matrix, then
◆ X′CX has the same distribution as a weighted sum of independent Wp(Σ, 1) matrices, where the weights are eigenvalues of C;
◆ X′CX has a Wishart distribution if C is idempotent. In this case X′CX ∼ Wp(Σ, r) where r = tr(C) = rank(C);
◆ If S = n−1X′HX is the sample covariance matrix, then nS ∼ Wp(Σ, n − 1).
(See proof on board)
20 / 30
Hotelling’s T
2distribution 21 / 30
Definition
■ If α can be written as md′M−1d where d and M are independently distributed as Np(0, I) and Wp(I, m), then we say that α has the Hotelling T2 distribution with parameters p and m. We write α ∼ T2(p, m).
■ Hotelling’s T2 distribution is a generalization of the student t-distribution. If x ∼ tm, then x2∼ T2(1, m).
22 / 30
One-sample T2 test
■ (Th 3.5.1 of MKB) If x and M are independently distributed as Np(µ, Σ) and Wp(Σ, m) then m(x − µ)′M−1(x − µ) ∼ T2(p, m) (see proof on board).
■ (Cor 3.5.1.1 of MKB) if ¯x and S are the mean vector and covariance matrix of a sample of size n from Np(µ, Σ), and Su= (n/(n − 1))S, then
T12 = (n − 1)(¯x − µ)′S−1(¯x − µ) = n(¯x − µ)Su−1(¯x − µ) has a T2(p, n − 1) distribution (see proof on board).
23 / 30
Relationship with F distribution
■ The T2 distribution is not readily available in R. But the T2 distribution is closely related to the F -distribution:
(Th 3.5.2 of MKB, without proof)
T2(p, m) = {mp/(m − p + 1)}Fp,m−p+1.
■ (Cor 3.5.2.1 of MKB) If ¯x and S are the mean and covariance of a sample of size n from Np(µ, Σ), then (n−1)pn−p T12 has a Fp,n−p distribution.
24 / 30
Mahalonobis distance
■ The so-called Mahalonobis distance between two populations with means µ1 and µ2 and common covariance matrix Σ is given by ∆, where
∆2 = (µ1− µ2)′Σ−1(µ1− µ2)
In other words, D is the euclidian distance between the re-scaled vectors Σ−1/2µ1 and Σ−1/2µ2.
■ The sample version of the Mahalonobis distance, D, is defined by D2 = (¯x1− ¯x2)′Su−1(¯x1− ¯x2),
where Su= (n1S1+ n2S2)/(n − 2), ¯xi is the sample mean of sample i, ni is the sample size of sample i, and n = n1+ n2.
25 / 30
Two-sample T2 test
■ (Th 3.6.1 of MKB, without proof) If X1 and X2 are independent data matrices, and if the ni rows of Xi are i.i.d. Np(µi, Σi), i = 1, 2, then when µ1= µ2 and Σ1 = Σ2,
T22= n1n2 n D2 has a T2(p, n − 2) distribution.
■ Corollary: n−1−p(n−2)pT22 has a Fp,n−1−p distribution.
26 / 30
Final remarks 27 / 30
Assumptions for one-sample T2 test
■ We have a simple random sample from the population
■ The population has a multivariate normal distribution
28 / 30
Assumptions for two-sample T2 test
■ We have a simple random sample from each population
■ In each population the variables have a multivariate normal distribution
■ The two populations have the same covariance matrix
29 / 30
Multivariate test versus univariate tests See board
30 / 30