• No results found

SF2940: Probability theory Lecture 8: Multivariate Normal Distribution

N/A
N/A
Protected

Academic year: 2021

Share "SF2940: Probability theory Lecture 8: Multivariate Normal Distribution"

Copied!
76
0
0

Loading.... (view fulltext now)

Full text

(1)

SF2940: Probability theory

Lecture 8: Multivariate Normal Distribution

Timo Koski

24.09.2014

(2)

Learning outcomes

Random vectors, mean vector, covariance matrix, rules of transformation

Multivariate normal R.V., moment generating functions, characteristic function, rules of transformation

Density of a multivariate normal RV Joint PDF of bivariate normal RVs

Conditional distributions in a multivariate normal distribution

(3)

PART 1: Mean vector, Covariance matrix, MGF,

Characteristic function

(4)

Vector Notation: Random Vector

A random vector X is a column vector

X =

 

  X

1

X

2

.. . X

n

 

 

= ( X

1

, X

2

, . . . , X

n

)

T

Each X

i

is a random variable.

(5)

Sample Value Random Vector

A column vector

x =

 

  x

1

x

2

.. . x

n

 

 

= ( x

1

, x

2

, . . . , x

n

)

T

We can think of x

i

is an outcome of X

i

.

(6)

Joint CDF, Joint PDF

The joint CDF (=cumulative distribution function) of a continuous random vector X is

F

X

( x ) = F

X1,...,Xn

( x

1

, . . . , x

n

) = P ( X ≤ x ) =

= P ( X

1

x

1

, . . . , X

n

x

n

) Joint probability density function (PDF)

f

X

( x ) =

n

∂x

1

. . . ∂x

n

F

X1,...,Xn

( x

1

, . . . , x

n

)

(7)

Mean Vector

µ

X

= E [ X ] =

 

 

E [ X

1

] E [ X

2

]

.. . E [ X

n

]

 

  ,

a column vector of means (=expectations) of X.

(8)

Matrix, Scalar Product

If X

T

is the transposed column vector (=a row vector), then XX

T

is a n × n matrix, and

X

T

X =

n

i=1

X

i2

is a scalar product, a real valued R.V..

(9)

Covariance Matrix of A Random Vector

Covariance matrix

C

X

: = E h

( X − µ

X

) ( X − µ

X

)

T

i where the element ( i , j )

C

X

( i, j ) = E [( X

i

µ

i

) ( X

j

µ

j

)]

is the covariance of X

i

and X

j

.

(10)

A Quadratic Form

x

T

C

X

x =

n

i=1 n

j=1

x

i

x

j

C

X

( i, j ) . We see that

=

n

i=1 n

j=1

x

i

x

j

E [( X

i

µ

i

) ( X

j

µ

j

)]

= E

"

n

i=1 n

j=1

x

i

x

j

( X

i

µ

i

) ( X

j

µ

j

)

#

(∗)

(11)

Properties of a Covariance Matrix

Covariance matrix is nonnegative definite, i.e., for all x we have x

T

C

X

x ≥ 0

Hence

det C

X

0.

The covariance matrix is symmetric

C

X

= C

XT

(12)

Properties of a Covariance Matrix

The covariance matrix is symmetric C

X

= C

XT

since

C

X

( i, j ) = E [( X

i

µ

i

) ( X

j

µ

j

)]

= E [( X

j

µ

j

) ( X

i

µ

i

)] = C

X

( j , i )

(13)

Properties of a Covariance Matrix

A covariance matrix is positive definite, x

T

C

X

x > 0 for all x 6= 0 iff

det C

X

> 0

(i.e. C

X

is invertible).

(14)

Properties of a Covariance Matrix

Proposition

x

T

C

X

x ≥ 0 Pf: By (∗) above

x

T

C

X

x = x

T

E h

( X − µ

X

) ( X − µ

X

)

T

i x

= E h

x

T

( X − µ

X

) ( X − µ

X

)

T

x i

= E h

x

T

w · w

T

x i

where we have set w = ( X − µ

X

) . Then by linear algebra x

T

w = w

T

x

= ∑

ni=1

w

i

x

i

. Hence

E h

x

T

ww

T

x i

= E

n

=

w

i

x

i

!

2

 ≥ 0.

(15)

Properties of a Covariance Matrix

In terms of the entries c

i,j

of a covariance matrix C = ( c

i,j

)

n,n,i=1,j=1

there are the following necessary properties.

1

c

i,j

= c

j,i

(symmetry).

2

c

i,i

= Var ( X

i

) = σ

i2

≥ 0 (the elements in the main diagonal are the variances, and thus all elements in the main diagonal are

nonnegative).

3

c

i2,j

c

i,i

· c

j,j

(Cauchy-Schwartz’ inequality).

(16)

Coefficient of Correlation

The Coefficient of Correlation ρ of X and Y is defined as ρ : = ρ

X,Y

: = Cov ( X , Y )

p Var ( X ) · Var ( Y ) ,

where Cov ( X , Y ) = E [( X − µ

X

) ( Y − µ

Y

)] . This is normalized

1ρ

X,Y

1 For random variables X and Y ,

Cov ( X , Y ) = ρ

X,Y

= 0 does not always mean that X , Y are

independent.

(17)

Special case: Covariance Matrix of A Bivariate Vector

X = ( X

1

, X

2

)

T

.

C

X

=

 σ

12

ρσ

1

σ

2

ρσ

1

σ

2

σ

22

 ,

where ρ is the coefficient of correlation of X

1

and X

2

, and σ

12

= Var ( X

1

) , σ

22

= Var ( X

2

) . C

X

is invertible iff ρ

2

6= 1, for proof we note that

det C

X

= σ

12

σ

22

1 − ρ

2



(18)

Special case: Covariance Matrix of A Bivariate Vector

Λ =

 σ

12

ρσ

1

σ

2

ρσ

1

σ

2

σ

22

 , if ρ

2

6= 1, the inverse exists and

Λ

1

= 1

σ

12

σ

22

( 1 − ρ

2

)

 σ

22

ρσ

1

σ

2

ρσ

1

σ

2

σ

12



,

(19)

Y = BX + b

Proposition

X is a random vector with mean vector µ

X

and covariance matrix C

X

. B is a m × n matrix. If Y = BX + b, then

E Y = B µ

X

+ b C

Y

= BC

X

B

T

Pf: For simplicity of writing, take b = µ = 0. Then C

Y

= E YY

T

= EB X ( B X )

T

=

= EBXX

T

B

T

= BE h XX

T

i

B

T

= BC

X

B

T

(20)

Moment Generating and Characteristic Functions

Definition

Moment generating function of X is defined as

ψ

X

( t )

def

= Ee

tTX

= Ee

t1X1+t2X2+···+tnXn

Definition

Characteristic function of X is defined as

ϕ

X

( t )

def

= Ee

itTX

= Ee

i(t1X1+t2X2+···+tnXn)

Special cases: take t

1

= 1, t

2

= t

3

= . . . = t

n

= 0, then

ϕ

X

( t ) = ϕ

X1

( t

1

) .

(21)

PART 2: Def I of a multivariate normal distribution

We recall first some of the properties of univariate normal distribution

(22)

Normal (Gaussian) One-dimensional RVs

X is a normal random variable if f

X

( x ) = 1

σ

e

2σ21 (xµ)2

where µ is real and σ > 0.

Notation: X ∈ N ( µ, σ

2

)

Properties: E ( X ) = µ, Var = σ

2

(23)

Normal (Gaussian) One-dimensional RVs

−2 0 2 4 6

0 0.2 0.4 0.6 0.8

x

f X(x)

−2 0 2 4 6

0 0.2 0.4 0.6 0.8

x

fX(x)

(a)

µ = 2, σ = 1/2 , (b) µ = 2, σ = 2

(24)

Linear Transformation

X ∈ N ( µ

X

, σ

2

) ⇒ Y = aX + b is N ( aµ

X

+ b, a

2

σ

2

) Thus Z =

XσµX

X

N ( 0, 1 ) and

P ( X ≤ x ) = P  X − µ

X

σ

X

xµ

X

σ

X



or

F

X

( x ) = P



Z ≤ xµ

X

σ

X



= Φ  xµ

X

σ

X



(25)

Normal (Gaussian) One-dimensional RVs

X ∈ N ( µ, σ

2

) then the moment generating function is ψ

X

( t ) = E h

e

tX

i

= e

+12t2σ2

, and the characteristic function is

ϕ

X

( t ) = E h e

itX

i

= e

itµ12t2σ2

as found in previous Lectures.

(26)

Multivariate Normal Def. I

Definition

An n × 1 random vector X has a normal distribution iff for every n × 1-vector a the one-dimensional random vector a

T

X has a normal distribution.

We write X ∈ N ( µ, Λ ) , when µ is the mean vector and Λ is the

covariance matrix.

(27)

Consequences of Def. I (1)

An n × 1 vector XN ( µ, Λ ) iff the one-dimensional random vector a

T

X has a normal distribution for every n-vector a .

Now we know that (take B = a

T

in the preceding) Ea

T

X = a

T

µ, Var h

a

T

X i

= a

T

Λa

(28)

Consequences of Def. I (2)

Hence, if Y = a

T

X, then Y ∈ N a

T

µ, a

T

Λa  and the moment generating function of Y is

ψ

Y

( t ) = E h e

tY

i

= e

taTµ+12t2aTΛa

. Therefore

ψ

X

( a ) = Ee

aTX

= ψ

Y

( 1 ) = e

aTµ+12aTΛa

.

(29)

Consequences of Def. I (3)

Hence we have shown that if X ∈ N ( µ, Λ ) , then

ψ

X

( t ) = Ee

tTX

= e

tTµ+21tTΛt

.

is the moment generating function of X.

(30)

Consequences of Def. I (4)

In the same way we can find that

ϕ

X

( t ) = Ee

itTX

= e

itTµ12tTΛt

.

is the characteristic function of X ∈ N ( µ, Λ ) .

(31)

Consequences of Def. I (5)

Let Λ be a diagonal covariance matrix with λ

2i

s on the main diagonal, i.e.,

Λ =

 

 

 

λ

21

0 0 . . . 0 0 λ

22

0 . . . 0 0 0 λ

23

. . . 0 0 . .. ... . . . 0 0 0 0 . . . λ

2n

 

 

  ,

Proposition

If X ∈ N ( µ, Λ ) , then X

1

, X

2

, . . . , X

n

are independent normal variables.

(32)

Consequences of Def. I (6)

Pf: Λ is diagonal, the quadratic form becomes a single sum of squares.

ϕ

X

( t ) = e

itTµ12tTΛt

=

= e

i ∑ni=1µiti12ni=1λ2iti2

= e

iµ1t112λ21t12

e

2t212λ22t22

· · · e

iµntn12λ2nt2n

is the product of the characteristic functions of X

i

N µ

i

, λ

2i

 , which are thus seen to be independent N µ

i

, λ

2i



.

(33)

Kac’s theorem: Thm 8.1.3. in LN

Theorem

X = ( X

1

, X

2

, · · · , X

n

)

. The components X

1

, X

2

, · · · , X

n

are independent if and only if

φ

X

( s ) = E h e

isX

i =

n

i=1

φ

Xi

( s

i

) ,

where φ

Xi

( s

i

) is the characteristic function for X

i

.

(34)

Further properties of the multivariate normal

X ∈ N ( µ, Λ )

Every component X

k

is one-dimensional normal. To prove this we take a = ( 0, 0, . . . , 1

|{z}

position k

, 0, . . . , 0 )

T

and the conclusion follows by Def. I.

X

1

+ X

2

+ · · · X

n

is one-dimensional normal. Note: The terms in the

sum need not be independent.

(35)

Properties of multivariate normal

X ∈ N ( µ, Λ )

Every marginal distribution of k variables ( 1 ≤ k < n is normal. To

prove this we consider any k variables X

i1

, X

i2

. . . X

ik

and then take a

such that a

j

= 0 for j 6= i

1

, . . . i

k

and then apply Def. I.

(36)

Properties of multivariate normal

Proposition

X ∈ N ( µ, Λ ) and Y = BX + b. Then

Y ∈ N  B µ + b, BΛB

T

 . Pf:

ψ

Y

( s ) = E h e

sTY

i

= E h

e

sT(b+BX)

i

=

= e

sTb

E h e

sTBX

i

= e

sTb

E



e (

BTs

)

TX



E



e (

BTs

)

TX

 = ψ

X

 B

T

s 

.

(37)

Properties of multivariate normal

X ∈ N ( µ, Λ ) ψ

X

 B

T

s 

= e (

BTs

)

Tµ+21

(

BTs

)

TΛ

(

BTs

) .

 B

T

s 

T

µ = s

T

B µ,

 B

T

s 

T

Λ  B

T

s 

= s

T

BΛB

T

s,

e (

BTs

)

Tµ+12

(

BTs

)

TΛ

(

BTs

) = e

sT+12sTBΛBTs

(38)

Properties of multivariate normal

ψ

X

 B

T

s 

= e

sTBµ+21sTBΛBTs

.

ψ

Y

( s ) = e

sTb

ψ

X

 B

T

s 

= e

sTb

e

sTBµ+12sTBΛBTs

ψ

Y

( s ) = e

sT(b+)+12sTBΛBTs

,

which proves the claim as asserted.

(39)

PART 3: Multivariate normal, Def. II: characteristic

function, DEF III: density

(40)

Multivariate normal, Def. II: char. fnctn

Definition

A random vector X with mean vector µ and a covariance matrix Λ is N ( µ, Λ ) if its characteristic function is

ϕ

X

( t ) = Ee

itTX

= e

itTµ12tTΛt

.

(41)

Multivariate normal, Def. II implies Def. I

We need to show that the one-dimensional random vector Y = a

T

X has a normal distribution.

ϕ

Y

( t ) = E h e

itY

i

= E h

e

it ∑ni=1ai·Xi

i

=

= E h e

itaTX

i

= ϕ

X

( ta ) =

= e

itaTµ12t2aTΛa

and this is the characteristic function of N a

T

µ, a

T

Λa 

.

(42)

Multivariate normal, Def. III: joint PDF

Definition

A random vector X with mean vector µ and an invertible covariance matrix Λ is N ( µ, Λ ) , if the density is

f

X

( x ) = 1

( )

n/2

p det ( Λ ) e

12(x−µ)TΛ1(x−µ)

(43)

Multivariate normal

It can be checked by a computation that e

itTµ21tTΛt

=

Z

Rn

e

itTx

1

( )

n/2

p det ( Λ ) e

12(x−µ)TΛ1(x−µ)

d x

(complete the square) Hence Def. III implies the property in Def. II. The

three definitions are equivalent, in the case inverse of the covariance

matrix exists.

(44)

PART 4: Bivariate normal with density

(45)

Multivariate Normal: the bivariate case

As soon as ρ

2

6= 1, the matrix Λ =

 σ

12

ρσ

1

σ

2

ρσ

1

σ

2

σ

22

 , is invertible, and the inverse is

Λ

1

= 1

σ

12

σ

22

( 1 − ρ

2

)

 σ

22

ρσ

1

σ

2

ρσ

1

σ

2

σ

12



,

(46)

Multivariate Normal: the bivariate case

ρ

2

6= 1, and X = ( X

1

, X

2

)

T

, then f

X

( x ) = 1

det Λ e

12(xµX)TΛ1(xµX)

= 1

2πσ

1

σ

2

p 1 − ρ

2

e

12Q(x1,x2)

(47)

Multivariate Normal: the bivariate case

where

Q ( x

1

, x

2

) = 1

( 1 − ρ

2

) ·

" x

1

µ

1

σ

1



2

( x

1

µ

1

)( x

2

µ

2

)

σ

1

σ

2

+  x

2

µ

2

σ

2



2

#

For this, invert the matrix Λ and expand the quadratic form !

(48)

ρ = 0

0 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0

0

3 2 1 0 -1 -2 -3

0

3 2

1 0

-1 -2

-3

(49)

ρ = 0.9

0 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0

0

3 2 1 0 -1 -2 -3

0

3 2

1 0

-1 -2

-3

(50)

ρ = − 0.9

0 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0

0

3 2 1 0 -1 -2 -3

0

3 2

1 0

-1 -2

-3

(51)

Conditional densities for the bivariate normal

Complete the square of the exponent to write f

X,Y

( x, y ) = f

X

( x ) f

Y|X

( y ) where

f

X

( x ) = 1 σ

1

e

1 2σ21

(x−µ1)2

f

Y|X

( y ) = 1

˜σ

2

e

1 2 ˜σ22

(y−˜µ2(x))2

˜

µ

2

( x ) = µ

2

+ ρ σ

2

σ

1

( x − µ

1

) , ˜σ

2

= σ

2

q

1 − ρ

2

(52)

Bivariate normal properties

E ( X ) = µ

1

Given X = x, Y is Gaussian

Conditional mean of Y given X = x:

˜

µ

2

( x ) = µ

2

+ ρ σ

2

σ

1

( x − µ

1

) = E ( Y | X = x ) Conditional variance of Y given X = x:

Var ( Y | X = x ) = σ

22

1 − ρ

2



(53)

Bivariate normal properties

Conditional mean of Y given X = x:

˜

µ

2

( x ) = µ

2

+ ρ σ

2

σ

1

( x − µ

1

) = E ( Y | X = x ) Conditional variance of Y given X = x:

Var ( Y | X = x ) = σ

22

1 − ρ

2



Check Section 3.7.3. and Exercise 3.8.4.6. By this is seen that the conditional mean of Y given X variable in a bivariate normal

distribution is also the best LINEAR predictor of Y based on X , and

the conditional variance is the variance of the estimation error.

(54)

Marginal PDFs

(55)

Proof of conditional pdf

Consider

f

X,Y

( x, y ) f

X

( x ) = σ

1

2πσ

1

σ

2

p 1 − ρ

2

e

12Q(x,y)+ 1

2σ21

(x−µ1)2

(56)

Proof of conditional pdf

1 2 Q ( x, y ) + 1

12

( x − µ

1

)

2

= − 1 2 H ( x, y ) ,

(57)

Proof of conditional pdfs

H ( x, y ) = 1

( 1 − ρ

2

) ·

"

x − µ

1

σ

1



2

( x − µ

1

)( y − µ

2

)

σ

1

σ

2

+  yµ

2

σ

2



2

#

 xµ

1

σ

1



2

(58)

Proof of conditional pdf

H ( x, y ) = ρ

2

( 1 − ρ

2

)

( x − µ

1

)

2

σ

12

( x − µ

1

)( y − µ

2

)

σ

1

σ

2

( 1 − ρ

2

) + ( y − µ

2

)

2

σ

22

( 1 − ρ

2

)

(59)

Proof of conditional pdf

H ( x, y ) =



y − µ

2

ρ

σσ21

( x − µ

1

) 

2

σ

22

( 1 − ρ

2

)

(60)

Conditional pdf

f

X,Y

( x, y ) f

X

( x ) = 1

p 1 − ρ

2

σ

2

e

−12

(

yµ2ρσ2 σ1(xµ1)

)

2

σ22(1ρ2)

This establishes the bivariate normal properties claimed above.

(61)

Bivariate normal properties : ρ

Proposition

( X , Y ) bivariate normal ⇒ ρ = ρ

X,Y

Proof:

E [( X − µ

1

)( Y − µ

2

)]

= E ( E ([( X − µ

1

)( Y − µ

2

)] | X ))

= E (( X − µ

1

) E [ Yµ

2

] | X ))

(62)

Bivariate normal properties : ρ

= E (( X − µ

1

) E [( Y − µ

2

)] | X ))

= E ( X − µ

1

) [ E ( Y | X ) − µ

2

]

= E (( X − µ

1

)



µ

2

+ ρ σ

2

σ

1

( X − µ

1

) − µ

2



= ρ σ

2

σ

1

E ( X − µ

1

)(( X − µ

1

))

(63)

Bivariate normal properties : ρ

= ρ σ

2

σ

1

E ( X − µ

1

)( X − µ

1

)

= ρ σ

2

σ

1

E ( X − µ

1

)

2

= ρ σ

2

σ

1

σ

12

= ρσ

2

σ

1

(64)

Bivariate normal properties : ρ

In other words we have checked that

ρ = E [( Xµ

1

)( Y − µ

2

)]

σ

2

σ

1

ρ = 0 ⇔ bivariate normal X , Y are independent.

(65)

PART 5: Generating a multivariate normal variable

(66)

Standard Normal Vector: definition

Z ∈ N ( 0, I ) is a standard normal vector.

I is the n × n identity matrix.

f

Z

( z ) = 1

( )

n/2

p det ( I ) e

12(z−0)TI1(z−0)

= 1

( )

n/2

e

12zTz

(67)

Distribution of X = AZ + b

X = AZ + b, Z is standard Gaussian, then X = N 

b, AA

T



(follows by a rule in the preceding)

(68)

Multivariate Normal: the bivariate case

If

Λ =

 σ

12

ρσ

1

σ

2

ρσ

1

σ

2

σ

22

 , then Λ = AA

T

, where

A =

 σ

1

0 ρσ

2

σ

2

p 1 − ρ

2



,

(69)

Standard Normal Vector

X ∈ N ( µ

X

, Λ ) , and A is such that Λ = AA

T

(An invertible matrix A with this property exists always, if Λ is positive definite (we need the symmetry of Λ, too.) Then

Z = A

1

( X − µ

X

) is a standard Gaussian vector.

Proof: We give the first idea of his proof, a rule of transformation.

(70)

Rule of transformation

If X has density f

X

( x ) , Y = AX + b, A is invertible, then

f

Y

( y ) = 1

| det A | f

X

A

−1

( y − b ) 

Note that if Λ = AA

T

, then

det Λ = det A · det A

T

= det A · det A = det A

2

, so that | det A | = √

det Λ.

(71)

Johann Carl Friedrich Gauss (30 April 1777 23 February

1855)

(72)

Diagonalizable Matrices

An n × n matrix A is orthogonally diagonalizable, if there is an orthogonal matrix P (i.e., P

T

P = PP

T

= I ) such that

P

T

AP = Λ,

where Λ is a diagonal matrix.

(73)

Diagonalizable Matrices

Theorem

If A is an n × n matrix, then the following are equivalent:

(i) A is orthogonally diagonalizable.

(ii) A has an orthonormal set of eigenvectors.

(iii) A is symmetric.

Since covariance matrices are symmetric, we have by the theorem above

that all covariance matrices are orthogonally diagonalizable.

(74)

Diagonalizable Matrices

Theorem

If A is a symmetric matrix, then

(i) Eigenvalues of A are all real numbers.

(ii) Eigenvectors from different eigenspaces are orthogonal.

That is, all eigenvalues of a covariance matrix are real.

(75)

Diagonalizable Matrices

Hence we have for any covariance matrix the spectral decomposition C =

n

i=1

λ

i

e

i

e

iT

, (1) where Ce

i

= λ

i

e

i

. Since C is nonnegative definite, and its eigenvectors are orthonormal,

0 ≤ e

iT

Ce

i

= λ

i

e

iT

e

i

= λ

i

,

and thus the eigenvalues of a covariance matrix are nonnegative.

(76)

Diagonalizable Matrices

Let now P be an orthogonal matrix such that P

C

X

P = Λ,

and X ∈ N ( 0, C

X

) , i.e., C

X

is a covariance matrix and Λ is diagonal (with the eigenvalues of C

X

on the main diagonal). Then if Y = P

T

X, we have that

Y ∈ N ( 0, Λ ) .

In other words, Y is a Gaussian vector and has independent components.

This method of producing independent Gaussians has several important

applications. One of these is the principal component analysis.

References

Related documents

ERDA using an 15 MeV oxygen ion beam under a scattering an- gle of 30 ° was also applied before and after TDS to determine the relative amount of helium released during the

This article examines key issues for companies to consider when advertising in social media, including through consumer endorsements and testimonials, and running a contest,

Since this might have important implications in the estimated relationship between evasion and tax rate, we have used transaction-level data on trade costs to generate a value

For more information visit the College Policies, Procedures and Guidelines webpage then click on the Academic Administration side tab and search for the document entitled

accommodated within the FMMS. 5) Calculate the Facility Condition Index for individual constructed assets through comparison of Deferred Maintenance Costs with Current

Attribute based encryption (ABE) is a promising cryptographic primitive, which has been generally connected to plan fine grained access control framework recently.. In any

undergraduates are able to create and use Powerpoints to communicate their ndings to others Most undergraduates are able to develop and use a speech (oral presentation) to