SF2940: Probability theory Lecture 8: Multivariate Normal Distribution

(1)

SF2940: Probability theory

Lecture 8: Multivariate Normal Distribution

Timo Koski

24.09.2014

(2)

Learning outcomes

Random vectors, mean vector, covariance matrix, rules of transformation

Multivariate normal R.V., moment generating functions, characteristic function, rules of transformation

Density of a multivariate normal RV Joint PDF of bivariate normal RVs

Conditional distributions in a multivariate normal distribution

(3)

PART 1: Mean vector, Covariance matrix, MGF,

Characteristic function

(4)

Vector Notation: Random Vector

A random vector X is a column vector

X =



 

  X

₁

X

₂

.. . X

_n



 

 

= ( X

₁

, X

₂

, . . . , X

n

)

^T

Each X

i

is a random variable.

(5)

Sample Value Random Vector

A column vector

x =



 

  x

₁

x

₂

.. . x

_n



 

 

= ( x

₁

, x

₂

, . . . , x

n

)

^T

We can think of x

i

is an outcome of X

i

.

(6)

Joint CDF, Joint PDF

The joint CDF (=cumulative distribution function) of a continuous random vector X is

F

_X

( x ) = F

_X₁_,...,X_n

( x

₁

, . . . , x

n

) = P ( X ≤ ^x ) =

= P ( X

₁

≤ ^x

¹

, . . . , X

n

≤ ^x

ⁿ

) Joint probability density function (PDF)

f

_X

( x ) = ^∂

n

∂x

1

. . . ∂x

n

F

_X₁_,...,X_n

( x

₁

, . . . , x

n

)

(7)

Mean Vector

µ

X

= E [ X ] =



 

 

E [ X

₁

] E [ X

₂

]

.. . E [ X

_n

]



 

  ,

a column vector of means (=expectations) of X.

(8)

Matrix, Scalar Product

If X

^T

is the transposed column vector (=a row vector), then XX

^T

is a n × ⁿ matrix, and

X

^T

X =

n

∑

i=1

X

_i²

is a scalar product, a real valued R.V..

(9)

Covariance Matrix of A Random Vector

Covariance matrix

C

_X

: = E h

( X − ^µ

^X

) ( X − ^µ

^X

)

^T

ⁱ where the element ( i , j )

C

_X

( i, j ) = E [( X

_i

− ^µ

ⁱ

) ( X

_j

− ^µ

^j

)]

is the covariance of X

i

and X

j

.

(10)

A Quadratic Form

x

^T

C

_X

x =

n

∑

i=1 n

∑

j=1

x

_i

x

_j

C

_X

( i, j ) . We see that

=

n

∑

i=1 n

∑

j=1

x

_i

x

_j

E [( X

_i

− ^µ

ⁱ

) ( X

_j

− ^µ

^j

)]

= E

"

_n

∑

i=1 n

∑

j=1

x

_i

x

_j

( X

_i

− ^µ

ⁱ

) ( X

_j

− ^µ

^j

)

#

(∗)

(11)

Properties of a Covariance Matrix

Covariance matrix is nonnegative definite, i.e., for all x we have x

^T

C

_X

x ≥ ⁰

Hence

det C

_X

≥ ^0.

The covariance matrix is symmetric

C

_X

= C

_X^T

(12)

Properties of a Covariance Matrix

The covariance matrix is symmetric C

_X

= C

_X^T

since

C

_X

( i, j ) = E [( X

_i

− ^µ

ⁱ

) ( X

_j

− ^µ

^j

)]

= E [( X

_j

− ^µ

^j

) ( X

_i

− ^µ

ⁱ

)] = C

_X

( j , i )

(13)

Properties of a Covariance Matrix

A covariance matrix is positive definite, x

^T

C

_X

x > ₀ for all x 6= ^{0 iff}

det C

_X

> ₀

(i.e. C

_X

is invertible).

(14)

Properties of a Covariance Matrix

Proposition

x

^T

C

_X

x ≥ ⁰ Pf: By (∗) ^above

x

^T

C

_X

x = x

^T

E h

( X − ^µ

X

) ( X − ^µ

X

)

^T

ⁱ x

= E h

x

^T

( X − ^µ

^X

) ( X − ^µ

^X

)

^T

x i

= E h

x

^T

w · ^w

^T

^x ⁱ

where we have set w = ( X − ^µ

^X

) . Then by linear algebra x

^T

w = w

^T

x

= ∑

ⁿ_i₌₁

w

_i

x

_i

. Hence

E h

x

^T

ww

^T

x i

= E





n

∑

=

w

i

x

i

!

2



 ≥ ^0.

(15)

Properties of a Covariance Matrix

In terms of the entries c

i,j

of a covariance matrix C = ( c

_i,j

)

^n,n,_i₌_1,j₌₁

there are the following necessary properties.

1

c

i,j

= c

j,i

(symmetry).

2

c

i,i

= Var ( X

i

) = σ

_i²

≥ 0 (the elements in the main diagonal are the variances, and thus all elements in the main diagonal are

nonnegative).

3

c

_i²_,j

≤ ^c

ⁱ^,i

· ^c

^j,j

(Cauchy-Schwartz’ inequality).

(16)

Coefficient of Correlation

The Coefficient of Correlation ρ of X and Y is defined as ρ : = ρ

X,Y

: = ^Cov ( X , Y )

p Var ( X ) · ^Var ( Y ) ^,

where Cov ( X , Y ) = E [( X − ^µ

^X

) ( Y − ^µ

^Y

)] . This is normalized

− ¹ ≤ ^ρ

^X^,Y

≤ ¹ For random variables X and Y ,

Cov ( X , Y ) = ρ

X,Y

= 0 does not always mean that X , Y are

independent.

(17)

Special case: Covariance Matrix of A Bivariate Vector

X = ( X

₁

, X

2

)

^T

.

C

_X

=

σ

₁²

ρσ

1

σ

2

ρσ

1

σ

2

σ

₂²

,

where ρ is the coefficient of correlation of X

1

and X

₂

, and σ

₁²

= Var ( X

₁

) , σ

₂²

= Var ( X

₂

) . C

_X

is invertible iff ρ

²

6= 1, for proof we note that

det C

_X

= σ

₁²

σ

₂²

1 − ^ρ

²

(18)

Special case: Covariance Matrix of A Bivariate Vector

Λ =

σ

₁²

ρσ

1

σ

2

ρσ

1

σ

2

σ

₂²

, if ρ

²

6= 1, the inverse exists and

Λ

⁻¹

= ¹

σ

₁²

σ

₂²

( 1 − ^ρ

²

)

σ

₂²

− ^ρσ

1

σ

2

− ^ρσ

¹

^σ

²

^σ

1²

,

(19)

Y = _BX + b

Proposition

X is a random vector with mean vector µ

X

and covariance matrix C

_X

. B is a m × n matrix. If Y = BX + b, then

E Y = B µ

X

+ b C

_Y

= BC

_X

B

^T

Pf: For simplicity of writing, take b = µ = 0. Then C

_Y

= E YY

^T

= EB X ( B X )

^T

=

= EBXX

^T

B

^T

= BE h XX

^T

i

B

^T

= BC

_X

B

^T

(20)

Moment Generating and Characteristic Functions

Definition

Moment generating function of X is defined as

ψ

X

( t )

^def

= Ee

^t^T^X

= Ee

^t¹^X¹⁺^t²^X²^+···+^tⁿ^Xⁿ

Definition

Characteristic function of X is defined as

ϕ

X

( t )

^def

= Ee

ⁱ^t^T^X

= Ee

ⁱ⁽^t¹^X¹⁺^t²^X²^+···+^tⁿ^Xⁿ⁾

Special cases: take t

₁

= 1, t

₂

= t

₃

= . . . = t

n

= 0, then

ϕ

X

( t ) = ϕ

X₁

( t

₁

) .

(21)

PART 2: Def I of a multivariate normal distribution

We recall first some of the properties of univariate normal distribution

(22)

Normal (Gaussian) One-dimensional RVs

X is a normal random variable if f

_X

( x ) = ¹

σ √

2π e

⁻^2σ2¹ ⁽^x⁻^µ⁾²

where µ is real and σ > _0.

Notation: X ∈ ^N ( µ, σ

²

)

Properties: E ( X ) = µ, Var = σ

²

(23)

Normal (Gaussian) One-dimensional RVs

−2 0 2 4 6

0 0.2 0.4 0.6 0.8

x

f X(x)

−2 0 2 4 6

0 0.2 0.4 0.6 0.8

x

fX(x)

(a)

µ = 2, σ = 1/2 , (b) µ = 2, σ = 2

(24)

Linear Transformation

X ∈ ^N ( µ

X

, σ

²

) ⇒ ^Y = aX + b is N ( aµ

X

+ b, a

²

σ

²

) Thus Z =

^X⁻_σ^µ^X

X

∈ ^N ( 0, 1 ) and

P ( X ≤ ^x ) = P X − ^µ

^X

σ

X

≤ ^x − ^µ

^X

σ

X

or

F

_X

( x ) = P

Z ≤ ^x − ^µ

^X

σ

X

= ^Φ ^x − ^µ

^X

σ

X

(25)

Normal (Gaussian) One-dimensional RVs

X ∈ ^N ( µ, σ

²

) then the moment generating function is ψ

X

( t ) = E h

e

^tX

i

= e

^tµ⁺¹²^t²^σ²

, and the characteristic function is

ϕ

X

( t ) = E h e

^itX

i

= e

^itµ⁻¹²^t²^σ²

as found in previous Lectures.

(26)

Multivariate Normal Def. I

Definition

An n × 1 random vector X has a normal distribution iff for every n × 1-vector a the one-dimensional random vector a

^T

X has a normal distribution.

We write X ∈ ^N ( µ, Λ ) , when µ is the mean vector and Λ is the

covariance matrix.

(27)

Consequences of Def. I (1)

An n × ^{1 vector X} ∈ ^N ( µ, Λ ) iff the one-dimensional random vector a

^T

X has a normal distribution for every n-vector a .

Now we know that (take B = a

^T

in the preceding) Ea

^T

X = a

^T

µ, Var h

a

^T

X i

= a

^T

Λa

(28)

Consequences of Def. I (2)

Hence, if Y = a

^T

X, then Y ∈ ^N ^a

^T

^{µ, a}

^T

^Λa and the moment generating function of Y is

ψ

Y

( t ) = E h e

^tY

i

= e

^ta^T^µ⁺¹²^t²^a^T^Λa

. Therefore

ψ

X

( a ) = Ee

^a^T^X

= ψ

Y

( 1 ) = e

^a^T^µ⁺¹²^a^T^Λa

.

(29)

Consequences of Def. I (3)

Hence we have shown that if X ∈ ^N ( µ, Λ ) , then

ψ

X

( t ) = Ee

^t^T^X

= e

^t^T^µ⁺²¹^t^T^Λt

.

is the moment generating function of X.

(30)

Consequences of Def. I (4)

In the same way we can find that

ϕ

X

( t ) = Ee

^it^T^X

= e

^it^T^µ⁻¹²^t^T^Λt

.

is the characteristic function of X ∈ ^N ( µ, Λ ) .

(31)

Consequences of Def. I (5)

Let Λ be a diagonal covariance matrix with λ

²_i

s on the main diagonal, i.e.,

Λ =



 

 

λ

²₁

0 0 . . . 0 0 λ

²₂

0 . . . 0 0 0 λ

²₃

. . . 0 0 . .. ... . . . 0 0 0 0 . . . λ

²n



 

  ,

Proposition

If X ∈ ^N ( µ, Λ ) , then X

₁

, X

₂

, . . . , X

n

are independent normal variables.

(32)

Consequences of Def. I (6)

Pf: Λ is diagonal, the quadratic form becomes a single sum of squares.

ϕ

X

( t ) = e

ⁱ^t^T^µ⁻¹²^t^T^Λt

=

= e

^{i ∑}ⁿⁱ⁼¹^µⁱ^tⁱ⁻¹²^∑ⁿⁱ⁼¹^λ²ⁱ^tⁱ²

= e

ⁱ^µ¹^t¹⁻¹²^λ²¹^t¹²

e

^iµ²^t²⁻¹²^λ²²^t²²

· · · ^e

ⁱ^µⁿ^tⁿ⁻¹²^λ²ⁿ^t²ⁿ

is the product of the characteristic functions of X

i

∈ ^N ^µ

ⁱ

^{, λ}

²i

, which are thus seen to be independent N µ

i

, λ

²_i

.

(33)

Kac’s theorem: Thm 8.1.3. in LN

Theorem

X = ( X

₁

, X

₂

, · · · ^{, X}

ⁿ

)

^′

. The components X

1

, X

₂

, · · · ^{, X}

ⁿ

^are independent if and only if

φ

X

( s ) = ^E ^h ^e

ⁱ^s^′^X

ⁱ =

n

∏

i=1

φ

Xi

( s

_i

) ,

where φ

Xi

( s

_i

) is the characteristic function for X

i

.

(34)

Further properties of the multivariate normal

X ∈ ^N ( µ, Λ )

Every component X

_k

is one-dimensional normal. To prove this we take a = ( 0, 0, . . . , 1

|{z}

position k

, 0, . . . , 0 )

^T

and the conclusion follows by Def. I.

X

₁

+ X

₂

+ · · · ^X

ⁿ

is one-dimensional normal. Note: The terms in the

sum need not be independent.

(35)

Properties of multivariate normal

X ∈ ^N ( µ, Λ )

Every marginal distribution of k variables ( 1 ≤ ^k < _n is normal. To

prove this we consider any k variables X

i₁

, X

i₂

. . . X

i_k

and then take a

such that a

j

= 0 for j 6= ⁱ

¹

^{, . . . i}

^k

and then apply Def. I.

(36)

Properties of multivariate normal

Proposition

X ∈ ^N ( µ, Λ ) and Y = BX + b. Then

Y ∈ ^N ^B ^µ + b, BΛB

^T

. Pf:

ψ

Y

( s ) = E h e

^s^T^Y

i

= E h

e

^s^T⁽^b⁺^B^X⁾

i

=

= e

^s^T^b

E h e

^s^T^BX

i

= e

^s^T^b

E

e (

^B^T^s

)

^T^X

E

e (

^B^T^s

)

^T^X

= ψ

X

B

^T

s

.

(37)

Properties of multivariate normal

X ∈ ^N ( µ, Λ ) ψ

X

B

^T

s

= e (

^B^T^s

)

^T^µ⁺2¹

(

^B^T^s

)

^T^Λ

(

^B^T^s

) _.

B

^T

s

T

µ = s

^T

B µ,

B

^T

s

T

Λ B

^T

s

= s

^T

BΛB

^T

s,

e (

^B^T^s

)

^T^µ⁺¹2

(

^B^T^s

)

^T^Λ

(

^B^T^s

) = e

^s^T^Bµ⁺¹²^s^T^B^ΛB^T^s

(38)

Properties of multivariate normal

ψ

X

B

^T

s

= e

^s^T^B^µ⁺²¹^s^T^BΛB^T^s

.

ψ

Y

( s ) = e

^s^T^b

ψ

X

B

^T

s

= e

^s^T^b

e

^s^T^B^µ⁺¹²^s^T^BΛB^T^s

ψ

Y

( s ) = e

^s^T⁽^b⁺^Bµ⁾⁺¹²^s^T^B^ΛB^T^s

,

which proves the claim as asserted.

(39)

PART 3: Multivariate normal, Def. II: characteristic

function, DEF III: density

(40)

Multivariate normal, Def. II: char. fnctn

Definition

A random vector X with mean vector µ and a covariance matrix Λ is N ( µ, Λ ) if its characteristic function is

ϕ

X

( t ) = ^Ee

^it^T^X

= e

^it^T^µ⁻¹²^t^T^Λt

.

(41)

Multivariate normal, Def. II implies Def. I

We need to show that the one-dimensional random vector Y = a

^T

X has a normal distribution.

ϕ

Y

( t ) = E h e

^itY

i

= E h

e

^{it ∑}ⁿⁱ⁼¹^aⁱ^·^Xⁱ

i

=

= E h e

^ita^T^X

i

= ϕ

X

( ta ) =

= e

^ita^T^µ⁻¹²^t²^a^T^Λa

and this is the characteristic function of N a

^T

µ, a

^T

Λa

.

(42)

Multivariate normal, Def. III: joint PDF

Definition

A random vector X with mean vector µ and an invertible covariance matrix Λ is N ( µ, Λ ) , if the density is

f

_X

( x ) = ¹

( 2π )

^n/2

^p det ( Λ ) ^e

−¹₂(x−µ)^TΛ⁻¹(x−µ)

(43)

Multivariate normal

It can be checked by a computation that e

^it^T^µ⁻²¹^t^T^Λt

=

Z

Rⁿ

e

^it^T^x

1 ( 2π )

^n/2

^p det ( Λ ) ^e

−¹₂(x−µ)^TΛ⁻¹(x−µ)

d x

(complete the square) Hence Def. III implies the property in Def. II. The

three definitions are equivalent, in the case inverse of the covariance

matrix exists.

(44)

PART 4: Bivariate normal with density

(45)

Multivariate Normal: the bivariate case

As soon as ρ

²

6= 1, the matrix Λ =

σ

₁²

ρσ

1

σ

2

ρσ

1

σ

2

σ

₂²

, is invertible, and the inverse is

Λ

⁻¹

= ¹

σ

₁²

σ

₂²

( 1 − ^ρ

²

)

σ

₂²

− ^ρσ

¹

^σ

²

− ^ρσ

1

σ

2

σ

₁²

,

(46)

Multivariate Normal: the bivariate case

ρ

²

6= ^{1, and X} = ( X

₁

, X

₂

)

^T

, then f

_X

( x ) = ¹

2π √

det Λ e

⁻¹²⁽^x⁻^µ^X⁾^T^Λ⁻¹⁽^x⁻^µ^X⁾

= ¹

2πσ

1

σ

2

p 1 − ^ρ

²

^e

−¹₂^Q(x1,x2)

(47)

Multivariate Normal: the bivariate case

where

Q ( x

₁

, x

₂

) = 1

( 1 − ^ρ

²

) ·

" x

₁

− ^µ

1

σ

1

2

− ^2ρ ( x

₁

− ^µ

1

)( x

₂

− ^µ

2

)

σ

1

σ

2

+ ^x

²

− ^µ

2

σ

2

#

For this, invert the matrix Λ and expand the quadratic form !

(48)

ρ = 0

0 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0

0

3 2 1 0 -1 -2 -3

0

3 2

1 0

-1 -2

-3

(49)

ρ = 0.9

0 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0

0

3 2 1 0 -1 -2 -3

0

3 2

1 0

-1 -2

-3

(50)

ρ = − ^0.9

0 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0

0

3 2 1 0 -1 -2 -3

0

3 2

1 0

-1 -2

-3

(51)

Conditional densities for the bivariate normal

Complete the square of the exponent to write f

_X_,Y

( x, y ) = f

_X

( x ) f

_Y_|_X

( y ) where

f

_X

( x ) = ¹ σ

1

√ 2π e

⁻

1 2σ21

(x−^µ1)²

f

_Y_|_X

( y ) = ¹

˜σ

2

√ 2π e

⁻

1 2 ˜σ22

(y−^˜µ2(x))²

˜

µ

2

( x ) = µ

2

+ ρ σ

2

σ

1

( x − ^µ

1

) , ˜σ

2

= σ

2

q

1 − ^ρ

²

(52)

Bivariate normal properties

E ( X ) = µ

1

Given X = x, Y is Gaussian

Conditional mean of Y given X = x:

˜

µ

2

( x ) = µ

2

+ ρ σ

2

σ

1

( x − ^µ

¹

) = E ( Y | ^X = x ) Conditional variance of Y given X = x:

Var ( Y | ^X = x ) = σ

₂²

1 − ^ρ

²

(53)

Bivariate normal properties

Conditional mean of Y given X = x:

˜

µ

2

( x ) = µ

2

+ ρ σ

2

σ

1

( x − ^µ

1

) = E ( Y | ^X = x ) Conditional variance of Y given X = x:

Var ( Y | ^X = x ) = σ

₂²

1 − ^ρ

²

Check Section 3.7.3. and Exercise 3.8.4.6. By this is seen that the conditional mean of Y given X variable in a bivariate normal

distribution is also the best LINEAR predictor of Y based on X , and

the conditional variance is the variance of the estimation error.

(54)

Marginal PDFs

(55)

Proof of conditional pdf

Consider

f

_X_,Y

( x, y ) f

_X

( x ) = σ

1

√

2π 2πσ

1

σ

2

p 1 − ^ρ

²

^e

−¹₂^Q(x,y)+ ¹

2σ21

(x−µ1)²

(56)

Proof of conditional pdf

− ¹ ₂ ^Q ( x, y ) + ¹

2σ

₁²

( x − ^µ

¹

)

²

= − ¹ ₂ ^H ( x, y ) ,

(57)

Proof of conditional pdfs

H ( x, y ) = 1

( 1 − ^ρ

²

) ·

"

x − ^µ

1

σ

1

2

− ^2ρ ( x − ^µ

1

)( y − ^µ

2

)

σ

1

σ

2

+ ^y − ^µ

2

σ

2

#

− ^x − ^µ

¹

σ

1

2

(58)

Proof of conditional pdf

H ( x, y ) = ρ

²

( 1 − ^ρ

²

)

( x − ^µ

¹

)

²

σ

₁²

− ^2ρ ( x − ^µ

¹

)( y − ^µ

²

)

σ

1

σ

2

( 1 − ^ρ

²

) + ( y − ^µ

²

)

²

σ

₂²

( 1 − ^ρ

²

)

(59)

Proof of conditional pdf

H ( x, y ) =

y − ^µ

2

− ^ρ

^σ_σ²₁

( x − ^µ

1

)

²

σ

₂²

( 1 − ^ρ

²

)

(60)

Conditional pdf

f

_X_,Y

( x, y ) f

_X

( x ) = 1

p 1 − ^ρ

²

^σ

2

√ 2π e



−¹₂

(

^y⁻µ2−ρσ2 σ1(x−_µ1)

)

²

σ22(1−^ρ2)





This establishes the bivariate normal properties claimed above.

(61)

Bivariate normal properties : ρ

Proposition

( X , Y ) bivariate normal ⇒ ^ρ = ρ

X,Y

Proof:

E [( X − ^µ

1

)( Y − ^µ

2

)]

= E ( E ([( X − ^µ

1

)( Y − ^µ

2

)] | ^X ))

= E (( X − ^µ

1

) E [ ^Y − ^µ

2

] | ^X ))

(62)

Bivariate normal properties : ρ

= E (( X − ^µ

1

) E [( Y − ^µ

2

)] | ^X ))

= E ( X − ^µ

¹

) [ E ( Y | ^X ) − ^µ

²

]

= E (( X − ^µ

¹

)

µ

2

+ ρ σ

2

σ

1

( X − ^µ

¹

) − ^µ

²

= ρ σ

2

σ

1

E ( X − ^µ

¹

)(( X − ^µ

¹

))

(63)

Bivariate normal properties : ρ

= ρ σ

2

σ

1

E ( X − ^µ

1

)( X − ^µ

1

)

= ρ σ

2

σ

1

E ( X − ^µ

¹

)

²

= ρ σ

2

σ

1

σ

₁²

= ρσ

2

σ

1

(64)

Bivariate normal properties : ρ

In other words we have checked that

ρ = ^E [( ^X − ^µ

1

)( Y − ^µ

2

)]

σ

2

σ

1

ρ = 0 ⇔ bivariate normal X , Y are independent.

(65)

PART 5: Generating a multivariate normal variable

(66)

Standard Normal Vector: definition

Z ∈ ^N ( 0, I ) is a standard normal vector.

I is the n × ⁿ identity matrix.

f

_Z

( z ) = ¹

( 2π )

^n/2

^p det ( I ) ^e

−¹₂(z−0)^TI⁻¹(z−0)

= ¹

( 2π )

^n/2

^e

−¹2z^Tz

(67)

Distribution of X = _AZ + b

X = AZ + b, Z is standard Gaussian, then X = N

b, AA

^T

(follows by a rule in the preceding)

(68)

Multivariate Normal: the bivariate case

If

Λ =

σ

₁²

ρσ

1

σ

2

ρσ

1

σ

2

σ

₂²

, then Λ = AA

^T

, where

A =

σ

1

0 ρσ

2

σ

2

p 1 − ^ρ

²

,

(69)

Standard Normal Vector

X ∈ ^N ( µ

X

, Λ ) , and A is such that Λ = AA

^T

(An invertible matrix A with this property exists always, if Λ is positive definite (we need the symmetry of Λ, too.) Then

Z = A

⁻¹

( X − ^µ

^X

) is a standard Gaussian vector.

Proof: We give the first idea of his proof, a rule of transformation.

(70)

Rule of transformation

If X has density f

_X

( x ) , Y = AX + b, A is invertible, then

f

_Y

( y ) = ¹

| ^{det A} | ^f

^X

^A

−1

( y − ^b )

Note that if Λ = AA

^T

, then

det Λ = det A · ^{det A}

^T

= det A · ^{det A} = det A

²

, so that | ^{det A} | = √

det Λ.

(71)

Johann Carl Friedrich Gauss (30 April 1777 23 February

1855)

(72)

Diagonalizable Matrices

An n × ⁿ matrix A is orthogonally diagonalizable, if there is an orthogonal matrix P (i.e., P

SF2940: Probability theory Lecture 8: Multivariate Normal Distribution