Correlation in Random Variables

(1)

Correlation in Random Variables

Lecture 11

(2)

Correlation in Random Variables

Suppose that an experiment produces two random vari-ables, _X and _Y . What can we say about the relationship be-tween them?

One of the best ways to visu-alize the possible relationship is to plot the (_{X, Y} ) pair that is produced by several trials of the experiment. An example of correlated samples is shown

(3)

Joint Density Function

The joint behavior of _X and _Y is fully captured in the joint probability distribution. For a continuous distribution

E[_Xm_Y n] =

_∞

−∞ x

m_yn_f

XY (x, y)dxdy

For discrete distributions

E[_Xm_Y n] = x∈Sx

y∈Sy

(4)

Covariance Function

The covariance function is a number that measures the common variation of _X and _Y. It is deﬁned as

cov(_{X, Y} ) = _E[(_X − _E[_X])(_Y − _E[_Y ])] = _E[_XY ] − _E[_X]_E[_Y ]

The covariance is determined by the diﬀerence in _E[_XY ] and _E[_X]_E[_Y ]. If _X and _Y were statistically independent then _E[_XY ] would equal

E[_X]_E[_Y ] and the covariance would be zero.

The covariance of a random variable with itself is equal to its vari-ance. cov[_{X, X}] = _E[(_X − _E[_X])2] = var[_X]

(5)

Correlation Coeﬃcient

The covariance can be normalized to produce what is known as the correlation coeﬃcient, _ρ.

ρ = cov(X,Y) var(X)var(Y)

The correlation coeﬃcient is bounded by −1 ≤ _ρ ≤ 1_. It will have value _ρ = 0 when the covariance is zero and value _ρ = ±1 when _X and _Y are perfectly correlated or anti-correlated.

(6)

Autocorrelation Function

The autocorrelation function is very similar to the covariance func-tion. It is deﬁned as

R(_{X, Y} ) = _E[_XY ] = cov(X, Y ) + E[X]E[Y ]

It retains the mean values in the calculation of the value. The random variables are orthogonal if _R(_{X, Y} ) = 0_.

(7)

Normal Distribution

f_XY (_{x, y}) = 1 2_πσ_x_σ_y

1 − _ρ2

exp     −

x−µx

σx

₂

− 2_ρ x−µx σx

_y₋_µ

y

σy

+ y−µy σy

₂ 2(1 − _ρ2)

    

(8)

Normal Distribution

The orientation of the elliptical contours is along the line _y = _x if

ρ > 0 and along the line _y = −_x if _{ρ <} 0_. The contours are a circle, and the variables are uncorrelated, if _ρ = 0_. The center of the ellipse is (_µ_x_{, µ}_y) _.

(9)

Linear Estimation

The task is to construct a rule for the prediction ˆ_Y of _Y based on an observation of _X.

If the random variables are correlated then this should yield a better result, on the average, than just guessing. We are encouraged to select a linear rule when we note that the sample points tend to fall about a sloping line.

ˆ

Y = _aX + _b

where _a and _b are parameters to be chosen to provide the best results. We would expect _a to correspond to the slope and _b to the intercept.

(10)

Minimize Prediction Error

To ﬁnd a means of calculating the coeﬃcients from a set of sample points, construct the predictor error

ε = _E[(_Y − _Yˆ)2]

We want to choose _a and _b to minimize _ε. Therefore, compute the appropriate derivatives and set them to zero.

∂ε

∂a = −2E[(Y −

ˆ

Y )∂Yˆ

∂a ] = 0 ∂ε

∂b = −2E[(Y −

ˆ

Y )∂Yˆ

∂b ] = 0

These can be solved for _a and _b in terms of the expected values. The expected values can be themselves estimated from the sample set.

(11)

Prediction Error Equations

The above conditions on _a and _b are equivalent to

E[(_Y − _Yˆ)_X] = 0

E[_Y − _Yˆ] = 0

The prediction error _Y −_Yˆ must be orthogonal to _X and the expected prediction error must be zero.

Substituting ˆ_Y = _aX + _b leads to a pair of equations to be solved for _a and _b.

E[(_Y − _aX − _b)_X] = _E[_XY ] − _aE[_X2] − _bE[_X] = 0

E[_Y − _aX − _b] = _E[_Y ] − _aE[_X] − _b = 0

E[_X2] _E[_X]

E[_X] 1

a b

=

E[_XY ]

E[_Y ]

(12)

Prediction Error

a = cov(X,Y) var(X)

b = _E[_Y ] − cov(X,Y)

var(X) E[X]

The prediction error with these parameter values is

ε = (1 − _ρ2)var(Y)

When the correlation coeﬃcient _ρ = ±1 the error is zero, meaning that perfect prediction can be made.

When _ρ = 0 the variance in the prediction is as large as the variation in _Y, and the predictor is of no help at all.

For intermediate values of _ρ, whether positive or negative, the pre-dictor reduces the error.

(13)

Linear Predictor Program

The program lp.pro computes the coeﬃcients [_{a, b}] as well as the covariance matrix _C and the correlation coeﬃcient, _ρ. The covari-ance matrix is

C =

var[X] cov[X, Y ] cov[X, Y ] var[Y ]

Usage example: N=100 X=Randomn(seed,N) Z=Randomn(seed,N) Y=2*X-1+0.2*Z p=lp(X,Y,c,rho) print,’Predictor Coefficients=’,p print,’Covariance matrix’ print,c print,’Correlation Coefficient=’,rho

(14)

Program lp

function lp,X,Y,c,rho,mux,muy

;Compute the linear predictor coefficients such that ;Yhat=aX+b is the minimum mse estimate of Y based on X.

;Shorten the X and Y vectors to the length of the shorter. n=n_elements(X) < n_elements(Y)

X=(X[0:n-1])[*] Y=(Y[0:n-1])[*]

;Compute the mean value of each. mux=total(X)/n

muy=total(Y)/n

(15)

Program lp (continued)

;Compute the covariance matrix. V=[[X-mux],[Y-muy]]

C=V##transpose(V)/(n-1)

;Compute the predictor coefficient and constant. a=c[0,1]/c[0,0]

b=muy-a*mux

;Compute the correlation coefficient rho=c[0,1]/sqrt(c[0,0]*c[1,1])

Return,[a,b] END

(16)

Example

Predictor Coefficients= 1.99598 -1.09500 Correlation Coefficient=0.812836

Covariance Matrix

0.950762 1.89770 1.89770 5.73295

(17)

IDL Regress Function

IDL provides a number of routines for the analysis of data. The function REGRESS does multiple linear regression.

Compute the predictor coeﬃcient _a and constant _b by

a=regress(X,Y,Const=b) print,[a,b]

(18)

New Concept

Introduction to Random Processes

(19)

(20)

Random Process

• A random variable is a function _X(_e) that maps the set of ex-periment outcomes to the set of numbers.

• A random process is a rule that maps every outcome _e of an experiment to a function _X(_{t, e})_.

• A random process is usually conceived of as a function of time, but there is no reason to not consider random processes that are functions of other independent variables, such as spatial coordi-nates.

• The function _X(_{u, v, e}) would be a function whose value de-pended on the location (_{u, v}) and the outcome _e, and could be used in representing random variations in an image.

(21)

Random Process

• The domain of _e is the set of outcomes of the experiment. We assume that a probability distribution is known for this set.

• The domain of _t is a set, T _, of real numbers_.

• If T is the real axis then _X(_{t, e}) is a continuous-time random process

• If T is the set of integers then _X(_{t, e}) is a discrete-time random process

• We will often suppress the display of the variable _e and write _X(_t) for a continuous-time RP and _X[_n] or _X_n for a discrete-time RP.

(22)

Random Process

• A RP is a family of functions, _X(_{t, e})_. Imagine a giant strip chart recording in which each pen is identiﬁed with a diﬀerent _e. This family of functions is traditionally called an ensemble.

• A single function _X(_{t, e}_k) is selected by the outcome _e_k_. This is just a time function that we could call _X_k(_t)_. Diﬀerent outcomes give us diﬀerent time functions.

• If _t is ﬁxed, say _t = _t₁_, then _X(_t₁_{, e}) is a random variable. Its value depends on the outcome _e.

(23)

Moments and Averages

X(_t₁_{, e}) is a random variable that represents the set of samples across the ensemble at time _t₁

If it has a probability density function _f_X(_x; _t₁) then the moments are

m_n(_t₁) = _E[_Xn(_t₁)] =

_∞

−∞ x

n_f

X (x;t1) dx

The notation _f_X(_x; _t₁) may be necessary because the probability density may depend upon the time the samples are taken.

The mean value is _µ_X = _m₁_, which may be a function of time. The central moments are

E[(_X(_t₁) − _µ_X(_t₁))n] =

_∞

−∞ (x − µX(t1))

n _f

(24)

Pairs of Samples

The numbers _X(_t₁_{, e}) and _X(_t₂_{, e}) are samples from the same time function at diﬀerent times.

They are a pair of random variables (_X₁_{, X}₂)_.

They have a joint probability density function _f(_x₁_{, x}₂; _t₁_{, t}₂)_.

From the joint density function one can compute the marginal den-sities, conditional probabilities and other quantities that may be of interest.