That is, we want a functions. considerations to functions f that are. predictors are of the form. all. , Cov CY. j), isj. optimal " )) = o ci ) , n :

(1)

Linear Prediction

-Say

Y

and Wi

. .. _> Wn are random

variables

,

and

we want

to

predict

Y

_given

W , . .. . , Wn . That is , we

want

a

functions

f-

(

W , . ... , Wn

)

of Wi . -g Wn

that

is in

some sense a

good

predictor

of

Y .

We will -restrict our

considerations

to

functions

f _that _are

linear in Wi , . .

, Wm , i.e.

, all

predictors

are of

the

form

got a_, Wn t . . . t an W , . Let a -- Ca . . . . , an

)T

, My = ELY

)

Mi = E [ Wi) , i =L , ugh Mw = ( Mn , . ._, M,

JT

W = ( Wn , .._, W ,

JT

( so Mw = ECW

)

→ 8 = Cov ( Y , W

)

=

(

CovCY , Wn

)

. . .. , Cov CY , Wi

)

T

= Cov

(

W

)

c- hxn covariance matrix

of

W

So

Tij

= Cov ( Wnti

- i s

Wnt

_,

-j

)

,

isj

= I , .

.

, n

we will _say that

the

" best" or "

optimal

" set of

_coefficients

{ go _, a. . ... , an

}

are

those

that

minimize

the

_mean

squared

error : F-

[ (

Y - Cao ta , Wnt .-_i t an W ,

)

'

]

C*) To find

the

optimal

ao, a. . .

, an we

differentiate

(*

)

w. r.t .

to

each

_aj

and

set

the

derivative

to 0

,

then

solve

for ao ,a , , ..

, an .

Differentiate

wir .t. ao : F-

₍

_Y - Cao ta , Wnt -e. tanw_,

)

= o ci

)

Differentiate

wir't. aj ,

j

= _I , . . , n : E

[

(

Y - Cao tai Wnt .- t an W ,

)

Wnt,

-j

)

-o Lii

)

Cil

_gives

ELY

₎

= Ao t ai F-[wilt . _{. .} + an ECW

,

]

So

then

do = My -

(

9

, Un t . -it an M

,

)

(2)

The

predictor

now in terms

of

ai , . . , an is My - ATMw t a, Wh t .-. t an W_, = my - at Mw t a' -W

in

_, t at Cw - Mw

)

Now _,

_plugging

this

into Lii)

_gives

F-

[

₍

4-_my- at ( W -Mw

)

Wnt,

-j

)

= 0 ,

j

=L. . .. , h . Since ELY -my- at( W -Mw

)

= _o

(

from Ci)

)

, we can

also

write Cii)

as E

KY

-my- a'(W- Mw

)

(

Wnt , -

j

-anti

-j

)

]

= O ,

which

is

the

same as Cov

(

Y _, Wnt -j

)

-

at

Cov

(

W ,

Wnt

, -j

)

= O T vector of covariances ⇐ Cov

(

Y ,

Wnt

, -j

)

-e ai

Covlwnti

- i , Wha-j

)

= O ⇐

_8J

-. ai

Ting

= O ⇐ ₈ ; -

E

, ai

T.si

= _O ,

j

-- I . . . , n

-C.

ATT

₎

; =

(

Ta

)

; so

8

=

Ta

. That is

,

the

optimal

a

satisfies

Ta

= 8

.

Notation

we let P ( Y l w

)

denote

the

best

linear

_predictor

of Y based on W = ( Wn , ..

, W,

)T

,

%77.hn?!nIgaTCw-Mw),whereaisthesolutiofa

The minimum mean

squared

error is

F-

[

(

Y - PCYIW

)

'

]

-

ELLY

-

cmytatcw

-Mw

)

'

]

= F-

[

₍

( Y - Ny

)

-

ATC

w -Mw

)

'

)

=

E[

( Y -my)'

)

t

E[

at

( W -Mw

)

(

W - Mw

)

Ta

)

-

2aTE[

( Y -my

)

( W -Mw

)

(3)

= Var LY

)

t

at

Ina

- za

-18

= 8 =

Var

( Y

)

t aT 8 - z a-18 =

Var

( Y

)

-

at

8-heminimumMSEisvarl4-a.IT

(4)

Note

that

when we are

predicting

a

random

variable

Y

using

MSE as

the

criterion , we are

assuming

that

_Var

_;

otherwise

the

MSE would be

equal

to • for

any

predictor

.

Also_, note

that

from Ci) and Iii) from

the

lecture

,

cis is

saying

that PCH 1W

)

_, which is

the

best linear

predictor

,

gives

residuals

which

are zero mean ,

Cii) is

saying

that Cov

(

Y - PLY 1W)

, W

)

= 0

;

that

is,

the

residuals

from the best linear

predictor

PCYIW) are

uncorrelated

with the

_predictor

variables W .

Properties

of

the

Best Linear Predictor

-we can treat the

predictor

PCYIW

)

as an

operator

PC . I w

)

acting

on random variables ,

taking

them to their best linear

prediction

based on W . As an

operator

, PC . I w

)

is a linear

operator

.

Suppose

4_, , 22

and

B are

real

numbers

and

U and V are

random

variables

. Then we have

the

following

property

:

①

PC

w

)

= ₂ , PCU Iw

)

t 2.

PCVIW)

t p

Prot

. LHS is 2 , ECU

)

+ _asECU) _t B _t at ( W -Mw ) _,

where a satisfies Ta = Cov (2, Utd. Vt B

, W

)

Ca) on the RHS PCU I w) = ECU) t

(

a' "

ICW

-Mw

)

and pcvlw

)

-- E[

is

+ La "

_'t

' (W -Mw

)

,

where

a "'

_satisfies

_Ta

"' = Cov

(

U , W

)

and

e b) aw' satisfies Ta"' = Cov ( V , W

)

, cc)

and the RHS is a , ECU) t 2_, Ca

"'

₎

'₍ _W - Mw

)

t 2 , F-LV) th (a (W -Mw

)

+p = 2_, East 2.East B t

(

a , a "

't

2. a'

4)

"

(

w -Mw

)

-So the LHS will

equal

the

RHS if a = a , a "' ta, am . So we

will

show

2. a "

't

229'" satisfies

ea¥ti

T

(

x_, a"'t _2. a"'

)

=

Could

, U tart B , W

)

(5)

But

Could

, Uta. Vt B , w

)

=L, Cov ( U, W

)

t 2, GvCV, w

)

t covers, W

)

=L_, Cov Lu, w

)

the Cool V, W

)

since Cov ( B, W)

= ₀ Since

B

is a constant.

Thus , we want to

show

T

(

2 ,

a'

n

't

2. a "'

)

₌ 2_,

Cov

( U, w

)

t da Gv (V , W

)

But from Cb) and Cc)

, Ta"' = Cov C U ,w

)

and Ta "' = Cover_, w

)

So we

get

TK, a cult 2,a '"

)

₌ ₄ , Ta "' t 2_, Ta '_" =L_, Corfu_,w) t 2, Cov ( V, W

)

we can

extend

①

to di U, t - . - tan Un t B

,

where _dis-

-, 2n , B

are constants and Ui. ..

, Un are

random

variables . We

have

Pcd

, U, t . - t an Unt

Bl

w

)

= 2

, PCU, I w

)

t

Pla

Ust . .. tank

I

w

)

t

B

by

⑦

= 2

, PCU , I w

)

t da P cuz I w

)

t Pks Ust. .. tan Un

I

w

)

t B

'

,

by

①

again

=L

_, PCU_, Iw

)

t 2_. Plus 1W) t

. . .. t

Lnp

( Un l w) t

B

.

If the

Ui

" are

equal

to the Wi

" ,

then

we

have

⑦

P

(

2 , Wit . .. t 2h Wnt B

I

w

)

= _{di Wit}.. . tan Wnt p

By

property

① , Pca , W, t. ..tan Wnt

Btw

)

= 4 , PCW, 1W) t - attn PCW . I w

)

+

P

.

So it is sufficient to

show

that

PCWi 1W

)

= W

, for it. ... , n .

But

_Pcwilw

)

= ECW

;)

_t

at

( W - Mw

)

, where a

satisfies

Ta =

Cov

Cwi

, W

)

.

-T

But Cov( Wi, W) is

just the

(htt - i)

" _column

of T .

So

the

solution to Ta = Inti - it

th _column of T is a = ( O_, - .. _, O_, I . , O,- . . _, O ) T

t

(_htt - i) th

component

So

_Pcwilw

)

= E[Wi

)

_t

(

_Wi -Mi

)

=

Wi

(6)

If U is

uncorrelated

with

W then we

have

③

p ( ul w

)

= ECU) if Carl Usw

)

=

Or

_vector _of n _zeroes

.

We

have

PCUIW) = Ecu)

t.at

( W

-Mw) _, where a

satisfies

Ta = Corcu_, w

)

= O . The solution to

this

is

a = O .

(7)

We have

discussed the

linear

predictor

PC . I 'w

)

and its

properties

.

In

the

context of time series we are

mostly

interested

in

forecasting

.

Forecasting

In PC Y l w

)

_, if we take Y = Xnth.

and

W = ( Xn , . . , X ,

)T

_,

where

{

_Xd

is a

stationary

time series

,

then

linear

prediction

in

this

context is

called

forecasting

. The

authors

use

the

notation

Pn

Xanth

to _mean

_{Plxnth I}

_Xn , . . , X.

)

- -T refers to

[

_this is the

the number of _random _variable

past values of we are

trying

to

the time series

_predict

we are

basing

the

prediction

on .

Examples

one -

step

prediction

of an ARCH

, 101<1 We want

Pu

Xu , = P ( Xnti

I

Xn , . -y X ,

)

.

Methodic

Direct

method)

. We have

T

=

18×10

) 8×41 AN

_ii.

K Cn-_i

)

=

IoT

. . . .

:*

:

" :

1¥

:

::÷÷÷

:

' i , I \ ,

t.im

, '

i'

'

¥

:

_:

i ' . . . 8×4) i 0"' . . 02 0 I Also , 8_; = Cov

(

Xnti , Xn+, -j

)

= 8 ,

Cj

)

=

¥070

" so 8 =

7¥10

0 ' . ..

0h

)

' S.

Ta

= 8 is

given

by

i

:÷:÷÷

:

'

"

(8)

Note that

the

first

column

of T

multiplied

by

0 is 8

.

Therefore _, we can see

by

inspection

that

the

solution

is

o

, . . . . ,

of

. Then

the

best linear

predictor

is

'

Ii

_.ie#+ita

IT

. Here _, _{E[ Xi}

₎

' - _o for all i , and so we

get

Pan

Xue_, I Xn _as

the

best linear

predictor

of

_Xnt

_,

.

Methods

( use the

properties

of PL. I w

)

. We have

Xht

,

= 0 Xn t

Zhu

,

where {

Zt

}

is a zero mean WNCO

'

)

process. .

Therefore

,

Pn

Xn+, =

Pn

( 0Xn t Zinta

)

=

0Pa

Xn _t Pu Zhi

,

by

linearity

of

the

prediction operator

.

=

01

Xn _t P

,

Zn+

_,

by

property

②

= 0 Xn _t O

by property

③

=-D

Xn

Example

h

-step

prediction

_{of ARCH process}

with

101<1

, h 21 We want Pu Xu₊_h = Pn ( 0Xnth -i t 2-nth

)

= of Pn X nth-I = of' Pm Xn th -2 ( s =

01h

pm

Xn =

Oh

Xn

Example

One

_step

_prediction

for an AR

Cp

)

process .

Let _{{ Xt}

)

be a zero mean _,

stationary

process

satisfying

Xt

= 0 ,

Xt

-i t 02

Xt

-_z t - - t

Op

Xt -p t Zt ,

(9)

where

{ Zt

}

is a zero- _mean WN (

04 process . we wish to

predict

X_.n+_, in terms of Xn

, - -, X, . That is _, we

wish

to

_compute

Pn X. _{+ ,} = Pn

(

0 , Xn t . -u t

0pXn+

,

.pt

2-ht.

)

= 0 , Pn Xu t . . _. t

Op

Pn Xin +,

.pt

Pn

Zn+ i = Of , Xu t.. . . t

Op

Xnt , -p t Pn Zn't ,

If { Xt

)

is

causal

( we have not

discussed

conditions

for an AR

Cp

)

process to be

causal

)

_,

then

we

get

that Pn

Zn+

, = _o

and

then

Ph

Xu+ , = 0 , Xn t . . . t p Xnti -p .

Note

If

14+3

is a

stationary

process with nonzero mean

,

say

it ,

then

we can write

Yt

= Xt tu _,

where

{

Xfl

is

stationary

and zero - mean .

Then Pn Ynth = Pn (Xn

th t m

)

=

Pn

Xn₊_h t Ph M

= Pn

Xanth t M

Hee Pn

Ynth

=

Plinth

I Yn . . .

, Y,

)

.

So

above

we

have

that

PC

Ynth l Yn . . . ,

Y

.

)

=P(

Xn

th l

Ya

_, . . , Y,

)

t M .

However

, PC Xnti I Yn, .. ,

Y

,

)

=P ( Xnt,

I

Xn

_, ..

, X,

)

,

because

the

set of all

_possible

linear

_predictors

in terms of _Tn

, . .

, Y, is

the

same as

the

set

of all linear

predictors

in terms of Xn

, . . , X_, .

Therefore

,

PC

Ynth l Ya _, . . , Y,

)

= PC Xnt , I Xn _, . . , X,

)

t M . That is ,

the

best linear

predictor

of

_Ynth

in terms of Yn

,.

, Y, is

the

best linear

predictor

of _Xnth based on Xn

,

-y X,

plus

the mean M

. Thus , it is

sufficient

to _restrict _attention _to

only

zero -mean

processes when

doing

prediction

fo