Linear Prediction
-Say
Y
and Wi
. .. > Wn are randomvariables
,
and
we wantto
predict
Ygiven
W , . .. . , Wn . That is , wewant
afunctions
f-(
W , . ... , Wn)
of Wi . -g Wnthat
is insome sense a
good
predictor
of
Y .We will -restrict our
considerations
tofunctions
f that are
linear in Wi , . .
, Wm , i.e.
, all
predictors
are of
the
form
got a, Wn t . . . t an W , . Let a -- Ca . . . . , an
)T
, My = ELY)
Mi = E [ Wi) , i =L , ugh Mw = ( Mn , . ., M,JT
W = ( Wn , .., W ,JT
( so Mw = ECW)
)
→ 8 = Cov ( Y , W)
=(
CovCY , Wn)
. . .. , Cov CY , Wi)
)
TT
= Cov(
W)
c- hxn covariance matrixof
WSo
Tij
= Cov ( Wnti- i s
Wnt
,-j
)
,isj
= I , ..
, n
we will say that
the
" best" or "
optimal
" set ofcoefficients
{ go , a. . ... , an}
are
those
that
minimizethe
meansquared
error : F-[ (
Y - Cao ta , Wnt .-i t an W ,)
)
']
C*) To findthe
optimal
ao, a. . ., an we
differentiate
(*
)
w. r.t .to
each
ajand
setthe
derivative
to 0,
then
solve
for ao ,a , , .., an .
Differentiate
wir .t. ao : F-(
Y - Cao ta , Wnt -e. tanw,)
)
= o ci)
Differentiate
wir't. aj ,j
= I , . . , n : E[
(
Y - Cao tai Wnt .- t an W ,)
)
Wnt,-j
)
-o Lii)
Cilgives
ELY)
= Ao t ai F-[wilt . . . + an ECW,
]
So
then
do = My -(
9, Un t . -it an M
,
)
The
predictor
now in termsof
ai , . . , an is My - ATMw t a, Wh t .-. t an W, = my - at Mw t a' -Win
, t at Cw - Mw)
Now ,
plugging
this
into Lii)gives
F-
[
(
4-my- at ( W -Mw)
)
Wnt,-j
)
= 0 ,j
=L. . .. , h . Since ELY -my- at( W -Mw)
)
= o(
from Ci))
, we can
also
write Cii)as E
KY
-my- a'(W- Mw)
)
(
Wnt , -j
-anti-j
)
]
= O ,which
isthe
same as Cov(
Y , Wnt -j)
-at
Cov
(
W ,Wnt
, -j)
= O T vector of covariances ⇐ Cov(
Y ,Wnt
, -j)
-e aiCovlwnti
- i , Wha-j)
= O ⇐8J
-. aiTing
= O ⇐ 8 ; -E
, aiT.si
= O ,j
-- I . . . , n-C.
ATT
)
; =(
Ta)
; so8
=Ta
. That is,
the
optimal
asatisfies
Ta
= 8
.
Notation
we let P ( Y l w)
denote
the
bestlinear
predictor
of Y based on W = ( Wn , .., W,
)T
,%77.hn?!nIgaTCw-Mw),whereaisthesolutiofa
The minimum mean
squared
error isF-
[
(
Y - PCYIW)
)
']
-ELLY
-cmytatcw
-Mw)
)
)
']
= F-[
(
( Y - Ny)
-ATC
w -Mw)
)
')
=E[
( Y -my)')
tE[
at
( W -Mw)
(
W - Mw)
Ta
)
-2aTE[
( Y -my)
( W -Mw)
)
= Var LY
)
tat
Ina
- za-18
= 8 =Var
( Y)
t aT 8 - z a-18 =Var
( Y)
-at
8-heminimumMSEisvarl4-a.IT
Note
that
when we arepredicting
arandom
variableY
using
MSE as
the
criterion , we areassuming
that
Var
;
otherwise
the
MSE would beequal
to • forany
predictor
.Also, note
that
from Ci) and Iii) fromthe
previous
lecture
,cis is
saying
that PCH 1W)
, which isthe
best linearpredictor
,gives
residuals
which
are zero mean ,Cii) is
saying
that Cov(
Y - PLY 1W), W
)
= 0
;
that
is,the
residuals
from the best linearpredictor
PCYIW) areuncorrelated
with the
predictor
variables W .Properties
ofthe
Best Linear Predictor
-we can treat the
predictor
PCYIW)
as anoperator
PC . I w
)
acting
on random variables ,taking
them to their best linearprediction
based on W . As an
operator
, PC . I w)
is a linearoperator
.Suppose
4, , 22and
B are
real
numbersand
U and V arerandom
variables
. Then we havethe
following
property
:①
PC
w)
= 2 , PCU Iw)
t 2.PCVIW)
t pProt
. LHS is 2 , ECU)
+ asECU) t B t at ( W -Mw ) ,where a satisfies Ta = Cov (2, Utd. Vt B
, W
)
Ca) on the RHS PCU I w) = ECU) t(
a' "ICW
-Mw)
and pcvlw)
-- E[is
+ La "'t
' (W -Mw)
,where
a "'satisfies
Ta
"' = Cov(
U , W)
and
e b) aw' satisfies Ta"' = Cov ( V , W)
, cc)and the RHS is a , ECU) t 2, Ca
"'
)
'( W - Mw)
t 2 , F-LV) th (a (W -Mw)
+p = 2, East 2.East B t(
a , a "'t
2. a'4)
"(
w -Mw)
-So the LHS will
equal
the
RHS if a = a , a "' ta, am . So wewill
show
2. a "'t
229'" satisfiesea¥ti
T(
x, a"'t 2. a"')
=Could
, U tart B , W)
But
Could
, Uta. Vt B , w)
=L, Cov ( U, W)
t 2, GvCV, w)
t covers, W)
=L, Cov Lu, w
)
the Cool V, W)
since Cov ( B, W)
= 0 Since
B
is a constant.Thus , we want to
show
T
(
2 ,a'
n't
2. a "')
= 2,Cov
( U, w)
t da Gv (V , W)
But from Cb) and Cc)
, Ta"' = Cov C U ,w
)
and Ta "' = Cover, w)
So weget
TK, a cult 2,a '")
= 4 , Ta "' t 2, Ta '" =L, Corfu,w) t 2, Cov ( V, W)
we canextend
①
to di U, t - . - tan Un t B,
where dis-
-, 2n , B
are constants and Ui. ..
, Un are
random
variables . Wehave
Pcd
, U, t . - t an UntBl
w)
= 2
, PCU, I w
)
tPla
Ust . .. tankI
w)
tB
by
⑦
= 2
, PCU , I w
)
t da P cuz I w)
t Pks Ust. .. tan UnI
w)
t B'
,
by
①again
=L
, PCU, Iw)
t 2. Plus 1W) t. . .. t
Lnp
( Un l w) tB
.If the
Ui
" areequal
to the Wi" ,
then
wehave
⑦
P(
2 , Wit . .. t 2h Wnt BI
w)
= di Wit.. . tan Wnt pBy
property
① , Pca , W, t. ..tan WntBtw
)
= 4 , PCW, 1W) t - attn PCW . I w)
+P
.So it is sufficient to
show
that
PCWi 1W)
= W, for it. ... , n .
But
Pcwilw
)
= ECW;)
tat
( W - Mw)
, where a
satisfies
Ta =
Cov
Cwi
, W
)
.
-T
But Cov( Wi, W) is
just the
(htt - i)" column
of T .
So
the
solution to Ta = Inti - itth column of T is a = ( O, - .. , O, I . , O,- . . , O ) T
t
(htt - i) thcomponent
SoPcwilw
)
= E[Wi)
t(
Wi -Mi)
=Wi
If U is
uncorrelated
with
W then wehave
③
p ( ul w)
= ECU) if Carl Usw)
=Or
vector of n zeroes.
We
have
PCUIW) = Ecu)t.at
( W-Mw) , where a
satisfies
Ta = Corcu, w
)
= O . The solution tothis
isa = O .
We have
discussed the
linearpredictor
PC . I 'w)
and itsproperties
.In
the
context of time series we aremostly
interested
inforecasting
.Forecasting
In PC Y l w)
, if we take Y = Xnth.and
W = ( Xn , . . , X ,)T
,where
{Xd
is astationary
time series,
then
linearprediction
inthis
context iscalled
forecasting
. Theauthors
usethe
notationPn
Xanth
to meanPlxnth I
Xn , . . , X.)
- -T refers to[
this is thethe number of random variable
past values of we are
trying
tothe time series
predict
we are
basing
theprediction
on .Examples
one -step
prediction
of an ARCH, 101<1 We want
Pu
Xu , = P ( XntiI
Xn , . -y X ,)
.Methodic
Directmethod)
. We haveT
=18×10
) 8×41 ANii.
K Cn-i)
=IoT
. . . .:*
:*
:
:
:
" :1¥
:
::÷÷÷
:
' i , I \ ,t.im
, 'i'
'¥
:
:
i ' . . . 8×4) i 0"' . . 02 0 I Also , 8; = Cov(
Xnti , Xn+, -j)
= 8 ,Cj
)
=¥070
" so 8 =7¥10
0 ' . ..0h
)
' S.Ta
= 8 isgiven
by
i
:÷:÷÷
:
:
'"
Note that
the
firstcolumn
of Tmultiplied
by
0 is 8.
Therefore , we can see
by
inspection
that
the
solution
iso
, . . . . ,
of
. Then
the
best linearpredictor
is'
Ii
.ie#+ita
IT
. Here , E[ Xi)
' - o for all i , and so weget
Pan
Xue, I Xn asthe
best linearpredictor
ofXnt
,.
Methods
( use theproperties
of PL. I w)
)
. We have
Xht
,= 0 Xn t
Zhu
,
where {
Zt}
is a zero mean WNCO'
)
process. .Therefore
,Pn
Xn+, =Pn
( 0Xn t Zinta)
=0Pa
Xn t Pu Zhi,
by
linearity
ofthe
prediction operator
.=
01
Xn t P,
Zn+
,by
property
②
= 0 Xn t O
by property
③=-D
XnExample
h-step
prediction
of ARCH processwith
101<1
, h 21 We want Pu Xu+h = Pn ( 0Xnth -i t 2-nth
)
= of Pn X nth-I = of' Pm Xn th -2 ( s =01h
pm
Xn =Oh
XnExample
Onestep
prediction
for an ARCp
)process .
Let { Xt
)
be a zero mean ,stationary
processsatisfying
Xt
= 0 ,Xt
-i t 02Xt
-z t - - tOp
Xt -p t Zt ,where
{ Zt
}
is a zero- mean WN (04 process . we wish to
predict
X.n+, in terms of Xn, - -, X, . That is , we
wish
tocompute
Pn X. + , = Pn(
0 , Xn t . -u t0pXn+
,.pt
2-ht.)
= 0 , Pn Xu t . . . tOp
Pn Xin +,.pt
Pn
Zn+ i = Of , Xu t.. . . tOp
Xnt , -p t Pn Zn't ,If { Xt
)
iscausal
( we have notdiscussed
conditions
for an ARCp
)process to be
causal
)
,then
weget
that Pn
Zn+, = o
and
then
Ph
Xu+ , = 0 , Xn t . . . t p Xnti -p .Note
If14+3
is astationary
process with nonzero mean
,
say
it ,then
we can writeYt
= Xt tu ,where
{Xfl
isstationary
and zero - mean .Then Pn Ynth = Pn (Xn
th t m
)
=Pn
Xn+h t Ph M= Pn
Xanth t M
Hee Pn
Ynth
=Plinth
I Yn . . ., Y,
)
.So
above
wehave
that
PC
Ynth l Yn . . . ,Y
.)
=P(Xn
th lYa
, . . , Y,)
t M .However
, PC Xnti I Yn, .. ,Y
,)
=P ( Xnt,I
Xn
, .., X,
)
,because
the
set of allpossible
linearpredictors
in terms of Tn, . .
, Y, is
the
same asthe
setof all linear
predictors
in terms of Xn, . . , X, .
Therefore
,PC
Ynth l Ya , . . , Y,)
= PC Xnt , I Xn , . . , X,)
t M . That is ,the
best linear
predictor
ofYnth
in terms of Yn,.
, Y, is
the
best linearpredictor
of Xnth based on Xn,
-y X,
plus
the mean M. Thus , it is
sufficient
to restrict attention toonly
zero -meanprocesses when