Time Series Analysis

(1)

Time Series Analysis

Henrik Madsen

[email protected]

Informatics and Mathematical Modelling Technical University of Denmark

(2)

Outline of the lecture

Recursive and adaptive estimation: Introduction to Chapter 11

Recursive LS, Section 11.1 Cursory material:

(3)

Why recursive and adaptive estimation?

0 2000 4000 6000 8000 500 1500 2500 Heat Consumption (GJ/h) 0 2000 4000 6000 8000 30 20 10 0 −10 Temperature (°C)

As time passes we get more information Models are approximations

The best approximation change over time

Makes it possible to produce software which learns as new data becomes available

(4)

Types of models considered

REG: Yt = µ + β1U1_,t + β2U2_,t + . . . + β_mU_m,t + ε_t FIR: Y_t = µ + ω(B)U_t + ε_t = µ + ω0U_t + ω1U_t−1 + +. . . ωsUt−s + εt AR: φ(B)Yt = µ + εt ⇔ Yt = µ − φ1Y_t−1 − φ2Yt−2 − . . . − φpYt−p + εt ARX: φ(B)Yt = µ + ω(B)Ut + εt ⇔ Y_t = µ − φ1Y_t−1 − . . . − φpYt−p + ω0Ut + . . . + ωsUt−s + εt

(5)

Generic form of the models considered

Yt = xT_t θ + εt = θ1x1_,t + θ2x2_,t + . . . + θ_ℓx_ℓ,t + ε_t Example: Yt = µ · _|{z}1 x1_,t + φ2 · (−Y_t−2) | {z } x2_,t + ω1 · U_t−1 |{z} x3_,t + εt

(6)

LS-estimate at time

t

Model:

Yt = xT_t θ + εt

Data (x _{may contain lagged values of the “real” input):}

Y1, Y2, Y3, Y4, . . . , Y_t−1, Yt

x₁_, x₂_, x₄_, x₄_{, . . . ,}x_t₋₁_, x_t

LS-estimate based on t observations:

b θ_t _{= arg min} θ St( θ₎ S_t(θ_{) =} t X s=1 (Y_s − xT s θ) 2

(7)

LS-estimate at time

t

Model:

Yt = xT_t θ + εt

Data (x _{may contain lagged values of the “real” input):}

Y1, Y2, Y3, Y4, . . . , Y_t−1, Yt

x₁_, x₂_, x₄_, x₄_{, . . . ,}x_t₋₁_, x_t

LS-estimate based on t observations:

b θ_t _{= arg min} θ St( θ_{) = (}XTX₎−1 XT Y S_t(θ_{) =} t X s=1 (Y_s − xT s θ) 2

(8)

From one time step to the next (in an easy way)

The trick is to realize that:

R_t ₌ XT X ₌ x₁xT₁ ₊ x₂xT₂ ₊ _{. . .} ₊ x_txT t = t X s=1 x_sxT s h_t ₌ XTY ₌ x_1Y1 ₊ x₂_Y2 ₊ _{. . .} ₊ x_t_Y_t ₌ t X s=1 x_s_Y_s Where: x_txT t =      x1_,tx1_,t x1_,tx2_,t · · · x1_,tx_ℓ,t x2_,tx1_,t x2_,tx2_,t · · · x2_,tx_ℓ,t .. . ... . .. ... x_ℓ,tx1_,t x_ℓ,tx2_,t · · · _x_ℓ,t_x_ℓ,t      xtYt =      x1_,tY_t x2_,tY_t .. . x_ℓ,tY_t     

(9)

From one time step to the next (cont’nd)

b θ_t ₌ R−1 t ht R_t ₌ t X s=1 x_sxt s = xtxTt + t−1 X s=1 x_sxT s = xtxTt + Rt−1 h_t ₌ t X s=1 x_s_Y_s ₌ x_t_Y_t ₊ t−1 X s=1 x_s_Y_s ₌ x_t_Y_t ₊ h_t₋₁ Initialization: R₀ ₌ 0 _{(matrix of zeros)} h₀ ₌ 0 _{(vector of zeros)}

(10)

Other formulations

1. Eliminating h_t_: R_t ₌ x_txT t + Rt−1 b θ_t ₌ θb_t₋₁ ₊ R−1 t xt(Yt − x T t θbt−1)

2. Eliminating h_t _{and avoiding matrix-inversion:}

K_t ₌ Pt−1xt 1 + xT t Pt−1xt b θ_t ₌ θb_t₋₁ ₊ K_t₍_Y_t ₋ xT t θbt−1) P_t ₌ P_t₋₁ ₋ Pt−1xtx T t Pt−1 1 + xT t Pt−1xt

(11)

Example:

HC

_t

=

µ

+

θ

₁

T

_t

+

ε

_t

Hour from start

Temperature 0 2000 4000 6000 8000 −10 0 10 20 30

Hour from start

Consumption

0 2000 4000 6000 8000

500

1500

(12)

Example:

HC

_t

=

µ

+

θ

₁

T

_t

+

ε

_t

Hour from start

Intercept

0 2000 4000 6000 8000

0

500

1000

Hour from start

Slope 0 2000 4000 6000 8000 −60 −20 0 20

(13)

Forgetting old observations

So far we have a way of updating the estimates as the data set grows

If we want a method which forgets old observations we apply weights which start at 1 and goes to 0 when observations gets old: b θ_t _{= arg min} θ St( θ₎ St(θ) = t X s=1 β(t, s)(Ys − xT_s θ) 2

(14)

Forgetting old observations

So far we have a way of updating the estimates as the data set grows

If we want a method which forgets old observations we apply weights which start at 1 and goes to 0 when observations gets old: b θ_t _{= arg min} θ St( θ_{) = (}XT W X₎−1 XTW Y St(θ) = t X s=1 β(t, s)(Ys − xT_s θ) 2 where W ₌ _diag₍_β₍_t,₁₎_{, β}₍_t,₂₎_{, . . . , β}₍_{t, t} ₋ ₁₎_, ₁₎

(15)

Exponential decay of weights

Let’s first consider β(t, s) = λt−s

(0 < λ ≤ 1)

λ = 1_: What we did with the previous algorithms

λ < 1_: We forget in an exponential manner

Observation number

Weight

0 5 10 15 20 25 30

0 1

In the general case it turns out that if the sequence of weights can be written

β(t, s) = λ(t)β(t − 1_{, s}) 1 ≤ _s ≤ _t − 1 β(t, t) = 1

(16)

The Adaptive Recursive LS algorithm

R_t ₌ x_txT t + λ(t)Rt−1 h_t ₌ x_t_Y_t ₊ _λ₍_t₎h_t₋₁ b θ_t ₌ R−1 t ht

(17)

Other formulations

1. Eliminating h_t_: R_t ₌ x_txT t + λ(t)Rt−1 b θ_t ₌ θb_t₋₁ ₊ R−1 t xt(Yt − x T t θbt−1)

2. Eliminating h_t _{and avoiding matrix-inversion:}

K_t ₌ Pt−1xt λ(t) + xT t Pt−1xt b θ_t ₌ θb_t₋₁ ₊ K_t₍_Y_t ₋ xT t θbt−1) P_t ₌ 1 λ(t) P_t₋₁ ₋ Pt−1xtx T t Pt−1 λ(t) + xT t Pt−1xt

(18)

Constant forgetting

If λ(t) = λ we call λ the forgetting factor and define the

memory as T0 = ∞ X i=0 λi = 1 + λ + λ2 + λ3 + λ4 + . . . = 1 1 − _λ

Given a data set an optimal value of λ can be found by “trial

and error”

It is often a good idea to select the values of λ to be

investigated so that the corresponding values of T0 are

approximately equidistant

The criteria to evaluate may depend on the application, but the sum of squared one-step prediction errors is often appropriate The initialization period should be excluded from the evaluation

(19)

Variable forgetting

Many methods exists

One is based on the aim of keeping the criteria functions

defining the estimates constant at S0

Leads to: λ(t) = 1 − ε 2 t S0 1 + xT t P t−1xt

A lower bound λmin on λ(t) should be applied

(20)

Example:

HC

_t

=

µ

+

θ

₁

T

_t

+

ε

_t

Hour from start

Temperature 0 2000 4000 6000 8000 −10 0 10 20 30

Hour from start

Consumption

0 2000 4000 6000 8000

500

1500

(21)

Example:

HC

_t

=

µ

+

θ

₁

T

_t

+

ε

_t

,

λ

= 0

.

995

Hour from start

Intercept

0 2000 4000 6000 8000

0

500

1500

Hour from start

Slope 0 2000 4000 6000 8000 −80 −40 0 20

(22)

Example (cont’nd)

Hour from start

Temperature (blue) 0 2000 4000 6000 8000 −10 0 10 20 30 500 1500 2500 Consumption (red)

Hour from start

Intercept (black) 0 2000 4000 6000 8000 0 500 1500 −80 −40 0 20 Slope (green)