Time Series Analysis 3 DS351

(1)

Time Series Analysis 3

DS351

(2)

Holt’s linear trend method

I Like exponential smoothing but with trend element.

I Use when there is no seasonallity.

I If there is seasonallity, use Holt-Winters instead (next model) Holt’s method

Forecast equation yˆt+h|t=`t+hbt

Level equation `t=αyt+ (1−α)(`t−1+bt−1)

Trend equation bt=β(`t−`t−1) + (1−β)bt−1,

(3)

Holt’s method Forecast equation yˆt+h|t=`t+hbt Level equation `t=αyt+ (1−α)(`t−1+bt−1) Trend equation bt=β(`t−`t−1) + (1−β)bt−1, I lt is the level (estimate of yt). I bt is the slope.

I Suppose that we have

(4)

Holt’s linear trend method

Holt’s method Forecast equation yˆt+h|t=`t+hbt Level equation `t=αyt+ (1−α)(`t−1+bt−1) Trend equation bt=β(`t−`t−1) + (1−β)bt−1, I lt is the “average” between yt and lt−1+bt−1. I Find lt−1+bt−1.

(5)

Holt’s method Forecast equation yˆt+h|t=`t+hbt Level equation `t=αyt+ (1−α)(`t−1+bt−1) Trend equation bt=β(`t−`t−1) + (1−β)bt−1, I lt is the “average” between yt and lt−1+bt−1. I Find lt−1+bt−1. I Then find lt.

(6)

Holt’s method Forecast equation yˆt+h|t =`t+hbt Level equation `t=αyt+ (1−α)(`t−1+bt−1) Trend equation bt=β(`t−`t−1) + (1−β)bt−1, I bt is the “average” between lt−lt−1 and bt−1.

I Start the first forecast ˆ

(7)

ˆ

y

t+h|t

=

`

t

+

hb

t

(8)

Air passengers data

Year Time Observation Level Slope Forecast

t yt `t bt yt|t1 1989 0 15.57 2.102 1990 1 17.55 17.57 2.102 17.67 1991 2 21.86 21.49 2.102 19.68 1992 3 23.89 23.84 2.102 23.59 1993 4 26.93 26.76 2.102 25.94 .. . ... ... ... ... ... 2016 27 72.60 72.50 2.102 72.02 h yˆt+h|t 1 74.60 2 76.70 3 78.80 4 80.91 5 83.01

(9)

Damped Holt’s method

I Linear trend is not realistic in many situations.

I Examples: Total factory output with a fixed number of machine.

Damped Holt’s method(Gardner & McKenzie, 1985) Fix 0≤φ≤1 ˆ y_t₊h|t =`t+ (φ+φ2+· · ·+φh)bt `t =αyt+ (1−α)(`t−1+φbt−1) bt =β∗(`t−`t−1) + (1−β∗)φbt−1. I φ= 1→Holt’s method

I φ= 0→forecast with a constant

(10)

Damped Holt’s method

I Linear trend is not realistic in many situations.

I Examples: Total factory output with a fixed number of machine.

Damped Holt’s method(Gardner & McKenzie, 1985) Fix 0≤φ≤1 ˆ y_t₊h|t =`t+ (φ+φ2+· · ·+φh)bt `t =αyt+ (1−α)(`t−1+φbt−1) bt =β∗(`t−`t−1) + (1−β∗)φbt−1. I φ= 1→Holt’s method

I φ= 0→forecast with a constant

(11)

Air passengers data

φ= 0.9 25 50 75 100 1990 2000 2010 2020 2030 Year Air passengers in A ustr alia (millions) Forecast

Damped Holt's method Holt's method

(12)

Holt-Winters’ seasonal method

I Use this method when there is seasonality.

ˆ

y_t₊h|t=`t+hbt+s

t-`t=α(yt−st−m) + (1−α)(`t−1+bt−1) bt=β(`t−`t−1) + (1−β)bt−1

st=γ(yt−`t−1−bt−1) + (1−γ)st−m,

I Basically Holt’s method + seasonality.

I m is the frequency of seasonality e.g. m= 12 for monthly data.

I lt is the “average” between observation with seasonality removed yt−st−m andlt−1+bt−1.

I st is the “average” between observation with level and trend removedyt−lt−1−bt−1 and the value of previous

(13)

Holt-Winters’ seasonal method

Holt-Winters’ method ˆ yt+h|t=`t+hbt+st− `t=α(yt−st−m) + (1−α)(`t−1+bt−1) bt=β(`t−`t−1) + (1−β)bt−1 st=γ(yt−`t−1−bt−1) + (1−γ)st−m,

I t− inst− is the latest time in the sample that has the same seasonal index ast+h.

I For example, if t= January, 2019 andt+h= March, 2019 then t−= March, 2018.

I There are a lot of parameters now: α,β,γ,`0,b0,s−m+1, s−m+2, . . . ,s0.

(14)

Holt-Winters’ multiplicative method

We can replace

add by st →multiply byst

subtract by st →divide byst. Holt-Winters’ multiplicative method

ˆ yt+h|t = (`t+hbt)st− `t =α yt st−m + (1−α)(`t−1 +bt−1) bt =β∗(`t−`t−1) + (1−β∗)bt−1 st =γ yt (`t−1+bt−1) + (1−γ)st−m.

(15)

International visitors nights in Australia

40 60 80 2005 2010 2015 Year

Visitor nights (millions)

Forecast

HW additive forecasts HW multiplicative forecasts International visitors nights in Australia

(16)

Forecasts using Holt-Winters’ method

t yt `t bt st yt 2004 Q1 -3 9.70 2004 Q2 -2 -9.31 2004 Q3 -1 -1.69 2004 Q4 0 32.26 0.70 1.31 2005 Q1 1 42.21 32.82 0.70 9.50 42.66 2005 Q2 2 24.65 33.66 0.70 -9.13 24.21 .. . ... ... ... ... ... ... 2015 Q4 44 66.06 63.22 0.70 2.35 64.22 h yt+h|t 2016 Q1 1 76.10 2016 Q2 2 51.60 2016 Q3 3 63.97 2016 Q4 4 68.37 2017 Q1 5 78.90 2017 Q2 6 54.41

(17)

AutoRegressive Integrated Moving Average

(ARIMA)

(18)

Backshift operator

I This is where differencing is important.

I Backshift operator B helps us with differencing.

Byt =yt−1.

Thus the first difference can be written as

y_t0 =yt−yt−1 =

The second difference is

y_t00=

(19)

Backshift operator

Byt =yt−1.

y_t0 =yt−yt−1 =

y_t00=

(20)

Backshift operator

Byt =yt−1.

y_t0 =yt−yt−1 =

y_t00=

(21)

Backshift operator

Byt =yt−1.

y_t0 =yt−yt−1 =

y_t00=

(22)

Autoregressive model

An autoregressive model of orderp,AR(p), is

yt =c+φ1yt−1+φ2yt−2+· · ·+φpyt−p+εt, which is a multiple linear regression withyt−p, . . . ,yt−1 as

(23)

Moving average model

A moving average model,MA(q), uses past errorsto forecast.

(24)

ARIMA model

AutoRegressive Integrated Moving Average (ARIMA) is a combination of both AR and MA models:

y_t0 =c+φ1yt−0 1+· · ·+φpyt−p0 +θ1εt−1+· · ·+θqεt−q+εt, (1) wherey_t0 is the differenced series. Thus there are a total of p+q

predictors. We call this anARIMA(p,d,q) model.

The value ofc andd have the following effects on the long-term forecast.

c = 0 d = 0 forecasts go to zero.

c = 0 d = 1 forecasts go to a non-zero constant.

c = 0 d = 2 forecasts follow a straight line.

c 6= 0 d = 0 forecasts go to the mean of the data.

c 6= 0 d = 1 forecasts follow a straight line.

(25)

ARIMA model

AutoRegressive Integrated Moving Average (ARIMA) is a combination of both AR and MA models:

y_t0 =c+φ1yt−0 1+· · ·+φpyt−p0 +θ1εt−1+· · ·+θqεt−q+εt, (1) wherey_t0 is the differenced series. Thus there are a total of p+q

predictors. We call this anARIMA(p,d,q) model.

The value ofc andd have the following effects on the long-term forecast.

c = 0 d = 0 forecasts go to zero.

c = 0 d = 1 forecasts go to a non-zero constant.

c = 0 d = 2 forecasts follow a straight line.

c 6= 0 d = 0 forecasts go to the mean of the data.

c 6= 0 d = 1 forecasts follow a straight line.

(26)

Example: Quarterly US consumption

−1 0 1 2 2000 2005 2010 2015 2020 Time uschange[, "Consumption"]

(27)

Find

p

and

q

from ACF and PACF plots

−0.1 0.0 0.1 0.2 0.3 4 8 12 16 20 Lag A CF

Series: uschange[, "Consumption"]

−0.1 0.0 0.1 0.2 0.3 4 8 12 16 20 Lag P A CF

ACF and PACF plots of differenced datacan help us find

I p in ARIMA(p,d,0)

I q in ARIMA(0,d,q)

(28)

Find

p

and

q

from ACF and PACF plots

−0.1 0.0 0.1 0.2 0.3 4 8 12 16 20 Lag A CF

−0.1 0.0 0.1 0.2 0.3 4 8 12 16 20 Lag P A CF

The data may follow an ARIMA(p,d,0) model if

I the ACF is exponentially decaying or sinusoidal

I there is a significant spike at lag p in the PACF, but none beyond lagp.

(29)

Find

p

and

q

from ACF and PACF plots

−0.1 0.0 0.1 0.2 0.3 4 8 12 16 20 Lag A CF

−0.1 0.0 0.1 0.2 0.3 4 8 12 16 20 Lag P A CF

The data may follow an ARIMA(0,d,q) model if

I the PACF is exponentially decaying or sinusoidal

I there is a significant spike at lag q in the ACF, but none beyond lagq.

(30)

Model selection

Our “score” of ARIMA model is based on thelikelihood

L=P(data|model)

For example: We assume that a coin has probability 0.2 of turning head (the model isP(X =H) = 0.2).

If the results of five coin tosses areH,H,T,H,T, then the likelihood of the data given the model is

P(H,H,T,H,T) = (0.2)3(0.8)2.

(31)

Model selection

Our “score” of ARIMA model is based on thelikelihood

L=P(data|model)

For example: We assume that a coin has probability 0.2 of turning head (the model isP(X =H) = 0.2).

If the results of five coin tosses areH,H,T,H,T, then the likelihood of the data given the model is

P(H,H,T,H,T) = (0.2)3(0.8)2.

(32)

Model selection

Choosep andq, find parameters and compute L=L(data|model). There are three “scores” that we can use

I Akaikes Information Criterion (AIC)

AIC =−2 log(L) + 2(p+q+k+ 1),

wherek = 1 if c 6= 0 andk = 0 ifc = 0.

I corrected AIC

AICc = AIC + 2(p+q+k+ 1)(p+q+k+ 2)

T −p−q−k−2 ,

where T is the number of observations in the data.

I Bayesian Information Criterion

(33)

Model selection

AIC =−2 log(L) + 2(p+q+k+ 1),

I corrected AIC

AICc = AIC + 2(p+q+k+ 1)(p+q+k+ 2)

T −p−q−k−2 ,

whereT is the number of observations in the data.

(34)

Model selection

AIC =−2 log(L) + 2(p+q+k+ 1),

I corrected AIC

AICc = AIC + 2(p+q+k+ 1)(p+q+k+ 2)

T −p−q−k−2 ,

whereT is the number of observations in the data.

(35)

Model selection

These scores follow the same concepts.

I Better model has higher likelihood.

I But model with too many parameters (high p+q) tends to overfit and should be penalized.

We prefer AICc for ARIMA. The higher the score, the better. Also check out seasonal ARIMA (not in scope of this course).