Time Series Analysis 3
DS351
Holt’s linear trend method
I Like exponential smoothing but with trend element.
I Use when there is no seasonallity.
I If there is seasonallity, use Holt-Winters instead (next model) Holt’s method
Forecast equation yˆt+h|t=`t+hbt
Level equation `t=αyt+ (1−α)(`t−1+bt−1)
Trend equation bt=β(`t−`t−1) + (1−β)bt−1,
Holt’s linear trend method
Holt’s method Forecast equation yˆt+h|t=`t+hbt Level equation `t=αyt+ (1−α)(`t−1+bt−1) Trend equation bt=β(`t−`t−1) + (1−β)bt−1, I lt is the level (estimate of yt). I bt is the slope.I Suppose that we have
Holt’s linear trend method
Holt’s method Forecast equation yˆt+h|t=`t+hbt Level equation `t=αyt+ (1−α)(`t−1+bt−1) Trend equation bt=β(`t−`t−1) + (1−β)bt−1, I lt is the “average” between yt and lt−1+bt−1. I Find lt−1+bt−1.Holt’s linear trend method
Holt’s method Forecast equation yˆt+h|t=`t+hbt Level equation `t=αyt+ (1−α)(`t−1+bt−1) Trend equation bt=β(`t−`t−1) + (1−β)bt−1, I lt is the “average” between yt and lt−1+bt−1. I Find lt−1+bt−1. I Then find lt.Holt’s linear trend method
Holt’s method Forecast equation yˆt+h|t =`t+hbt Level equation `t=αyt+ (1−α)(`t−1+bt−1) Trend equation bt=β(`t−`t−1) + (1−β)bt−1, I bt is the “average” between lt−lt−1 and bt−1.I Start the first forecast ˆ
Holt’s linear trend method
ˆ
y
t+h|t=
`
t+
hb
tAir passengers data
Year Time Observation Level Slope Forecast
t yt `t bt yt|t1 1989 0 15.57 2.102 1990 1 17.55 17.57 2.102 17.67 1991 2 21.86 21.49 2.102 19.68 1992 3 23.89 23.84 2.102 23.59 1993 4 26.93 26.76 2.102 25.94 .. . ... ... ... ... ... 2016 27 72.60 72.50 2.102 72.02 h yˆt+h|t 1 74.60 2 76.70 3 78.80 4 80.91 5 83.01
Damped Holt’s method
I Linear trend is not realistic in many situations.
I Examples: Total factory output with a fixed number of machine.
Damped Holt’s method(Gardner & McKenzie, 1985) Fix 0≤φ≤1 ˆ yt+h|t =`t+ (φ+φ2+· · ·+φh)bt `t =αyt+ (1−α)(`t−1+φbt−1) bt =β∗(`t−`t−1) + (1−β∗)φbt−1. I φ= 1→Holt’s method
I φ= 0→forecast with a constant
Damped Holt’s method
I Linear trend is not realistic in many situations.
I Examples: Total factory output with a fixed number of machine.
Damped Holt’s method(Gardner & McKenzie, 1985) Fix 0≤φ≤1 ˆ yt+h|t =`t+ (φ+φ2+· · ·+φh)bt `t =αyt+ (1−α)(`t−1+φbt−1) bt =β∗(`t−`t−1) + (1−β∗)φbt−1. I φ= 1→Holt’s method
I φ= 0→forecast with a constant
Air passengers data
φ= 0.9 25 50 75 100 1990 2000 2010 2020 2030 Year Air passengers in A ustr alia (millions) ForecastDamped Holt's method Holt's method
Holt-Winters’ seasonal method
I Use this method when there is seasonality.
ˆ
yt+h|t=`t+hbt+s
t-`t=α(yt−st−m) + (1−α)(`t−1+bt−1) bt=β(`t−`t−1) + (1−β)bt−1
st=γ(yt−`t−1−bt−1) + (1−γ)st−m,
I Basically Holt’s method + seasonality.
I m is the frequency of seasonality e.g. m= 12 for monthly data.
I lt is the “average” between observation with seasonality removed yt−st−m andlt−1+bt−1.
I st is the “average” between observation with level and trend removedyt−lt−1−bt−1 and the value of previous
Holt-Winters’ seasonal method
Holt-Winters’ method ˆ yt+h|t=`t+hbt+st− `t=α(yt−st−m) + (1−α)(`t−1+bt−1) bt=β(`t−`t−1) + (1−β)bt−1 st=γ(yt−`t−1−bt−1) + (1−γ)st−m,I t− inst− is the latest time in the sample that has the same seasonal index ast+h.
I For example, if t= January, 2019 andt+h= March, 2019 then t−= March, 2018.
I There are a lot of parameters now: α,β,γ,`0,b0,s−m+1, s−m+2, . . . ,s0.
Holt-Winters’ multiplicative method
We can replace
add by st →multiply byst
subtract by st →divide byst. Holt-Winters’ multiplicative method
ˆ yt+h|t = (`t+hbt)st− `t =α yt st−m + (1−α)(`t−1 +bt−1) bt =β∗(`t−`t−1) + (1−β∗)bt−1 st =γ yt (`t−1+bt−1) + (1−γ)st−m.
International visitors nights in Australia
40 60 80 2005 2010 2015 YearVisitor nights (millions)
Forecast
HW additive forecasts HW multiplicative forecasts International visitors nights in Australia
Forecasts using Holt-Winters’ method
t yt `t bt st yt 2004 Q1 -3 9.70 2004 Q2 -2 -9.31 2004 Q3 -1 -1.69 2004 Q4 0 32.26 0.70 1.31 2005 Q1 1 42.21 32.82 0.70 9.50 42.66 2005 Q2 2 24.65 33.66 0.70 -9.13 24.21 .. . ... ... ... ... ... ... 2015 Q4 44 66.06 63.22 0.70 2.35 64.22 h yt+h|t 2016 Q1 1 76.10 2016 Q2 2 51.60 2016 Q3 3 63.97 2016 Q4 4 68.37 2017 Q1 5 78.90 2017 Q2 6 54.41AutoRegressive Integrated Moving Average
(ARIMA)
Backshift operator
I This is where differencing is important.
I Backshift operator B helps us with differencing.
Byt =yt−1.
Thus the first difference can be written as
yt0 =yt−yt−1 =
The second difference is
yt00=
Backshift operator
I This is where differencing is important.
I Backshift operator B helps us with differencing.
Byt =yt−1.
Thus the first difference can be written as
yt0 =yt−yt−1 =
The second difference is
yt00=
Backshift operator
I This is where differencing is important.
I Backshift operator B helps us with differencing.
Byt =yt−1.
Thus the first difference can be written as
yt0 =yt−yt−1 =
The second difference is
yt00=
Backshift operator
I This is where differencing is important.
I Backshift operator B helps us with differencing.
Byt =yt−1.
Thus the first difference can be written as
yt0 =yt−yt−1 =
The second difference is
yt00=
Autoregressive model
An autoregressive model of orderp,AR(p), is
yt =c+φ1yt−1+φ2yt−2+· · ·+φpyt−p+εt, which is a multiple linear regression withyt−p, . . . ,yt−1 as
Moving average model
A moving average model,MA(q), uses past errorsto forecast.
ARIMA model
AutoRegressive Integrated Moving Average (ARIMA) is a combination of both AR and MA models:
yt0 =c+φ1yt−0 1+· · ·+φpyt−p0 +θ1εt−1+· · ·+θqεt−q+εt, (1) whereyt0 is the differenced series. Thus there are a total of p+q
predictors. We call this anARIMA(p,d,q) model.
The value ofc andd have the following effects on the long-term forecast.
c = 0 d = 0 forecasts go to zero.
c = 0 d = 1 forecasts go to a non-zero constant.
c = 0 d = 2 forecasts follow a straight line.
c 6= 0 d = 0 forecasts go to the mean of the data.
c 6= 0 d = 1 forecasts follow a straight line.
ARIMA model
AutoRegressive Integrated Moving Average (ARIMA) is a combination of both AR and MA models:
yt0 =c+φ1yt−0 1+· · ·+φpyt−p0 +θ1εt−1+· · ·+θqεt−q+εt, (1) whereyt0 is the differenced series. Thus there are a total of p+q
predictors. We call this anARIMA(p,d,q) model.
The value ofc andd have the following effects on the long-term forecast.
c = 0 d = 0 forecasts go to zero.
c = 0 d = 1 forecasts go to a non-zero constant.
c = 0 d = 2 forecasts follow a straight line.
c 6= 0 d = 0 forecasts go to the mean of the data.
c 6= 0 d = 1 forecasts follow a straight line.
Example: Quarterly US consumption
−1 0 1 2 2000 2005 2010 2015 2020 Time uschange[, "Consumption"]Find
p
and
q
from ACF and PACF plots
−0.1 0.0 0.1 0.2 0.3 4 8 12 16 20 Lag A CFSeries: uschange[, "Consumption"]
−0.1 0.0 0.1 0.2 0.3 4 8 12 16 20 Lag P A CF
Series: uschange[, "Consumption"]
ACF and PACF plots of differenced datacan help us find
I p in ARIMA(p,d,0)
I q in ARIMA(0,d,q)
Find
p
and
q
from ACF and PACF plots
−0.1 0.0 0.1 0.2 0.3 4 8 12 16 20 Lag A CFSeries: uschange[, "Consumption"]
−0.1 0.0 0.1 0.2 0.3 4 8 12 16 20 Lag P A CF
Series: uschange[, "Consumption"]
The data may follow an ARIMA(p,d,0) model if
I the ACF is exponentially decaying or sinusoidal
I there is a significant spike at lag p in the PACF, but none beyond lagp.
Find
p
and
q
from ACF and PACF plots
−0.1 0.0 0.1 0.2 0.3 4 8 12 16 20 Lag A CFSeries: uschange[, "Consumption"]
−0.1 0.0 0.1 0.2 0.3 4 8 12 16 20 Lag P A CF
Series: uschange[, "Consumption"]
The data may follow an ARIMA(0,d,q) model if
I the PACF is exponentially decaying or sinusoidal
I there is a significant spike at lag q in the ACF, but none beyond lagq.
Model selection
Our “score” of ARIMA model is based on thelikelihood
L=P(data|model)
For example: We assume that a coin has probability 0.2 of turning head (the model isP(X =H) = 0.2).
If the results of five coin tosses areH,H,T,H,T, then the likelihood of the data given the model is
P(H,H,T,H,T) = (0.2)3(0.8)2.
Model selection
Our “score” of ARIMA model is based on thelikelihood
L=P(data|model)
For example: We assume that a coin has probability 0.2 of turning head (the model isP(X =H) = 0.2).
If the results of five coin tosses areH,H,T,H,T, then the likelihood of the data given the model is
P(H,H,T,H,T) = (0.2)3(0.8)2.
Model selection
Choosep andq, find parameters and compute L=L(data|model). There are three “scores” that we can use
I Akaikes Information Criterion (AIC)
AIC =−2 log(L) + 2(p+q+k+ 1),
wherek = 1 if c 6= 0 andk = 0 ifc = 0.
I corrected AIC
AICc = AIC + 2(p+q+k+ 1)(p+q+k+ 2)
T −p−q−k−2 ,
where T is the number of observations in the data.
I Bayesian Information Criterion
Model selection
Choosep andq, find parameters and compute L=L(data|model). There are three “scores” that we can use
I Akaikes Information Criterion (AIC)
AIC =−2 log(L) + 2(p+q+k+ 1),
wherek = 1 if c 6= 0 andk = 0 ifc = 0.
I corrected AIC
AICc = AIC + 2(p+q+k+ 1)(p+q+k+ 2)
T −p−q−k−2 ,
whereT is the number of observations in the data.
I Bayesian Information Criterion
Model selection
Choosep andq, find parameters and compute L=L(data|model). There are three “scores” that we can use
I Akaikes Information Criterion (AIC)
AIC =−2 log(L) + 2(p+q+k+ 1),
wherek = 1 if c 6= 0 andk = 0 ifc = 0.
I corrected AIC
AICc = AIC + 2(p+q+k+ 1)(p+q+k+ 2)
T −p−q−k−2 ,
whereT is the number of observations in the data.
I Bayesian Information Criterion
Model selection
These scores follow the same concepts.
I Better model has higher likelihood.
I But model with too many parameters (high p+q) tends to overfit and should be penalized.
We prefer AICc for ARIMA. The higher the score, the better. Also check out seasonal ARIMA (not in scope of this course).