ARMA and ARIMA models - ARIMA forecasting models

Time series

Chapter 15 from R in Action, Second Edition by Robert I Kabacoff

15.4 ARIMA forecasting models

15.4.2 ARMA and ARIMA models

In an autoregressive model of order p, each value in a time series is predicted from a linear combination of the previous p values

AR(p):Yt = μ + β1Yt−1 + β2Yt−2 + ... + βpYt−p + εt

where Yt is a given value of the series, µ is the mean of the series, the βs are the

weights, and εt is the irregular component. In a moving average model of order q, each

value in the time series is predicted from a linear combination of q previous errors. In this case

MA(q):Yt = μ − θ1εt−1 − θ2εt−2 ... − θqεt−q + εt

where the εs are the errors of prediction and the θ s are the weights. (It’s important to note that the moving averages described here aren’t the simple moving averages described in section 15.1.2.)

Combining the two approaches yields an ARMA(p, q) model of the form

Yt = μ + β1Yt−1 + β2Yt−2 + ... + βpYt−p − θ1εt−1 − θ2εt−2 ... − θqεt−q + εt

that predicts each value of the time series from the past p values and q residuals. An ARIMA(p, d, q) model is a model in which the time series has been differenced d times, and the resulting values are predicted from the previous p actual values and q

ARIMA forecasting models

previous errors. The predictions are “un-differenced” or integrated to achieve the final prediction.

The steps in ARIMA modeling are as follows: 1 Ensure that the time series is stationary.

2 Identify a reasonable model or models (possible values of p and q). 3 Fit the model.

4 Evaluate the model’s fit, including statistical assumptions and predictive accuracy.

5 Make forecasts.

Let’s apply each step in turn to fit an ARIMA model to the Nile time series. ENSURING THAT THE TIME SERIES IS STATIONARY

First you plot the time series and assess its stationarity (see listing 15.7 and the top half of figure 15.11). The variance appears to be stable across the years observed, so there’s no need for a transformation. There may be a trend, which is supported by the results of the ndiffs() function.

Nile 1880 1900 1920 1940 1960 600 800 1000 1400 Time diff(Nile) 1880 1900 1920 1940 1960 −400 −200 0 200 400

Figure 15.11 Time series displaying the annual flow of the river Nile at Ashwan from 1871 to 1970 (top) along with the times series differenced once (bottom). The differencing removes the decreasing trend evident in the original plot.

> library(forecast) > library(tseries) > plot(Nile) > ndiffs(Nile) [1] 1 > dNile <- diff(Nile) > plot(dNile) > adf.test(dNile)

Augmented Dickey-Fuller Test data: dNile

Dickey-Fuller = -6.5924, Lag order = 4, p-value = 0.01 alternative hypothesis: stationary

The series is differenced once (lag=1 is the default) and saved as dNile. The differenced time series is plotted in the bottom half of figure 15.11 and certainly looks more stationary. Applying the ADF test to the differenced series suggest that it’s now stationary, so you can proceed to the next step.

IDENTIFYING ONE OR MORE REASONABLE MODELS

Possible models are selected based on the ACF and PACF plots: Acf(dNile)

Pacf(dNile)

The resulting plots are given in figure 15.12.

Figure 15.12 Autocorrelation and partial autocorrelation plots for the differenced Nile time series

Listing 15.7 Transforming the time series and assessing stationarity

−0.4 −0.2 0.0 0.2 AC F 1 2 3 4 5 6 7 8 9 10 12 14 16 18 −0.4 −0.2 0.0 0 .2 Lag P a rtial A C F 1 2 3 4 5 6 7 8 9 10 12 14 16 18

ARIMA forecasting models

The goal is to identify the parameters p, d, and q. You already know that d=1 from the previous section. You get p and q by comparing the ACF and PACF plots with the guidelines given in table 15.6.

The results in table 15.6 are theoretical, and the actual ACF and PACF may not match this exactly. But they can be used to give a rough guide of reasonable models to try. For the Nile time series in figure 15.12, there appears to be one large autocorrelation at lag 1, and the partial autocorrelations trail off to zero as the lags get bigger. This suggests trying an ARIMA(0, 1, 1) model.

FITTING THE MODEL(S)

The ARIMA model is fit with the arima() function. The format is arima(_ts, order=c(q, d, q)). The result of fitting an ARIMA(0, 1, 1) model to the Nile time series is given in the following listing.

> library(forecast)

> fit <- arima(Nile, order=c(0,1,1)) > fit Series: Nile ARIMA(0,1,1) Coefficients: ma1 -0.7329 s.e. 0.1143

sigma^2 estimated as 20600: log likelihood=-632.55 AIC=1269.09 AICc=1269.22 BIC=1274.28

> accuracy(fit)

ME RMSE MAE MPE MAPE MASE Training set -11.94 142.8 112.2 -3.575 12.94 0.8089

Note that you apply the model to the original time series. By specifying d=1, it calcu- lates first differences for you. The coefficient for the moving averages (-0.73) is pro- vided along with the AIC. If you fit other models, the AIC can help you choose which one is most reasonable. Smaller AIC values suggest better models. The accuracy Table 15.6 Guidelines for selecting an ARIMA model

Model ACF PACF

ARIMA(p, d, 0) Trails off to zero Zero after lag p

ARIMA(0, d, q) Zero after lag q Trails off to zero

ARIMA(p, d, q) Trails off to zero Trails off to zero

measures can help you determine whether the model fits with sufficient accuracy. Here the mean absolute percent error is 13% of the river level.

EVALUATING MODEL FIT

If the model is appropriate, the residuals should be normally distributed with mean zero, and the autocorrelations should be zero for every possible lag. In other words, the residuals should be normally and independently distributed (no relationship between them). The assumptions can be evaluated with the following code.

> qqnorm(fit$residuals) > qqline(fit$residuals) > Box.test(fit$residuals, type="Ljung-Box") Box-Ljung test data: fit$residuals X-squared = 1.3711, df = 1, p-value = 0.2416

The qqnorm() and qqline() functions produce the plot in figure 15.13. Normally distributed data should fall along the line. In this case, the results look good.

The Box.test() function provides a test that the autocorrelations are all zero. The results aren’t significant, suggesting that the autocorrelations don’t differ from zero. This ARIMA model appears to fit the data well.

MAKING FORECASTS

If the model hadn’t met the assumptions of normal residuals and zero autocorrelations, it would have been necessary to alter the model, add parameters, or try a differ- ent approach. Once a final model has been chosen, it can be used to make predictions of future values. In the next listing, the forecast() function from the forecast package is used to predict three years ahead.

Listing 15.9 Evaluating the model fit

−2 −1 0 1 2 −400 −200 0 200 Normal Q−Q Plot Theoretical Quantiles Sample Quantiles

Figure 15.13 Normal Q-Q plot for determining the normality of the time-series residuals

ARIMA forecasting models

> forecast(fit, 3)

Point Forecast Lo 80 Hi 80 Lo 95 Hi 95 1971 798.3673 614.4307 982.3040 517.0605 1079.674 1972 798.3673 607.9845 988.7502 507.2019 1089.533 1973 798.3673 601.7495 994.9851 497.6663 1099.068 > plot(forecast(fit, 3), xlab="Year", ylab="Annual Flow")

The plot() function is used to plot the forecast in figure 15.14. Point estimates are given by the blue dots, and 80% and 95% confidence bands are represented by dark and light bands, respectively.

In document Exploring Data Science (Page 59-64)