Exponential smoothing
7.1 Simple exponential smoothing
The simplest of the exponentially smoothing methods is naturally called “simple exponential smoothing” (SES). (In some books, it is called “single exponential smoothing”.) This method is suitable for forecasting data with no trend or seasonal pattern. For example, the data in Fig-ure 7.1 do not display any clear trending behaviour or any seasonality, although the mean of the data may be changing slowly over time. We have already considered the naïve and the average as possible methods for forecasting such data ( Section 2.3 ).
Listing 7.1: R code
o i l d a t a <− window( o i l , s t a r t =1996 ,end=2007)
plot ( o i l d a t a , y l a b=" O i l ␣ ( m i l l i o n s ␣ o f ␣ t o n n e s ) " , x l a b=" Year " )
Using the naïve method, all forecasts for the future are equal to the last observed value of the series,
yˆT +h|T = yT,
for h = 1, 2, . . . Hence, the naïve method assumes that the most current observation is the only important one and all previous observations provide no information for the future. This can be thought of as a weighted average where all the weight is given to the last observation.
117
Figure 7.1: Oil production in Saudi Arabia from 1996 to 2007.
Using the average method, all future forecasts are equal to a simple average of the observed data,
yˆT +h|T = 1 T
T
X
t=1
yt,
for h = 1, 2, . . . Hence, the average method assumes that all observations are of equal importance and they are given equal weight when generating forecasts.
We often want something between these two extremes. For example it may be sensible to attach larger weights to more recent observations than to observations from the distant past.
This is exactly the concept behind simple exponential smoothing. Forecasts are calculated using weighted averages where the weights decrease exponentially as observations come from further in the past — the smallest weights are associated with the oldest observations:
yˆT +1|T = αyT + α(1 − α)yT −1+ α(1 − α)2yT −2+ · · · , (7.1) where 0 ≤ α ≤ 1 is the smoothing parameter. The one-step-ahead forecast for time T + 1 is a weighted average of all the observations in the series y1, . . . , yT. The rate at which the weights decrease is controlled by the parameter α.
Table 7.1 shows the weights attached to observations for four different values of α when fore-casting using simple exponential smoothing. Note that the sum of the weights even for a small α will be approximately one for any reasonable sample size.
Observation α= 0.2 α= 0.4 α= 0.6 α= 0.8
yT 0.2 0.4 0.6 0.8
yT −1 0.16 0.24 0.24 0.16
yT −2 0.128 0.144 0.096 0.032
yT −3 0.1024 0.0864 0.0384 0.0064
yT −4 (0.2)(0.8)4 (0.4)(0.6)4 (0.6)(0.4)4 (0.8)(0.2)4 yT −5 (0.2)(0.8)5 (0.4)(0.6)5 (0.6)(0.4)5 (0.8)(0.2)5
Table 7.1: .
For any α between 0 and 1, the weights attached to the observations decrease exponentially as we go back in time, hence the name “exponential smoothing”. If α is small (i.e., close to 0), more weight is given to observations from the more distant past. If α is large (i.e., close to 1), more
weight is given to the more recent observations. At the extreme case where α0 = 1, ˆyT +1|T = yT and forecasts are equal to the naïve forecasts.
We present three equivalent forms of simple exponential smoothing, each of which leads to the forecast equation (7.1).
Weighted average form
The forecast at time t + 1 is equal to a weighted average between the most recent observation yt and the most recent forecast ˆyt|t−1,
yˆt+1|t = αyt+ (1 − α)ˆyt|t−1
Then substituting each equation into the following equation, we obtain yˆ3|2= αy2+ (1 − α) [αy1+ (1 − α)`0] So the weighted average form leads to the same forecast equation (7.1).
Component form
An alternative representation is the component form. For simple exponential smoothing the only component included is the level, `t. (Other methods considered later in this chapter may also include a trend bt and seasonal component st.) Component form representations of exponential smoothing methods comprise a forecast equation and a smoothing equation for each of the com-ponents included in the method. The component form of simple exponential smoothing is given by:
Forecast equation yt + 1t = `ˆ t
Smoothing equation `t= αyt+ (1 − α)`t−1,
where `tis the level (or the smoothed value) of the series at time t. The forecast equation shows that the forecasted value at time t + 1 is the estimated level at time t. The smoothing equation for the level (usually referred to as the level equation) gives the estimated level of the series at each period t. Applying the forecast equation for time T gives ˆyT + 1T = `T, the most recent estimated level. If we replace `tby ˆyt + 1t and `t−1by ˆytt − 1 in the smoothing equation, we will recover the weighted average form of simple exponential smoothing.
Error correction form
The third form of simple exponential smoothing is obtained by re-arranging the level equation in the component form to get what we refer to as the error correction form
`t= `t−1+ α(yt− `t−1)
= `t−1+ αet
where et = yt− `t−1 = yt− ˆytt − 1 for t = 1, . . . , T . That is, et is the one-step within-sample forecast error at time t. The within-sample forecast errors lead to the adjustment/correction of the estimated level throughout the smoothing process for t = 1, . . . , T . For example, if the error at time t is negative, then ˆyt|t−1 > ytand so the level at time t − 1 has been over-estimated. The new level `t is then the previous level `t−1 adjusted downwards. The closer α is to one the “rougher”
the estimate of the level (large adjustments take place). The smaller the α the “smoother” the level (small adjustments take place).
Multi-horizon Forecasts
So far we have given forecast equations for only one step ahead. Simple exponential smoothing has a “flat” forecast function, and therefore for longer forecast horizons,
yˆT +h|T = ˆyT +1|T = `T, h = 2, 3, . . . .
Remember these forecasts will only be suitable if the time series has no trend or seasonal compo-nent.
Initialisation
The application of every exponential smoothing method requires the initialisation of the smoothing process. For simple exponential smoothing we need to specify an initial value for the level, `0 which appears in the last term of equation (7.27.2). equation (7.2). Hence `0 plays a role in all forecasts generated by the process. In general, the weight attached to `0 is small. However, in the case that α is small and/or the time series is relatively short, the weight may be large enough to have a noticeable effect on the resulting forecasts. Therefore, selecting suitable initial values can be quite important. A common approach is to set `0 = y1 (recall that `0 = ˆy1|0).
Other exponential smoothing methods that also involve a trend and/or a seasonal component require initial values for these components also. We tabulate common strategies for selecting initial values in Table 7.9.
An alternative approach (see below) is to use optimization to estimate the value of `0 rather than set it to some value. Even if optimization is used, selecting appropriate initial values can assist the speed and precision of the optimization process.
Optimization
For every exponential smoothing method we also need to choose the value for the smoothing parameters. For simple exponential smoothing, there is only one smoothing parameter ( α), but for the methods that follow there is usually more than one smoothing parameter.
There are cases where the smoothing parameters may be chosen in a subjective manner – the forecaster specifies the value of the smoothing parameters based on previous experience. However, a more robust and objective way to obtain values for the unknown parameters included in any exponential smoothing method is to estimate them from the observed data.
In Section 4.2 we estimated the coefficients of a regression model by minimizing the sum of the squared errors (SSE). Similarly, the unknown parameters and the initial values for any
exponential smoothing method can be estimated by minimizing the SSE. The errors are specified as et = yt− ˆyt|t−1 for t = 1, . . . , T (the one-step-ahead within-sample forecast errors). Hence we find the values of the unknown parameters and the initial values that minimize
SSE =
Unlike the regression case (where we have formulae that return the values of the regression coef-ficients which minimize the SSE) this involves a non-linear minimization problem and we need to use an optimization tool to perform this.
Example 7.1 Oil production
Figure 7.2: Simple exponential smoothing applied to oil production in Saudi Arabia (1996–2007).
Listing 7.2: R code
In this example, simple exponential smoothing is applied to forecast oil production in Saudi Arabia. The black line in Figure 7.2 is a plot of the data over the period 1996–2007, which shows a changing level over time but no obvious trending behaviour.
In Table 7.2 we demonstrate the application of simple exponential smoothing. The last three columns show the estimated level for times t = 0 to t = 12, then the forecasts for h = 1, 2, 3, for three different values of α. For the first two columns the smoothing parameter α is set to 0.2 and 0.6 respectively and the initial level `0 is set to y1 in both cases. In the third column both the smoothing parameter and the initial level are estimated. Using an optimization tool, we find the values α and `0 that minimize the SSE, subject to the restriction that 0 ≤ α ≤ 1. Note that the
SSE values presented in the last row of the table is smaller for this estimated α and `0 than for
Table 7.2: Forecasting total oil production in millions of tonnes for Saudi Arabia using simple exponential smoothing with three different values for the smoothing parameter α.
* α = 0.89 and `0 = 447.5 are obtained by minimizing SSE over periods t = 1, 2, . . . , 12.
The three different sets of forecasts for the period 2008–2010 are plotted in Figure 7.2. Also plotted are one-step-ahead within-sample forecasts alongside the data over the period 1996–2007.
The influence of α on the smoothing process is clearly visible. The larger the α the greater the adjustment that takes place in the next forecast in the direction of the previous data point; smaller α leads to less adjustment and so the series of one-step within-sample forecasts is smoother.