Simple exponential smoothing - Exponential smoothing

Exponential smoothing

7.1 Simple exponential smoothing

The simplest of the exponentially smoothing methods is naturally called “simple exponential smoothing” (SES). (In some books, it is called “single exponential smoothing”.) This method is suitable for forecasting data with no trend or seasonal pattern. For example, the data in Fig-ure 7.1 do not display any clear trending behaviour or any seasonality, although the mean of the data may be changing slowly over time. We have already considered the naïve and the average as possible methods for forecasting such data ( Section 2.3 ).

Listing 7.1: R code

o i l d a t a <− window( o i l , s t a r t =1996 ,end=2007)

plot ( o i l d a t a , y l a b=" O i l ␣ ( m i l l i o n s ␣ o f ␣ t o n n e s ) " , x l a b=" Year " )

Using the naïve method, all forecasts for the future are equal to the last observed value of the series,

yˆ_{T +h|T} = y_T,

for h = 1, 2, . . . Hence, the naïve method assumes that the most current observation is the only important one and all previous observations provide no information for the future. This can be thought of as a weighted average where all the weight is given to the last observation.

117

Figure 7.1: Oil production in Saudi Arabia from 1996 to 2007.

Using the average method, all future forecasts are equal to a simple average of the observed data,

yˆ_{T +h|T} = 1 T

t=1

yt,

for h = 1, 2, . . . Hence, the average method assumes that all observations are of equal importance and they are given equal weight when generating forecasts.

We often want something between these two extremes. For example it may be sensible to attach larger weights to more recent observations than to observations from the distant past.

This is exactly the concept behind simple exponential smoothing. Forecasts are calculated using weighted averages where the weights decrease exponentially as observations come from further in the past — the smallest weights are associated with the oldest observations:

yˆ_{T +1|T} = αy_T + α(1 − α)y_{T −1}+ α(1 − α)²yT −2+ · · · , (7.1) where 0 ≤ α ≤ 1 is the smoothing parameter. The one-step-ahead forecast for time T + 1 is a weighted average of all the observations in the series y₁, . . . , yT. The rate at which the weights decrease is controlled by the parameter α.

Table 7.1 shows the weights attached to observations for four different values of α when fore-casting using simple exponential smoothing. Note that the sum of the weights even for a small α will be approximately one for any reasonable sample size.

Observation α= 0.2 α= 0.4 α= 0.6 α= 0.8

y_T 0.2 0.4 0.6 0.8

y_{T −1} 0.16 0.24 0.24 0.16

yT −2 0.128 0.144 0.096 0.032

yT −3 0.1024 0.0864 0.0384 0.0064

y_{T −4} (0.2)(0.8)⁴ (0.4)(0.6)⁴ (0.6)(0.4)⁴ (0.8)(0.2)⁴ yT −5 (0.2)(0.8)⁵ (0.4)(0.6)⁵ (0.6)(0.4)⁵ (0.8)(0.2)⁵

Table 7.1: .

For any α between 0 and 1, the weights attached to the observations decrease exponentially as we go back in time, hence the name “exponential smoothing”. If α is small (i.e., close to 0), more weight is given to observations from the more distant past. If α is large (i.e., close to 1), more

weight is given to the more recent observations. At the extreme case where α⁰ = 1, ˆy_{T +1|T} = y_T and forecasts are equal to the naïve forecasts.

We present three equivalent forms of simple exponential smoothing, each of which leads to the forecast equation (7.1).

Weighted average form

The forecast at time t + 1 is equal to a weighted average between the most recent observation y_t and the most recent forecast ˆy_t|t−1,

yˆ_t+1|t = αy_t+ (1 − α)ˆy_t|t−1

Then substituting each equation into the following equation, we obtain yˆ_3|2= αy₂+ (1 − α) [αy₁+ (1 − α)`₀] So the weighted average form leads to the same forecast equation (7.1).

Component form

An alternative representation is the component form. For simple exponential smoothing the only component included is the level, `_t. (Other methods considered later in this chapter may also include a trend b_t and seasonal component s_t.) Component form representations of exponential smoothing methods comprise a forecast equation and a smoothing equation for each of the com-ponents included in the method. The component form of simple exponential smoothing is given by:

Forecast equation yt + 1t = `ˆ t

Smoothing equation `_t= αy_t+ (1 − α)`_t−1,

where `_tis the level (or the smoothed value) of the series at time t. The forecast equation shows that the forecasted value at time t + 1 is the estimated level at time t. The smoothing equation for the level (usually referred to as the level equation) gives the estimated level of the series at each period t. Applying the forecast equation for time T gives ˆyT + 1T = `T, the most recent estimated level. If we replace `_tby ˆyt + 1t and `t−1by ˆytt − 1 in the smoothing equation, we will recover the weighted average form of simple exponential smoothing.

Error correction form

The third form of simple exponential smoothing is obtained by re-arranging the level equation in the component form to get what we refer to as the error correction form

`_t= `_t−1+ α(y_t− `_t−1)

= `_t−1+ αe_t

where e_t = y_t− `_t−1 = y_t− ˆytt − 1 for t = 1, . . . , T . That is, e_t is the one-step within-sample forecast error at time t. The within-sample forecast errors lead to the adjustment/correction of the estimated level throughout the smoothing process for t = 1, . . . , T . For example, if the error at time t is negative, then ˆy_t|t−1 > y_tand so the level at time t − 1 has been over-estimated. The new level `_t is then the previous level `_t−1 adjusted downwards. The closer α is to one the “rougher”

the estimate of the level (large adjustments take place). The smaller the α the “smoother” the level (small adjustments take place).

Multi-horizon Forecasts

So far we have given forecast equations for only one step ahead. Simple exponential smoothing has a “flat” forecast function, and therefore for longer forecast horizons,

yˆ_{T +h|T} = ˆy_{T +1|T} = `_T, h = 2, 3, . . . .

Remember these forecasts will only be suitable if the time series has no trend or seasonal compo-nent.

Initialisation

The application of every exponential smoothing method requires the initialisation of the smoothing process. For simple exponential smoothing we need to specify an initial value for the level, `₀ which appears in the last term of equation (7.27.2). equation (7.2). Hence `₀ plays a role in all forecasts generated by the process. In general, the weight attached to `₀ is small. However, in the case that α is small and/or the time series is relatively short, the weight may be large enough to have a noticeable effect on the resulting forecasts. Therefore, selecting suitable initial values can be quite important. A common approach is to set `₀ = y₁ (recall that `₀ = ˆy_1|0).

Other exponential smoothing methods that also involve a trend and/or a seasonal component require initial values for these components also. We tabulate common strategies for selecting initial values in Table 7.9.

An alternative approach (see below) is to use optimization to estimate the value of `₀ rather than set it to some value. Even if optimization is used, selecting appropriate initial values can assist the speed and precision of the optimization process.

Optimization

For every exponential smoothing method we also need to choose the value for the smoothing parameters. For simple exponential smoothing, there is only one smoothing parameter ( α), but for the methods that follow there is usually more than one smoothing parameter.

There are cases where the smoothing parameters may be chosen in a subjective manner – the forecaster specifies the value of the smoothing parameters based on previous experience. However, a more robust and objective way to obtain values for the unknown parameters included in any exponential smoothing method is to estimate them from the observed data.

In Section 4.2 we estimated the coefficients of a regression model by minimizing the sum of the squared errors (SSE). Similarly, the unknown parameters and the initial values for any

exponential smoothing method can be estimated by minimizing the SSE. The errors are specified as e_t = y_t− ˆy_t|t−1 for t = 1, . . . , T (the one-step-ahead within-sample forecast errors). Hence we find the values of the unknown parameters and the initial values that minimize

SSE =

Unlike the regression case (where we have formulae that return the values of the regression coef-ficients which minimize the SSE) this involves a non-linear minimization problem and we need to use an optimization tool to perform this.

Example 7.1 Oil production

Figure 7.2: Simple exponential smoothing applied to oil production in Saudi Arabia (1996–2007).

Listing 7.2: R code

In this example, simple exponential smoothing is applied to forecast oil production in Saudi Arabia. The black line in Figure 7.2 is a plot of the data over the period 1996–2007, which shows a changing level over time but no obvious trending behaviour.

In Table 7.2 we demonstrate the application of simple exponential smoothing. The last three columns show the estimated level for times t = 0 to t = 12, then the forecasts for h = 1, 2, 3, for three different values of α. For the first two columns the smoothing parameter α is set to 0.2 and 0.6 respectively and the initial level `₀ is set to y₁ in both cases. In the third column both the smoothing parameter and the initial level are estimated. Using an optimization tool, we find the values α and `₀ that minimize the SSE, subject to the restriction that 0 ≤ α ≤ 1. Note that the

SSE values presented in the last row of the table is smaller for this estimated α and `₀ than for

Table 7.2: Forecasting total oil production in millions of tonnes for Saudi Arabia using simple exponential smoothing with three different values for the smoothing parameter α.

* α = 0.89 and `₀ = 447.5 are obtained by minimizing SSE over periods t = 1, 2, . . . , 12.

The three different sets of forecasts for the period 2008–2010 are plotted in Figure 7.2. Also plotted are one-step-ahead within-sample forecasts alongside the data over the period 1996–2007.

The influence of α on the smoothing process is clearly visible. The larger the α the greater the adjustment that takes place in the next forecast in the direction of the previous data point; smaller α leads to less adjustment and so the series of one-step within-sample forecasts is smoother.

In document Forecasting: principles and practice (Page 127-132)