In their paper summarizing 5 oil price forecasting methodologies commonly used at the moment by industry practitioners, Olsen et al. (2005) pointed out that the method based on Sequential Gaussian Simulation (SGS) is the most rigorous and as a result, the most recommended one. Its major advantage over Bootstrapping method is that it takes into account both the frequency distribution and the semi-variogram of uninflated historical price data. The semi-variogram describes the historically observed temporal variability in the price of oil. Use of the semi-variogram ensures that changes in future oil price with time are consistent with historical changes leading to the most realistic forecasts.
Therefore in our computer model, we decided to use this price forecasting methodology.
In order to implement the SGS method, we followed procedure described by Holmes et al. (2006) First we took the historical monthly average spot price data for West Texas Intermediate (WTI) starting from January 2000 until June 2007. This information is available in the tabular form from the Energy Information Administration web site (Energy Information Administration, 2007). Our price data is spread over considerable period of time (7.5 years). Therefore, inflation should be accounted for.
Consumer Price Index (CPI) data for the same period of time has been taken from U.S.
Department of Labor, Bureau of Labor Statistics web site (U.S. Department of Labor, 2007). Using CPI data, historical price data has been brought to common dollar values,
1st of July, 2007 U.S. dollar has been chosen as the common dollar (Fig. 3.1). Both actual historical prices, as well as inflation adjusted prices are shown on Fig. 3.2.
Fig. 3.1 - Value of July 1, 2007 US dollar
Fig. 3.2 - WTI actual historical prices and prices brought to common July 2007 dollars
0
Jan-00 Jul-00 Jan-01 Jul-01 Jan-02 Jul-02 Jan-03 Jul-03 Jan-04 Jul-04 Jan-05 Jul-05 Jan-06 Jul-06 Jan-07
Time (months)
Oil price (US dollars) in July
2007 dollars actual prices
In order to use SGS, oil price data should be normally distributed. Distribution of our inflation adjusted price data is far from a normal distribution (Fig.3.3). Therefore we needed to apply a normal score transformation to our data. The normal score transformation ranks each data point and assigns it a corresponding position in the standard normal distribution. Fig.3.4 shows inflation adjusted price data after being
Fig. 3.3 - Histogram of inflation adjusted oil price data
Fig. 3.4 – Normal score transformed oil price data
-3
normal score transformed, while Fig.3.5 shows its distribution. As it can be seen, now our data has standard normal distribution.
Fig. 3.5 – Histogram of normal score transformed oil price data
Now using this normally distributed price data, we may implement the temporal continuity model of oil price change in the form of a semivariogram. This semivariogram shows dependence of oil price variance to the lag-time. It is computed as follows. Let z(ta) represent the price value at time ta, and z(ta + h) represent the price value h lag time units apart. All observations h time units apart from each other are included in the number of pairs, N(h). Datum z(ua) is the tail and z(ua + h) is the head of the vector. Means and standard deviations of the head and tail of the vector may be calculated. The semivariogram is
=
where *(h) is the estimated semivariogram for lag h. The semivariogram is half the average squared difference between the values of each pair (King, 2000). The semivariogram for our normal score transformed price data is shown on Fig.3.6 by a dotted line. A solid line represents fit with the spherical model (Houlding, 2000).
Equation of solid line is
where a0 = 32, and c0 = 1.
Fig. 3.6 – Semivariogram of normal score transformed oil price data
After fitting the real data semivariogram with the spherical model and having the equation of a spherical model, we can proceed with SGS. SGS is based on local neighborhood algorithm, which predicts unknown values by randomly sampling a
for h a0
Mean squared price difference ($,bbl)^2
conditional probability distribution. In our case, the conditional probability distribution for a given month was determined by the mean and variance from ordinary kriging (OK) that uses price, either previously predicted or actual, in the preceding months. Randomly sampling this conditional distribution yields a predicted price for a given month. A complete oil price realization is created by predicting prices one month at a time, starting from July 2007, and moving forward sequentially. Since the conditional distribution is based on the historical data, the realization honors both the frequency distribution and the semivariogram of the historical data (Holmes et al. 2006).
The algorithm for SGS is as follows:
1. If the price data is not univariate normal, transform the data to obtain normal scored prices.
2. Construct a model of temporal continuity for the normal scored data.
3. Define a random path through all of the months to be simulated, that is each month exactly once.
4. Use Ordinary Kriging to determine the mean and variance of the Gaussian conditional probability distribution at a given month. Retain a specified number of neighboring data to be used as conditioning data. Both previously simulated price values and original data are included.
5. Draw randomly from conditional probability distribution and assign that value to the node being simulated.
6. Repeat steps 4 and 5 for all simulation months.
7. Back transform the simulated normal values into the original price values using original inflation adjusted price data distribution.
8. Repeat steps 3 – 7 for multiple realizations.
We believe step 4 requires some more explanation. Ordinary kriging is the technique developed in geostatistics to interpolate the value of an attribute at an unobserved location from observations of its value at nearby locations. It assigns weights to each available observation based on its distance from unobserved location using semivariogram, and calculates expected value of attribute at given unobserved location and its error by the following formulas (Kanevski & Maignan, 2004)
′ equal to the distance between locations x1 and xn (see Eq.3.1)
In the case of price forecasting, attribute is the oil price, unobserved locations are months to be simulated, and observations are normal ranked historical monthly data.
Though Holmes et al. used GSLIB© software to generate their forecasts, above
Fig. 3.7 – Uninflated SGS oil price forecasts
Fig. 3.8 –Inflated SGS oil price forecasts
described complex SGS algorithm has been implemented in MATLAB® computational environment. We have generated 50 equiprobable price forecast realizations starting
0 20 40 60 80 100 120
Jul-07 Jul-08 Jul-09 Jul-10 Jul-11 Jul-12 Jul-13 Jul-14 Jul-15 Jul-16 Jul-17 Jul-18 Jul-19 Jul-20 Jul-21 Jul-22 Jul-23 Jul-24 Jul-25 Jul-26 Jul-27
Time (months)
Price (US dollars)
0 20 40 60 80 100 120 140 160
Jul-07 Jul-08 Jul-09 Jul-10 Jul-11 Jul-12 Jul-13 Jul-14 Jul-15 Jul-16 Jul-17 Jul-18 Jul-19 Jul-20 Jul-21 Jul-22 Jul-23 Jul-24 Jul-25 Jul-26 Jul-27
Time (months)
Price (US dollars)
from July 2007 through June 2028. 12 of these realizations are depicted on Fig.3.7.
However these forecasts are in July 2007 U.S. dollars, therefore annual inflation rate of 3% has been added to them, resulting in inflated price forecasts shown on Fig.3.8.
For further convenient access in the model all 50 realizations were arranged in an 252 x 50 array, with each realization being a column and each month a row. Thus we have completed price forecasting module of our model and proceeded to the next one.