Examining Oil Price Dynamics

(1)

Erasmus University Rotterdam

Erasmus School of Economics

Examining Oil Price Dynamics

Using Heterogeneous Expectations

Master Thesis

Econometrics & Management Science In Cooperation with:

PJK International by

Erik Hessing 336421

Rotterdam, the Netherlands March 1, 2014 PJK International Supervisor: P.D. Kulsen Academic Supervisor: D.J.C. van Dijk Second Reader: D. Karstanje

(2)

Abstract

In this paper we study whether financial speculators influence oil prices and if so does accounting for these speculators in the form of a heterogeneous agents model improve predictive accuracy. We find evidence for speculator activity in oil markets and compare the predictive accuracy of the HAM to that of a random walk model and a VECM, the most promising model of PJK International. Furthermore, we investigate the economic value of the models by determining which model would have been the most lucrative if we had invested based on its predictions. We conclude that the HAM and VECM are not able to beat the random walk, based on predictive accuracy, for very short (one day) forecast horizons but also find indications that the HAM might be able to outper-form both random walk and VECM for medium (one month) forecast horizons. We find that the VECM has real potential in predicting the correct sign of price movements on medium investment horizons. The high predictive accuracy of the HAM and the sign prediction potential of the VECM led to mixed results in economic value for the medium investment horizons. Indicating that the choice between HAM and VECM depends on the oil product of interest.

(3)

Acknowledgement

I would like to thank my supervisors Dick van Dijk and Patrick Kulsen for their valuable comments and remarks, my colleagues at PJK International who have made these past months a great and educative experience. I would also like to thank Dennis Karstanje for taking the time to read this paper. Lastly, I would like to thank my friends and family who were always there for me and supported me to make the most out of this paper.

(4)

3.2.6 Stock Indexes . . . 35 3.2.7 World PMI . . . 37 3.2.8 Oil Fundamentals . . . 38 4 Results 43 4.1 In-Sample Estimation . . . 43 4.1.1 Daily data . . . 44 4.1.2 Monthly data . . . 48 4.2 Out-of-Sample Performance . . . 52 4.2.1 Daily Data . . . 53 4.2.2 Monthly Data . . . 55 5 Conclusion 58 6 Further Research 60

(5)

CONTENTS CONTENTS

Appendix A 63

A.1 Elaborate Description of Future contracts . . . 63

A.2 Graphs & Figures . . . 64

A.2.1 Fundamentalist Weight Graphs . . . 70

A.3 Tables . . . 73

A.3.1 ADF Test Statistics & Discriptive Statistics . . . 73

A.3.2 Daily Results . . . 79

A.3.3 Monthly Results . . . 89

A.4 Explanatory Variables . . . 94

A.4.1 Principal Components Analysis . . . 94

A.4.2 Atlantic- and Crack Spread models . . . 96

(6)

Chapter 1

Introduction

In order to forecast price changes in financial products it is crucial to understand the cause of price movements. According to the Efficient Market Hypothesis all agents in the market behave rationally and have the same rational expectation. That is, they expect the price to reflect or return to the fundamental value of the asset. In this paper we will focus on the oil market. Oil market efficiency is rejected by Gjølberg (1985) and Moosa and Al-Loughani (1994), whom asses the efficiency of the market for oil products and crude oil futures, respectively. Sornette et al. (2009) explain this by showing that there are large differences between the fundamental value and the price of crude oil.

The inefficiency of the oil market is often attributed to the existence of speculators. Fattouh et al. (2012) discuss the effect of speculation in oil markets, as a possible result of increased financialization (i.e. the increase in the variety of instruments that permit speculation in oil such as futures) of the oil futures market. They find that the main problem in literature on speculation is that it is rarely clear how speculation is defined. Resulting in the fact that they find no clear evidence in favor of speculation. However, Fattou et al. (2012) conclude that the absence of evidence for speculation does not mean that the financialization of oil futures markets does not matter. They suggest using agents with heterogeneous expectations in further research.

Alquist et al. (2011) forecast the price of oil using several models based on economic fundamentals or related financial products. They obtain an 22% increase in forecast-ing accuracy compared to the random walk model, usforecast-ing a simple model based on % changes in the price of non-oil industrial raw materials. They state that there is strong evidence that not all households share the same oil price expectations (see Anderson et al. (2010)), casting doubt on standard rational expectations models with homogenous agents.

The awareness of heterogeneous behavior in financial markets has led to models that are known as Heterogeneous Agents Models (HAMs). These HAMs use the heteroge-neous expectations of different agents to describe and predict price movements. Ter Ellen and Zwinkels (2010) developed a simple and highly stylized heterogeneous agents model for oil prices. In which they assume two types of speculators exist, fundamental-ists and chartfundamental-ists. Fundamentalfundamental-ists expect prices to return to their fundamental value whereas chartists expect price trends to continue (similar to an AR(1) model). Agents switch between these two speculator groups based on the previous performance of their respective strategies. They show that the HAM is able to outperform the random walk and VAR model on all horizons (i.e. 1 to 12 months), based on statistical performance measures like the mean squared prediction error (MSPE). In this paper we will expand on the research of ter Ellen and Zwinkels (2010). We will add an additional method to estimate the fundamental value of the oil. More specifically, in ter Ellen and Zwinkels (2010) the fundamental value is constructed using a moving average of historical and future prices. The new method proposed in this paper creates a fundamental value of oil products based on inventory and manufacturing data. Secondly, we will use the HAM to

(7)

CHAPTER 1. INTRODUCTION

create predictions of price movements for very short as well as medium length investment horizons (i.e. daily and monthly). Lastly, instead of analyzing and forecasting the spot price of oil, we will focus on analyzing and forecasting the price of oil futures contracts, which are more liquid and should therefore be more open to speculation.

This paper is written for and in cooperation with PJK International B.V.. PJK International is a market research, analysis and consultancy firm specializing in the petroleum industry of the ARA (Amsterdam-Rotterdam-Antwerp) region. PJK Interna-tional provides its clients with market information and analysis, related to the pricing of oil products and transport costs. Among PJKs clients are major oil companies, traders, financial institutions, importers, distribution companies, transporters and consumers. PJK International does not engage in commodity trading to safeguard against conflicts of interest. The predictive accuracy of a model for oil spot prices, futures or spreads is of significant importance to PJK International. Accurate forecasts of the prices of oil products will give the clients of PJK International an important edge over their compe-tition.

In order to obtain better forecasts of the price movements of oil products PJK In-ternational has performed extensive internal research on the predictability of various oil products and the forecast accuracy of most basic models. This research includes but is not limited to; forecasting the spot-future spread using ARX/GARCH models by Akker-mans et al. (2010), forecasting the returns of oil futures using Vector Error Regression Models (VECM) by Kulsen (2011) and forecasting oil futures prices using a feed-forward neural network by Tilgenkamp et al. (2012).

However, this research is limited by the fact that most models created in these papers require agents to be rational and to have homogeneous expectations. The HAM model created in this paper has no such limitations and therefore provides a new and different perspective on forecasting the price movements of oil products.

The VECM created in Kulsen (2011) is considered by PJK International to be its most promising model. Kulsen shows that the VECM outperforms an AR model based on the percentage correct sign performance measure. However, it fails to to outperform the AR model in predictive accuracy (i.e. (MSPE)). Kulsen(2011) states that the lack in predictive accuracy could be explained by his particular choice of out of sample period, the year 2009, which is just after the collapse of oil prices as a result of the credit crisis. Furthermore, Alquist et al. (2005) find that a suitably designed VAR model, which is similar to a VECM, tends to be more accurate than the random walk model for invest-ment horizons up to six months

In this paper we will examine, using the HAM, whether speculation has a significant effect on oil prices and if accounting for this possible speculation can result in improved out-of-sample performance. We will compare a model based on homogenous expecta-tions with a highly stylized model which is based on heterogeneous expectaexpecta-tions. More specifically the out of sample performance of a VECM, similar to that in Kulsen (2011), will be compared to that of a HAM.

We find that, based on daily data, the HAM results are poor. The HAM either remains in the equilibrium state where fundamentalists and chartists have equal

(8)

pres-CHAPTER 1. INTRODUCTION

ence in the oil market, making the HAM similar to an equally weighted combination of a model based on a moving average and an AR(1) model. Or the HAM immediately converts to a state where fundamentalists have been driven out of the market, making the HAM similar to an AR(1) model. The forecasts made by both HAMs are signifi-cantly worse than that of the random walk model. However based on MSPE the HAM, dominated by chartists, is able to outperform the VECM.

The HAM performs significantly better in a monthly setting. We find that the weight distribution between fundamentalist and chartists varies significantly over time. Funda-mentalist seem to dominate the market most of the time but they convert to a chartists strategy after sudden spikes and crashes. In the out-of-sample period one of the HAMs is able to outperform the VECM and the random walk model based on MSPE although no statistically significant conclusion can be made because of the small sample size.

To determine the economic value of our models we calculate the total return obtained during the out-of-sample period if we had invested based on a follow sign strategy. That is, we take a long position in the contract if the forecast of our model is positive and short the contract in the case of a negative forecast. Based on our daily data sample we obtain disappointing results. The total returns are on average negative and we find that in some cases it would have been more lucrative to buy the contract and hold on to it the entire out-of-sample period (i.e. hold a long position the entire out-of-sample period). The results in our monthly sample are significantly different. We find mostly positive total returns and conclude that following our models is significantly more lucrative than just buying a contract and holding on to it. Lastly we see that the HAM has the highest economic value for 3 out of 5 oil products. While the VECM is superior for the other two.

Chapter 2 discusses the methodology of this paper, in which sections 2.1, 2.2 and 2,3 give an outline of the heterogeneous agents model, section 2.4 describes the VECM and section 2.5 describes the performance measures we use to compare the HAM and VECM. Chapter 3 describes the data used in this paper. Section 3.1 describes the oil futures contracts of which we will make forecasts using the HAM and VECM. Section 3.2 gives a list of explanatory variables, used in the VECM and for the estimation of a fundamental value, and discusses why these variables influence oil prices. In chapter 4 we discuss the implications of the in-sample results of the HAM with respect to speculators in the oil market as well as compare the out-of-sample performance of the HAM with the VECM for daily and monthly forecasting. Chapter 5 concludes.

(9)

Chapter 2

Methodology

This chapter discusses the methods used to make predictions of the price movements of first month oil futures contracts. The main method of interest is an heterogeneous agents model (HAM). This model is based on the underlying assumption that there are different types of agents with heterogeneous expectations active in the market. Our model combines both real and speculative market participants. We define real partici-pants as companies that are involved in the production or consumption of oil products. Speculators are agents that have no current use for oil products and have no tangible link to the oil industry.

We define 2 distinctive types of speculators; fundamentalists and chartists. Firstly, fundamentalists are agents that base their expectations on economic theory. This group believes that the asset has some intrinsic or fundamental value and expects the market price to revert to this fundamental value. The second type of agents, chartists or tech-nical traders, base their expectations on recent price changes. They expect trends to continue in the same direction. The trading techniques used by fundamentalists are as-sumed to have a stabilizing (i.e. mean-reverting) effect on market prices. While chartists tend to drive prices away from their fundamental value, and as such have a destabilizing effect on market prices. Speculators are able to switch between groups based on recent performance (from fundamentalists to chartists and vice versa).

Sections 2.1,2.2 and 2.3 outline the heterogeneous agents model. Section 2.4 describes the VECM and section 2.5 describes the performance measures we will use to compare the HAM and VECM.

2.1 Speculators

The oil demand function of fundamentalists is based on the difference between the current price and the expected price that is,

D_tF =aF[E_tF(Pt+1)−Pt] (2.1) where Pt is the log-price of the futures contract in period t, aF is a positive parameter

that represents the reaction of the demand to an expected price change. The demand of fundamentalists will increase (decrease) if they expect a higher (lower) price in the future. The fundamentalists use, among other things, some fundamental value to determine their price expectations. The price expectations of fundamentalists can be described by the following equation.

E_tF(Pt+1) =Pt+bF₁(Pt−Ft)++bF₂(Pt−Ft)− (2.2) Where Ft is the log-fundamental value of the futures contract in period t. Equation 2.2 shows that fundamentalists expect price movements when the current price deviates from its fundamental value. Just like ter Ellen and Zwinkels (2010) we make a distinc-tion between over and undervaluadistinc-tion; (Pt−Ft)+= (Pt−Ft) if (Pt−Ft)>0 and zero

(10)

2.1. SPECULATORS CHAPTER 2. METHODOLOGY

otherwise. Similarly, (Pt−Ft)− = (Pt−Ft) if (Pt−Ft) < 0 and zero otherwise. We

expect bF₁ &bF₂ to be negative and to be ∈[−1,0], because fundamentalists will expect prices to decrease (increase) if the current price is above (below) the fundamental value. We distinguish between over and under valuation because research in behavioral finance and psychology has shown that investors react differently on potential gains and poten-tial losses, Kahneman & Tversky (1979). According to their results traders are more hesitant to sell in case of overvaluation than to buy in case of undervaluation.

The oil demand function of chartists is similar to the demand function of fundamen-talists.

D_tC =aC[E_tC(Pt+1)−Pt] (2.3) in whichaC is a positive reaction parameter, implying that the demand of chartists will increase (decrease) if they expect the future price to be higher (lower) than the current price.

Chartists determine their price expectations based on technical analysis. However, technical analysis comes in many different forms (see Brock et al. (1992)). The charac-teristics of chartists, described earlier, need to be incorporated in the technical trading rule (i.e. the price expectation formula). The most common and simple technical trading rule that is in line with chartists characteristics and believes is the AR(1) specification, also used by ter Ellen and Zwinkels (2010).

Chartists expectations according to the AR(1) specification are given by

EC_t (Pt+1) =Pt+bC₁(Pt−Pt−1)++bC2(Pt−Pt−1)

−

(2.4) Similarly to equation 2.2, we make a distinction between an upward or downward trend. Chartists expect trend movements to continue in the same direction, so we expect bC₁

andbC₂ to be positive. IfbC₁ > bC₂ chartists react more to price increases, and vice versa.

2.1.1 Fundamental value

One of the ways we expand on the model of ter Ellen and Zwinkels (2010) is by de-termining the fundamental value of oil products using an additional method. We will compare the performance of this method with the method of ter Ellen and Zwinkels.

• We set the fundamental value equal to the 30 days (24 months) moving average (Similar to ter Ellen and Zwinkels (2010)1);

• Determine the fundamental value by regressing the Log-Price of first month futures contracts on the principal components of the fundamental oil variables described in section 3.2.

1

ter Ellen and Zwinkels (2010) do not address the look-ahead bias in their results they calculate all their fundamental values using equation 2.5

(11)

Moving Average

The fundamental value based on a moving average is calculated using the following equation: Ft= 1 N N X i=1 P_t₋N 2+i (2.5)

In which N is equal to the number of days (months) in the moving average and Pt

is the log price at time t. As is clear from equation 2.5 the fundamental value is an equally weighted moving average of past and future prices. This is done to produce a fundamental value which should be as close as possible to the current ”true” value. More specifically, we make the assumption that at any current time the price deviates from the fundamental value because either information has not yet been incorporated in the current price or there has been an overreaction to past information. By taking prices over a certain window including future and past prices we intend to get as close as possible to the fundamental value, that is the value based on all available information.

This methodology which is sound for in-sample estimation would lead to an obvious look-ahead bias in the out-of-sample results. We will address this by changing the way we calculate the fundamental values which would use prices which are not in our in-sample period. These fundamental values are calculated using the following equation:

Ft= 1 Z+N₂ Z+N₂ X i=1 P_t₋N 2+i (2.6)

In which Z is the number of future observations left in the in-sample period with a maximum ofN/2, N is the window length andPtis the log price at time t. For example,

if N=24 but the in-sample period(i.e. the sample) ends 10 periods from now, Z will be equal to 10. Such that Ft is the mean of 22 observations, the 11 previous prices the current price and the 10 known upcoming prices. This method of calculating the fundamental value makes sure we don’t have a look-ahead bias in our forecasting results. As the last fundamental value of our in-sample period, Ft will only rely on the prices

Pt−N/2 up tillPt.

One could argue that the fundamental value constructed using the method described above is not theoretically a fundamental value. Possible arguments could be that because the fundamental value is based on previous and future prices market sentiment is already incorporated in this fundamental value and as such it is no longer strictly based on fundamentals (i.e. supply and demand). However, for this model to function as expected it is important to put a stabilizing group against a destabilizing group. When using this fundamental value the stabilization occurs towards the moving average instead of a ”true” fundamental value and as such fundamentalism should be interpreted somewhat more broadly. In the next section we will discuss the construction of a theoretically correct fundamental value.

(12)

Principal Components Analysis

The second fundamental value used in this paper is based on oil fundamentals. More specifically it is constructed by performing a principal component analysis on the pro-duction and supply and demand variables described in section 3.2. The number of explanatory variables is relatively large and because we are searching for a ”true” funda-mental value, which can differ from the current price according to our assumptions and research hypothesis (i.e. the existence of heterogeneous agents), we can’t reject variables based on significance in an ordinary least squares or maximum likelihood regression. In order to reduce the amount of explanatory variables and with that the dimensionality and parameter uncertainty we will perform a principal component analysis. We will use the number of principal components that account for at least 95% of the variance in the original explanatory variables. We estimate parameters by regressing these principal components on the log price of oil futures. After which the log fundamental values are obtained by multiplying the parameters with the principal components.

A Principal Components Analysis is defined as follows: consider X aT xnmatrix of n explanatory variables and let V be the correlation matrix of X. W is annxn orthogonal matrix of eigenvectors of V. The principal components of V are the columns of theT xn

matrix P which is defined as

P =XW (2.7)

The original system of correlated explanatory variables has been transformed to an orthogonal system. W is ordered so that the first column of W is the eigenvector corre-sponding to the largest eigenvalue of V, the second column of W to the second eigenvalue of V and so on. The total variation of X is the sum of the eigenvalues of V,λ1+...+λn.

The proportion of the total variation that is explained by the first k principal components is equal to

λ1+...+λk λ1+...+λn

(2.8) We can now reduce dimensionality by setting k < nfor some number of principal com-ponents k which predict a large enough proportion of the total variance. Resulting in

X∗ =P∗W∗0 (2.9)

whereP∗ is aT xkmatrix consisting of the first k columns of P andW∗is annxkmatrix whose columns are the first k eigenvectors.

The fundamental value is then obtained by the following equation:

Ft=α+P_t∗β (2.10)

In whichFtis the log fundamental value, P∗ is theT xk matrix of principal components in which k is chosen so that 95% of the variance is captured and α and β(kx1) are the parameters obtained after performing the following regression:

(13)

In which Pt is log price. This regression is performed over the entire in-sample period

to account for the best fit but exclude the chance of a look-ahead bias. Furthermore this entire process is repeated when the in-sample period changes. To be more specific, when the in-sample period changes (i.e. expanding window) we calculate new principal components and estimate new parameters.

2.1.2 Switching between Strategies

Speculators are able to switch between investment strategies. They determine whether they switch or not based on the previous performance of both strategies in forecasting the price movement of oil. The performance of a strategy is measured using the squared forecasting error in the previous K>0 days/months. Optimal values for K are selected empirically by looking at the autocorrelation and partial autocorrelation of the errors and set equal to 9 days or 6 months for daily and monthly data respectively. The performance of a strategy relative to the other strategy is time varying, therefore the distribution of agents changes over time. The performance of both strategies is measured using the following equations,

AF_t = K X k=1 [E_tF₋_k₋₁(Pt−k)−Pt−k]2 (2.12) AC_t = K X k=1 [E_tC₋_k₋₁(Pt−k)−Pt−k]2 (2.13)

In which AF_t is the conditional performance of the fundamentalist investment strategy and AC_t of the chartist investment strategy. More specifically, AF_t and AC_t represent the squared difference between the expected prices and the realized prices (i.e. the squared error) in the previous K periods, for the fundamentalist and chartist strategies respectively. The size of AF_t and AC_t is negatively correlated with the performance of its respective investment strategy, that is if AF

t is larger than ACt we can state that

the chartist investment strategy was, on average, more accurate in forecasting the price movements in the previous K periods than the fundamentalist strategy.

The fraction of fundamentalists active in the market depends on the performance of the fundamentalist investment strategy relative to the approach of the chartists. Similar to ter Ellen and Zwinkels (2010) the multinomial switching rule is given by,

Wt= 1 +exp φ AF_t −AC_t AF_t +AC_t −1 (2.14) WhereWtis the fraction of speculators in period t that invests using the fundamentalist approach, such that 1−Wt is the fraction of chartists. φ is the intensity of choice parameter, representing the extend to which the performance of a strategy determines whether it is adopted. Ifφ= 0 agents don’t react to difference in performance between the two strategies such that Wt = 1/2. φ > 0 implies that the strategy which is performing better in period t, is more broadly applied in period t+ 1. Therefore, the demand of that group will carry more weight in periodt+ 1.

(14)

2.2. REAL PARTICIPANTS CHAPTER 2. METHODOLOGY

2.2 Real participants

This section describes the supply and demand functions for agents that are involved in the consumption or production of oil (i.e. they have a tangible link to the oil indus-try). The reason we can, and should, distinguish between real agents and speculators is because the futures we are interested in are futures on an commodity. Therefore, we distinguish an investment part from the consumption part. The market supply of oil affects the price formation process. Consequently, we add real demand and supply for oil to the model.

The real demand for oil depends on a component which does not depend on the oil price, and a component which represents the negative (positive) effect of a price increase (decrease) on the real demand for oil. Real demand is given by

DR_t =aR−bRPt (2.15) in which aR is the exogenous demand for the oil product and bR represents the price sensitivity of the demand. We expectbR>0 because an increase in price should lead to a lower demand.

The supply function of oil is similar to the demand function in equation 2.15. How-ever, supply of oil is a positive function of the price and an exogenous component which does not depend on the price of oil. The supply function is given by

St=aS+bSPt (2.16)

Where aS is the exogenous supply of the oil product and bS represents the price sensi-tivity of supply to oil prices. We expect bS>0 because supply is a positive function of price.

2.3 The market

Combining the equations presented in the previous sections we obtain the total market demand for oil products. The total market demand consists of the real demand plus the weighted average of the demand of fundamentalists and chartists.

D_tM =DR_t +WtD_tF + (1−Wt)D_tC (2.17) Finally, the price changes of oil products are a function of excess demand and a noise term, that is

Pt+1=Pt+θ(DMt −St) +εt (2.18)

Where θ is a positive parameter which describes price adjustment according to market frictions.

(15)

2.3. THE MARKET CHAPTER 2. METHODOLOGY

in the final model described by 2.19.

                           ∆Pt+1 =a+bPt+Wt(α1(Pt−Ft)++α2(Pt−Ft)−) +(1−Wt)(β1(Pt−Pt−1)++β2(Pt−Pt−1)−) Wt= 1 +exphφAFt−ACt AF t+ACt i−1 AF_t =PN n=1[α1(Pt−n−Ft−n)++α2(Pt−n−Ft−n)−−∆Pt−n+1]2 AC_t =PN n=1[β1(Pt−n−Pt−n−1)++β2(Pt−n−Pt−n−1) −₋_∆_Pt −n+1]2 (2.19)

Where a = θ(aR−aS), b = θ(−bR−bS), α1 = θaFbF1, α2 = θaFbF2, β1 = θaCbC1 and

β2 =θaCbC2

This model is estimated using quasi-maximum likelihood (QML) such that autocor-relation, heteroskedasticity and possible non-normality are controlled. Quasi-maximum likelihood is similar to regular maximum likelihood, however instead of maximizing the actual log likelihood function it often maximizes a simplified form of the log likelihood function. As long as the quasi-maximum likelihood function is not overly simplified, the quasi-maximum likelihood estimates will be consistent and asymptotically normal. An easy way to perform a QML estimation is by performing a regular maximum likelihood estimation (MLE) and adjusting the standard errors. More specifically, after perform-ing a MLE the covariance matrix is created usperform-ing an equation commonly known as the sandwich formula. That is, instead of setting the covariance matrix equal to the inverse Hessian or the outer product of the gradient, the covariance matrix is created as follows:

Cov=H−1gg0H−1 (2.20) In which Cov is the covariance matrix, H is the Hessian and g is the gradient (px1 with

(16)

2.4. VECTOR ERROR CORRECTION MODEL CHAPTER 2. METHODOLOGY

2.4 Vector Error Correction Model

A VECM model is a VAR model with an extra feature, this feature captures the coin-tegration between oil futures prices. Coincoin-tegration is a relation that can exist between two series which both have unit roots. If two series are cointegrated they have similar stochastic trends, which make sure that they remain relatively close to each other. This cointegration relation can add to the forecasting ability of the model.

As we will show in subsection 3.1.4 the cointegration relation of the futures contracts can be represented by several spreads commonly used in the oil business, more specifically Atlantic and crack spreads. Both these spreads are modeled using an ARX/GARCH model with a student t distribution, that is we use ARX model for the mean and a GARCH(1,1) model to represent the volatility.

yt=c+α1yt−1+...+αkyt−k+Xβ+t σ2_t =ω+γ_t2₋₁+δσ2_t₋₁

(2.21)

The Atlantic spreads are modeled using fundamental oil data from both sides of the Atlantic as explanatory variables. The specific data used consists of weekly US EIA data, ARA product stock and daily freight rate data. Data for the crack spread models consists of a level-shift variable (also used in the Atlantic crude spread), sine and cosine variables and the weekly US EIA data. The level shift variable is a variable which is zero before the first of January 2011 and one after, this variable is added to account for the increase in Crude Light supply after January 2011

In the case of daily estimation, the weekly EIA and ARA data were transformed to daily data, by updating the series when a new observation was published. That is, on the day a new observation is published the time series is updated to the published value, this value stays the same until the next publication. Furthermore, several differenced series have been created by taking a lagged series and subtracting it from the original. Differenced series have been created by taking lags going back the original frequency, in this case one week. After taking differences the same procedure is used to make the frequency of these differenced series daily. The explanatory variables for each model are selected using backward eliminations. That is, we start with all explanatory variables in the model, delete the least significant variable and re-estimate the model. We continue this procedure until all variables are significant at the 5% level. This method will also be applied to the explanatory variables for the upcoming VECMs.

The Atlantic and crack spread models are constructed to create two time series that will serve as explanatory variables in the VECMs. Firstly, we will use the spread models to make daily and monthly forecasts of the Atlantic and Crack spreads. Because the spreads are directly related to the price of the futures contracts an accurate forecast of the spreads could improve the forecast of the futures contracts. Secondly, we will also use the estimation residuals as explanatory variables. These residuals are essentially the difference between the two underlying products of the spreads, after accounting for the effects of the explanatory variables. Lastly, a lagged version of the Atlantic and crack

(17)

2.5. PERFORMANCE MEASURES CHAPTER 2. METHODOLOGY

spreads is also added as an explanatory variable.

Series which are cointegrated are best estimated by an error correction model. In this paper we will use a VECM with a CCC-GARCH component for volatility. The VECM is given by:

∆Pt=c1+θPt−1+β1∆Pt−1+...+βn∆Pt−n+γ1X1,t−1+...+γkXk,t−1+htzt (2.22) wherec1 is a constant, θPt−1 is the cointegration effect, ∆Pt−1, ...,∆Pt−n are lagged log

oil future returns,X1,t−1, ..., Xk,t−1are exogenous variables andhtztis the CCC-GARCH model for the residual vector.

The CCC-GARCH feature captures the conditional heteroskedasticity in the resid-uals, which is expected to be present in oil futures return data (Kulsen 2011). Zt is a vector of standardized student-t residuals and Ht is the CCC-GARCH model specified

as

Ht=DtRDt (2.23)

in which R is a kxk correlation matrix andDt a kxk matrix withphii,t, the conditional standard deviation, on the diagonal. The conditional volatilityhii,t is modeled using a

GARCH(1,1) model. The GARCH(1,1) is given by

hii,t=ωii+αiiεi,t2 −1+βiihii,t−1 (2.24)

2.5 Performance Measures

This section describes the measures and techniques we will use to asses which model makes the most accurate forecasts. The methods we use to asses predictive accuracy are the mean squared prediction error (MSPE) and the percentage correct sign (PCS) performance measure. Furthermore, we will use Diebold-Mariano statistics (see Diebold and Mariano (1995)) to asses the significance of the difference in the squared prediction errors of the different models.

2.5.1 Mean Squared Prediction Error

The MSPE is calculated using the following equation:

M SP E=

Pn

t=1(yt−yˆt)2

n (2.25)

where yt is the realized value at period t. Whereas ˆyt is a forecast for the same period

and n is equal to the total number of forecasts made, that is n is equal to the size of the out of sample period. In our case ytwill be the log price difference of a first month futures contract between periods t and t-1.

We will use the Diebold-Mariano (DM) statistic to compare the predictive accuracy of the different models. This statistic is created by comparing the forecast errors of

(18)

different models after using some kind of loss function. The loss function used in this paper is called the squared loss function. Given that a forecast error is defined as

_t₊₁|t=yt+1−yˆt+1|t (2.26)

in whichyt+1 is the realized value at t+1 and ˆyt+1|t is a forecast of that value made with

information up to time t. The loss function becomes:

L(t+1|t) = (t+1|t)2 (2.27)

The DM statistic is calculated under the null hypothesis that the expected loss of the forecast errors of one model is equal to the expected loss of the other model. More specifically:

H0=E[L(_t1₊₁|t)] =E[L(

2

t+1|t)] (2.28)

Against the alternative that they are not equal. Finally, the DM statistic is calculated using the Loss differential this can be represented by the following equations:

dt=L(1_t₊₁_|_t)−L(2_t₊₁_|_t) (2.29)

DM = q d¯

1

T ∗vard

(2.30)

In which ¯dis the mean of d, T is the number of forecast errors and vard is equal to the variance of d (in case of one step ahead forecasts). Diebold and Mariano (1995) show that under the null of equal predictive accuracy.

DM ∼N(0,1) (2.31)

So that we reject the null hypothesis of equal predictive accuracy at the 10%, 5% and 1% significance level if the absolute value of the DM statistic is greater than 1,65, 1,96 and 2,58, respectively.

2.5.2 Percentage Correct Sign

The percentage correct sign performance measure indicates whether a forecast made by a model has the same direction as the realized value. More specifically, it is the percentage of forecasts which have a positive (negative) sign when the realized value also has a positive (negative) sign. It is calculated using the following equation:

P CS=

Pn

t=1(sign(yt) =sign(ˆyt))

n (2.32)

In which sign(yt) is equal to 1 ifytis positive and -1 if ytis negative. Similarly, sign(ˆyt) is equal to 1 if ˆytis positive and -1 if ˆytis negative. sign(yt) =sign(ˆyt) is equal to 1 if the signs ofytand (ˆyt) are the same. That is, if sign(yt) is equal to sign(ˆyt) and 0 otherwise.

(19)

2.5.3 Economic Value

The economic value performance measure is the most practical of the performance mea-sures used in this paper. That is, it is the performance measure which will matter most to investors, but perhaps less to academics. The economic value performance measure calculates the total return obtained using a certain strategy based on the forecasts made by each model. The strategy we use in this paper can be described as a follow sign strat-egy. That is, we take a long position in a futures contract when the one period ahead forecast is positive and short position when we expect a negative price movement. The total return is calculated over the entire out-of-sample period according to the following equation. T otal Return= N X t=1 sign(ˆyt)∗(yt) (2.33)

In which N is equal to the total number of out-of-sample forecasts, ˆyt is the forecast

made of yt with information available at time t−1 and yt is the realized return (i.e.

log(pt)-log(pt−1)) at time t.

This performance measure values sign prediction as well as predictive accuracy (i.e. minimizing the size of the forecast error). Sign prediction is accounted for by taking a long position when the one period ahead forecast is positive and shorting the futures contract when a negative price movement is expected. Similarly, predictive accuracy is valued because a wrong sign prediction when a large return is realized decreases the total return more than a wrong sign prediction when a small return is realized. This values predictive accuracy because a wrong sign forecast with a large realized return means that the forecast error is larger in an absolute sense than in the case of a small realized return ( given that we ignore the size of the forecast when the sign is incorrect).

(20)

Chapter 3

Data

The data used in this paper can be divided in two groups. The first group consists of the first month futures contracts of which we attempt to predict the price movements using the heterogeneous agents model specified in chapter 2. The second group consists of the data we use in the Vector Error Correction Model and the principal components analysis. The VECM will be used as a benchmark and comparison model (see section 2.4). The principal component analysis will be used in order to calculate the fundamental value of a certain futures contract on a specific date (see section 2.1.1).

3.1 First Month Futures Contracts

This paper focuses on predicting the daily and monthly price movements of the first month futures contracts of the following commodities;

• NYMEX crude light • NYMEX RBOB (gasoline) • NYMEX Heating oil • ICE Brent crude • ICE Gas oil

The crude oil futures, Brent and Crude light, are the most liquid oil futures in the world and are considered crude oil benchmark prices for Europe and the US. RBOB gasoline futures are the only traded liquid gasoline futures. The futures are used to hedge physical gasoline positions and are considered the worldwide benchmark prices for gasoline. Gas oil and Heating oil are the same oil products, traded on the European and US futures markets respectively.

Figure 3.1 shows the price time series of the first month futures contract on NYMEX crude light. The price time series of futures contracts with different underlying oil prod-ucts will be used in several spreads as possible cointegration relations and explanatory variables for the return series. Figure 3.2 shows the returns time series of NYMEX crude light, the return time series will serve as dependent variables in the VECM. That is, we will attempt to forecast the return series using: a Vector Error Correction model (VECM). A more detailed description of the futures contracts and underlying products can be found in the appendix together with figures of the price and return time series.

(21)

3.1. FIRST MONTH FUTURES CONTRACTS CHAPTER 3. DATA

Figure 3.1: NYMEX crude light prices

This figure shows the daily prices of the NYMEX Crude light first month futures contract during a period ranging from the 2th of January 2007 until the 7th of May 2013 (1600 observations). The prices

are in$/bbl (i.e. $per barrel). Similar graphs for other oil products are displayed in the appendix.

Figure 3.2: NYMEX crude light returns

This figure shows the daily returns on the NYMEX Crude light first month futures contract during a period ranging from the 2th of January 2007 until the 7th of May 2013 (1600 observations). The

returns are in percentages. Similar graphs for other oil products are displayed in the appendix.

3.1.1 Roll-over Effect

The price times series of first month futures contracts are continuation price sequences. This requires that the sequences are rolled over every month, due to the fact that each month a futures contract expires and disappears. For example, if the first month futures contract is the January 2013 contract valid between 1st of January 2013 and the 31st of January 2013, than the first month pricing data between these two dates is the price

(22)

of the January 2013 contract. However, this contract expires on January 31st so the pricing data on the 1st of February will be based on the February 2013 contract. The switch from basing the price data on the January contract to basing it on the February contract is called a roll-over.

The roll-over has two significant implications. Firstly, the difference in price between two consecutive months (i.e. the calender spread) leads to a jump in prices when the roll-over occurs. We adjust the return series to counter this effect by combining the appropriate prices to calculate the returns at the time of and one day after the roll-over (see equation 3.1). The problem of the roll-over effect also exists in price time series. To adjust for this we will create a new price time series from the adjusted returns series. This price time series will be equal to the original price time series, except for two observations each month during which the roll-over occurred.

Secondly, on the days leading up to the expiration of the first month contract, the liquidity of this contract decreases. More specifically the open interest (i.e. number of outstanding contracts) of the first month contract decreases. We will use this decrease in open interest as an indicator for the time of expiration. On the day of expiration the open interest will be at a minimum followed by a sharp rise the next day, this indicates that it is time to perform a roll-over. That is, we will perform a roll-over when the open interest of the first month contract is at a minimum. The returns series are adjusted for the roll-over effect according to the following equation:

rt=              log(F (2) t F_t(2)−1 ) f or t=τ log(F (1) t F_t(2)−1 ) f or t=τ + 1 log(F (1) t Ft(1)−1 ) else (3.1)

In whichτ indicates the day of expiration of the first month (i.e. spot) futures contract and F_ti is the future settlement price of the ith futures contract at date t.

3.1.2 Augmented Dickey-Fuller test

To determine whether time series are stationary we use the Augmented Dickey-Fuller (ADF) test. This test determines whether a time series has a unit root and is therefore not stationary. The presence of a unit root indicates that shocks to a process (e.g. a price time series) have permanent effects and that the variance depends on the time period and diverges to infinity as time increases. These properties of a unit root process can cause problems in statistical inference involving time series models. A common way to make these series stationary is to take first differences, or in case of financial products take returns. The lag order (i.e. the amount of lags) used in the ADF test is selected based on the Schwarz Information Criterion with a maximum of 24 lags. The deterministic components used in the ADF test will not be the same for each variable in the upcoming sections and will be described beneath the table with ADF test results. An intercept will be added for series with a non-zero mean. Similarly a trend variable

(23)

will be included if a time-series has a clear trend.

The ADF test statistic is calculated using the following equations:

∆yt=α+βt+γyt−1+δ1∆yt−1+...+δp−1∆yt−p+1+t (3.2) In which α is a constant, β the coefficient of a time trend and p the lag order of the autoregressive process. The null hypothesis of the ADF test is that γ = 0 against the alternative hypothesis ofγ <0. If a time series has no clear trend and a mean equal or close to 0 we impose the restrictionsα= 0 and β = 0 . Similarly, if no trend exists but the mean is not equal to 0 we impose the restriction β = 0. Under the null hypothesis this corresponds to modelling a random walk and a random walk with drift, respectively.

The ADF test statistic is than calculated with the following equation:

DF = γ

SE(γ) (3.3)

This DF statistic will be compared to the relevant critical value for the ADF test, which depends on the imposed constraints. If the DF statistic is less than the critical value, the null hypothesis ofγ = 0 is rejected and we can conclude that the time series doesn’t contain a unit root and as such is stationary.

3.1.3 Descriptive statistics

This section compares the statistics and time series of the different oil futures contracts and returns series. Figure 3.3 shows the daily price movements of the five different 1st month futures contracts from January 1st 2007 until the 7th of may 2013. The price movements are all very similar which implies that the returns of the futures contracts are highly correlated. This is confirmed in table 3.2 which shows the correlation and covariance between the returns of the futures contracts. ICE gas oil seems to be the least correlated with the other futures, this can be explained by the fact that trading in ICE gas oil futures closes earlier than trading in the other futures contracts, 17:30 and 20:30 respectively. However, the smallest correlation is still above 0.5.

Table3.1 shows the descriptive statistics of the different returns series. It is clear that:

• The mean return is close to 0%.

• The standard deviations of the daily returns are between 1.85% and 2.54%. • All return distributions have excess kurtosis, which indicates that the distributions

of the returns have fat tails and as such are not normally distributed.

• The skewness is on average quite small and negative, except in the case of gas oil (in which it is small and positive).

• The autocorrelations are relatively close to zero, which is often the case in returns series.

(24)

• There is significant autocorrelation in the squared daily returns series (see 3.4 and A.1)

From figure 3.4 it is clear that there is significant autocorrelation in the volatility of Brent returns (similar patterns are visible for other oil futures time series). More specifically, the returns series seem to suffer from volatility clustering. Autocorrelations and partial autocorrelations have been computed for the squared Brent returns series (see A.1 in the appendix). It turns out that autocorrelations are present between current volatility and lags one till five. To account for these autocorrelations we will use a GARCH(1,1) for volatility (Bollerslev (1986)) when estimating our daily VECM model.

Considering that the distribution of returns has excess kurtosis, is close to symmetric and that this remains true for the residuals after accounting for the autocorrelation in volatility we conclude that the distribution of the returns series is relatively similar to a student t distribution.

The monthly returns series doesn’t suffer form significant autocorrelation in the squared returns series, has a very small excess kurtosis and a mean close to one. There-fore the monthly VECM will be modeled using a normal distribution with no GARCH component.

Figure 3.3: Prices of the different first month futures contracts

This figure shows the daily prices of the first month futures contracts of five different oil products, that is the price movements of NYMEX Crude light, NYMEX RBOB, NYMEX Heating oil, ICE Brent Crude and ICE Gas oil. Firstly, a significant drop in prices can be seen at the start of the financial crises (i.e. the credit crises). Secondly, it is clear that the prices of crude oil are on average lower than

the prices of refined oil products. This is to be expected as the difference in price is basically the gross margin on which refineries make their profits.

(25)

Table 3.1: Descriptive Statistics of the Returns of Different Oil Products

NYM Crude L NYM RBOB NYM HO ICE Brent C ICE GO

Mean -0,0100 0,0665 0,0102 0,0382 0,0384 Standard Dev 2,5364 2,4717 2,0674 2,2412 1,8597 Kurtosis 7,1101 5,4314 5,2012 6,9163 6,0231 Skewness -0,0075 -0,2383 -0,2220 -0,2690 0,1010 Autocorr 1 -0,0458 -0,0361 -0,0280 -0,0748 0,0012 Autocorr 2 -0,0058 -0,0250 0,0049 -0,0105 0,0097

This table shows the descriptive statistics of the realized returns series of the different first month futures contracts. It is clear that in contrast to a normal distribution these returns series exhibit

significant excess kurtosis. Furthermore the mean is close to zero and there is a relatively small negative skewness.

Table 3.2: Covariance-Correlation Matrix of the Returns

NYM Crude L NYM RBOB NYM HO ICE Brent C ICE GO NYM Crude L 6,4334 4,6721 4,3412 5,0158 2,7340 NYM RBOB 0,7452 6,1093 4,0277 4,5485 2,4187

NYM HO 0,8279 0,7882 4,2743 4,1846 2,4983

ICE Brent C 0,8824 0,8211 0,9031 5,0228 2,5727

ICE GO 0,5796 0,5262 0,6498 0,6173 3,4585

The upper triangle of this table shows the covariances of the different oil products. The bottom triangle displays the correlations, the variance of each product can be found on the diagonal.

Figure 3.4: Squared Brent Returns

This figure shows the squared Brent returns series. This series serves as a proxy for volatility in Brent returns. It is evident from this picture that there is volatility clustering in the Brent returns series.

(26)

3.1.4 Stationarity & Cointegration of Oil Prices

This section discusses the stationarity of the price time series of the 1st month futures contracts with the 5 different oil products, described above, as the underlying. Table 3.3 shows the results of the ADF test. The price time series of the 1stmonth futures contracts all contain unit roots. In an attempt to make these series stationary we calculated the daily returns. The bottom part of table 3.3 shows the results of the ADF test performed on the daily returns series. The returns series are all stationary, therefore we will use the return series as dependent and if useful explanatory variables in the VECM.

If two time series have a unit root (i.e. a stochastic trend) but they don’t stray away to far from each other because they share similar stochastic trends, these time series are said to be cointegrated. Cointegrated series are best estimated using an error correction model, the cointegration relation can improve the predictive accuracy of the model. There are four spreads commonly used in the oil business to hedge or speculate that consist of combinations of the above mentioned oil products. We suspect that the spreads might represent cointegration relations, these four spreads are:

• The Atlantic spread of crude oil (ICE Brent Crude - NYMEX Crude Light). • The Atlantic spread of gas oil (ICE Gas oil - NYMEX Heating oil).

• The Crack spread of gasoline (NYMEX RBOB - NYMEX Crude Light).

• The Crack spread of Heating oil (NYMEX Heating oil - NYMEX Crude Light). (See 3.2.1)

The Atlantic spreads represent price differences between relatively similar products, produced and traded in different geological locations. The Crack spreads can be seen as the added value of refineries to the oil products (i.e. the margin on which refineries make their profits). To determine whether these spreads represent cointegration relations we perform the Augmented Dickey fuller test on the above mentioned pairs of price time series. If these spreads represent cointegration relations the results of the ADF test should indicate that they are stationary given that the price time series themselves are non-stationary. Table 3.4 shows the results of the ADF test. The results show that the Atlantic gas oil spread and the RBOB crack spread indeed represent cointegration relations between ICE gas oil-NYMEX heating oil and NYMEX RBOB-NYMEX crude light, respectively. However, the results also imply that the Atlantic crude spread and the Heating oil crack spread don’t represent cointegration relations. This can be explained by the level shift in the Atlantic crude spread and Heating oil crack spread which occurs at the start of 2011 (see figure 3.5). If we perform the ADF test on both these spreads with data up till but not including 2011. We find that the ADF rejects the null hypothesis of non-stationarity, with p-values of 0,0001 and 0,0261 for the Atlantic crude spread and the Heating oil crack spread, respectively. This indicates that these spreads might indeed represent a cointegration relation which is disturbed by the level shift. Therefore, we will adjust the Atlantic crude spread and Heating oil crack spread by regressing on a level shift dummy variable in order to obtain residuals that don’t contain a unit root. This

(27)

dummy variable is equal to 0 before the start of 2011 and 1 afterwards. After regressing on this dummy we performed the ADF test on the residuals the results can be found in the last two columns of table 3.4 which indicate that these residuals are stationary and can therefore be seen as the cointegration relation between the used price time series.

This level shift can be explained by the increased production of NYMEX crude light in the US and the export ban of crude oil from the US. More specifically, at the start of 2011 the US has increased its oil production while demand remained relatively unchanged. This together with the export ban on crude oil decreased the price (stagnated the upward price trend) of NYMEX crude light.

As stated above the best way to estimate a cointegrated series is by using an error correction model. The error correction model we use is the Vector error correction model (VECM) with a GARCH component for volatility, this model has been described in section 2.4. The error component (i.e. cointegration component) used in the VECM will be the (adjusted) spreads mentioned above.

Table 3.3: Augmented Dickey-Fuller Test on Oil products

NYM Crude L NYM RBOB NYM HO ICE Brent C ICE GO ADF Statistic -2,1413∗ -2,0439∗ -1,7202∗ -1,7654∗ -1,6295∗ Critical value -2,8646 -2,8646 -2,8646 -2,8646 -2,8646 P-value 0,2340 0,2772 0,4205 0,4005 0,4606

Unit root yes yes yes yes yes

Returns ADF Statistic -42,0871 -40,2074 -40,6539 -42,7545 -38,7669 Critical value -1,9416 -1,9416 -1,9416 -1,9416 -1,9416 P-value 0,0001 0,0001 0,0001 0,0001 0,0001 Unit root no no no no no Observations 1600 1600 1600 1600 1600 ∗

Indicates that an intercept was added to the ADF test.

This table displays the results of the augmented Dickey Fuller test on the time series consisting of daily prices of first month futures contracts with different underlying oil products and their returns series.

The top displays the results based on the prices of the futures contract, similarly the bottom part displays the results of the returns series. The time series consisting of the prices all contain unit roots,

(28)

3.2. DATA FOR VECM & PCA CHAPTER 3. DATA

Table 3.4: Augmented Dickey Fuller Test on Oil Spreads

ATL Crude∗ ATL GO∗ RBOB Crack∗ HO Crack∗ ATL Crude adj∗ HO Crack adj∗ ADF Stat -2,1623 -15,0290 -3.0709 -2,2339 -5.7032 -3,0485 Critical val -2,8645 -2,8645 -2,8645 -2,8645 -2,8645 -2,8645

P-value 0,2247 0,0001 0,0292 0,1946 0,0001 0,0310

unit root yes no no yes no no

# Obs 1600 1600 1600 1600 1600 1600

∗

: Indicates that an intercept has been included in the ADF test.

This table shows the results of the Augmented Dickey-Fuller test on the (adjusted) Atlantic and Crack spreads specified above.

3.2 Data for VECM & PCA

This section describes the data used in the VECM and principal component analysis. We will use several fundamental (marked by the *) and financial variables with two distinctive goals in mind. Firstly we will use the fundamental variables, stated below, in a principal component analysis in an attempt to asses the fundamental value (i.e. the intrinsic value) of first month oil futures contracts. More specifically, we attempt to cal-culate the fundamental value of the underlying oil products which as such also represents the fundamental value of the first month futures contracts. In order to keep the amount of explanatory variables to a minimum, to counteract parameter uncertainty, we will perform a principal component analysis on these fundamental variables. Secondly, we will use all variables mentioned below, that is the fundamental and financial variables, as possible explanatory variables in a vector error correction model (VECM). The VECM will be used to make daily and monthly forecasts of the returns of the first month futures contracts. Furthermore, it will serve as the benchmark model for the HAM. To be more precise, we will create daily and monthly forecasts with the VECM and compare its performance (i.e. predictive accuracy) to the performance of the HAM model.

For the prediction of the Atlantic and Crack spreads;

• Periodic functions. Sine and cosine variables with period of one year, an amplitude of one and zero phase shift.

• Freight rates. Freight rates between USG (US East Golf)-ARA(North-West Eu-rope) and Freight rate ARA-NYH.

• US EIA oil data. Crude inventories, distillate inventories, gasoline inventories, refinery capacity utilization.

The explanatory variables of the VECM;

(29)

• Calender Spreads. The difference between the prices of futures contract of 2 con-secutive months (2nd-1st), for all depend variables.

• The daily returns on a weighted average of the exchange rates of the worlds main oil importing countries against the US dollar: US dollar index returns.

• Consolidated stock index returns. That is, daily returns on a weighted average of the stock exchanges of the worlds main oil importing countries

• ∗_{World PMI index (i.e. weighted average of the PMI indexes of the worlds main oil}

importing countries). one month lagged differences for the 1st 2nd and 3th month of the World PMI index.

• ∗EIA international crude oil data and OPEC quota data. OECD commercial inventories, total production - total demand (World Supply) and OPEC quota data.

• ∗_{ARA oil inventories. ARA gas oil, gasoline and total oil product inventories.}

• ∗US EIA oil data. EIA refinery capacity utilization.

• ∗Singapore oil inventories. Both light and middle distillate inventories. • ∗IEA oil demand data. Change in IEA oil demand growth.

• ∗_{Cushing inventories and refinery capacity utilization. Cushing is a major trading}

hub for crude oil (i.e. Crude Light) in Oklahoma, United States

A more specific list of explanatory variables for the different models can be found in appendix A.4. In the upcoming sections we will discuss the relation between each of these variables and oil prices accompanied by an in dept analysis of the statistics of each time series.

3.2.1 Atlantic & Crack Spreads

The Atlantic spreads represent the price difference between the price of an oil futures contract in Europe and in the USA. The Atlantic crude spread is created by subtracting the price of the 1st month futures contract on NYMEX Crude Light from the price of a similar contract with ICE Brent Crude as the underlying. Similarly, the Atlantic gas oil spread is created by subtracting NYMEX heating oil from ICE gas oil.

Figure 3.5 displays the daily Atlantic crude spread from the 2nd of January 2007 until May 7th 2013 (1600 observations), again clearly showing the level shift at the start of 2011. The figure for the Atlantic gas oil spread can be viewed in the appendix (see figure A.9).

Crack spreads are the difference in price between oil products and crude oil and as such can be interpreted as a profit margin for refiners. Figure 3.6 displays the daily gasoline crack spread between January 2nd 2007 and May 7th 2013 (1600 observations).

(30)

This crack spread is created by taking the difference in price of the 1 month futures contracts of NYMEX RBOB and NYMEX crude light. Figure 3.6 shows that there is some indication of a seasonal pattern in the gasoline crack spread. An explanation for this can be that the rise in demand as a result of the US ”driving season” causes a rise in gasoline prices relative to crude prices. This will be accounted for, when estimating the Crack Gasoline spread model, by the sine and cosine explanatory variables stated above. Figure A.10 in the appendix shows the heating oil crack spread during the same time period. We clearly see an upward trend in the heating oil crack spread, interrupted by a downward trend at the start of the credit crises, indicating an increase in heating oil prices relative to crude oil prices.

Section 3.1.4 already gave the most important reason to compute models for these spreads the so called cointegration relation, between different oil products, captured by these models. Another possible use of the Atlantic and crack spread models is to make forecasts of these spreads and use these forecasts as explanatory variables in the models for the returns on the first month futures contracts. Because these series are composed of oil product prices, the movements of these spreads are correlated with movements in oil prices. Table A.2 displays the descriptive statistics of the Atlantic and crack spreads. These statistics confirm the statements made above. The Atlantic crude spread is on average positive confirming that crude oil is indeed cheaper in the US than the EU. Both Crack spreads have relatively large positive means representing the added value of refining oil. Lastly, all spreads except for the Atlantic gas oil spread have first and second order autocorrelations higher than 0.97. The lower autocorrelations in the Atlantic gas oil spread can again be attributed to the earlier closing time of ICE gas oil futures trading.

Figure 3.5: Atlantic Crude Spread

The Atlantic crude spread (ICE Brent Crude ($/bbl) - NYMEX Crude Light($/bbl)) from January 2nd 2007 until May 7th 2013

(31)

Figure 3.6: Gasoline Crack Spread

The gasoline crack spread (NYMEX RBOB($/bbl) - NYMEX crude light($/bbl)) from January 2nd 2007 until May 7th 2013

3.2.2 Freight rates

The Atlantic spreads are basically a comparison between 2 similar products in different geographical locations. If this spread becomes larger, in an absolute sense, it becomes more attractive to order the cheaper product no matter your geographical location. This makes sense if the price difference in oil products is larger than the cost of transportation (i.e. the freight rates). Therefore, the freight rates can have a significant effect on the Atlantic spreads. Figure 3.7 shows the freight rates for oil tankers traveling from ARA to New York or the US Gulf and from the US Gulf back to the ARA region. It is clear from this figure that the freight rates are highly correlated but do seem to differ in level. In order to asses whether these series of freight rates are stationary we perform the ADF test. Table A.3 shows the results of this test, according to the ADF test all freight rates series are non-stationary. However, because the ADF test can’t account for the economic value of variables and because for two out of the three series the ADF null hypothesis is only barely not rejected, with p-values of 0.0582 and 0.0598 we choose to use level freight rates as explanatory variables. Furthermore, because the freight rates will only be used in the Atlantic spread models, which main use is to provide a series which captures the cointegration relation between oil futures, the effect of the possible unit root should be negligible in the final VECM. The descriptive statistics of these series are shown in table A.4 in the appendix, the means are relatively close, with the difference possible being caused by the difference in transporting distance, loading, unloading and other harbor fees. They all have a kurtosis close to -0.1 and high autocorrelations.

(32)

Figure 3.7: Freight Rates

The Freight rates in ($/ton) for oil tankers traveling from the ARA to New York or the US Gulf and from the US Gulf to the ARA region, for the period ranging from 1st of January 2007 until 5th of May

2013

3.2.3 EIA Inventories

The current inventories of oil products can be used to make assumptions of short term supply and demand dynamics. Crude inventories are related to crude supply and refin-ery demand, whereas the inventory of oil products relates to end consumer demand and refinery supply. These inventories are related to the Atlantic and crack spreads because they provide insight in the current demand and supply of the U.S., and as such provide insight in the price movements of oil products.

Another important component in the supply of crude oil, especially related to crack spreads, is the availability of operational crude oil refining capacity. The availability of crude oil refining compared to the required refining capacity should, according to eco-nomic theory, influence the price of oil products and as such influence the crack spreads. In order to capture this relation we add EIA US refinery capacity utilization data, the only refinery capacity data available, to the list of possible explanatory variables. Figure 3.9 shows the US refinery capacity utilization.

To examine whether the EIA inventory (3.8) and the refinery capacity utilization data have unit roots we perform the Augmented Dickey-Fuller test. The results of this test are displayed in table A.5 in the appendix. According to the results of the ADF test each series of EIA inventory data contains a unit root, however after taking first differ-ences the series becomes stationary. Therefore, it is advised and customary to use the first difference series of the inventory data as possible explanatory variables in the spread and error correction models. However, the ADF test doesn’t account for economic value, according to economic theory the level of inventory is one of the main driving forces for price changes, because it directly relates to supply and demand. Therefore we will use a level inventory variable as well as the first difference series (i.e. the change in inventory). The results for the refinery capacity utilization data are significantly different. This

(33)

data is already stationary so we will use the original series of this data as an explanatory variable. The descriptive statistics of the EIA inventories and US refinery capacity can be found in Table A.6 in the appendix.

Figure 3.8: EIA Inventory Levels

EIA inventory data: US Crude oil, distillate and gasoline stocks from the 3rd of January 2007 until 24th of April 2013 (330 weekly observations)

Figure 3.9: US Refinery Capacity Utilization

This figure shows the US refinery capacity utilization from the 3rd of January 2007 until 24th of April 2013 (330 weekly observations)

3.2.4 Calender Spreads

Futures contracts with different physical delivery dates (i.e. time to maturity) are traded on commodity futures exchanges at every point in time. On the NYMEX and ICE oil futures exchanges contracts are being traded with delivery dates for every month in the future, with a maximum of three years. That is, you can buy a futures contract now and get the oil delivered any number of months from now up to three years, depending

Examining Oil Price Dynamics

Erasmus University Rotterdam

Erasmus School of Economics