Forecasting crude oil prices

Building a good time series model is the primary requirement for generating good oil price forecasts.

SHANU JAIN and AJAY GUPTA
Reliance Industries Limited

Viewed : 16063

Article Summary

Crude oil is one of the most important commodities in the world, accounting for one-third of global energy consumption. It is a starting material for most of the products that we use in everyday life, ranging from transportation fuels to plastics. Crude oil price fluctuations have a far reaching impact on global economies and thus price forecasting can assist in minimising the risks associated with volatility in oil prices. Price forecasts are very important to various stakeholders: governments, public and private enterprises, policymakers, and investors. According to economic theory, the price of crude oil should be easily predictable from the equilibrium between demand and supply, wherein demand forecasts are usually made from GDP, exchange rates and domestic prices, and supply is predicted from past production data and reserve data. Predicting demand for oil is usually straightforward, however supply is heavily affected by political activity such as cartelisation by OPEC to regulate prices, technological advances leading to the extraction of higher amounts of oil, and wars and other conflicts which can affect supply unpredictably.

Models incorporating economic parameters such as supply and demand and their determinants are known as structural models (see Equation 1). Even though structural models are found to be the most logical ways of modelling the prices of industrial products, the price of crude oil is affected by many other factors. One of these factors is that the price of crude oil is determined in the futures market which enables the purchase of a predefined amount of oil at a particular price in the future. Additionally, only 1% of the crude oil traded in futures contracts results in the actual purchase of a physical commodity; its chief purpose is to make money out of price fluctuations in crude oil. Hence the price of crude oil behaves more like a financial asset and therefore is more representative of the expectations of traders rather than just predictions based on economic theories of supply and demand:
Structural model:      Oil Price =
f(supply,demand)                                                             (1)

Time series model:    Oil Price=
f(Oil Price (t))                                                                     (2)

There are other categories of models which are non-structural and consider time variation of crude oil prices, known as time series models and generally formulated as in Equation 2. It is difficult to obtain reliable data to formulate a structural model, while time series data for crude oil prices is easily available and hence it is easier to build a time series model. We focus on time series modelling of crude oil prices in this article.

In time series models, it is assumed that the current price of crude oil reflects the effects of all influencing factors and that price forecasting can be done based on the behaviour of past crude oil prices. The main assumption in such models is that the past behaviour of oil prices can explain future prices. Although time series models can capture trends or any cyclical patterns in the data, there are limitations to the forecasting capability of these models when trend reversals are observed in the data or the repeating pattern captured in the model is not followed in future prices. Different trends in a time series can be classified as increasing, decreasing and periodic patterns (see Figure 1). Time series models are quite useful and forecast reasonably well when the data follows any of these type of trends.

We can easily observe the downtrends, uptrends and repeating patterns in crude oil prices within specific years (see Figure 2). Crude oil monthly price data is obtained from the US Energy Information Administration (EIA) website.¹ Different subsets of crude oil price data are formed to demonstrate the utility of time series modelling and its limitations in some scenarios.

Time series modelling techniques
Several methods are proposed in the literature to build time series models. They include autoregressive integrated moving average (ARIMA), generalised auto regressive conditional heteroscedastic (GARCH), Holt-Winters, autoregressive neural networks, and support vector regression.2 Various hybrid models are also suggested such as combination of ARIMA and neural networks with support vector regression, genetic algorithms and wavelets.3-7 Discussion of various methodologies applied for crude oil price modelling can be found in review articles available in the literature.8,7 We have used ARIMA and autoregressive neural networks for modelling oil prices, as these techniques cover both linear and non-linear types of modelling. A short description of these methods is given below.

ARIMA
ARIMA is the most widely used and well known technique for time series analysis, developed by Box and Jenkins. In an ARIMA model, future values are predicted as a linear combination of previous oil prices and the associated errors. This model consists of three parts: the AR (autoregressive) component is a linear combination of past observations; MA (moving average) is a linear combination of lagged error terms; and I (integrated) replaces the original series with differenced series. An ARIMA model is represented in the form of Equation 3:

âˆ†Dyt = c+φ1âˆ†Dyt-1 + … +φpâˆ†Dyt-p + εt + θ1εt-1 + … + θq εt-q                      (3)

Where, yt is value of variable at next time step t, φ and θ are AR and MA coefficients respectively, ε is lagged error terms and âˆ†Dyt represents Dth differenced time series. To build an ARIMA model, we need to specify three parameters: p, d and q; p represents the number of lag observations in an AR model; d is the degree of differencing; and q is the order of the moving average model. Before building a time series model, we need to make time series stationary. A time series is said to be stationary if mean, variance and covariance are constant in time. There is a formal statistical test known as the Augmented Dickey-Fuller (ADF) test for testing the stationarity of time series. Most of the time series data is usually not stationary, however it can be made stationary by data transformation techniques such as differencing and logarithm. Parameters p and q in the ARIMA model are determined from autocorrelation plots of time series. There are a set of rules which can guide in the interpretation of correlation plots and identify possible ARIMA models for further testing. For the model diagnosis, correlation plots of residuals and the Ljung box test are used. We have used a built-in ARIMA model in R to model the crude oil prices.

Autoregressive neural network
An autoregressive neural network (ANN) is a non-linear model in which future prices are expressed as a non-linear function of lagged prices in the series, in contrast to linear modelling in ARIMA. Additionally, neural network based models have the ability to learn and capture patterns in data sets without the need to specify the exact model form. Multilayer perceptron (MLP) is the most widely used ANN in forecasting problems. Typically, the model is composed of input layer, hidden layer and output layer. The connecting nodes in these layers are called neurons. Input to the neurons is mapped using transfer functions and the weighted average of output from all the nodes is sent to next layer. There are various parameters that need to be specified for an ANN model: number of hidden layers, number of neurons in each layer, type of transfer function, and number of lags. The selection of appropriate network parameters is crucial to the fitting and forecast accuracy of an ANN model. We have used the nnetar function in R to build a neural network model.