**Forecasting the Time Series Data Using ARIMA with Wavelet **

**Jatinder Kumar, Amandeep Kaur and Pammy Manchanda ** Department of Mathematics,

### Guru Nanak Dev University, Amritsar, INDIA.

### email: bhatiajkumar@yahoo.com, amanmath437@gmail.com (Received on: August 18, 2015)

**ABSTRACT **

### Wavelet transform has got very high attention in various fields such as mathematics, signal processing, engineering, physics, economics and finance. This paper illustrates an application of wavelet transform in the time series analysis. A novel technique based on wavelet transform and ARIMA model to forecast the time series data is proposed. Time series data for closing prices of the energy sector companies, GAIL and ONGC, is used for this study. It is observed that Wavelet Transform combined with ARIMA model gives better results at forecasting than the direct use of ARIMA model on the data.

**Keywords: Forecasting, GAIL and ONGC data, wavelet transform, ARIMA models. **

**1. INTRODUCTION **

### Forecasting is the process to predict future situations based on past and present data

### and therefore, it is helpful in planning and future growth. Time series forecasting is widely

### used in a variety of fields such as economics, business, engineering, natural science and

### meteorological sciences. Stock market forecasting is required for the investors as it is an

### important issue in investment decision making. There are a number of time series forecasting

### models which tell us about the nature of the system generating time series by analyzing the

### historical data. These models are very helpful in forecasting optimally and understanding

### dynamic relationship between different variables. These include basic models such as a linear

### regression, simple moving average and exponential smoothing and some advanced models

### such as autoregressive moving average (ARMA), its extension autoregressive integrated

### moving average (ARIMA) and neural network. All the time forecasting models can be divided

### into two forms- stationary and non stationary. In stationary models, the statistical properties

### mean and variance remain constant while in non stationarity, these properties are time-

### dependent. Mostly, the real world time series data is non stationary as it contains extreme

### variations and these fluctuations occur with high frequency. So wavelet methods are most

### 431 *Jatinder Kumar, et al., J. Comp. & Math. Sci. Vol.6(8), 430-438 (2015)*

### suitable for such type of time series data. They have advantages over traditional Fourier methods in analyzing physical situations where the signal contains discontinuities.

### The wavelet transform decomposes the data (signal) in terms of both time and frequency, allowing us to effectively analyze the main frequency component and to extract local information from the signal. It gives good results in various fields such as signal processing, image coding and compression, and in certain areas of mathematics, as in solutions of partial differential equations or numerical analysis. The wavelet transform is also very effective for the time series analysis. In

^{9}

### , Ramsey provides some important properties of the *wavelets and discusses its applications in both economics and finance. S. Yousefi et al.*

^{10}

### describe a wavelet based prediction procedure for oil market and show that it outperforms the *future market in average. Gencay et al.*

^{8}

### discussed the use of wavelets in economics and finance *with many illustrations and examples. Davidson et al.*

^{5}

### show that wavelet analysis is effective in describing the unstable variance structure and general features of the commodity prices such *as structural breaks. Kumar, J. et al.*

^{6}

### describe the concept of neuro-fuzzy with wavelet *decomposition for stock market. A. J. Conejo et al.*

^{4}

### discuss the role of ARIMA with wavelet in electricity market forecasting. The stochastic models for forecasting oil prices are discussed *by Mohammed et al. in*

^{7}

### .

### This paper focuses on the month-ahead price forecast of a daily GAIL and ONGC (companies from energy sector) prices. Gas and Oil sectors are about as important to developed country as agriculture. These sectors play a very significant role in country's economy being the biggest contributors to both the central and state treasuries. India is the fourth-largest energy consumer of oil and gas in the world. Today to meet its growing petroleum demand, India is investing heavily in oil fields. Price fluctuations in oil and gas affect largely country's economy. There are some companies that deal with oil and gas section. In this paper we discuss two companies: Gas Authority of India Limited (GAIL) and Oil and Natural Gas Corporation Limited (ONGC). GAIL is the largest natural gas processing and distribution company in India and ONGC is an Indian multinational oil and gas company. The wavelet transform converts time series into constituent series which show more stable variance and no outlier and so it can be predicted more accurately. That is why, we use the wavelet transform as preprocessor in the procedure explained in this paper. For this purpose we take daily closing price data of Gas and Oil sector.

### The paper is structured as follows; section 2 and 3 describe the basic introduction and properties of wavelet and ARIMA model respectively. In section 4 forecasting procedure and result are explained. Finally, conclusion is described in section 5.

**2. WAVELETS **

### Fourier transform is an alternative representation of the original time series such that it summarizes information in the data as a function of frequency and therefore does not preserve information in time. This transform is good when working with stationary time series.

### But most financial time series are non stationary and exhibit quite complicated patterns over

### time such that trends, abrupt changes etc. The Fourier transform cannot efficiently capture

### these events. Wavelet transform overcomes most of the limitations of this transform. This combines information from both time-domain and frequency-domain and is also very flexible.

**Also in wavelet analysis, there are fewer coefficients compared to the Fourier analysis. **

### The wavelet transform utilizes a basic function, ∈ (ℝ) , called wavelet or mother wavelet that is stretched and shifted to capture features that are local in time and local in frequency. This function satisfies the following properties:

### (i) = ∫

^{( )}

### < ∞

### where is the Fourier transform of ψ. This condition, called admissibility condition, ensures that (ω) goes to zero quickly as ω→ 0. In fact, to guarantee that < ∞, it is necessary (0) = 0 which is equivalent to

### ∫ ( ) = 0 (2.1) (ii) Wavelet function must have unit energy. That is

### ∫ | ( )| = 1 (2.2) Equations (2.1) and (2.2) imply that at least some coefficients of the wavelet function must be different from zero and these departures from zero must cancel out. By combining several combinations of shifting and stretching of the mother wavelet, the wavelet transform is able to capture all the information in the time series and associate it with specific time horizons and locations in time.

### Wavelets can be defined in terms of the sequence of a pairs of filters or in terms of functions created through splines that satisfy certain properties. Here we will define precisely 'father' and 'mother' wavelets. Father wavelets generate the scaling coefficients and represent the very long scale smooth component of the signal whereas mother wavelets generate the differencing coefficients and represent deviations from the smooth component. Father wavelet acts as a low pass filter and the mother wavelet acts as a high pass filters. The application of both the father and mother wavelets allows separating the low-frequency components of a time series from its high-frequency components. For any suitable choice of function (. ) ∈ (ℝ), we define the corresponding father and mother wavelets:

,

### ( ) = 2 (2 − ) ; ∈ ℤ, ∈ ℤ

### ∫ ( ) = 1 And

,

### ( ) = 2 2 − ; = 1, 2, … ,

### ∫ ( ) = 0

### 433 *Jatinder Kumar, et al., J. Comp. & Math. Sci. Vol.6(8), 430-438 (2015)*

### Here

_{,}

### is the father wavelet and

_{,}

### is the mother wavelet. Given this family of basis functions, we can define a sequence of coefficients

,

### = ∫ ( )

_{,}

### ( )

### and

_{,}

### = ∫ ( )

_{,}

### ( ) ; = 1, 2, … ,

### where the

_{,}

### are the coefficients for the father wavelet, known as, “smooth coefficients", and

_{,}

### are "detail coefficients" obtained from mother wavelets.

### So from the coefficients, the function f (.) can be represented by ( ) = ∑

_{∈ℤ}

_{,}

_{,}

### ( ) + ∑

_{∈ℤ}

_{,}

_{,}

### ( ) + ⋯ + ∑

_{∈ℤ}

_{,}

_{,}

### ( )

### or f(t) can be represented as

### ( ) =

_{ }

### + + + ⋯ +

### Where = ∑

_{∈ℤ}

_{,}

_{,}

### ( ) and

### = ∑

_{∈ℤ}

_{,}

_{,}

### ( ) ; = 1, 2, … ,

### As the Discrete wavelet transform (DWT) represents a time series in terms of the coefficients that are associated with particular scales, so it is effective tool for the time series analysis. By applying the DWT to signal f(t), the signal f(t) is decomposed into different scales of resolution. The inverse wavelet transform reconstruct the signal from its wavelet coefficients. There are some software available for the applications of the wavelet transform.

### We use the wavelet toolbox from MATLAB. Applying DWT to signal s, means obtaining its wavelet coefficients, from these coefficients two parts of the signal are obtained. One is approximation which is smoothed part (or low frequency part) of the signal and other is detail (or high frequency part) of the signal. The approximation part maintains the fine structure of the signal. For the signal decomposition, a specific wavelet is required.

### There is variety of wavelets such as Daubechies, Symlet, Meyer, Morlet, etc and the choice of the mother wavelets depends on the characteristics of data. The Daubechies wavelet transforms have been increasingly used by signal processing researchers. In this paper we use db2 level three. This gives the least error among all wavelets (db3, db4, db5, db7) tested for our data series which is shown in the Table 1.

** **

**Table 1: Mean Absolute Error of price series with various wavelets **

### Wavelet GAIL Prices ONGC Prices

### db2 4.2803e-11 3.6550e-11

### db3 4.5054e-10 4.4776e-10

### db4 8.3617e-11 8.42992-11

### db5 1.3250e-10 1.2977e-10

### db7 9.1177e-11 8.6004e-11

**Figure 1**

**3. ARIMA **

### ARIMA (Autoregressive Integrated Moving Average) model, also known as Box Jenkins model

^{3}

### , is widely used in time series forecasting because of its flexibility in representing different time series, i.e. pure autoregressive(AR), pure moving average(MA) and combined AR and MA (ARMA) series. The ARIMA model is usually denoted as ARIMA (p, q, d). Here p is the number of autoregressive orders that specify which previous values from the series are used to predict current values. The order of differencing, d is applied to the series before estimating model. The series with trends are nonstationarity and ARIMA modeling assumes stationarity. So, differencing is necessary when trends are present and is used to remove their effect. The number of moving average orders, q, specify how deviations from the series mean for previous values are used to predict current values. The ARIMA model assumes that the future values of a time series have functional relationship with past and current observations and white noise. So the underlying process that generates the time series has the following form

### = + + + ⋯ + + − − − ⋯ − (3.1)

### i.e. the actual value depends on its p previous values and q previous random error terms . Here random disturbance term is assumed to be "white noise" i.e. it is identically and independently distributed with mean 0 and common variance across the all observations.

### In the ARIMA model (i=1, 2... p) and ( = 1, 2, … , ) are called autoregressive and moving average operators respectively. If q = 0, then (3.1) becomes an AR model of order p.

### When p = 0, the model reduces to an MA model of order q. Our main task of the ARIMA model building is to determine the appropriate model order (p, d, q). There are three iterative steps to build a suitable ARIMA model for forecasting which are explained below:

**(a) Model Identification: At the beginning stage, the data is plotted to observe the trends and ** stationarity. In case of non stationarity, degree of differencing, d is chosen by examining the plot of the data and plot of sample autocorrelation function (SACF). The sample autocorrelation and partial autocorrelation functions are used, as basic instruments, to identify stationarity of time series. The next step in the process is to determine the AR order p and MA order q for the differenced time series.

0 0.5 1 1.5 2 2.5 3

-1.5 -1 -0.5 0 0.5 1 1.5 2

Daubechies wavelet of order 2

### 435 *Jatinder Kumar, et al., J. Comp. & Math. Sci. Vol.6(8), 430-438 (2015)*

**(b) Parameter estimation: Once the order of parameters has been identified, the next step is ** to estimate these parameters. Good estimators of the parameters can be computed by assuming the data to be stationary and by maximizing the likelihood with respect to the parameters.

### There are two conditions on the parameters one is stationary condition, means < 1 and other invertibility condition, means < 1.

**(c) Diagnostic Checking: After the estimation of the parameters it is necessary to check ** whether the model assumptions are satisfied. The estimated model is appropriate if the residuals obtained are uncorrelated random variables with mean zero and constant variance.

### The Ljung-Box Q statistic is used to check the overall adequacy of the model. The test statistic Q, is:

### = ( + 2) ∑

^{( )}

### where

### ( ) =the residuals autocorrelation at lag k

### = the number of residuals K= the number of time lags

### If the hypotheses on the residuals are validated by test and plots, then the corresponding fitted model can be used to forecast the price.

**4. FORECASTING FRAMEWORK AND RESULTS **

### Since the wavelet-based forecasting procedure works best for large sample sizes around 100 and larger, so we take a large amount of data of GAIL and ONGC daily closing prices. The data is collected from BSE site over a period of 1 January 2012 to 31 January 2014.

### Both data series are divided in two sets: training set and validation set. First, the model is fitted using the training data set and then forecast the fitted model over the validation period. For the training set data, we take values from 1 January 2012 to 31 December 2013 and predict next one month values (from 1 January 2014 to 31 January 2014). The proposed technique consists of the following steps:

**1. At the first stage, we decompose the original time series with wavelet using MATLAB. As ** the discrete wavelet transform is very effective for time series, we use it for data decomposition. This transform decomposes the data into coarse and finer parts. The coarse scales exhibit the trend while finer scales show the seasonal influences and noise. For our procedure we use Daubechies of order 2 (as shown in figure 1) and decomposition level 3. So we have

### ( ) = + + +

**Figure 2: x-axis and y-axis represent no. of data values and closing values respectively **

**Figure 3: x-axis and y-axis represent no. of data points and closing values respectively **

**2. After the decomposition, appropriate ARIMA model is fitted on each scale. All the iterative ** steps explained in section 3 are performed to build a specific model for each decomposed part.

### Then this model is used to forecast the future values for the one month for each part.

0 200 400 600

200 300 400

Original GAIL signal

0 200 400 600

200 300 400

Reconstructed signal

0 200 400 600

200 300 400

A3

0 200 400 600

-20 0 20

D3

0 200 400 600

-10 0 10

D2

0 200 400 600

-10 0 10

D1

0 200 400 600

200 300 400

Original ONGC signal

0 200 400 600

200 300 400

Reconstructed signal

0 200 400 600

200 300 400

A3

0 200 400 600

-20 0 20

D3

0 200 400 600

-10 0 10

D2

0 200 400 600

-10 0 10

D1

### 437 *Jatinder Kumar, et al., J. Comp. & Math. Sci. Vol.6(8), 430-438 (2015)*

**3. Then using these decomposed and extended signals on different scales we reconstruct the ** signal with the help of following equation. Here ( ) is the estimate price for next one month ahead.

### ( ) = + + + (4.1)

### The performance of prediction is evaluated by the most used statistical measures of error: Mean Absolute Error (MAE), Root Mean Square Error (RMSE) and Mean Absolute Percentage Error (MAPE). These are defined as

### = ∑ | − |

### = ∑

^{(}

^{)}

### = ∑

^{|}

^{|}

### where and are the actual price and forecasted price respectively and n is total number of observations. The result of the study is given in the table 2 and table 3. It is observed from table 2 that for GAIL data series, the MAE of Wavelet-ARIMA model is 3.8595 which is very less than from the value 6.8936, MAE of ARIMA model only. Similarly RMSE and MAPE of Wavelet-ARIMA model is less than those values of ARIMA model. The same result can be observed in case of ONGC data series from table 3. It means for the given data set, the proposed wavelet-ARIMA procedure is superior to the direct use of ARIMA model. So ARIMA model gives better result with wavelet as preprocessor than the direct use of itself.

**Table 2: GAIL Data**

### ERROR MEASURES WAVELET-ARIMA ARIMA

### MAE 3.8595 6.8963

### RMSE 4.9310 7.9355

### MAPE 0.256 0.4552

**Table 3: ONGC Data**