382 www.hrmars.com
Modeling an Average Monthly Temperature of Sokoto Metropolis Using Short Term Memory Models
Yakubu Musa
Department of Mathematics (Statistics Unit), Usmanu Danfodiyo University, Sokoto, Nigeria Corresponding author: [email protected]
DOI: 10.6007/IJARBSS/v4-i7/1030 URL: http://dx.doi.org/10.6007/IJARBSS/v4-i7/1030 ABSTRACT
In this paper, the results of seasonal modeling of Sokoto monthly average temperature have been obtained using seasonal autoregressive integrated moving average modeling approach.
Based on this seasonal modeling analysis, we conclude that , the best seasonal model among the models that are adequate to describe the seasonal dynamics for Sokoto city temperature is SARIMA (3,0,1)(4,1,0) 12, SARIMA (1,0,0)(0,1,1) 12 and SARIMA (4,0,2)(5,1,1) 12 models. These models are the only models that passed all the diagnostic tests and thus it can be used for forecasting at some future time.
KEY WORDS: Seasonality, SARIMA, Identification, Estimation, and Diagnostics test.
JEL: C2, C22
1. INTRODUCTION
The location of Sokoto in Nigeria is at Latitude 13
002 N and Longitude 05
015 E. Sokoto State is in the dry Sahel, surrounded by sandy Savannah and isolated hills. Sokoto is a city located in the extreme northwest of Nigeria. Sokoto as a whole is very hot area. The raining season is from June to October during which showers are a daily occurrence. From late October to February, during the cold season, the climate is dominated by the Harmattan wind blowing Sahara dust over the land. The dust dims the sunlight there by lowering temperatures significantly and also leading to the inconvenience of dust everywhere in houses.
The purpose of univariate time series modeling is to provide a simple description of the basic features of a single time series in terms of its own past and errors of the process in the past.
The emphasis is purely on making use of the past information in the time series for forecasting its future. In addition to producing forecasts, time series models also produce the distribution of future values conditional upon the past, and can thus be used to evaluate the likelihood of certain events.
Seasonality is the systematic although not necessarily regular, intra-year movement caused by the changes of weather, the calendar, and timing of decisions, directly or indirectly through the production and consumption decisions made by the agents of the economy (Hylleberg, 1990).
These decisions are influenced by endowments, the expectations and preferences of the
agents, and the production techniques available in the economy. Such seasonal patterns can be
observed for many macroeconomic time series like gross domestic product, unemployment,
weather, industrial production or construction. Temperature is indispensable for sustaining life.
383 www.hrmars.com
Even a brief rise and falling of it can cause a serious effect on human and his economic activities. However, the term seasonality is also used in a broader sense to characterize time series that show specific patterns that regularly recur within fixed time intervals (e.g. a year, a month or a week).
1.2 SHORT TERM MEMORY MODELS AND SOME BASIC PROPERTIES
The following models were especially suited for understanding short-memory processes that is processes for which there is little or no shock persistence.
AR(P) Processes (Autoregressive Process of Order p) The general expression for an AR p process is
1 1 2 2
...
t t t p t p t
y y
y
y
E (1.01) Where
tis a white noise with variance
2 2
E E
t
Using the lag operator notation, this process can be written as
1
1L
2L
2 ...
pL
p y
t E
t(1.02)
More compactly,
L y
tE
t (1.03)
where
L 1
1L
2L
2...
pL
p The pth order polynomial L can be factorized as
L 1
1L 1
2L ... 1
pL
(1.04)
For AR p to be stationary in this factorization, the coefficients
i 1 ,that is all the roots of the polynomial must lie outside the unit circle, meaning that the values
i1must be greater than one in absolute value. This is the property of covariance stationarity of AR p process.
MA (q) Process The representation,
X
t c E
t
1E
t1
2E
t2 ...
qE
t q(1.05) is a time series X
tof order q representation, where E
tis a white noise with variance
2 2
E E
t
384 www.hrmars.com
where E
t~ IND 0,
2 .
The lag notation of MA q process (model) is:
L 1
1L
2L
2...
qL
q
(1.06)
The ARMA Process
The representation of a time series Y
tof an ARMA p q , process is:
1 1 2 2
...
1 1 2 2...
t t t p t p t t t q t q
Y Y
Y
Y
c E E
E
E
(1.07) where
t~ IND 0,
2
In lag notation, (1.07)) can be written as:
1
1L
2L
2 ...
pL Y
p
t c 1
1L
2L
2 ...
qL E
q
t(1.08)
or L Y
t c L E
t(1.09)
where
L 1
1 L
2L
2...
pL
p
L 1
1L
2L
2...
qL
q
1) Remarks: L Y
tis called the AR component of the process Y
t. L
tis the MA component of the process Y
t.
2) When L 1, thatis q 0, Y
t~ MA q . 3) When L 1, thatis q 0, Y
t~ AR p .
4) L is called the AR characteristics polynomial of the AR component of Y
t.
5) Similarly L is called the MA characteristics polynomial of the MA component of Y
t. 6) p is the order of the AR component and q is the order of the MA component. p q , is
the order of the ARMA process. In this case we write Y
t~ ARMA p q , to denote that the process Y
tis an ARMA process whose AR order is p and whose MA order is q.
An ARMA process is said to be invertible if it has an AR representation, while it is said to be covariance stationary if it has an MA representation, that is every AR process is invertible and every MA process is stationary.
For ARMA process we normally impose the condition that they are covariance stationary. But
most time series data are not stationary. This means that unless we can incorporate the
concept of non-stationarity in time series we would not be able to achieve our objective to
model such systems. This leads us to ARIMA process.
385 www.hrmars.com
Invertibility and Covariance Stationarity of an ARMA Process
Definition - (Invertible ARMA process): an ARMA process is said to be invertible if it has an AR representation.
Definition – (covariance ARMA stationarity): an ARMA process is said to be covariance stationary if it has an MA representation. We note that every AR process is invertible and every MA process is covariance stationary. Suppose Y
t~ ARMA p q , where p 0 and g 0 . Let
Y
thave the representation:
L Y
tc L E
t
(1.10) Suppose
1 L exists. Multiply (1.10) by
1 L . We have
1 1 1
t t
L L Y L c L L E
(1.11)
or
*
t t
Y c L E
which is MA representation of Y
t. Therefore the condition for Y
tto be covariance stationary is if
1 L exists. However, this exist iff the roots of the polynomial p x lie outside the unit circle.
For invertibility, we multiply (1.10) by
1 L . We have
1 1
t t
L L Y L L E
(1.12) or
L Y
tc
*E
t
However, for
1 L to exist, all the roots of L should lie outside the unit circle.
ARIMA Process
Most application time series are not stationary. And to model a time series, we normally
impose the condition that they are covariance stationary. This means that unless we can
incorporate the concept of non-stationarity in time series we would not be able to model such
systems. This leads us to ARIMA processes. ARIMA denotes Autoregressive Integrated Moving
Average. Or, an ARIMA process is an integrated ARMA process. Here is a class of non-stationary
processes which becomes stationary after a finite number of differencing. For such process let d
denote that number of terms the process needs to be differenced to become stationary. d is
called the order of integration of the processes. When such derived stationary processes can be
model as ARMA p q , , then we call the original process ARIMA p d q , , . It is a non stationary
process we need to differentiate it d times to become stationary and once stationary, we can
model it as ARMA p q , process.
386 www.hrmars.com
Representation:
An ARIMA p d q , , process Y
tdenoted by Y
t~ ARIMA p d q , , has the representation:
1
d
p
L L Y
tc
qL E
t
(1.13)
These polynomials are the same as defined in ARMA process. Because of the factor 1 L
din
(1.13) we call it ARIMA processes of difference order d. this is also known as unit root processes.
ARIMA denotes Autoregressive Integrated Moving Average. Here is a class of non-stationary process which becomes stationary after a finite number of differencing. When d is the finite number of differencing for the time series to be stationary, then d is called the order of integration of the processes, denoted by ARIMA p d q , , .
There are situations where an ARIMA process will not be enough to model, for example, the monthly average temperature of Sokoto, which is having seasonality in data(table: ) , in such situations SARIMA p d q , , P D Q , ,
smay be appropriate to model such series.
2. SARIMA MODELLING
The multiplicative seasonal autoregressive integrated moving average model, SARIMA is denoted by SARIMA ( , , )( , , ) p d q P D Q s (Box and Jenkins 1976), where p denotes the number of autoregressive terms, q denotes the number of moving average terms and d denotes the number of times a series must be differenced to induce stationarity. P Denotes the number of seasonal autoregressive components, Q denotes the number of seasonal moving average terms and D denotes the number of seasonal differences required to induce stationarity. The seasonal autoregressive integrated moving average model has the following representation:
(1 L ) (1
d L
s)
D ( ) L
s( ) L X
t a ( ) L
s( ) L e
t(2.01) where:
a is a constant,
{e
t} is a sequence of uncorrelated normally distributed random variables with the same mean ( ) and the same variance (
2)
L is the lag operator defined by L X
k t X
t k ( ) L 1
1L
2L
2 ...
qL
q ( ) L 1
1L
2L
2 ...
pL
p ( ) L 1
1sL
1s
2sL
2s ...
PsL
Ps ( ) L 1
1sL
1s
2sL
2s ...
QsL
Qs.
The selection of the appropriate seasonal ARIMA model for the data is achieved by an iterative
procedure based on three steps (Box et al, 1994).
387 www.hrmars.com
2.1 Model Identification
The Model Identification stage enables us to select a subclass of the family of SARIMA models appropriate to represent a time series. This involves stationary transformation, regular differencing, seasonal differencing and the Unit root and Stationarity tests (ADF, KPSS and HYGY).
Stationary transformations: Our task is to identify if the time series could have been generated by a stationary process. First, we use the time plot of the series to analyze if it is variance stationary. The series departs from this property when the dispersion of the data varies along time. In this case, the stationarity in variance is achieved by applying the appropriate Box- Cox transformation
1 0
( )
ln( ) 0
t
t
X
t X
X
and as a result, we get the series X
( ).
In some cases, especially when variability increases with level, such series can be transformed to stabilize the variance before being modeled with the Univariate Box-Jenkins- SARIMA method. A common transformation involves taking the natural logarithms of the original series.
The second part is the analysis of the stationarity in mean. The instruments are the time plot, the sample correlograms (ACF and PACF) and the tests for unit roots and stationarity. The path of a nonstationary series usually shows an upward or downward slope or jumps in the level whereas a stationary series moves around a unique level along time. The sample autocorrelations of stationary processes are consistent estimates of the corresponding population coefficients, so the sample correlograms of stationary processes go to zero for moderate lags.
When the series shows nonstationary patterns, we should take first differences and analyze if X
t( )is stationary or not in a similar way. This process of taking successive differences will continue until a stationary time series is achieved.
Regular differencing : To difference a data series, we define a new variable ( W
t) which is the change in Z
tfrom one time period to the next; that is,
W
t (1 L Z )
t Z
t Z
t1, t 1, 2,..., n (2.02) This working series" W
tis called the first difference of e
t. If the first differences do not have a constant mean, we might try a new W
t, which will be the second differences of Z
t, that is:
W
t ( Z
t Z
t1) ( Z
t1 Z
t2) Z
t 2 Z
t1 Z
t2,
Using the lag operator as shorthand (1-L) is the differencing operator since (1 L Z )
t Z
t Z
t1. Then, in general,
W
t (1 L Z )
d t388 www.hrmars.com
is a d-th order regular difference. That is, d denotes the number of nonseasonal differences.
Seasonal differencing: For seasonal models, seasonal differencing is often useful. For example, W
t (1 L
12) Z
t Z
t Z
t12(2.03)
is a first-order seasonal difference with period 12, as would be used for monthly data with 12 observations per year. Rewriting (2.03) and using successive resubstitution (i.e., using
12 12 24
t t t
W
Z
Z
) gives
Z
t Z
t12 W
t Z
t12 W
t12 W
t Z
t36 W
t24 W
t12 W
tand so on. This is a kind of “seasonal integration “, In general W
t (1 L
s)
DZ
tis a D
thorder seasonal difference with period S where D denotes the number of seasonal differences.
Unit Roots and Stationarity: Because the order of integration of a time series is of great important for the analysis, a number of statistical tests have been developed for investigating it. In this, case, we have to test the data, to know the level or if there is any need for seasonal and nonseasonal differencing before modeling the data. The tests are:
(i) Augmented Dickey-Fuller (ADF) Test: This test was first introduced by Dickey and Fuller (1979) to test for the presence of unit root(s). The regression model for the test is given as:
1
* 1
1 p
t t j t j t
j
X X
X
u
(2.04) in this model the pair of hypothesis H
0: 0 Versus H
1: < 0
H
0is rejected if the t-statistics is smaller than the relevant p-values (critical value).
If =0 (that is, under H
0) the series X
thas a unit root and is nonstationary, whereas it is regarded as stationary if the null hypothesis is rejected.
(ii) KPSS Test
This test (KPSS) has been proposed by Kwiatkowski et al (1992) where the hypothesis that the Data generating process (DGP) is stationary is tested against a unit root. If there is no linear trend term (i.e. trend stationary). ), the Data generating process is given by
X
t y
tz
t(2.05) where y
t A y
1 t1 ... A y
p t p v
tv
ti i d . . (0,
v2) They proposed the following statistics:
T-statistic
2
2 2
1
( ) 1
ˆ
T t k
t
t S
T
Where S
t2is the partial sum of the residuals, ˆ
2is the long run variance. Accept
389 www.hrmars.com
H
0when the t-statistics is less than the critical value, that is, X
tis stationary.
Reject H
0for large values of t
k(i.e. t
k> c
r), X
thas a unit root.
(iii) Seasonal Unit root (Hegy test)
This test has been proposed by hylleberg et al (1990) to check for seasonal unit root.
For monthly time series, Frances (1990) discussed the test for seasonal unit root based on the model
12X
t
1 1, 1z
t
2 2, 1z
t
3 3, 1z
t
4 3,z
t2
5 4, 1z
t
6 4,z
t2
7 5, 1z
t
8 5,z
t2
9 6, 1z
t
6 4,z
t2
7 5, 1z
t
8 5,z
t2
9 6, 1z
t
10 6,z
t2
11 7, 1z
t
12 7,z
t2
pj1
*j
12X
t j u
tThe number of lagged seasonal differences
4X
tjhas to be chosen before the HEGY test can be performed. The process X
thas a regular (zero frequency unit root if
1 0 and it has seasonal unit root if any one of the other
i(i = 2,3,…,12) is zero. If all the
i(i =1,…,12) are zero, then a stationary model for the monthly seasonal differences of the series is suitable .
2.2 Model Estimation
The parameters of the selected SARIMA (p, d, q)(P, D, Q) model can be estimated consistently by least-squares or by maximum likelihood estimation methods. Both estimation procedures are based on the computation of the innovations
tfrom the values of the stationary variable.
2.3 Model Diagnostic test
Once we have identified and estimated the SARIMA models, we assess the adequacy of the selected models to the data. This model diagnostic checking step involves both parameter and residual analysis by the use of ACF and PACF residuals plot, Ljung-Box Statistics and Normality test.
If the univeriate modeling procedure is utilized for forecasting purposes then this step can also form an important part of the diagnostic checking. This involves short forecast, middle forecast and long forecast statistics of the fitted models.
3.0 Modeling
The focus is to use the seasonal autoregressive integrated moving average (SARIMA) techniques
based on Box and Jenkins (1994) methodology to build models (Modelling) for the monthly
average temperature of Sokoto city using data set for the period January 1995 to December
2003. The SARIMA model is then used to perform an out of sample forecast for January 2004 to
December 2004. The data sets were obtained from the Metrological department, Sokoto State
International Air port.
390 www.hrmars.com
Fig 3.01: Time plot of monthly average temperature (X)
for Sokoto from 1995:1 to 2003:12.
Figure 3.02: ACF and PACF for the time series,
1995:1 to 2003:12.
Fig3.03: Range-Mean plot for Sokoto monthly average
temperature, 1995:1 to 2003:12
Fig 3.04: Sample periodogram of Sokoto temperature, 1995:1 to 2003:12
3.1 Identification of the seasonal models.
Time plot
Fig 4.01 displays the time plot of the monthly average temperature series. A noticeable feature
is the persistent recurrence of the pattern variability in all the periods, suggesting that the series
has a pronounced seasonal pattern and hence is not stationary. In this case a formal test has to
be carried out to test the presence or absence of seasonal unit root.
391 www.hrmars.com
ACF and PACF
Consider the ACF plot of Fig 4.02 in which the highest spikes always occur at lags 12, 24, 36, etc., this indicates that the series is seasonal with period 12. Also the series is highly autocorrelated and the correlation is very persistent. Since the autocorrelation at seasonal periods are positive we expected that the fitted model should have seasonal autoregressive (SAR) component. On the other hand the PACF shows that the model is a mixed model with both AR and MA components.
Range-mean plot
We observe in the Fig 4.03 that the ranges are not increasing or do not tend to increase with the means. This means that there is no strong positive relationship between the sample mean and the sample variances for each period in the data. Finally, this indicates that there is no need for a log transformation.
Spectral analysis:
Spectral analysis is a useful frequency domain tool for detecting the existence of periodicity in a time series (Hamilton, 1994). This can be achieved by plotting the periodogram or spectral density of the series against either period or the frequency.
It can be seen in Fig 4.04 that there is a large-scale component at a frequency of nine cycles, precisely. In this case, there were 108 samples (9 years of data). Therefore, a frequency of nine is nine cycles every 108 months, or one cycles every 12 months (108/9). There is also another spike at a frequency of 18, which corresponds to a period of 6 months (108/18). The frequency spectrum clearly shows that there are both seasonal (12 month) and monthly (6 months) cycles in the sokoto temperature data .the height of the spikes tell you how much each spectral component contribute to the original data.
Unit root test
We use two methods to determine the order of non-seasonal integration of the series: ADF (Augmented Dickey-Fuller) and KPSS tests. The ADF test checks the null hypothesis of unit root against the alternative of stationarity for the data generating process. The KPSS test checks the null hypothesis of stationarity against the alternative of a unit root for the data generating process. The results for the ADF and KPSS tests are in Table 4.01. At the 5% significant level, the ADF test rejects the null hypothesis of unit root and KPSS test does not rejects the null hypothesis of stationarity. Therefore conclusively the time series does not required non- seasonal differencing.
HYGY Test
The HYGY statistic tests the null hypothesis there is no seasonal unit root against the alternative
seasonal unit root. The p-value in table 4.01 is 0.006. Hence the null hypothesis of no seasonal
is rejected at 5% significance level, confirming our expectation that the time series is seasonally
integrated.
392 www.hrmars.com
Table 3.01: Summary results for different tests.
TEST t-statistics p-values
ADF -4.6388 0.00011
KPSS 0.0189 0.146
HYGY for levels 0.5567 0.006
HYGY test for seasonal difference 0.0756 0.405 Penalty function criteria
To specified the range of values of the SARIMA parameters ( , , )( , , ) p d q P D Q s . The values of three of the parameters are known now: s =12, d = 0, and D = 1; we have shown that the order of nonseasonal integration is zero; the order of seasonal integration is 1 and the periods of seasonality 12.
For the parameter space p = 0,1,2,…,5; q = 0,1,2,…,4; P = 0,1,2,…,6; Q =0,1,2,…,4, the most parsimonious models given by the two information criteria AIC and BIC using ASTSA are:
1. SARIMA (1, 0, 0)(0,1,1) 12 2. SARIMA (2, 0, 1)(2,1,0) 12 3. SARIMA (2, 0, 2)(3,1,2) 12 4. SARIMA (3, 0, 1)(4,1,0) 12 5. SARIMA (4, 0, 2)(5,1,1) 12
An extension of the search to any wilder parameter space produced the same results. This confirms the optimality of the five models above.
3.2 Estimation of Models
The parameter estimation results show that all the models parameters are significant by using their standard error with their P – values. The Table below represents the estimates:
Table 3.02: Models Estimations
Model Predictor Coefficient Std. Error T- ratio P- value SARIMA
(1,0,0)(0,1,1)
AR(1) 0.24 0.096 2.53 0.13
SMA(1) 0.78 0.073 10.72 0.000
Model Predictor Coefficient Std. Error T- ratio P- value SARIMA
(2,0,1)(2,1,0)
AR(1) -0.3372 0.1880 -1.7938 0.077
AR(2) 0.112 0.1112 0.1006 0.920
MA(1) -0.4813 0.2029 -2.3717 0.021
SAR(1) -0.5769 0.0990 -5.8273 0.000
SAR(2) -0.4452 0.0933 -4.7695 0.000
393 www.hrmars.com
Model Predictor Coefficient Std. Error T- ratio P- value SARIMA
(2,0,2)(3,1,2)
AR(1) 0.1303 0.0029 45.31 0.000
AR(2) 0.3675 0.0032 115.56 0.000
MA(1) 0.2831 0.0037 76.65 0.000
MA(2) 0.9764 0.0037 263.99 0.000
SAR(1) 0.1137 0.0037 30.65 0.000
SAR(2) -0.5147 0.0036 -141.43 0.000
SAR(3) -0.0298 0.0032 -9.42 0.000
SMA(1) 0.9683 0.0037 260.66 0.000
SMA(2) -0.2426 0.0037 -65.30 0.000
Model Predictor Coefficient Std. Error T- ratio P- value SARIMA
(4,0,2)(5,1,1)
AR(1) 0.95 0.0017 546.9 0.000
AR(2) -0.38 0.0015 -248.6 0.000
AR(3) 0.34 0.0017 198.3 0.000
AR(4) -0.22 0.0015 -148.6 0.000
MA(1) -0.36 0.0017 -208.8 0.000
MA(2) 1.33 0.0017 774.3 0.000
SAR(1) -0.06 0.0017 -38.1 0.000
SAR(2) -0.52 0.0015 -350.7 0.000
SAR(3) -0.52 0.0016 -324.3 0.000
SAR(4) -0.28 0.0017 -162.9 0.000
SAR(5) -0.17 0.0017 -98.3 0.000
SMA(1) 0.96 0.0017 551.9 0.000
Model Predictor Coefficient Std. Error T- ratio P- value
SARIMA AR(1) 0.03 0.0068 3.9 0.000
394 www.hrmars.com
(3,0,1)(4,1,0) AR(2) -0.28 0.0097 -29.3 0.000
AR(3) 0.04 0.0093 4.3 0.000
MA(1) -1.18 0.0097 -122.5 0.000
SAR(1) -0.38 0.0082 -46.7 0.000
SAR(2) -0.65 0.0089 -73.0 0.000
SAR(3) -0.61 0.0095 -64.7 0.000
SAR(4) -0.31 0.0095 -32.3 0.000
3.3 Diagnostic checking
We test whether or not the residuals are generated by a white noise process by using (i) the ACF and PACF plots and using the Ljung- Box test to check whether or not the residuals are uncorrelated, (ii) normal probability plots and the Anderson-Darling test to test the normality of the residuals.
Figure 3.05: Sample ACF and PACF 0f ARIMA (1,0,0)(0,1,1) residuals
395 www.hrmars.com
Figure 3.06: Sample ACF and PACF of ARIMA (2,0,0)(2,1,0) residuals*
Figure 3.07: Sample ACF and PACF of ARIMA (2, 0,2)(3,1,2) residuals*
Figure 3.08: Sample ACF and PACF of ARIMA (3, 0,1)(4,1,0) residuals
396 www.hrmars.com
Figure 3.09: Sample ACF and PACF of ARIMA (4, 0,2)(5,1,1) residuals
Table 3.03 shows the results for Ljung-Box test, .The tests reveals that only the residuals for the models SARIMA (2,0,1)(2,1,0)12 and SARIMA (2,0,2)(3,1,2)12 are not uncorrelated, using the 5%
significance level; these two cases are identified by the symbol *.
TABLE 3.03: Ljung-Box statistics (to test the residual autocorrelation as a set rather than individuals
Seasonal ARIMA Models Lung-Box test
(1