Wavelet Estimation of Time Series Regression with Long Memory Processes

(1)

Wavelet Estimation of Time Series Regression with Long

Memory Processes

Haibin Wu

University of Alberta

Abstract

This paper studies the estimation of time series regression when both regressors and disturbances have long memory. In contrast with the frequency domain estimation as in Robinson and Hidalgo (1997), we propose to estimate the same regression model with discrete wavelet transform (DWT) of the original series. Due to the approximate

de-correlation property of DWT, the transformed series can be estimated using the traditional least squares techniques. We consider both the ordinary least squares and feasible generalized least squares estimator. Finite sample Monte Carlo simulation study is performed to examine the relative efficiency of the wavelet estimation.

* I am really grateful to Yanqin Fan for guidance and comments, and to Robert Taylor and two anonymous referees for valuable comments. All the errors are the author’s.

Citation: Wu, Haibin, (2006) "Wavelet Estimation of Time Series Regression with Long Memory Processes." Economics Bulletin, Vol. 3, No. 33 pp. 1-10

Submitted: December 14, 2005. Accepted: December 6, 2006.

(2)

1. Introduction

Long memory processes play an important role in time series analysis and can be found in many fields such as physics, engineering, environmental sciences and economics. Intuitively, a long memory process is a stochastic process whose autocovariance function decays very slowly as the distance between observations tends to infinity. Due to this property, traditional time series techniques are no longer applicable. In the current literature, estimation and regression analysis about long memory processes is usually done in the frequency domain1.

In this paper, we look at the regression model where both regressors and disturbances have long memory property. Robinson and Hidalgo (1997) examine the asymptotic properties of the estimation of this model mainly in the frequency domain, and they show that feasible generalized least squares estimator using Fourier transform is aymptotically efficient. This paper proposes the estimation of the same model using discrete wavelet transform (DWT) of the independent and explanatory variables. DWT is a linear filter, when it is applied to a time series, the original time series is decomposed into coefficients at different time scales. Compared with Fourier transform in the frequency domain, DWT attains properties of the original series at different time scales, and at the same time gives information about the time locations at different scales. One important property of DWT which makes it ideal for analysis of long memory processes is the so-called de-correlation property. The de-de-correlation property ensures the coefficients obtained by applying DWT to the original series have very weak correlation (see Fan(2001)). So after the DWT, we can use the traditional econometric techniques to analyze long memory processes.

Since generalized least squares in the frequency domain is most efficient, we examine the relative efficiency achieved by the generalized least squares estimator using wavelet transformed series. After the DWT is applied to the dependent and explanatory variables, ordinary least squares is performed, then using the Cochrane-Orcutt method, we perform the generalized least squares regression. Though asymptotic theory cannot be established for the wavelet estimation, Monte Carlo simulation study shows that the relative efficiency of the estimator achieved by using the wavelet transform is reasonably good. This remaining part of this paper is organized as follows: part II gives a brief introduction of DWT and the approximate de-correlation property; part III discusses the frequency domain estimation using Fourier transform in the current literature; part IV displays the estimation procedure using wavelet methods; part V gives the finite sample simulation results of the relative efficiency of the wavelet estimation, and part VI concludes.

2. DWT and the De-correlation Property

We consider the discrete wavelet transform (DWT) for a finite series2. Let T=2J where J is a positive integer. Given a sequence

{ }

X_t T_t₌₁, let Wj,k (j=1,…J; k=0,…,Mj-1) denote the

1

One exception is Robinson and Hidalgo (1997) which also considers the estimation in time domain, I would like to thank the editor and referees for pointing this out.

2

Actually, y can be any finite data set (e.g., cross sectional sample). Since we are focusing on time series regression in this paper, we will use time series hereafter.

(3)

boundary independent DWT coefficients of a series where Mj is the number of boundary

independent DWT coefficients at level j. 2.1 The Discrete Wavelet Transform3

A wavelet filter 1 0 }

{h_l L_l₌− of length L where L is an even integer must have the following three properties: Zero Sum: 0; 1 0 =

∑

− = L l l h Unit Energy; 1; 1 0 2 =

∑

− = L l l h Orthogonality: ₂ ₂ 0 1 0 = = ∞ ₊ −∞ = + − =

∑

l n l l n l L l l h h h

h for all nonzero integers n.

It is clear that a wavelet filter must be a difference filter. The scaling filter { } 01

− = L l l g is the quadrature mirror filter corresponding to the wavelet filter {h_l}_lL₌−₀1 by the relationship

gl=(-1)l+1hL-1-l.

Let {h_l}_lL₌−₀1 be a Daubechies compactly supported wavelet filter, then the transfer function of {h_l}is just the discrete Fourier transform:

( ) exp( 2 ) 1 0 fl i h f H L l l π − =

∑

− = (1) Similarly, let G(f) be the transfer function of {g_l}. We can write the squared gain function of {h_l}as: H(f)=|H(f)|2=DL/2(f)C(f) (2) Where C(f)= ( )cos ( ) 2 1 2 1 2 / 0 1 2 / 1 f l L l l L l L

∑

π − = + −

− is the squared gain function of a high-pass filter,

and D(f)=4sin2(πf) is the squared gain function of a first order backward difference filter. Thus a wavelet filter can be viewed as a two-stage filter. The first stage is a L/2th order backward difference filter, while the second stage is a high-pass filter. Hence we know wavelet filter {h_l}and scaling filter {g_l}are approximately high-pass and low-pass filter, i.e., they preserve information at high or low frequencies of the original series.

The DWT can be performed using the so-called pyramid algorithm. The algorithm works as follows: At first stage, the time series {X_t}T_t₌₁ with T=2J is filtered using the wavelet and scaling filters and downsampled by two. The resulting coefficients consist of a vector of wavelet coefficients W1 and scaling coefficients V1 each with length T/2. At

the second stage, the vector of scaling coefficients are filtered with both wavelet and scaling filtered to obtain a vector of wavelet W2 and scaling coefficients V2 each with

length T/4. The process continues until we reach level J to get a wavelet and scaling coefficients each with length 1.

To see more clearly the effects of DWT of the original series with pyramid algorithm. Consider the level 1 scaling coefficients V1, which is obtained by filtering the original

3

(4)

series with scaling filter {g_l}and downsampled by two. That is, the input is the original series, the operator is the scaling filter and the output is the level 1 scaling filter. At second stage, the level 1 scaling coefficients are filtered again by level 2 wavelet and scaling filters. In other words, the input is V1, the operator is the level 2 wavelet and

scaling filters, and the output is W2 and V2. It is clear that we obtain the level 2 wavelet

coefficients by a scaling filter first and then by a wavelet filter, and we can have the same wavelet coefficients W2 by a new filter which is the convolution of the level 1 scaling

filter and level 2 wavelet filter. Define H2(f) as the transfer function of the new filter

which is the convolution of the level 1 scaling filter and level 2 wavelet filter, we have H2(f)=H(2f)G(f) |f|≤1/2

where the transfer function of level 2 wavelet filter H(2f) captures the idea of downsampling by two since in the frequency domain which is equivalent to scaling the transfer function by two. Similarly, for the transfer function of the convolution of the level 1 scaling filter and level 2 scaling filter G2(f), we have

G2(f)=G(2f)G(f) |f|≤1/2

Continuing in this way, if Hj(f) and Gj(f) are the transfer functions of the equivalent filters

which are the convolution of level 1 through j-1 scaling filters and level j wavelet and scaling filters respectively, we have

( ) (2 ) (2 ) 2 0 1 _f _G _f H f H j i i j j

∏

− = − = |f|≤1/2 and ( ) (2 ) 1 0 f G f G j i i j

∏

− = = |f|≤1/2

2.2 Fractionally Integrated Processes and Long Memory

Before talking about the approximate de-correlation property of DWT, we briefly go over some important properties of long memory processes in this subsection for the purpose of illustrating the approximate de-correlation property of DWT of a long memory process. We focus on the discrete time realization of a fractionally integrated auto-regressive and moving average (FIARMA(0, d, 0)) process

{ }

X_t T_t₌₁ with differencing parameter d and innovation variance σ2:

(1-B)dXt=εt (3)

where B is backward shift operator with BXt=Xt-1. If the differencing parameter satisfies –

1/2≤d<1/2, then the process is stationary and its spectral density can represented as (see Taniguchi and Kakizawa (2000)):

S_X f f 2d 2 | ) sin( 2 | 2 ) ( = π − π σ (4) If 0<d<1/2, the process is a stationary long memory process, and if d≥1/2, the process is non-stationary, but we can difference the series to the [d+1/2]th order to obtain stationarity.

When 0<d<1/2, it is easy to show that

(5)

Compared with stationary short memory processes which have exponential rate of decay for the auto-covariance function. We can see the rate of decay for the long memory is much slower. That is where the name of long memory processes comes from.

2.3 The Approximate De-correlation Property of DWT

One important property of DWT which makes it ideal for long memory processes is the so-called approximate de-correlation property. It states that the DWT coefficients of a long memory process are approximately uncorrelated both with and across scales.

The covariance between DWT coefficients of a long memory process can be expressed as4: cov(W_j_k,W_j _k) exp(i2 f(2j(k' 1) 2j(k 1))H_j(f)H*_j'(f)S_X(f)df 2 / 1 2 / 1 ' ' ,' , =

∫

₋ π + − + (6)

where Hj(f) is the transfer function for the jth level wavelet filter, and SX(f) is the spectral

density function for a long memory process.

Fan (2003) shows formally that for the within scale correlation at level j, we have the following property: cov( , , , ') ([2 | '|] ( 2 )) d L j k j k j W O k k W = − − − , as 2j|k-k’|→∞ (7) The above equation gives the relationship between the rate of decay of auto-covariance and the time periods separating the wavelet coefficients given any level j and length L of the wavelet filter.

While for the across scale correlation, we have

cov(W_j_,_k,W_j_,'_k_')=O(L−3/4) uniformly in k and k’ (8) So when the length of wavelet filter tends to infinity, the across scale correlation is quite small even for fixed separating time periods between two wavelet coefficients.

Note the difference in rate of decay between the original long memory processes, and the DWT coefficients, compared with the original series, the DWT coefficients have much faster rate of decay.

3. DWT Estimation of Long Memory Processes

In this paper, we consider a regression model where both regressors and disturbances have long memory. Specifically, the model takes the following form:

Yt=α+β’Xt+µt (9)

where both Xt and ut possess long memory property, and ut is covariance stationary,

having zero mean and absolute continuous spectral density function.

Robinson and Hidalgo (1997) derived the asymptotic property for the frequency domain estimator. In this section, we describe how to estimate the same model using DWT of the original series. As in Robinson and Hidalgo (1997), the regression model takes the form as in equation (9) where both Xt and ut have long memory with

differencing parameter c and d. Let T=2J be the number of observations, and Tj=T/2j. Let

X k j

W _, denote the level j boundary independent DWT coefficients for X={X1, …XT}

obtained from the level j wavelet filter { ,} 01

− =j L l l j h , that is 4

(6)

∑

− = + −− − = 1 0 1 mod 1 ) 1 ( 2 , , j j j L l k l N l j X k j h X W , j=1, …J, k=0, …, Tj-1

where Lj=(2j-1)(L-1)+1 and Mj is the number of boundary independent coefficients at

level j. Similarly define W_jY,_k and U

k j

W , for Y={Y1, …YT} and U={u1, …uT}.

Consider the least squares regression with the DWT coefficients of Y and X:

W_jY,_k =α +βW_jX,_k +W_jU,_k (11) Due to the de-correlation property, we expect the OLS regression with DWT of the original series should perform reasonably well.

Although DWT has good de-correlation property, it is possible the residuals from the above OLS regression with DWT coefficients have correlation among them. To see how the generalized least squares estimator improves the efficiency of the estimation, we consider a simple feasible generalized least squares estimator using Cochrane-Orcutt method. After the OLS regression with DWT coefficients, we regress the estimated residuals on its own first lag:

Wˆ_jU,_k = ρWˆ_jU,_k−1+ε_j,_k (12) to get an estimated autocorrelation coefficient ρˆ . Then using the estimated autocorrelation coefficient, we transform the DWT coefficients to estimate the following model:

W~_jY_,_k =α +βW~_jX_,_k +W~_jU_,_k

where W~_jY_,_k =W_jY_,_k −ρˆW_jY_,_k₋₁ and similarly for X and U. If the specification in (12) is correct, i.e., there is just first order correlation among the residuals from OLD regression, then asymptotically this method yields efficient estimation.

4. Simulation Result

Simulation study similar to that in Robinson and Hidalgo (1997) is performed to examine the relative efficiency of wavelet estimation. Specifically, we compare the ratio of asymptotic variance 1 11

− −

−

Σ_g

T of the generalized least squares estimator in Robinson and Hidalgo (1997):

Σ_g11 =Γ(1−c+d)2/Γ(1−2c+2d)

−

to the Monte Carlo mean squared errors of DWT estimator, where c and d are the order of fractional integration for independent variable and error term. Since the feasible GLS estimator in Robinson and Hidalgo is quite efficient, we want to see the relative efficiency of DWT estimator to the feasible estimator in Robinson and Hidalgo (1997). In this paper, we use three wavelet filters from the family of the Daubechies compactly supported wavelet filters: Haar, D4 and LA8 wavelet filters. We generate 500 iterations. Table 1 shows the ratio of 1 11

− −

−

Σ_g

T to the mean squared errors of ordinary least squares estimator using level 1 DWT coefficients. Several interesting features can be noticed from Table 1. First, as in Robinson and Hidalgo (1997), the efficiency of OLS estimator of DWT coefficients increases in d and decreases in c. Second, the efficiency of DWT OLS estimator is lower than the feasible GLS estimators in Robinson and Hidalgo

(7)

(1997). One possible reason is that we use only level DWT coefficients which is of length

T/2. Third, the three wavelet filters perform quite similarly with Haar slightly better overall.

Table 2 shows the same ratios as in Table 1 using wavelet coefficients up to level 4. The significant improvement in efficiency with more levels of DWT coefficients is apparent compared with Table 1. The reason is simple, since with level 4 DWT coefficients, we use almost 94% of the complete DWT coefficients, so there is much more information used compared with estimation with only level 1 coefficients. The other general pattern is quite similar to that in Table 1.

Table 1: Ratios of T−1Σ−_g1 to MSE of DWT OLS (level 1) T=128 c/d 0.05 0.15 0.25 0.35 0.45 Haar 0.05 0.15 0.25 0.35 0.45 0.4563 0.3886 0.3923 0.2947 0.1709 0.5327 0.4602 0.4363 0.3591 0.2969 0.5264 0.5265 0.4353 0.4230 0.3909 0.6129 0.6015 0.5128 0.4541 0.4065 0.6395 0.6469 0.5597 0.5349 0.5110 c/d 0.05 0.15 0.25 0.35 0.45 D4 0.05 0.15 0.25 0.35 0.45 0.4364 0.3618 0.3855 0.2711 0.1641 0.5097 0.4555 0.4080 0.3095 0.2900 0.5675 0.4731 0.4532 0.4065 0.3207 0.5736 0.5839 0.5354 0.4778 0.4102 0.8056 0.6251 0.5996 0.5225 0.4597 c/d 0.05 0.15 0.25 0.35 0.45 LA8 0.05 0.15 0.25 0.35 0.45 0.4006 0.3843 0.3317 0.2716 0.1593 0.5672 0.4422 0.4036 0.3326 0.2483 0.5174 0.4800 0.4384 0.3787 0.3325 0.5598 0.5587 0.4995 0.4506 0.3532 0.7085 0.6173 0.5790 0.5346 0.4695

Table 3 gives the ratio of 1 11 − −

−

Σ_g

T to the mean square errors of feasible GLS estimator described above using level 4 wavelet coefficients. Compared with Table 2, an interesting observation is that the feasible GLS estimator performs relatively better compared with OLS estimator only when d is 0.35 or larger. The intuition is simple, since wavelet filter is two stage filter with de-correlation property, the usefulness of Cochrane-Orcutt method is evident only when the degree of long memory of disturbances is large enough.

(8)

5 .

Conclusion

This paper examines the relative efficiency of wavelet estimator of a regression model with both regressors and disturbances having long memory. Though not so efficient as the feasible GLS estimator proposed by Robinson and Hidalgo (1997), the wavelet estimator is quite useful due to its computational convenience. Much can be still done in this direction. Further work includes (1) comparing the efficiency performance for different sample size; (2) studying the performance when regressors or disturbances are nonstationary, i.e., c or d is greater than 0.5.

Table 2: Ratios of T−1Σ−_g1 to MSE of DWT OLS (level 4) T=128 c/d 0.05 0.15 0.25 0.35 0.45 Haar 0.05 0.15 0.25 0.35 0.45 0.9428 0.9110 0.8545 0.6958 0.4618 0.9942 0.9195 0.8331 0.7477 0.7706 0.9328 0.7585 0.8006 0.7223 0.7114 0.9001 0.8006 0.7125 0.7463 0.7527 0.8420 0.7350 0.6800 0.6486 0.5677 c/d 0.05 0.15 0.25 0.35 0.45 D4 0.05 0.15 0.25 0.35 0.45 0.8441 0.7857 0.7428 0.5845 0.4299 0.9582 0.8872 0.7328 0.6922 0.6972 0.8673 0.7457 0.7471 0.7015 0.6904 0.8837 0.7815 0.6764 0.7637 0.7063 0.8725 0.6863 0.6509 0.7070 0.5679 c/d 0.05 0.15 0.25 0.35 0.45 LA8 0.05 0.15 0.25 0.35 0.45 0.6942 0.7036 0.6234 0.4557 0.3547 0.8602 0.7720 0.6841 0.6016 0.5485 0.7942 0.7091 0.6632 0.6209 0.5603 0.8902 0.7333 0.6456 0.6332 0.6146 0.8502 0.7026 0.5671 0.6786 0.5460

(9)

Table 3: Ratios of T−1Σ−_g1 to MSE of DWT FGLS (level 4) T=128 c/d 0.05 0.15 0.25 0.35 0.45 Haar 0.05 0.15 0.25 0.35 0.45 0.9184 0.8806 0.8316 0.6765 0.4524 0.9978 0.8997 0.8030 0.7245 0.7366 0.8844 0.7568 0.7909 0.7029 0.6944 0.8897 0.8233 0.7218 0.7401 0.7725 0.8620 0.7299 0.7062 0.6579 0.5897 c/d 0.05 0.15 0.25 0.35 0.45 D4 0.05 0.15 0.25 0.35 0.45 0.8204 0.7678 0.7317 0.5555 0.4284 0.9550 0.8607 0.7549 0.6765 0.6737 0.8744 0.7483 0.7447 0.6916 0.6701 0.8659 0.7608 0.7047 0.7722 0.7402 0.8613 0.7281 0.7064 0.7198 0.6042 c/d 0.05 0.15 0.25 0.35 0.45 LA8 0.05 0.15 0.25 0.35 0.45 0.6664 0.6681 0.5994 0.4344 0.3351 0.8433 0.7642 0.6592 0.5849 0.5214 0.7869 0.7209 0.6632 0.6321 0.5571 0.9178 0.7384 0.6681 0.6241 0.6202 0.9112 0.7503 0.6237 0.6839 0.5738

(10)

References

Beran, J. (1994). Statistics for Long Memory Processes, Chapman & Hall

Fan, Yanqin (2003). “On the Approximate De-correlation Property of the Discrete Wavelet Transform for Fractionally Differenced Processes”, IEEE Transactions on Information Theory 49, 516-521

__________ and Brandon Whitcher (2001) “A Wavelet Solution to the Spurious Regression of Fractionally Integrated Processes”, working paper, Department of Economics, Vanderbilt University

Percival, D.B. and A.T. Walden (2000). Wavelet Methods for Time Series Analysis, Cambridge Series in Statistical and Probabilistic mathematics, Cambridge University Press

Robinson, P.M. and F.J. Hidalgo (1997) “Time Series Regression with Long Range Dependence”, The Annals of Statistics, 25(1), 77-104

Taniguchi, Masanobu and Yoshihide Kakizawa (2000) Asymptotic Theory of Statistical

Inference for Time Series, Springer Series in Statisitcs

Tasy, Wen-Jen and Ching-Fan Chung (2000) “The Spurious Regression of Fractionally Integrated Processes”, Journal of Econometrics, (96), 155-182