Practical Issues in the Analysis of Univariate GARCH Models

(1)

Practical Issues in the Analysis of Univariate GARCH

Models

∗

Eric Zivot

†

April 18, 2008

Abstract

This paper gives a tour through the empirical analysis of univariate GARCH models for financial time series with stops along the way to discuss various prac-tical issues associated with model specification, estimation, diagnostic evaluation and forecasting.

1 Introduction

There are many very good surveys covering the mathematical and statistical prop-erties of GARCH models. See, for example, [9], [14], [74], [76], [27] and [83]. There are also several comprehensive surveys that focus on the forecasting performance of GARCH models including [78], [77], and [3]. However, there are relatively few surveys that focus on the practical econometric issues associated with estimating GARCH models and forecasting volatility. This paper, which draws heavily from [88], gives a tour through the empirical analysis of univariate GARCH models for financial time series with stops along the way to discuss various practical issues. Multivariate GARCH models are discussed in the paper by [80]. The plan of this pa-per is as follows. Section 2 reviews some stylized facts of asset returns using example data on Microsoft and S&P 500 index returns. Section 3 reviews the basic univariate GARCH model. Testing for GARCH eﬀects and estimation of GARCH models are covered in Sections 4 and 5. Asymmetric and non-Gaussian GARCH models are dis-cussed in Section 6, and long memory GARCH models are briefly disdis-cussed in Section 7. Section 8 discusses volatility forecasting, and final remarks are given Section 91.

(2)

Asset Mean Med Min Max Std. Dev Skew Kurt JB Daily Returns MSFT 0.0016 0.0000 -0.3012 0.1957 0.0253 -0.2457 11.66 13693 S&P 500 0.0004 0.0005 -0.2047 0.0909 0.0113 -1.486 32.59 160848 Monthly Returns MSFT 0.0336 0.0336 -0.3861 0.4384 0.1145 0.1845 4.004 9.922 S&P 500 0.0082 0.0122 -0.2066 0.1250 0.0459 -0.8377 5.186 65.75

Notes: Sample period is 03/14/86 - 06/30/03 giving 4365 daily observations. Table 1: Summary Statistics for Daily and Monthly Stock Returns.

2 Some Stylized Facts of Asset Returns

Let Pt denote the price of an asset at the end of trading day t. The continuously compounded or log return is defined as rt = ln(Pt/Pt−1). Figure 1 plots the daily log returns, squared returns, and absolute value of returns of Microsoft stock and the S&P 500 index over the period March 14, 1986 through June 30, 2003. There is no clear discernible pattern of behavior in the log returns, but there is some per-sistence indicated in the plots of the squared and absolute returns which represent the volatility of returns. In particular, the plots show evidence of volatility clus-tering - low values of volatility followed by low values and high values of volatility followed by high values. This behavior is confirmed in Figure 2 which shows the sample autocorrelations of the six series. The log returns show no evidence of serial correlation, but the squared and absolute returns are positively autocorrelated. Also, the decay rates of the sample autocorrelations of r2_t _{and |r}t| appear much slower, especially for the S&P 500 index, than the exponential rate of a covariance station-ary autoregressive-moving average (ARMA) process suggesting possible long memory behavior. Monthly returns, defined as the sum of daily returns over the month, are illustrated in Figure 3. The monthly returns display much less volatility clustering than the daily returns.

Table 1 gives some standard summary statistics along with the Jarque-Bera test for normality. The latter is computed as

JB = T 6 Ã [ skew2+( dkurt − 3) 2 4 ! , (1)

where [skew denotes the sample skewness and dkurt denotes the sample kurtosis. Under the null that the data are iid normal, JB is asymptotically distributed as chi-square

∗_{The paper was prepared for the Handbook of Financial Time Series, edited by T.G. Andersen,}

R.A. Davis, J-P Kreiss, and T. Mikosch. Thanks to Saraswata Chaudhuri, Richard Davis, Ron Schoenberg and Jiahui Wang for helpful comments and suggestions. Financial support from the Gary Waterman Distinguished Scholarship is greatly appreciated.

†_{Department of Economics, Box 353330, University of Washington. [email protected].} 1_{All of the examples in the paper were constructed using S-PLUS 8.0 and S+FinMetrics 2.0. Script}

(3)

1986 1990 1994 1998 2002 -0 .3 0 0 .1 0 Microsoft Returns 1986 1990 1994 1998 2002 -0 .2 0 0 .0 5 S & P 500 Returns 1986 1990 1994 1998 2002 0. 00 0. 08

Microsoft Squared Returns

1986 1990 1994 1998 2002

0.

000

0.

040

S & P 500 Squared Returns

1986 1990 1994 1998 2002

0.

00

0.

30

Microsoft Absolute Returns

1986 1990 1994 1998 2002

0.

00

0.

30

S & P 500 Absolute Returns

Figure 1: Daily returns, squared returns and absolute returns for Microsoft and the S&P 500 index.

with 2 degrees of freedom. The distribution of daily returns is clearly non-normal with negative skewness and pronounced excess kurtosis. Part of this non-normality is caused by some large outliers around the October 1987 stock market crash and during the bursting of the 2000 tech bubble. However, the distribution of the data still appears highly non-normal even after the removal of these outliers. Monthly returns have a distribution that is much closer to the normal than daily returns.

3 The ARCH and GARCH Model

[33] showed that the serial correlation in squared returns, or conditional heteroskedas-ticity, can be modeled using an autoregressive conditional heteroskedasticity (ARCH) model of the form

yt= Et−1[yt] + t, (2)

t= ztσt, (3)

(4)

Lag AC F 0 5 10 15 20 0.0 0 .6 Microsoft Returns Lag AC F 0 5 10 15 20 0.0 0 .6 S&P 500 Returns Lag AC F 0 5 10 15 20 0.0 0 .6

Lag AC F 0 5 10 15 20 0.0 0 .6

S&P 500 Squared Returns

Lag AC F 0 5 10 15 20 0.0 0 .6

Figure 2: Sample autocorrelations of rt, r2t and |rt| for Microsoft and S&P 500 index. where E_t−1_{[·] represents expectation conditional on information available at time t−1,} and ztis a sequence of iid random variables with mean zero and unit variance. In the basic ARCH model ztis assumed to be iid standard normal. The restrictions a0 > 0 and ai ≥ 0 (i = 1, . . . , p) are required for σ2t > 0. The representation (2) - (4) is convenient for deriving properties of the model as well as for specifying the likelihood function for estimation. The equation for σ2_t can be rewritten as an AR(p) process for 2_t

2

t = a0+ a1 2_t−1+ · · · + ap 2_t−p+ ut, (5)

where ut= 2t− σ2t is a martingale diﬀerence sequence (MDS) since Et−1[ut] = 0 and it is assumed that E( 2_t_{) < ∞. If a}1+ · · · + ap < 1 then t is covariance stationary, the persistence of 2_t and σ2_t is measured by a1+ · · · + ap and ¯σ2 = var( t) = E( 2t) = a0/(1 − a1− · · · − ap).

An important extension of the ARCH model proposed by [12] replaces the AR(p) representation in (4) with an ARMA(p, q) formulation

σ2_t = a0+ p X i=1 ai 2_t−i+ q X j=1 bjσ2_t−j, (6)

(5)

1986 1990 1994 1998 2002 -0 .3 0 .4 Microsoft Returns 1986 1990 1994 1998 2002 -0 .2 0 0 .1 0 S&P 500 Returns 1986 1990 1994 1998 2002 0. 02 0. 18

1986 1990 1994 1998 2002

0.

005

Lag AC F 0 5 10 15 20 0.0 0 .6

Figure 3: Monthly Returns, Squared Returns and Sample Autocorrelations of

Squared Returns for Microsoft and the S&P 500.

positive to ensure that the conditional variance σ2_t is always positive.2 The model in (6) together with (2)-(3) is known as the generalized ARCH or GARCH(p, q) model. The GARCH(p, q) model can be shown to be equivalent to a particular ARCH(∞) model. When q = 0, the GARCH model reduces to the ARCH model. In order for the GARCH parameters, bj (j = 1, · · · , q), to be identified at least one of the ARCH coeﬃcients ai (i > 0) must be nonzero. Usually a GARCH(1,1) model with only three parameters in the conditional variance equation is adequate to obtain a good model fit for financial time series. Indeed, [49] provided compelling evidence that is diﬃcult to find a volatility model that outperforms the simple GARCH(1,1).

Just as an ARCH model can be expressed as an AR model of squared residuals, a GARCH model can be expressed as an ARMA model of squared residuals. Consider the GARCH(1,1) model

σ2_t = a0+ a1 2_t−1+ b1σ2_t−1. (7)

Since E_t−1( 2

t) = σ2t, (7) can be rewritten as 2

t = a0+ (a1+ b1) _t−12 + ut− b1ut−1, (8)

2_{Positive coeﬃcients are suﬃcient but not necessary conditions for the positivity of conditional}

(6)

which is an ARMA(1,1) model with ut = 2t − Et−1( 2t) being the MDS disturbance term.

Given the ARMA(1,1) representation of the GARCH(1,1) model, many of its properties follow easily from those of the corresponding ARMA(1,1) process for 2_t. For example, the persistence of σ2_t is captured by a1+ b1 and covariance stationarity requires that a1 + b1 < 1. The covariance stationary GARCH(1,1) model has an ARCH(∞) representation with ai = a1bi−11 , and the unconditional variance of t is ¯

σ2 = a0/(1 − a1− b1).

For the general GARCH(p, q) model (6), the squared residuals t behave like an ARMA(max(p, q), q) process. Covariance stationarity requires Pp_i=1ai+Pqj=1bi< 1 and the unconditional variance of t is

¯

σ2= var( t) =

a0

1 −³Ppi=1ai+Pqj=1bi

´. (9)

3.1 Conditional Mean Specification

Depending on the frequency of the data and the type of asset, the conditional mean E_t−1[yt] is typically specified as a constant or possibly a low order autoregressive-moving average (ARMA) process to capture autocorrelation caused by market mi-crostructure effects (e.g., bid-ask bounce) or non-trading effects. If extreme or un-usual market events have happened during sample period, then dummy variables associated with these events are often added to the conditional mean specification to remove these effects. Therefore, the typical conditional mean specification is of the form E_t−1[yt] = c + r X i=1 φ_iy_t−i+ s X j=1 θ_{j t−j}+ L X l=0 β0_lx_t−l+ t, (10)

where xtis a k × 1 vector of exogenous explanatory variables.

In financial investment, high risk is often expected to lead to high returns. Al-though modern capital asset pricing theory does not imply such a simple relationship, it does suggest that there are some interactions between expected returns and risk as measured by volatility. Engle, Lilien and Robins (1987) proposed to extend the basic GARCH model so that the conditional volatility can generate a risk premium which is part of the expected returns. This extended GARCH model is often referred to as GARCH-in-the-mean or GARCH-M model. The GARCH-M model extends the conditional mean equation (10) to include the additional regressor g(σt), which can be an arbitrary function of conditional volatility σt. The most common specifications are g(σt) = σ2t, σt, or ln(σ2t).

3.2 Explanatory Variables in the Conditional Variance Equation

Just as exogenous variables may be added to the conditional mean equation, exoge-nous explanatory variables may also be added to the conditional variance formula (6)

(7)

in a straightforward way giving σ2_t = a0+ p X i=1 ai 2_t−i+ q X j=1 bjσ2_t−j+ K X k=1 δ0_kz_t−k,

where zt is a m × 1 vector of variables, and δ is a m × 1 vector of positive coeﬃ-cients. Variables that have been shown to help predict volatility are trading volume, macroeconomic news announcements ([58], [43], [17]), implied volatility from option prices and realized volatility ([82], [11]), overnight returns ([46], [68]), and after hours realized volatility ([21])

3.3 The GARCH Model and Stylized Facts of Asset Returns

Previously it was shown that the daily returns on Microsoft and the S&P 500 ex-hibited the “stylized facts” of volatility clustering as well as a non-normal empirical distribution. Researchers have documented these and many other stylized facts about the volatility of economic and financial time series. [14] gave a complete account of these facts. Using the ARMA representation of GARCH models shows that the GARCH model is capable of explaining many of those stylized facts. The four most important ones are: volatility clustering, fat tails, volatility mean reversion, and asymmetry.

To understand volatility clustering, consider the GARCH(1, 1) model in (7). Usu-ally the GARCH coeﬃcient b1 is found to be around 0.9 for many daily or weekly financial time series. Given this value of b1, it is obvious that large values of σ2_t−1will be followed by large values of σ2_t, and small values of σ2_t−1 will be followed by small values of σ2_t. The same reasoning can be obtained from the ARMA representation in (8), where large/small changes in 2_t−1 will be followed by large/small changes in 2_t.

It is well known that the distribution of many high frequency financial time series usually have fatter tails than a normal distribution. That is, extreme values occur more often than implied by a normal distribution. [12] gave the condition for the existence of the fourth order moment of a GARCH(1, 1) process. Assuming the fourth order moment exists, [12] showed that the kurtosis implied by a GARCH(1, 1) process with normal errors is greater than 3, the kurtosis of a normal distribution. [51] and [52] extended these results to general GARCH(p, q) models. Thus a GARCH model with normal errors can replicate some of the fat-tailed behavior observed in financial time series. A more thorough discussion of extreme value theory for GARCH is given by [24]. Most often a GARCH model with a non-normal error distribution is required to fully capture the observed fat-tailed behavior in returns. These models are reviewed in sub-Section 6.2.

Although financial markets may experience excessive volatility from time to time, it appears that volatility will eventually settle down to a long run level. Recall, the unconditional variance of t for the stationary GARCH(1, 1) model is ¯σ2 = a0/(1 − a1− b1). To see that the volatility is always pulled toward this long run, the ARMA representation in (8) may be rewritten in mean-adjusted form as:

(8)

If the above equation is iterated k times, it follows that ( 2_t+k_{− ¯}σ2) = (a1+ b1)k( 2t − ¯σ2) + ηt+k,

where η_t is a moving average process. Since a1+ b1 < 1 for a covariance stationary GARCH(1, 1) model, (a1+ b1)k→ 0 as k → ∞. Although at time t there may be a large deviation between 2

t and the long run variance, 2t+k − ¯σ2 will approach zero “on average” as k gets large; i.e., the volatility “mean reverts” to its long run level ¯

σ2. The magnitude of a1 + b1 controls the speed of mean reversion. The so-called half-life of a volatility shock, defined as ln(0.5)/ ln(a1 + b1), measures the average time it takes for | 2t − ¯σ2| to decrease by one half. Obviously, the closer a1+ b1 is to one the longer is the half-life of a volatility shock. If a1+ b1> 1, the GARCH model is non-stationary and the volatility will eventually explode to infinity as k → ∞. Similar arguments can be easily constructed for a GARCH(p, q) model.

The standard GARCH(p, q) model with Gaussian errors implies a symmetric dis-tribution for ytand so cannot account for the observed asymmetry in the distribution of returns. However, as shown in Section 6, asymmetry can easily be built into the

GARCH model by allowing t to have an asymmetric distribution or by explicitly

modeling asymmetric behavior in the conditional variance equation (6).

3.4 Temporal Aggregation

Volatility clustering and non-Gaussian behavior in financial returns is typically seen in weekly, daily or intraday data. The persistence of conditional volatility tends to increase with the sampling frequency3. However, as shown in [32], for GARCH models there is no simple aggregation principle that links the parameters of the model at one sampling frequency to the parameters at another frequency. This occurs because GARCH models imply that the squared residual process follows an ARMA type process with MDS innovations which is not closed under temporal aggregation. The practical result is that GARCH models tend to be fit to the frequency at hand. This strategy, however, may not provide the best out-of-sample volatility forecasts. For example, [68] showed that a GARCH model fit to S&P 500 daily returns produces better forecasts of weekly and monthly volatility than GARCH models fit to weekly or monthly returns, respectively.

4 Testing for ARCH/GARCH eﬀects

The stylized fact of volatility clustering in returns manifests itself as autocorrelation in squared and absolute returns or in the residuals from the estimated conditional mean equation (10). The significance of these autocorrelations may be tested using

3

The empirical result that aggregated returns exhibit smaller GARCH eﬀects and approach Gaussian behavior can be explained by the results of [26] who showed that a central limit the-orem holds for standardized sums of random variables that follow covariance stationary GARCH processes.

(9)

the Ljung-Box or modified Q-statistic MQ(p) = T (T + 2) p X j=1 ˆ ρ2_j T − j, (12)

where ˆρ_j denotes the j-lag sample autocorrelation of the squared or absolute returns. If the data are white noise then the MQ(p) statistic has an asymptotic chi-square dis-tribution with p degrees of freedom. A significant value for MQ(p) provides evidence for time varying conditional volatility.

To test for autocorrelation in the raw returns when it is suspected that there are GARCH eﬀects present, [27] suggested using the following heteroskedasticity robust version of (12) MQHC(p) = T (T + 2) p X j=1 1 T − j Ã ˆ σ4 ˆ σ4+ ˆγ_j ! ˆ ρ2_j,

where ˆσ4 is a consistent estimate of the squared unconditional variance of returns, and ˆγ_j is the sample autocovariance of squared returns.

Since an ARCH model implies an AR model for the squared residuals 2_t, [33] showed that a simple Lagrange multiplier (LM) test for ARCH eﬀects can be con-structed based on the auxiliary regression (5). Under the null hypothesis that there are no ARCH eﬀects, a1= a2 = · · · = ap= 0, the test statistic

LM = T · R2 (13)

has an asymptotic chi-square distribution with p degrees of freedom, where T is the sample size and R2 is computed from the regression (5) using estimated residuals. Even though the LM test is constructed from an ARCH model, [61] show that it also has power against more general GARCH alternatives and so it can be used as a general specification test for GARCH eﬀects.

[64], however, argued that the LM test (13) may reject if there is general mis-specification in the conditional mean equation (10). They showed that such misspec-ification causes the estimated residuals ˆt to be serially correlated which, in turn, causes ˆ2_t to be serially correlated. Therefore, care should be exercised in specifying the conditional mean equation (10) prior to testing for ARCH eﬀects.

4.1 Testing for ARCH Eﬀects in Daily and Monthly Returns

Table 2 shows values of MQ(p) computed from daily and monthly squared returns and the LM test for ARCH, for various values of p, for Microsoft and the S&P 500. There is clear evidence of volatility clustering in the daily returns, but less evidence for monthly returns especially for the S&P 500.

5 Estimation of GARCH Models

The general GARCH(p, q) model with normal errors is (2), (3) and (6) with zt ∼ iid N (0, 1). For simplicity, assume that E_t−1[yt] = c. Given that t follows Gaussian

(10)

MQ(p) r_t2 LM Asset p 1 5 10 1 5 10 Daily Returns MSFT 56.81 (0.000) 562.1 (0.000) 206.8 (0.000) 56.76 (0.000) 377.9 (0.000) 416.6 (0.000) S&P 500 87.59 (0.000) 415.5 (0.000) 456.1 (0.000) 87.52 (0.000) 311.4 (0.000) 329.8 (0.000) Monthly Returns MSFT 0.463 (0.496) 17.48 (0.003) 31.59 (0.000) 0.455 (0.496) 16.74 (0.005) 33.34 (0.000) S&P 500 1.296 (0.255) 2.590 (0.763) 6.344 (0.786) 1.273 (0.259) 2.229 (0.817) 5.931 (0.821) Notes: p-values are in parentheses.

Table 2: Tests for ARCH Eﬀects in Daily Stock Returns

distribution conditional on past history, the prediction error decomposition of the log-likelihood function of the GARCH model conditional on initial values is

log L = T X t=1 lt= − T 2 log(2π) − 1 2 T X t=1 log σ2_t ₋1 2 T X t=1 2 t σ2 t , (14)

where lt = −1₂(log(2π) + log σ2t) − 12

2 t

σ2 t

. The conditional loglikelihood (14) is used in practice since the unconditional distribution of the initial values is not known in closed form4_{. As discussed in [69] and [20], there are several practical issues to} consider in the maximization of (14). Starting values for the model parameters c, ai (i = 0, · · · , p) and bj (j = 1, · · · , q) need to be chosen and an initialization of 2t and σ2

t must be supplied. The sample mean of yt is usually used as the starting value for c, zero values are often given for the conditional variance parameters other than a0 and a1, and a0 is set equal to the unconditional variance of yt5. For the initial values of σ2 t, a popular choice is σ2_t = 2_t = 1 T T X s=1 2 s, t ≤ 0,

where the initial values for s are computed as the residuals from a regression of yt on a constant.

Once the log-likelihood is initialized, it can be maximized using numerical op-timization techniques. The most common method is based on a Newton-Raphson iteration of the form

ˆ

θn+1= ˆθn− λnH(ˆθn)−1s(ˆθn),

4

[29] gave a computationally intensive numerical procedure for approximating the exact log-likelihood.

5

Setting the starting values for all of the ARCH coeﬃcients ai (i = 1, . . . , p)to zero may create

an ill-behaved likelihood and lead to a local minimum since the remaining GARCH parameters are not identified.

(11)

where θn denotes the vector of estimated model parameters at iteration n, λn is a scalar step-length parameter, and s(θn) and H(θn) denote the gradient (or score) vector and Hessian matrix of the log-likelihood at iteration n, respectively. The step length parameter λnis chosen such that ln L(θn+1) ≥ ln L(θn). For GARCH models, the BHHH algorithm is often used. This algorithm approximates the Hessian matrix using only first derivative information

−H(θ) ≈ B(θ) = T X t=1 ∂lt ∂θ ∂lt ∂θ0.

In the application of the Newton-Raphson algorithm, analytic or numerical deriva-tives may be used. [41] provided algorithms for computing analytic derivaderiva-tives for GARCH models.

The estimates that maximize the conditional log-likelihood (14) are called the maximum likelihood (ML) estimates. Under suitable regularity conditions, the ML estimates are consistent and asymptotically normally distributed and an estimate of the asymptotic covariance matrix of the ML estimates is constructed from an estimate of the final Hessian matrix from the optimization algorithm used. Unfortunately, verification of the appropriate regularity conditions has only been done for a limited number of simple GARCH models, see [63], [60], [55], [56] and [81]. In practice, it is generally assumed that the necessary regularity conditions are satisfied.

In GARCH models for which the distribution of ztis symmetric and the parame-ters of the conditional mean and variance equations are variation free, the information matrix of the log-likelihood is block diagonal. The implication of this is that the pa-rameters of the conditional mean equation can be estimated separately from those of the conditional variance equation without loss of asymptotic eﬃciency. This can greatly simplify estimation. An common model for which block diagonality of the information matrix fails is the GARCH-M model.

5.1 Numerical Accuracy of GARCH Estimates

GARCH estimation is widely available in a number of commercial software packages (e.g. EVIEWS, GAUSS, MATLAB, Ox, RATS, S-PLUS, TSP) and there are also a few free open source implementations. [41], [69], and [20] discussed numerical ac-curacy issues associated with maximizing the GARCH log-likelihood. They found that starting values, optimization algorithm choice, and use of analytic or numerical derivatives, and convergence criteria all influence the resulting numerical estimates of the GARCH parameters. [69] and [20] studied estimation of a GARCH(1,1) model from a variety of commercial statistical packages using the exchange rate data of [15] as a benchmark. They found that it is often diﬃcult to compare competing software since the exact construction of the GARCH likelihood is not always adequately de-scribed. In general, they found that use of analytic derivatives leads to more accurate estimation than procedures based on purely numerical evaluations.

In practice, the GARCH log-likelihood function is not always well behaved, es-pecially in complicated models with many parameters, and reaching a global max-imum of the log-likelihood function is not guaranteed using standard optimization

(12)

techniques. Also, the positive variance and stationarity constraints are not straight-forward to implement with common optimization software and are often ignored in practice. Poor choice of starting values can lead to an ill-behaved log-likelihood and cause convergence problems. Therefore, it is always a good idea to explore the surface of the log-likelihood by perturbing the starting values and re-estimating the GARCH parameters.

In many empirical applications of the GARCH(1,1) model, the estimate of a1 is close to zero and the estimate of b1 is close to unity. This situation is of some concern since the GARCH parameter b1 becomes unidentified if a1 = 0, and it is well known that the distribution of ML estimates can become ill-behaved in models with nearly unidentified parameters. [66] studied the accuracy of ML estimates of the GARCH parameters a0, a1 and b1 when a1 is close to zero. They found that the estimated standard error for b1 is spuriously small and that the t-statistics for testing hypotheses about the true value of b1 are severely size distorted. They also showed that the concentrated loglikelihood as a function of b1 exhibits multiple maxima. To guard against spurious inference they recommended comparing estimates from pure ARCH(p) models, which do not suﬀer from the identification problem, with estimates from the GARCH(1,1). If the volatility dynamics from these models are similar then the spurious inference problem is not likely to be present.

5.2 Quasi-Maximum Likelihood Estimation

Another practical issue associated with GARCH estimation concerns the correct choice of the error distribution. In particular, the assumption of conditional normality is not always appropriate. However, as shown by [86] and [16], even when normal-ity is inappropriately assumed, maximizing the Gaussian log-likelihood (14) results in quasi-maximum likelihood estimates (QMLEs) that are consistent and asymptot-ically normally distributed provided the conditional mean and variance functions of the GARCH model are correctly specified. In addition, [16] derived an asymptotic covariance matrix for the QMLEs that is robust to conditional non-normality. This matrix is estimated using

H(ˆθQM L)−1B(ˆθQM L)H(ˆθQM L)−1, (15)

where ˆθQM L denotes the QMLE of θ, and is often called the “sandwich” estima-tor. The coeﬃcient standard errors computed from the square roots of the diagonal elements of (15) are sometimes called “Bollerslev-Wooldridge” standard errors. Of course, the QMLEs will be less eﬃcient than the true MLEs based on the correct er-ror distribution. However, if the normality assumption is correct then the sandwich covariance is asymptotically equivalent to the inverse of the Hessian. As a result, it is good practice to routinely use the sandwich covariance for inference purposes.

[35] and [16] evaluated the accuracy of the quasi-maximum likelihood estimation of GARCH(1,1) models. They found that if the distribution of ztin (3) is symmetric, then QMLE is often close to the MLE. However, if zt has a skewed distribution then the QMLE can be quite diﬀerent from the MLE.

(13)

5.3 Model Selection

An important practical problem is the determination of the ARCH order p and the GARCH order q for a particular series. Since GARCH models can be treated as ARMA models for squared residuals, traditional model selection criteria such as the Akaike information criterion (AIC) and the Bayesian information criterion (BIC) can be used for selecting models. For daily returns, if attention is restricted to pure ARCH(p) models it is typically found that large values of p are selected by AIC and BIC. For GARCH(p, q) models, those with p, q ≤ 2 are typically selected by AIC and BIC. Low order GARCH(p,q) models are generally preferred to a high order ARCH(p) for reasons of parsimony and better numerical stability of estimation (high order GARCH(p, q) processes often have many local maxima and minima). For many applications, it is hard to beat the simple GARCH(1,1) model.

5.4 Evaluation of Estimated GARCH models

After a GARCH model has been fit to the data, the adequacy of the fit can be evaluated using a number of graphical and statistical diagnostics. If the GARCH model is correctly specified, then the estimated standardized residuals ˆt/ˆσt should behave like classical regression residuals; i.e., they should not display serial correla-tion, conditional heteroskedasticity or any type of nonlinear dependence. In addicorrela-tion, the distribution of the standardized residuals ˆt/ˆσt should match the specified error distribution used in the estimation.

Graphically, ARCH eﬀects reflected by serial correlation in ˆ2_t/ˆσ2_t can be uncovered by plotting its SACF. The modified Ljung-Box statistic (12) can be used to test the null of no autocorrelation up to a specific lag, and Engle’s LM statistic (13) can be used to test the null of no remaining ARCH eﬀects6. If it is assumed that the errors are Gaussian, then a plot of ˆt/ˆσtagainst time should have roughly ninety five percent of its values between ±2; a normal qq-plot of ˆt/ˆσt should look roughly linear7; and the JB statistic should not be too much larger than six.

5.5 Estimation of GARCH Models for Daily and Monthly Returns

Table 3 gives model selection criteria for a variety of GARCH(p, q) fitted to the daily returns on Microsoft and the S&P 500. For pure ARCH(p) models, an ARCH(5) is chosen by all criteria for both series. For GARCH(p, q) models, AIC picks a GARCH(2,1) for both series and BIC picks a GARCH(1,1) for both series8.

Table 4 gives QMLEs of the GARCH(1,1) model assuming normal errors for the Microsoft and S&P 500 daily returns. For both series, the estimates of a1 are around

6

These tests should be viewed as indicative, since the distribution of the tests are influenced by the estimation of the GARCH model. For valid LM tests, the partial derivatives of σ2t with respect

to the conditional volatility parameters should be added as additional regressors in the auxiliary regression (5) based on estimated residuals.

7_{If an error distribution other than the Gaussian is assumed, then the qq-plot should be}

con-structed using the quantiles of the assumed distribution.

8_{The low log-likelihood values for the GARCH(2,2) models indicate that a local maximum was}

(14)

(p, q) Asset AIC BIC Likelihood (1,0) MSFT -19977 -19958 9992 S&P 500 -27337 -27318 13671 (2,0) MSFT -20086 -20060 10047 S&P 500 -27584 -27558 13796 (3,0) MSFT -20175 -20143 10092 S&P 500 -27713 -27681 13861 (4,0) MSFT -20196 -20158 10104 S&P 500 -27883 -27845 13947 (5,0) MSFT -20211 -20166 10113 S&P 500 -27932 -27887 13973 (1,1) MSFT -20290 -20264 10149 S&P 500 -28134 -28109 14071 (1,2) MSFT -20290 -20258 10150 S&P 500 -28135 -28103 14072 (2,1) MSFT -20292 -20260 10151 S&P 500 -28140 -28108 14075 (2,2) MSFT -20288 -20249 10150 S&P 500 -27858 -27820 13935

Table 3: Model Selection Criteria for Estimated GARCH(p,q) Models.

0.09 and the estimates of b1 are around 0.9. Using both ML and QML standard er-rors, these estimates are statistically diﬀerent from zero. However, the QML standard errors are considerably larger than the ML standard errors. The estimated volatility persistence, a1 + b1, is very high for both series and implies half-lives of shocks to volatility to Microsoft and the S&P 500 of 15.5 days and 76 days, respectively. The unconditional standard deviation of returns, ¯σ = pa0/(1 − a1− b1), for Microsoft and the S&P 500 implied by the GARCH(1,1) models are 0.0253 and 0.0138, respec-tively, and are very close to the sample standard deviations of returns reported in Table 1.

Estimates of GARCH-M(1,1) models for Microsoft and the S&P 500, where σt is added as a regressor to the mean equation, show small positive coeﬃcients on σt and essentially the same estimates for the remaining parameters as the GARCH(1,1) models.

Figure 4 shows the first diﬀerences of returns along with the fitted one-step-ahead volatilities, ˆσt, computed from the GARCH(1,1) and ARCH(5) models. The ARCH(5) and GARCH(1,1) models do a good job of capturing the observed volatil-ity clustering in returns. The GARCH(1,1) volatilities, however, are smoother and display more persistence than the ARCH(5) volatilities.

Graphical diagnostics from the fitted GARCH(1,1) models are illustrated in Fig-ure 5. The SACF of ˆ2_t/ˆσ2_t does not indicate any significant autocorrelation, but the normal qq-plot of ˆt/ˆσt shows strong departures from normality. The last three columns of Table 4 give the standard statistical diagnostics of the fitted GARCH

(15)

GARCH Parameters Residual Diagnostics Asset a0 a1 b1 MQ(12) LM(12) JB Daily Returns MSFT 2.80e−5 (3.42e−6) [1.10e−5] 0.0904 (0.0059) [0.0245] 0.8658 (0.0102) [0.0371] 4.787 (0.965) 4.764 (0.965) 1751 (0.000) S&P 500 1.72e−6 (2.00e−7) [1.25e−6] 0.0919 (0.0029) [0.0041] 0.8990 (0.0046) [0.0436] 5.154 (0.953) 5.082 (0.955) 5067 (0.000) Monthly Returns MSFT 0.0006 [0.0006] 0.1004 [0.0614] 0.8525 [0.0869] 8.649 (0.733) 6.643 (0.880) 3.587 (0.167) S&P 500 3.7e−5 [9.6e−5_] 0.0675 [0.0248] 0.9179 [0.0490] 3.594 (0.000) 3.660 (0.988) 72.05 (0.000) Notes: QML standard errors are in brackets.

Table 4: Estimates of GARCH(1,1) Model with Diagnostics.

models. Consistent with the SACF, the MQ statistic and Engle’s LM statistic do not indicate remaining ARCH eﬀects. Furthermore, the extremely large JB statistic confirms nonnormality.

Table 4 also shows estimates of GARCH(1,1) models fit to the monthly returns. The GARCH(1,1) models fit to the monthly returns are remarkable similar to those fit to the daily returns. There are, however, some important diﬀerences. The monthly standardized residuals are much closer to the normal distribution, especially for Mi-crosoft. Also, the GARCH estimates for the S&P 500 reflect some of the character-istics of spurious GARCH eﬀects as discussed in [66]. In particular, the estimate of a1 is close to zero, and has a relatively large QML standard error, and the estimate of b1 is close to one and has a very small standard error.

6 GARCH Model Extensions

In many cases, the basic GARCH conditional variance equation (6) under normality provides a reasonably good model for analyzing financial time series and estimating conditional volatility. However, in some cases there are aspects of the model which can be improved so that it can better capture the characteristics and dynamics of a particular time series. For example, the empirical analysis in the previous Section showed that for the daily returns on Microsoft and the S&P 500, the normality assumption may not be appropriate and there is evidence of nonlinear behavior in the standardized residuals from the fitted GARCH(1,1) model. This Section discusses several extensions to the basic GARCH model that make GARCH modeling more flexible.

(16)

1986 1990 1994 1998 2002 -0 .3 0 0 .1 0

Microsoft Daily Returns

1986 1990 1994 1998 2002 -0 .2 0 0 .0 5

S&P 500 Daily Returns

1986 1990 1994 1998 2002

0.

02

0.

10

Conditional Volatility from GARCH(1,1)

1986 1990 1994 1998 2002

0.

01

0.

06

Conditional Volatility from GARCH(1,1)

1986 1990 1994 1998 2002

0.

02

0.

10

Conditional Volatility from ARCH(5)

1986 1990 1994 1998 2002

0.

01

0.

09

Conditional Volatility from ARCH(5)

Figure 4: One-step ahead volatilities from fitted ARCH(5) and GARCH(1,1) models for Microsoft and S&P 500 index.

6.1 Asymmetric Leverage Eﬀects and News Impact

In the basic GARCH model (6), since only squared residuals 2_t−ienter the conditional variance equation, the signs of the residuals or shocks have no effect on conditional volatility. However, a stylized fact of financial volatility is that bad news (negative shocks) tends to have a larger impact on volatility than good news (positive shocks). That is, volatility tends to be higher in a falling market than in a rising market. [10] attributed this effect to the fact that bad news tends to drive down the stock price, thus increasing the leverage (i.e., the debt-equity ratio) of the stock and causing the stock to be more volatile. Based on this conjecture, the asymmetric news impact on volatility is commonly referred to as the leverage effect.

6.1.1 Testing for Asymmetric Eﬀects on Conditional Volatility

A simple diagnostic for uncovering possible asymmetric leverage eﬀects is the sample correlation between r_t2 and r_t−1. A negative value of this correlation provides some evidence for potential leverage eﬀects. Other simple diagnostics, suggested by [39],

(17)

Lag ACF 0 10 20 30 0.0 0 .4 0.8

Microsoft Squared Residuals

Lag ACF 0 10 20 30 0.0 0 .4 0.8

S&P 500 Squared Residuals

Quantiles of Standard Normal

Microsoft S tandardized Residuals -2 0 2 -0.3 -0.1 0.1

Quantiles of Standard Normal

S&P 500 Standardized Residuals

-2 0 2

-0.20

-0.05

0.10

Figure 5: Graphical residual diagnostics from fitted GARCH(1,1) models to Microsoft and S&P 500 returns.

result from estimating the following test regression ˆ

ε2_t = β₀+ β₁wˆ_t−1+ ξt,

where ˆεtis the estimated residual from the conditional mean equation (10), and ˆwt−1 is a variable constructed from ˆε_t−1 and the sign of ˆε_t−1. A significant value of β₁ indicates evidence for asymmetric eﬀects on conditional volatility. Let S_t−1− denote a dummy variable equal to unity when ˆε_t−1 is negative, and zero otherwise. Engle and Ng consider three tests for asymmetry. Setting ˆw_t−1 = S_t−1− gives the Sign Bias test; setting ˆw_t−1 = S_t−1− ˆε_t−1 gives the Negative Size Bias test; and setting

ˆ

w_t−1= S_t−1+ ˆε_t−1 gives the Positive Size Bias test.

6.1.2 Asymmetric GARCH Models

The leverage eﬀect can be incorporated into a GARCH model in several ways. [71] proposed the following exponential GARCH (EGARCH) model to allow for leverage eﬀects ht= a0+ p X i=1 ai| t−i| + γi t−i σ_t−i + q X j=1 bjht−j, (16)

(18)

where ht = log σ2t. Note that when t−i is positive or there is “good news”, the total effect of _t−i is (1 + γ_i_)| _t−i_{|; in contrast, when} _t−iis negative or there is “bad news”, the total effect of _t−i_{is (1 − γ}_i_)| _t−i_{|. Bad news can have a larger impact on} volatility, and the value of γ_i would be expected to be negative. An advantage of the EGARCH model over the basic GARCH model is that the conditional variance σ2_t is guaranteed to be positive regardless of the values of the coefficients in (16), because the logarithm of σ2_t instead of σ2_t itself is modeled. Also, the EGARCH is covariance stationary providedPq_j=1bj < 1.

Another GARCH variant that is capable of modeling leverage eﬀects is the

thresh-old GARCH (TGARCH) model,9 which has the following form

σ2_t = a0+ p X i=1 ai 2_t−i+ p X i=1 γ_iS_t−i 2_t−i+ q X j=1 bjσ2_t−j, (17) where S_t−i= ½ 1 if _t−i < 0 0 if _t−i _{≥ 0} .

That is, depending on whether _t−i is above or below the threshold value of zero, 2

t−i has different effects on the conditional variance σ2t: when t−i is positive, the total effects are given by ai 2_t−i; when t−i is negative, the total effects are given by (ai + γi) 2t−i. So one would expect γi to be positive for bad news to have larger impacts.

[31] extended the basic GARCH model to allow for leverage eﬀects. Their power GARCH (PGARCH(p, d, q)) model has the form

σd_t = a0+ p X i=1 ai(| t−i| + γi t−i)d+ q X j=1 bjσd_t−j, (18)

where d is a positive exponent, and γ_i denotes the coefficient of leverage effects. When d = 2, (18) reduces to the basic GARCH model with leverage effects. When d = 1, the PGARCH model is specified in terms of σtwhich tends to be less sensitive to outliers than when d = 2. The exponent d may also be estimated as an additional parameter which increases the flexibility of the model. [31] showed that the PGARCH model also includes many other GARCH variants as special cases.

Many other asymmetric GARCH models have been proposed based on smooth transition and Markov switching models. See [44] and [83] for excellent surveys of these models.

6.1.3 News Impact Curve

The GARCH, EGARCH, TGARCH and PGARCH models are all capable of modeling leverage eﬀects. To clearly see the impact of leverage eﬀects in these models, [75], and [39] advocated the use of the so-called news impact curve. They defined the news

9_{The original TGARCH model proposed by [87] models σ}

t instead of σ2t. The TGARCH model

(19)

GARCH(1, 1) σ2_t = A + a1(| t−1| + γ1 t−1)2 A = a0+ b1σ¯2 ¯ σ2= a0/[1 − a1(1 + γ21) − b1] TGARCH(1, 1) σ2_t = A + (a1+ γ1St−1) 2t−1 A = a0+ b1σ¯2 ¯ σ2 = a0/[1 − (a1+ γ1/2) − b1] PGARCH(1, 1, 1) σ2_t = A + 2√Aa1(| t−1| + γ1 t−1) +a2₁_(| _t−1_{| + γ}_{1 t−1})2, A = (a0+ b1σ)¯ 2 ¯ σ2 = a2₀_{/[1 − a}1/ p 2/π − b1]2 EGARCH(1, 1) σ2_t _{= A exp{a}1(| t−1| + γ1 t−1)/¯σ} A = ¯σ2b1_exp{a 0} ¯ σ2 _{= exp{(a}0+ a1 p 2/π)/(1 − b1)}

Table 5: News impact curves for asymmetric GARCH processes. ¯σ2 _{denotes the}

unconditional variance.

Asset corr(r2

t, rt−1) Sign Bias Negative Size Bias Positive Size Bias

Microsoft _−0.0315 −0.4417 (0.6587) −6.816 (0.000) 3.174 (0.001) S&P 500 _−0.098 2.457 (0.014) −11.185 (0.000) 1.356 (0.175) Notes: p-values are in parentheses.

Table 6: Tests for Asymmetric GARCH Eﬀects.

impact curve as the functional relationship between conditional variance at time t and the shock term (error term) at time t −1, holding constant the information dated t − 2 and earlier, and with all lagged conditional variance evaluated at the level of the unconditional variance. Table 5 summarizes the expressions defining the news impact curves, which include expressions for the unconditional variances, for the asymmetric GARCH(1,1) models.

6.1.4 Asymmetric GARCH Models for Daily Returns

Table 6 shows diagnostics and tests for asymmetric eﬀects in the daily returns on Microsoft and the S&P 500. The correlation between r2_t and r_t−1 is negative and fairly small for both series indicating weak evidence for asymmetry. However, the Size Bias tests clearly indicate asymmetric eﬀects with the Negative Size Bias test giving the most significant results.

Table 7 gives the estimation results for EGARCH(1,1), TGARCH(1,1) and PGARCH(1,d,1) models for d = 1, 2. All of the asymmetric models show statistically significant

(20)

lever-Model a0 a1 b1 γ1 BIC Microsoft EGARCH −0.7273 [0.4064] 0.2144 [0.0594] 0.9247 [0.0489] −0.2417 [0.0758] -20265 TGARCH 3.01e−5 [1.02e−5] 0.0564 [0.0141] 0.8581 [0.0342] 0.0771 [0.0306] -20291 PGARCH 2 2.87e−5 [9.27e−6] 0.0853 [0.0206] 0.8672 [0.0313] −0.2164 [0.0579] -20290 PGARCH 1 0.0010 [0.0006] 0.0921 [0.0236] 0.8876 [0.0401] −0.2397 [0.0813] -20268 S&P 500 EGARCH −0.2602 [0.3699] 0.0720 [0.0397] 0.9781 [0.0389] −0.3985 [0.4607] -28051 TGARCH 1.7e−6 [7.93e−7] 0.0157 [0.0081] 0.9169 [0.0239] 0.1056 0.0357 -28200 PGARCH 2 1.78e−6 [8.74e−7_] 0.0578 [0.0165] 0.9138 [0.0253] −0.4783 [0.0910] -28202 PGARCH 1 0.0002 [2.56e−6] 0.0723 [0.0003] 0.9251 [8.26e−6] −0.7290 [0.0020] -28253

Notes: QML standard errors are in brackets.

Table 7: Estimates of Asymmetric GARCH(1,1) Models.

age eﬀects, and lower BIC values than the symmetric GARCH models. Model selec-tion criteria indicate that the TGARCH(1,1) is the best fitting model for Microsoft, and the PGARCH(1,1,1) is the best fitting model for the S&P 500.

Figure 6 shows the estimated news impact curves based on these models. In this plot, the range of t is determined by the residuals from the fitted models. The TGARCH and PGARCH(1,2,1) models have very similar NICs and show much larger responses to negative shocks than to positive shocks. Since the EGARCH(1,1) and PGARCH(1,1,1) models are more robust to extreme shocks, impacts of small (large) shocks for these model are larger (smaller) compared to those from the other models and the leverage eﬀect is less pronounced.

6.2 Non-Gaussian Error Distributions

In all the examples illustrated so far, a normal error distribution has been exclusively used. However, given the well known fat tails in financial time series, it may be more appropriate to use a distribution which has fatter tails than the normal distribution. The most common fat-tailed error distributions for fitting GARCH models are: the Student’s t distribution; the double exponential distribution; and the generalized error distribution.

[13] proposed fitting a GARCH model with a Student’s t distribution for the standardized residual. If a random variable ut has a Student’s t distribution with ν degrees of freedom and a scale parameter st, the probability density function (pdf)

(21)

-0.2 -0.1 0.0 0.1 0.2

0.001

0.004

Asymmetric GARCH(1,1) Models for Microsoft

TGARCH PGARCH 1 PGARCH 2 EGARCH -0.2 -0.1 0.0 0.1 0.2 0.0 0 .002 0.005

Asymmetric GARCH(1,1) Models for S&P 500

TGARCH PGARCH 1 PGARCH 2 EGARCH

Figure 6: News impact curves from fitted asymmetric GARCH(1,1) models for Mi-crosoft and S&P 500 index.

of ut is given by f (ut) = Γ[(ν + 1)/2] (πν)1/2_Γ(ν/2) s−1/2_t [1 + u2_t/(stν)](ν+1)/2 , where Γ(·) is the gamma function. The variance of utis given by

var(ut) = stν

ν − 2, v > 2.

If the error term t in a GARCH model follows a Student’s t distribution with ν degrees of freedom and var_t−1( t) = σ2t, the scale parameter st should be chosen to be

st=

σ2_t_{(ν − 2)}

ν .

Thus the log-likelihood function of a GARCH model with Student’s t distributed errors can be easily constructed based on the above pdf.

[71] proposed to use the generalized error distribution (GED) to capture the fat tails usually observed in the distribution of financial time series. If a random variable

(22)

uthas a GED with mean zero and unit variance, the pdf of ut is given by f (ut) = ν exp[−(1/2)|ut/λ| ν_] λ · 2(ν+1)/ν_Γ(1/ν) , where λ = " 2−2/ν_Γ(1/ν) Γ(3/ν) #1/2 ,

and ν is a positive parameter governing the thickness of the tail behavior of the distribution. When ν = 2 the above pdf reduces to the standard normal pdf; when ν < 2, the density has thicker tails than the normal density; when ν > 2, the density has thinner tails than the normal density.

When the tail thickness parameter ν = 1, the pdf of GED reduces to the pdf of double exponential distribution:

f (ut) = 1 √

2e

−√2|ut|_.

Based on the above pdf, the log-likelihood function of GARCH models with GED or double exponential distributed errors can be easily constructed. See to [48] for an example.

Several other non-Gaussian error distribution have been proposed. [42] introduced the asymmetric Student’s t distribution to capture both skewness and excess kurtosis in the standardized residuals. [85] proposed the normal inverse Gaussian distribution. [45] provided a very flexible seminonparametric innovation distribution based on a Hermite expansion of a Gaussian density. Their expansion is capable of capturing general shape departures from Gaussian behavior in the standardized residuals of the GARCH model.

6.2.1 Non-Gaussian GARCH Models for Daily Returns

Table 8 gives estimates of the GARCH(1,1) and best fitting asymmetric GARCH(1,1) models using Student’s t innovations for the Microsoft and S&P 500 returns. Model selection criteria indicated that models using the Student’s t distribution fit better than the models using the GED distribution. The estimated degrees of freedom for Microsoft is about 7, and for the S&P 500 about 6. The use of t-distributed errors clearly improves the fit of the GARCH(1,1) models. Indeed, the BIC values are even lower than the values for the asymmetric GARCH(1,1) models based on Gaussian errors (see Table 7). Overall, the asymmetric GARCH(1,1) models with t-distributed errors are the best fitting models. The qq-plots in Figure 7 shows that the Student’s t distribution adequately captures the fat-tailed behavior in the standardized residuals for Microsoft but not for the S&P 500 index.

7 Long Memory GARCH Models

If returns follow a GARCH(p, q) model, then the autocorrelations of the squared and absolute returns should decay exponentially. However, the SACF of r_t2 _{and |r}t| for

(23)

Model a0 a1 b1 γ1 v BIC Microsoft GARCH 3.39e−5 [1.52e−5] 0.0939 [0.0241] 0.8506 [0.0468] 6.856 [0.7121 -20504 TGARCH 3.44e−5 [1.20e−5] 0.0613 [0.0143] 0.8454 [0.0380] 0.0769 [0.0241] 7.070 [0.7023] -20511 S&P 500 GARCH 5.41e−7 [2.15e−7] 0.0540 [0.0095] 0.0943 [0.0097] 5.677 [0.5571] -28463 PGARCH d = 1 0.0001 [0.0002] 0.0624 [0.0459] 0.9408 [0.0564] −0.7035 [0.0793] 6.214 [0.6369] -28540

Table 8: Estimates of Non Gaussian GARCH(1,1) Models.

Microsoft and the S&P 500 in Figure 2 appear to decay much more slowly. This is evidence of so-called long memory behavior. Formally, a stationary process has long memory or long range dependence if its autocorrelation function behaves like

ρ(k) → Cρk2d−1 as k → ∞,

where Cρ is a positive constant, and d is a real number between 0 and 1₂. Thus the autocorrelation function of a long memory process decays slowly at a hyperbolic rate. In fact, it decays so slowly that the autocorrelations are not summable:

∞ X k=−∞

ρ(k) = ∞.

It is important to note that the scaling property of the autocorrelation function does not dictate the general behavior of the autocorrelation function. Instead, it only specifies the asymptotic behavior when k → ∞ . What this means is that for a long memory process, it is not necessary for the autocorrelation to remain significant at large lags as long as the autocorrelation function decays slowly. [8] gives an example to illustrate this property.

The following subSections describe testing for long memory and GARCH models that can capture long memory behavior in volatility. Explicit long memory GARCH models are discussed in [83].

7.1 Testing for Long Memory

One of the best-known and easiest to use tests for long memory or long range de-pendence is the rescaled range (R/S) statistic, which was originally proposed by [53], and later refined by [67] and his coauthors. The R/S statistic is the range of partial sums of deviations of a time series from its mean, rescaled by its standard deviation. Specifically, consider a time series yt, for t = 1, · · · , T . The R/S statistic is defined

(24)

-10 -5 0 5 -5 0 5 Microsoft -5 0 5 S&P 500

Figure 7: QQ-plots of Standardized Residuals from Asymmetric GARCH(1,1) models with Student’s t errors.

as QT = 1 sT ⎡ ⎣ max 1≤k≤T k X j=1 (yj − ¯y) − min 1≤k≤T k X j=1 (yj− ¯y) ⎤ ⎦ , (19)

where ¯y = 1/TPT_i=1yiand sT = q

1/TPT_i=1(yi− ¯y)2. If ytis iid with finite variance, then

1 √

TQT ⇒ V,

where ⇒ denotes weak convergence and V is the range of a Brownian bridge on the unit interval. [62] gives selected quantiles of V .

[62] pointed out that the R/S statistic is not robust to short range dependence. In particular, if yt is autocorrelated (has short memory) then the limiting distribution of QT/

√

T is V scaled by the square root of the long run variance of yt. To allow for short range dependence in yt, [62] modified the R/S statistic as follows

˜ QT = 1 ˆ σT(q) ⎡ ⎣ max 1≤k≤T k X j=1 (yj− ¯y) − min 1≤k≤T k X j=1 (yj − ¯y) ⎤ ⎦ , (20)

(25)

where the sample standard deviation is replaced by the square root of the Newey-West ([73]) estimate of the long run variance with bandwidth q.10 [62] showed that if there is short memory but no long memory in yt, ˜QT also converges to V , the range of a Brownian bridge. [18] found that (20) is eﬀective for detecting long memory behavior in asset return volatility.

7.2 Two Component GARCH Model

In the covariance stationary GARCH model the conditional volatility will always mean revert to its long run level unconditional value. Recall the mean reverting form of the basic GARCH(1, 1) model in (11). In many empirical applications, the estimated mean reverting rate ˆa1 + ˆb1 is often very close to 1. For example, the estimated value of a1+ b1 from the GARCH(1,1) model for the S&P 500 index is 0.99 and the half life of a volatility shock implied by this mean reverting rate is ln(0.5)/ ln(0.956) = 76.5 days. So the fitted GARCH(1,1) model implies that the conditional volatility is very persistent.

[37] suggested that the high persistence and long memory in volatility may be due to a time-varying long run volatility level. In particular, they suggested decomposing conditional variance into two components

σ2_t = qt+ st, (21)

where qt is a highly persistent long run component, and st is a transitory short run component. Long memory behavior can often be well approximated by a sum of two such components. A general form of the two components model that is based on a modified version of the PGARCH(1, d, 1) is

σd_t = qd_t + sd_t, (22)

qd_t = α1| t−1|d+ β1qt−1d , (23)

sd_t = a0+ α2| t−1|d+ β2sdt−1. (24)

Here, the long run component qt follows a highly persistent PGARCH(1, d, 1) model and the transitory component stfollows another PGARCH(1, d, 1) model. For the two components to be separately identified the parameters should satisfy 1 < (α1+ β1) < (α2+ β2). It can be shown that the reduced form of the two components model is

σd_t = a0+ (α1+ α2)| t−1|d− (α1β2+ α2β1)| t−2|d

+ (β₁+ β₂)σd_t−1_{− β}₁β₂σd_t−2, which is in the form of a constrained PGARCH(2, d, 2) model. However, the two components model is not fully equivalent to the PGARCH(2, d, 2) model because not all PGARCH(2, d, 2) models have the component structure. Since the two compo-nents model is a constrained version of the PGARCH(2, d, 2) model, the estimation of a two components model is often numerically more stable than the estimation of an unconstrained PGARCH(2, d, 2) model.

1 0

(26)

Asset Q˜T r_t2 _|rt|

Microsoft 2.3916 3.4557

S&P 500 2.3982 5.1232

Table 9: Modified R/S Tests for Long Memory.

a0 α1 β1 α2 β2 v BIC Microsoft 2.86e−6 [1.65e−6] 0.0182 [0.0102] 0.9494 [0.0188] 0.0985 [0.0344] 0.7025 [0.2017] -20262 1.75e−6 5.11e−7 0.0121 [0.0039] 0.9624 [0.0098] 0.0963 [0.0172] 0.7416 [0.0526] 6.924 [0.6975] -20501 S&P 500 3.2e−8 [1.14e−8] 0.0059 [0.0013] 0.9848 [0.0000] 0.1014 [0.0221] 0.8076 [0.0001] −28113 1.06e−8 [1.26e−8] 0.0055 [0.0060] 0.9846 [0.0106] 0.0599 [0.0109] 0.8987 [0.0375] 5.787 [0.5329] −28457

Table 10: Estimates of Two Component GARCH(1,1) Models.

7.3 Integrated GARCH Model

The high persistence often observed in fitted GARCH(1,1) models suggests that volatility might be nonstationary implying that a1 + b1 = 1, in which case the GARCH(1,1) model becomes the integrated GARCH(1,1) or IGARCH(1,1) model. In the IGARCH(1,1) model the unconditional variance is not finite and so the model does not exhibit volatility mean reversion. However, it can be shown that the model is strictly stationary provided E[ln(a1zt2+b1)] < 0. If the IGARCH(1,1) model is strictly stationary then the parameters of the model can still be consistently estimated by MLE.

[27] argued against the IGARCH specification for modeling highly persistent volatility processes for two reasons. First, they argue that the observed convergence toward normality of aggregated returns is inconsistent with the IGARCH model. Sec-ond, they argue that observed IGARCH behavior may result from misspecification of the conditional variance function. For example, a two components structure or ignored structural breaks in the unconditional variance ([58] and [70]) can result in IGARCH behavior.

7.4 Long Memory GARCH Models for Daily Returns

Table 9 gives Lo’s modified R/S statistic (20) applied to r2

t and |rt| for Microsoft and the S&P 500. The 1% right tailed critical value for the test is 2.098 ([62] Table 5.2) and so the modified R/S statistics are significant at the 1% level for both series

(27)

providing evidence for long memory behavior in volatility.

Table 10 shows estimates of the two component GARCH(1,1) with d = 2, using Gaussian and Student’s t errors, for the daily returns on Microsoft and the S&P 500. Notice that the BIC values are smaller than the BIC values for the unconstrained GARCH(2,2) models given in Table 3, which confirms the better numerical stability of the two component model. For both series, the two components are present and satisfy 1 < (α1+ β1) < (α2+ β2). For Microsoft, the half-lives of the two components from the Gaussian (Student’s t) models are 21 (26.8) days and 3.1 (3.9) days, re-spectively. For the S&P 500, the half-lives of the two components from the Gaussian (Student’s t) models are 75 (69.9) days and 7.3 (16.4) days, respectively.

8 GARCH Model Prediction

An important task of modeling conditional volatility is to generate accurate forecasts for both the future value of a financial time series as well as its conditional volatility. Volatility forecasts are used for risk management, option pricing, portfolio allocation, trading strategies and model evaluation. Since the conditional mean of the general GARCH model (10) assumes a traditional ARMA form, forecasts of future values of the underlying time series can be obtained following the traditional approach for ARMA prediction. However, by also allowing for a time varying conditional variance, GARCH models can generate accurate forecasts of future volatility, especially over short horizons. This Section illustrates how to forecast volatility using GARCH models.

8.1 GARCH and Forecasts for the Conditional Mean

Suppose one is interested in forecasting future values of yT in the standard GARCH model described by (2), (3) and (6). For simplicity assume that ET[yT +1] = c. Then the minimum mean squared error h− step ahead forecast of yT +h is just c, which does not depend on the GARCH parameters, and the corresponding forecast error is

T +h = yT +h− ET[yT +h]. The conditional variance of this forecast error is then

varT( T +h) = ET[σ2T +h],

which does depend on the GARCH parameters. Therefore, in order to produce

confidence bands for the h−step ahead forecast the h−step ahead volatility forecast ET[σ2T +h] is needed.

8.2 Forecasts from the GARCH(1,1) Model

For simplicity, consider the basic GARCH(1, 1) model (7) where t= ztσt such that zt∼ iid (0, 1) and has a symmetric distribution. Assume the model is to be estimated over the time period t = 1, 2, · · · , T . The optimal, in terms of mean-squared error,

(28)

forecast of σ2

T +kgiven information at time T is ET[σ2T +k] and can be computed using a simple recursion. For k = 1,

ET[σ2T +1] = a0+ a1ET[ 2T] + b1ET[σ2T] (25) = a0+ a1 2T + b1σ2T,

where it assumed that 2_T and σ2_T are known11. Similarly, for k = 2 ET[σ2T +2] = a0+ a1ET[ 2T +1] + b1ET[σ2T +1]

= a0+ (a1+ b1)ET[σ2T +1].

since ET[ 2_{T +1}] = ET[z_{T +1}2 σ2_{T +1}] = ET[σ2_{T +1}]. In general, for k ≥ 2 ET[σ2T +k] = a0+ (a1+ b1)ET[σ2_{T +k−1}] = a0 k−1 X i=0 (a1+ b1)i+ (a1+ b1)k−1(a1 2T + b1σ2T). (26)

An alternative representation of the forecasting equation (26) starts with the mean-adjusted form

σ2_{T +1}_{− ¯}σ2= a1( 2T − ¯σ2) + b1(σ2T − ¯σ2),

where ¯σ2 = a0/(1 − a1− b1) is the unconditional variance. Then by recursive substi-tution

ET[σ2T +k] − ¯σ2 = (a1+ b1)k−1(E[σ2T +1] − ¯σ2). (27) Notice that as k → ∞, the volatility forecast in (26) approaches ¯σ2 if the GARCH process is covariance stationary and the speed at which the forecasts approaches ¯σ2 is captured by a1+ b1.

The forecasting algorithm (26) produces forecasts for the conditional variance σ2_{T +k}. The forecast for the conditional volatility, σT +k, is usually defined as the square root of the forecast for σ2_{T +k}.

The GARCH(1,1) forecasting algorithm (25) is closely related to an exponentially weighted moving average (EWMA) of past values of 2_t. This type of forecast is commonly used by RiskMetrics ([54]). The EWMA forecast of σ2_{T +1} has the form

σ2_{T +1,EW MA}_{= (1 − λ)} ∞ X s=0

λs 2_t−s (28)

for λ ∈ (0, 1). In (28), the weights sum to one, the first weight is 1−λ, and the remain-ing weights decline exponentially. To relate the EWMA forecast to the GARCH(1,1) formula (25), (28) may be re-expressed as

σ2_{T +1,EW MA}_{= (1 − λ)} 2_T + λσ2_{T,EW M A}= 2_T + λ(σ2_{T,EW M A}₋ 2_T),

1 1

In practice, a0, a1, b1, Tand σ2T are the fitted values computed from the estimated GARCH(1,1)

(29)

which is of the form (25) with a0 = 0, a1 = 1 − λ and b1 = λ. Therefore, the EWMA forecast is equivalent to the forecast from a restricted IGARCH(1,1) model. It follows that for any h > 0, σ2_{T +h,EW M A} = σ2_{T,EW M A}. As a result, unlike the GARCH(1,1) forecast, the EWMA forecast does not exhibit mean reversion to a long-run unconditional variance.

8.3 Forecasts from Asymmetric GARCH(1,1) Models

To illustrate the asymmetric eﬀects of leverage on forecasting, consider the TGARCH(1,1) model (17) at time T

σ2_T = a0+ a1 2_{T −1}+ γ1ST −1 2T −1+ b1σ2_{T −1}.

Assume that thas a symmetric distribution about zero. The forecast for T + 1 based on information at time T is

ET[σ2T +1] = a0+ a1 2T + γ1ST 2T + b1σ2T,

where it assumed that 2_T, ST and σ2T are known. Hence, the TGARCH(1,1) forecast for T + 1 will be diﬀerent than the GARCH(1,1) forecast if ST = 1 ( T < 0). The forecast at T + 2 is ET[σ2T +2] = a0+ a1ET[ 2T +1] + γ1ET[ST +1 2T +1] + b1ET[σ2T +1] = a0+ ³γ₁ 2 + a1+ b1 ´ ET[σ2T +1],

which follows since ET[ST +1 2_{T +1}] = ET[ST +1]ET[ 2_{T +1}] = 1₂ET[σ2_{T +1}]. Notice that the asymmetric impact of leverage is present even if ST = 0. By recursive substitution for the forecast at T + h is

ET[σ2T +h] = a0+ ³γ₁

2 + a1+ b1 ´h−1

ET[σ2T +1], (29)

which is similar to the GARCH(1,1) forecast (26). The mean reverting form (29) is ET[σ2T +h] − ¯σ2= ³γ₁ 2 + a1+ b1 ´h−1¡ ET[σ2T +h] − ¯σ2 ¢ where ¯σ2= a0/(1 − γ₂1 − a1− b1) is the long run variance.

Forecasting algorithms for σd

T +h in the PGARCH(1, d, 1) and for ln σ2T +h in the EGARCH(1,1) follow in a similar manner and the reader is referred to [31], and [71] for further details.

8.4 Simulation-Based Forecasts

The forecasted volatility can be used together with forecasted series values to generate confidence intervals of the forecasted series values. In many cases, the forecasted volatility is of central interest, and confidence intervals for the forecasted volatility can be obtained as well. However, analytic formulas for confidence intervals of forecasted

(30)

volatility are only known for some special cases (see [6]). In models for which analytic formulas for confidence intervals are not known, a simulation-based method can be used to obtain confidence intervals for forecasted volatility from any GARCH that can be simulated. To obtain volatility forecasts from a fitted GARCH model, simply simulate σ2_{T +k} from the last observation of the fitted model. This process can be repeated many times to obtain an “ensemble” of volatility forecasts. The point forecast of σ2_{T +k} may then be computed by averaging over the simulations, and a 95% confidence interval may be computed using the 2.5% and 97.5% quantiles of the simulation distribution, respectively.

8.5 Forecasting the Volatility of Multiperiod Returns

In many situations, a GARCH model is fit to daily continuously compounded returns rt = ln(Pt) − ln(Pt−1), where Pt denotes the closing price on day t. The resulting GARCH forecasts are for daily volatility at diﬀerent horizons. For risk management and option pricing with stochastic volatility, volatility forecasts are needed for multi-period returns. With continuously compounded returns, the h−day return between days T and T + h is simply the sum of h single day returns

rT +h(h) = h X j=1

rT +j.

Assuming returns are uncorrelated, the conditional variance of the h−period return is then varT(rT +h(h)) = σ2T(h) = h X j=1 varT(rT +j) = ET[σ2T +1] + · · · + ET[σ2T +h]. (30)

If returns have constant variance ¯σ2, then σ2_T(h) = h¯σ2 and σT(h) = √

h¯σ. This is known as the “square root of time” rule as the h−day volatility scales with√h. In this case, the h−day variance per day, σ2T(h)/h, is constant. If returns are described by a GARCH model then the square root of time rule does not necessarily apply. To see this, suppose returns follow a GARCH(1,1) model. Plugging the GARCH(1,1) model forecasts (27) for ET[σ2T +1], . . . , ET[σ2T +h] into (30) gives

σ2_T(h) = h¯σ2+ (E[σ2_{T +1}_{] − ¯}σ2) ∙

1 − (a1+ b1)h 1 − (a1+ b1)

¸

For the GARCH(1,1) process the square root of time rule only holds if E[σ2_{T +1}] = ¯σ2.

Whether σ2

T(h) is larger or smaller than h¯σ2 depends on whether E[σ2T +1] is larger or smaller than ¯σ2.

8.6 Evaluating Volatility Predictions

GARCH models are often judged by their out-of-sample forecasting ability, see [22] for an overview. This forecasting ability can be measured using traditional forecast

(31)

error metrics as well as with specific economic considerations such as value-at-risk violations, option pricing accuracy, or portfolio performance. Out-of-sample forecasts for use in model comparison are typically computed using one of two methods. The first method produces recursive forecasts. An initial sample using data from t = 1, . . . , T is used to estimate the models, and h−step ahead out-of-sample forecasts are produced starting at time T. Then the sample is increased by one, the models are re-estimated, and h−step ahead forecasts are produced starting at T +1. This process is repeated until no more h−step ahead forecasts can be computed. The second method produces rolling forecasts. An initial sample using data from t = 1, . . . , T is used to determine a window width T, to estimate the models, and to form h−step ahead out-of-sample forecasts starting at time T. Then the window is moved ahead one time period, the models are re-estimated using data from t = 2, . . . , T + 1, and h−step ahead out-of-sample forecasts are produced starting at time T + 1. This process is repeated until no more h−step ahead forecasts can be computed.

8.6.1 Traditional Forecast Evaluation Statistics

Let Ei,T[σ2_{T +h}] denote the h−step ahead forecast of σ2_{T +h} at time T from GARCH model i using either recursive or rolling methods. Define the corresponding forecast error as e_{i,T +h|T} = Ei,T[σ2T +h] − σ2T +h. Common forecast evaluation statistics based on N out-of-sample forecasts from T = T + 1, . . . , T + N are

MSEi = 1 N T +N_X j=T +1 e2_i,j+h|j, MAEi = 1 N T +N_X j=T +1 ¯ ¯e_i,j+h|j¯¯ , MAPEi = 1 N T +N_X j=T +1 ¯ ¯e_i,j+h|j¯¯ σj+h .

The model which produces the smallest values of the forecast evaluation statistics is judged to be the best model. Of course, the forecast evaluation statistics are random variables and a formal statistical procedure should be used to determine if one model exhibits superior predictive performance.

[28] proposed a simple procedure to test the null hypothesis that one model has superior predictive performance over another model based on traditional forecast evaluation statistics. Let {e1,j+h|j}T +NT +1 , and {e2,j+h|j}T +NT +1 denote forecast errors from two diﬀerent GARCH models. The accuracy of each forecast is measured by a particular loss function L(e_{i,T +h|T}), i = 1, 2. Common choices are the squared error loss function L(e_{i,T +h|T}) = ¡e_{i,T +h|T}¢2 and the absolute error loss function L(e_{i,T +h|T}) =¯¯e_{i,T +h|T}¯¯ . The Diebold-Mariano (DM) test is based on the loss diﬀer-ential