HAR model - Methods in volatility modeling and forecasting

4.3 Methods in volatility modeling and forecasting

4.3.1 HAR model

This section discusses in detail the HAR model introduced in Section 2. Long-memory dependence in ﬁnancial market volatility is long-established fact. Different models have been proposed to capture this behavior (see e.g. Section IV of Andersen, Boller- slev and Diebold, 2007, for a list of references). The HAR model is an outcome of this literature, in particular of the HARCH-class models (U. A. Müller, Dacorogna, Davé, Olsen et al., 1997), heuristically motivated by the heterogeneous market hy- pothesis (U. A. Müller, Dacorogna, Davé, Pictet et al., 1993): heterogeneous market participants trade on the market over different investment horizons, coexisting and interacting within the same market. E.g., high-frequency traders may be thought as participants trading at intra-day horizons, whereas large institutional traders hold their positions over longer time horizons. The typical slow-decay observed in volatility autocorrelation and stylized facts about returns’ and volatility distribution can be reproduced by mixing in a simple linear model only three volatility components,

intuitively corresponding to the contribution to total daily volatility from trading on daily, weekly, and monthly horizons. Such a model, known as the Heterogeneous Autoregressive (HAR) model of (Corsi, 2009), is very attractive due to its simplicity in estimation, interpretation and in forecasting ability.

The daily latent volatility process ˜σt(d)is modelled under a three-factor stochastic volatility speciﬁcation. Factors are the past (realized) volatilities at different frequen-

cies9_{. The three volatility terms identiﬁed in the HAR model are a daily component}

d, a weekly componentw and a monthly componentm, these are referred to as

partial-volatility terms, since speciﬁc of a given time horizon. The latent volatility ˜

σ(t·)at any time scale is assumed to be a (linear) function of the past observed realized variance at the same time-scale10and of the expectation of next-period’s longer-term partial volatility components11_{. The hierarchical cascade assumption reads}12_:

˜ σ_t(d₊₁)_d=c(d)+φ(d)_RV(d) t +γ(d)E ˜ σ_t(₊₁w)_w+ω˜(_td₊₁)_d , (4.15) ˜ σ(w) t+1w=c(w)+φ(w)RV( w) t +γ(w)E ˜ σ(m) t+1m +ω˜(_tw₊)₁_w, ˜ σ(_tm₊₁)_m=c(m)+φ(m)_RV(m) t +ω˜( m) t+1m,

where RV_t(p) is obtained by averaging p daily lagged realized variance estimates.

Speciﬁcally,RV_t(w)= 1₅4 i=0RV( d) t−iandRV (m) t =221 ₂₂ i=0RV( d)

t−iare the weekly and monthly volatility components. Importantly, the error terms ˜ω(_td₊)₁_d, ˜ω(_tw₊)₁_d and

ω(_tm₊₁)_d, are serially independent, zero-mean and must guarantee positivity of the estimates.

By setting ˜σt(d)=σt(d)withσt(d)being the square-root of the integrated volatility13

t−1dσs2d s

2_{, by substitutions eq. (4.15) turns into:}

σ_t(d₊₁)_d=c+β(d)RV_t(d)+β(w)RV_t(w)+β(m)RV_t(m)+ω˜(_td₊₁)_d , (4.16) 9_RV(p)

t denotes the realized variance estimated intfor the time-horizon p

10_{This corresponds to an AR-like structure: eq. (4.15) do not involve lagged values of ˜}_σ(·)

t , but rather

their respective proxies, i.e.RVt(·).

11_{For ˜}_σ(m)

t only a linear function of monthly-RV remains, so the AR-like term.

12_t₊₁_d_{reads as “(end of) day}_t_{plus one day”, similarly:} ₊₁_w_and₊₁_w_{respectively stand for a}

week and a month ahead w.r.t. dayt. RV are the actually observed ex-post values.

13_{As pointed out in Section 2 is the integrated volatility the usual quantity of interest, as a synthesis}

which corresponds to the three-factor representation earlier mentioned. By introduc- ing an error termω(_td₊₁)_d that accounts for both measurement and estimation errors associated with using RV as a proxy for ˜σ_t(d₊₁)_d -or analogously recalling thatRV_t(₊₁d)_dis not an error-free measure forσ_t(d₊₁)_d, (Barndorff-Nielsen and Shephard, 2002)-,σ_t(d₊₁)_d rewrites

σ_t(d₊)₁_d=RV_t(₊d)₁_d+ω(_td₊)₁_d . (4.17) By substituting eq. (4.17) into eq. (4.16) and collapsing the respective error terms, the HAR-RV model reads as:

RV_t(₊₁d) =c(d)+β(d)_RV(d)

t +β(w)RVt(w)+β(m)RVt(m)+ωt+1d , (4.18) whereω_t₊₁_d=ω˜(d)

t+1d−ω (d)

t+1d. This corresponds to an autoregressive model with

autoregressive weights taking a step-function form, restricted in a parsimonious way such that the three emerging components are economically meaningful and interpretable (Corsi, 2009).

The standard estimation of the HAR-RV model is performed via OLS. To guarantee non-negativity of the volatility estimates, eq. (4.18) can be written and estimated in the logs. To account for serial correlation a common practice is to use the Newey- West covariance correction in the estimation. Note that the HAR model can be implemented over the preferred volatility measure, e.g. over the realized kernel as in Publication IV.

As pointed out in Section 1.2, the discussion in Publication IV is based on some critical aspects of the HAR model. Here, I summarize them by referring to the above construction. (i) Linearity of equations (4.15) and thus in the linkage between the components involved in each equation. (ii) ˜ω(_td₊₁)_d are (a) mutually independent, (b) zero-mean, (c) left-tail truncated to guarantee positivity in the estimates. (iii) Independence of theβ(·)_{coefﬁcients in eq. (4.16) over time. (iv) Positivity of the} estimates in eq. (4.15), as (Corsi, 2009) suggest, can be achieved with an alternative speciﬁcation of the model in eq. (4.18) in terms of log-RV (which is a common practice), by doing this (a) eq. (4.17) is assumed to hold in the logs as well, (b) non-log estimates are retrieved by bootstrapping, i.e. simulation (v) The HAR-RV model corresponds to a reparametrization of an AR model, with autoregressive weights

taking a step-function14 _{(a) this is a step-like approximation of the typical power-} law decay in volatility, (b) a limited number of volatility terms only resemble a portion of overall long-range correlation involving a continuous of time-scales. (vi) (a) Presence of autocorrelation, heteroscedasticity and general non stationarity in

ωt+1d require attention under OLS estimation, e.g. by using HAC standard errors,

(b) although normality of the residuals is on a general level not critical for OLS applicability, conﬁdence interval for the predictions (either for the mean response or observations) are symmetric, while e.g. volatility shows skewness. These points are further discussed in Section 3.1 of Publication IV.

Publication IV relies on some critical aspects of some of the above key-points in the

HAR model construction. These arenotto be seen as a criticism but rather as starting

points for reasoning over possible limitations of the HAR model and for developing of possible extensions. Indeed, HAR’s simplicity, its ability in reproducing several stylized features empirically observed in the markets and its good prediction ability, broadly motivate its widespread use.

In document Volatility modeling and limit-order book analytics with high-frequency data (Page 86-89)