Volatility estimation - Volatility estimation and modelling in high-frequency settings

2.5 Volatility estimation and modelling in high-frequency settings

2.5.1 Volatility estimation

High-frequency data provide a valuable source of intra-day information. Intuitively all the intra-day information nowadays available can be exploited to reﬁne and improve estimates on volatility over a given period, generally set to a trading day. This section provides a review on the volatility estimation problem at high-frequencies, i.e. under market microstructure noise.

Rather than on the volatility diffusion term (usually denoted withσ_t) involved in a generic price model, which is difﬁcult to capture, the recent econometric literature has focused on a closely-related quantity, known as Integrated Variance (IV). The IV naturally constitutes a volatility measure, synthesizing in a single quantity (the IV

itself) the complex dynamics of the underlying random and unobserved processσt

over a time window.

To ﬁx the ideas, consider a standard price model of the formd S_t =μSd t+σtdWt

(geometric Brownian motion), whereμis a (negligible) drift term,σt the volatility

diffusion andW a Brownian motion. The integrated variance over an interval[0,t]

(generally a day) is thus deﬁned as: IV_t=

t 0

σ2

sd s. (2.4)

Withσt ﬁnite and bounded. The stochastic theory of quadratic variation naturally

identiﬁes a simple and feasible estimator for the IV7_{, the so-called Realized Variance} (RV):

RV_n=n

i=1

r_i2, (2.5)

where the price over[0,t]is sampled innintervals andr_iis the return over thei-th intervalr_i= p_i− p_i₋₁. Asngrows, therefore as prices are sampled more and more

frequently, RV_n converges in probability toIV_t. The idea of deﬁning a volaility

estimator through a sum of squared returns goes back to (Merton, 1980), however the term “Realized” was ﬁrst introduced by(Andersen, Bollerslev, Diebold and Labys, 2000a; Andersen, Bollerslev, Diebold and Ebens, 2001) The realized variance therefore is an immediate estimator for the IV, our volatility measure, and is importance of high-frequency settings is clear: the higher the sampling frequency the better is

the estimateRV_nprovides ofIV, so the smaller the error (Barndorff-Nielsen and

Shephard, 2002):

plim

n→+∞RVn=IVt . (2.6)

The results in eq. (2.6) generally holds for for an underlying log-price processes p_t

deﬁned as a semi-martingale. At high frequencies however, the log-price pfollows a

different dynamics. A disturbance term affects the efﬁcient semi-martingale price. 7_{Most of textbooks in stochastic analysis address this point.}

In the high-frequency econometrics literature the observed price p_t is therefore decomposed in two parts, the efﬁcient price (p∗_t) (which follows a semi-martingale,

such as the geometric Brownian motion) and an error termt (e.g. Hasbrouck, 2007;

P. R. Hansen and Lunde, 2006; L. Zhang et al., 2005; Aït-Sahalia and Jacod, 2014, for instance):

p_t= p_t∗+. (2.7)

The error term, negligible at low sampling frequencies, becomes relevant at high- frequencies and is broadly imputed to market microstructure noise (Hasbrouck, 2007) and includes (i) frictions, e.g. bid-ask bounce, price-discreteness and truncation, issues related to trading on different networks (ii) informational effects, e.g. different price-responses to block-trades or inventory control costs, and (iii) errors, e.g. entries with zero-price, misplaced decimal points (Aït-Sahalia, Mykland et al., 2011). As a consequence, the estimation of the IV via RV becomes biased and inconsistent (e.g. L. Zhang et al., 2005), because of the noise term, although it holds for the efﬁcient

price p_t∗which is however unobservable. The volatility signature plot (Andersen,

Bollerslev, Diebold and Labys, 1999) in Figure 2.4 clearly highlights a bias in RV when approaching higher sampling frequencies.

The impact of the error at low frequencies is negligible in the IV estimation through RV (e.g. Andersen, Bollerslev, Diebold and Labys, 2000b) in such a way that its

random effect is canceled out (is usually taken to be of mean zero) as its impact

on low-frequencies returns is considerably small. However at high frequencies

biases the RV measure. Effects such as the bid-ask bouncing will inevitably generate sequences of high-frequency returns as wide as the spread: these price movements are not imputable to price volatility, rather to microstructure effects (a clear statement in the context of Roll, 1984).

Whereas a zero-mean hypothesis on the error term is reasonable, an iid. framework (including by Bandi et al., 2008; L. Zhang et al., 2005) is simplistic. Empirical in- vestigations have shown that, for instance, the variance of the noise is time-varying, that thet process is negatively auto-correlated, and that is correlated with the efﬁ- cient price itself (P. R. Hansen and Lunde, 2006; Aït-Sahalia, Mykland et al., 2011). Besides non-stationarity, there are also diurnal effects (Jacod, Y. Li and X. Zheng, 2017; Kalnina and Linton, 2008) and heterogeneities among stocks (P. R. Hansen and Lunde, 2006; Aït-Sahalia and Yu, 2008) to complicate the discussion. The empirical

properties of the market microstructure and consequent theoretical assumptions drawn on it play a central role in designing estimators for the integrated variance that are noise-robust with desirable properties (e.g. P. R. Hansen and Lunde, 2006; Barndorff-Nielsen, P. R. Hansen et al., 2008).

A number of estimators have been developed to disentangle the contribution of the microstructure noise in the observed high-frequency returns and thus to provide a consistent and unbiased estimator for the IV, our target volatility measure. Several approaches have been developed. Besides the choice of the best sampling method, for instance, transaction-time sampling (e.g. R. C. Oomen, 2005; R. C. A. Oomen, 2006; Pooter et al., 2008), the presence of noise practically implies a trade-off between sampling frequency, which we would like to be as high as possible in virtue of eq. (2.6) and the effect of the microstructure error. A remedy is to sample at moderate frequencies, where the noise impact is small: sparse sampling at e.g. 5 minutes (or higher frequencies) is a commonly adopted scheme (L. Zhang et al., 2005; Ait-Sahalia et al., 2005). Several works, therefore, have investigated the effect of sampling and the possibility of determining an optimal sub-sampling scheme (e.g. Bandi et al., 2005; Bandi et al., 2008; L. Zhang et al., 2005; Kalnina, 2011). Among these we ﬁnd the two-scales estimator (L. Zhang et al., 2005), and its multi-scale extension (L. Zhang et al., 2006; Aït-Sahalia, Mykland et al., 2011). Pre-averaging estimators (Podolskij et al., 2009; Jacod, Y. Li, Mykland et al., 2009; Jacod, Podolskij et al., 2010) on the other hand, pre-average a number of returns to reduce the microstructure impact, and use them in estimating the IV with a (properly re-scaled) RV-like approach. A further approach for the robust estimation of the IV is the kernel approach (B. Zhou, 1996; P. R. Hansen and Lunde, 2004), leading to the so-called realized kernel of (Barndorff-Nielsen, P. R. Hansen et al., 2008). Nevertheless, several other approaches have been developed, e.g. the Fourier-based approach (Malliavin et al., 2009; Barucci et al., 2002), and pre-ﬁltering methods for the intraday-returns (e.g. Bollen et al., 2002; Andersen, Bollerslev, Diebold and Ebens, 2001). Figure 2.4 shows how some selected estimators behave at increasing sampling frequencies, underlying the impact of the MMS noise as a source of bias. Note that whereas the observed price process is in the truth of unknown exact nature, realized measures have been developed under precise noise assumptions: considerable deviations of high-frequency measures w.r.t. to low-sampled and noise-free RV are possible.

0 60 120 180 300 600 Sampling frequency (seconds)

16% 18% 20% 22% 24% 26% 28% 30%

Realized measures of volatility

RV ACRV PA RK 5 min. RV 20 min. RV

Figure 2.4 Volatility signature plot, depicting the behaviour of different realizedvolailitymeasures w.r.t. the sampling frequency. Realized measures include (B. Zhou, 1996) - ACRV, (ﬁrst-order) autocorrelated realized variance, the (Jacod, Y. Li, Mykland et al., 2009) - PA, pre-averaging estimator, and (Barndorff-Nielsen et al., 2009) - RK, realized kernel. Realizes Volatility measures, i.e. square-roots of the corresponding realized variance measures, are annualized for convenience. 5 and 20 min. RV are provided for reference (justiﬁed e.g. in L. Zhang et al., 2005; Barndorff-Nielsen et al., 2009, respectively). Average series over twenty days for stock AAPL, corresponding to May 1, 2012; Trades And Quotes data.

In document Volatility modeling and limit-order book analytics with high-frequency data (Page 53-57)