2.5 Volatility estimation and modelling in high-frequency settings
2.5.1 Volatility estimation
High-frequency data provide a valuable source of intra-day information. Intuitively all the intra-day information nowadays available can be exploited to refine and improve estimates on volatility over a given period, generally set to a trading day. This section provides a review on the volatility estimation problem at high-frequencies, i.e. under market microstructure noise.
Rather than on the volatility diffusion term (usually denoted withσt) involved in a generic price model, which is difficult to capture, the recent econometric literature has focused on a closely-related quantity, known as Integrated Variance (IV). The IV naturally constitutes a volatility measure, synthesizing in a single quantity (the IV
itself) the complex dynamics of the underlying random and unobserved processσt
over a time window.
To fix the ideas, consider a standard price model of the formd St =μSd t+σtdWt
(geometric Brownian motion), whereμis a (negligible) drift term,σt the volatility
diffusion andW a Brownian motion. The integrated variance over an interval[0,t]
(generally a day) is thus defined as: IVt=
t 0
σ2
sd s. (2.4)
Withσt finite and bounded. The stochastic theory of quadratic variation naturally
identifies a simple and feasible estimator for the IV7, the so-called Realized Variance (RV):
RVn=n
i=1
ri2, (2.5)
where the price over[0,t]is sampled innintervals andriis the return over thei-th intervalri= pi− pi−1. Asngrows, therefore as prices are sampled more and more
frequently, RVn converges in probability toIVt. The idea of defining a volaility
estimator through a sum of squared returns goes back to (Merton, 1980), however the term “Realized” was first introduced by(Andersen, Bollerslev, Diebold and Labys, 2000a; Andersen, Bollerslev, Diebold and Ebens, 2001) The realized variance therefore is an immediate estimator for the IV, our volatility measure, and is importance of high-frequency settings is clear: the higher the sampling frequency the better is
the estimateRVnprovides ofIV, so the smaller the error (Barndorff-Nielsen and
Shephard, 2002):
plim
n→+∞RVn=IVt . (2.6)
The results in eq. (2.6) generally holds for for an underlying log-price processes pt
defined as a semi-martingale. At high frequencies however, the log-price pfollows a
different dynamics. A disturbance term affects the efficient semi-martingale price. 7Most of textbooks in stochastic analysis address this point.
In the high-frequency econometrics literature the observed price pt is therefore decomposed in two parts, the efficient price (p∗t) (which follows a semi-martingale,
such as the geometric Brownian motion) and an error termt (e.g. Hasbrouck, 2007;
P. R. Hansen and Lunde, 2006; L. Zhang et al., 2005; Aït-Sahalia and Jacod, 2014, for instance):
pt= pt∗+. (2.7)
The error term, negligible at low sampling frequencies, becomes relevant at high- frequencies and is broadly imputed to market microstructure noise (Hasbrouck, 2007) and includes (i) frictions, e.g. bid-ask bounce, price-discreteness and truncation, issues related to trading on different networks (ii) informational effects, e.g. different price-responses to block-trades or inventory control costs, and (iii) errors, e.g. entries with zero-price, misplaced decimal points (Aït-Sahalia, Mykland et al., 2011). As a consequence, the estimation of the IV via RV becomes biased and inconsistent (e.g. L. Zhang et al., 2005), because of the noise term, although it holds for the efficient
price pt∗which is however unobservable. The volatility signature plot (Andersen,
Bollerslev, Diebold and Labys, 1999) in Figure 2.4 clearly highlights a bias in RV when approaching higher sampling frequencies.
The impact of the error at low frequencies is negligible in the IV estimation through RV (e.g. Andersen, Bollerslev, Diebold and Labys, 2000b) in such a way that its
random effect is canceled out (is usually taken to be of mean zero) as its impact
on low-frequencies returns is considerably small. However at high frequencies
biases the RV measure. Effects such as the bid-ask bouncing will inevitably generate sequences of high-frequency returns as wide as the spread: these price movements are not imputable to price volatility, rather to microstructure effects (a clear statement in the context of Roll, 1984).
Whereas a zero-mean hypothesis on the error term is reasonable, an iid. framework (including by Bandi et al., 2008; L. Zhang et al., 2005) is simplistic. Empirical in- vestigations have shown that, for instance, the variance of the noise is time-varying, that thet process is negatively auto-correlated, and that is correlated with the effi- cient price itself (P. R. Hansen and Lunde, 2006; Aït-Sahalia, Mykland et al., 2011). Besides non-stationarity, there are also diurnal effects (Jacod, Y. Li and X. Zheng, 2017; Kalnina and Linton, 2008) and heterogeneities among stocks (P. R. Hansen and Lunde, 2006; Aït-Sahalia and Yu, 2008) to complicate the discussion. The empirical
properties of the market microstructure and consequent theoretical assumptions drawn on it play a central role in designing estimators for the integrated variance that are noise-robust with desirable properties (e.g. P. R. Hansen and Lunde, 2006; Barndorff-Nielsen, P. R. Hansen et al., 2008).
A number of estimators have been developed to disentangle the contribution of the microstructure noise in the observed high-frequency returns and thus to provide a consistent and unbiased estimator for the IV, our target volatility measure. Several approaches have been developed. Besides the choice of the best sampling method, for instance, transaction-time sampling (e.g. R. C. Oomen, 2005; R. C. A. Oomen, 2006; Pooter et al., 2008), the presence of noise practically implies a trade-off between sampling frequency, which we would like to be as high as possible in virtue of eq. (2.6) and the effect of the microstructure error. A remedy is to sample at moderate frequencies, where the noise impact is small: sparse sampling at e.g. 5 minutes (or higher frequencies) is a commonly adopted scheme (L. Zhang et al., 2005; Ait-Sahalia et al., 2005). Several works, therefore, have investigated the effect of sampling and the possibility of determining an optimal sub-sampling scheme (e.g. Bandi et al., 2005; Bandi et al., 2008; L. Zhang et al., 2005; Kalnina, 2011). Among these we find the two-scales estimator (L. Zhang et al., 2005), and its multi-scale extension (L. Zhang et al., 2006; Aït-Sahalia, Mykland et al., 2011). Pre-averaging estimators (Podolskij et al., 2009; Jacod, Y. Li, Mykland et al., 2009; Jacod, Podolskij et al., 2010) on the other hand, pre-average a number of returns to reduce the microstructure impact, and use them in estimating the IV with a (properly re-scaled) RV-like approach. A further approach for the robust estimation of the IV is the kernel approach (B. Zhou, 1996; P. R. Hansen and Lunde, 2004), leading to the so-called realized kernel of (Barndorff-Nielsen, P. R. Hansen et al., 2008). Nevertheless, several other approaches have been developed, e.g. the Fourier-based approach (Malliavin et al., 2009; Barucci et al., 2002), and pre-filtering methods for the intraday-returns (e.g. Bollen et al., 2002; Andersen, Bollerslev, Diebold and Ebens, 2001). Figure 2.4 shows how some selected estimators behave at increasing sampling frequencies, underlying the impact of the MMS noise as a source of bias. Note that whereas the observed price process is in the truth of unknown exact nature, realized measures have been developed under precise noise assumptions: considerable deviations of high-frequency measures w.r.t. to low-sampled and noise-free RV are possible.
0 60 120 180 300 600 Sampling frequency (seconds)
16% 18% 20% 22% 24% 26% 28% 30%
Realized measures of volatility
RV ACRV PA RK 5 min. RV 20 min. RV
Figure 2.4 Volatility signature plot, depicting the behaviour of different realizedvolailitymeasures w.r.t. the sampling frequency. Realized measures include (B. Zhou, 1996) - ACRV, (first-order) autocorrelated realized variance, the (Jacod, Y. Li, Mykland et al., 2009) - PA, pre-averaging estimator, and (Barndorff-Nielsen et al., 2009) - RK, realized kernel. Realizes Volatility measures, i.e. square-roots of the corresponding realized variance measures, are annualized for convenience. 5 and 20 min. RV are provided for reference (justified e.g. in L. Zhang et al., 2005; Barndorff-Nielsen et al., 2009, respectively). Average series over twenty days for stock AAPL, corresponding to May 1, 2012; Trades And Quotes data.