Minjing
Tao, Yahzen Wang, Qiwei Yao and Jian Zou
Large volatility matrix inference via
combining low-frequency and
high-frequency approaches
Article (Accepted version)
(Refereed)
Original citation:
Tao, Minjing, Wang, Yahzen, Yao, Qiwei and Zou, Jian (2011) Large volatility matrix inference via combining low-frequency and high-frequency approaches. Journal of the American Statistical Association, 106 (495). pp. 1025-1040. ISSN 0162-1459
DOI: 10.1198/jasa.2011.tm10276
© 2011 American Statistical Association
This version available at: http://eprints.lse.ac.uk/39321/
Available in LSE Research Online: July 2014
LSE has developed LSE Research Online so that users may access research output of the School. Copyright © and Moral Rights for the papers on this site are retained by the individual authors and/or other copyright owners. Users may download and/or print one copy of any article(s) in LSE Research Online to facilitate their private study or for non-commercial research. You may not engage in further distribution of the material or use it for any profit-making activities or any commercial gain. You may freely distribute the URL (http://eprints.lse.ac.uk) of the LSE Research Online website.
This document is the author’s final accepted version of the journal article. There may be differences between this version and the published version. You are advised to consult the publisher’s version if you wish to cite from it.
Large Volatility Matrix Inference via Combining
Low-Frequency and High-Frequency Approaches
Minjing Tao and Yazhen Wang
University of Wisconsin-Madison
Qiwei Yao
London School of Economics
Jian Zou
National Institute of Statistical Sciences
February 27, 2011
Abstract
It is increasingly important in financial economics to estimate volatilities of asset returns. However most the available methods are not directly applicable when the num-ber of assets involved is large, due to the lack of accuracy in estimating high dimensional matrices. Therefore it is pertinent to reduce the effective size of volatility matrices in order to produce adequate estimates and forecasts. Furthermore, since high-frequency financial data for different assets are typically not recorded at the same time points, conventional dimension-reduction techniques are not directly applicable. To overcome those difficulties we explore a novel approach that combines high-frequency volatility matrix estimation together with low-frequency dynamic models. The proposed method-ology consists of three steps: (i) estimate daily realized co-volatility matrices directly based on high-frequency data, (ii) fit a matrix factor model to the estimated daily co-volatility matrices, and (iii) fit a vector autoregressive (VAR) model to the estimated volatility factors. We establish the asymptotic theory for the proposed methodology in the framework that allows sample size, number of assets, and number of days go to infinity together. Our theory shows that the relevant eigenvalues and eigenvectors can be consistently estimated. We illustrate the methodology with the high-frequency price data on several hundreds of stocks traded in Shenzhen and Shanghai Stock Exchanges over a period of 177 days in 2003. Our approach pools together the strengths of model-ing and estimation at both intradaily (high-frequency) and interdaily (low-frequency) levels.
Some key words: dimension reduction; eigen-analysis; factor model; high frequency data; matrix process; realized volatilities; vector autoregressive model.
1
Introduction
Modeling and forecasting the volatilities of financial returns are vibrant research areas in econometrics and statistics. For financial data at daily or longer time horizons, which are often referred to as low-frequency data, there exists extensive literature on direct volatil-ity modeling using GARCH, discrete stochastic volatilvolatil-ity, and diffusive stochastic volatilvolatil-ity models as well as indirect modeling using implied volatility obtained from option pricing models.
With the availability of intraday financial data, which are called high-frequency data, there is an surging interest on estimating volatilities using high-frequency returns directly. The field of high-frequency finance has experienced a rapid evolvement in past several years. One of the focus points at present is to estimate integrated volatility over a period of time, say, a day. Estimation methods for univariate volatilities include realized volatility (RV), bi-power realized variation (BPRV), two-time scale realized volatility (TSRV), wavelet re-alized volatility (WRV), rere-alized kernel volatility (KRV), pre-averaging rere-alized volatility, and Fourier realized volatility (FRV). For the cases with multiple assets, a so called non-synchronized problem arises, which refers to the fact that transactions for different assets often occur at distinct times, and the high-frequency prices of different assets are recorded at mismatched time points. Hayashi and Kusuoka (2005) and Zhang (2011) proposed to estimate integrated co-volatility of the two assets based on overlap intervals and previous ticks, respectively. Barndorff-Nielsen et. al. (2010) employed a refresh time scheme to syn-chronize the data and then apply a realized kernel to the synsyn-chronized data for estimating integrated co-volatility. Christensen et. al. (2010) studied integrated co-volatility estimation by the pre-averaging approach. Nevertheless most existing works on volatility estimation us-ing high-frequency data are for a sus-ingle asset or a small number of assets, and therefore are only directly applicable when the integrated volatility concerned is either a scalar or a small
matrix.
In reality we often face with scenarios involving a large number of assets. The integrated volatility concerned then is a matrix of a large size. In principle, a large volatility matrix may be estimated as follows: estimating each diagonal element, representing an integrated volatility of a single asset, by univariate methods such as RV and BPRV, and estimating each off-diagonal element, representing an integrated co-volatility of two assets, by the method of Hayashi and Kusuoka (2005) or Zhang (2011). However, due to the large number of elements in the volatility matrix, such a naive estimator often behaves poorly. It is widely known that as dimension (or matrix size) go to infinity, the estimators such as sample covariance matrix and usual realized co-volatility estimators are inconsistent in the sense that the eigenvalues and eigenvectors of the matrix estimators are far from the true targets (Johnstone (2001), Johnstone and Lu (2009), and Wang and Zou (2010)). Banding and tresholding are proposed by (Bickel and Levina (2008 a, b)) to yield consistent estimators of large covariance matrices, and a factor model approach is used in Fan et al. (2008) to estimate large covariance matrices. To illustrate this point, we conduct a simulation as follows: considerpassets over unit time interval with all log prices following independent standard Brownian motions. Observations were taken without noise at the same time grids ti =i/n for i= 0,1,· · · , n. Then the true
integrated volatility matrix V is the identity matrix Ip. The estimator for V based on the
RV and the co-RV methods is
b V = (Vbjk), with Vbjk = 1 n n ∑ i=1 ZijZik for 1≤j, k ≤p.
where Zij, i= 1,· · · , n, j = 1,· · · , p, are effectively independent N(0,1) random variables.
Settingp= 100, we drew 50 samples of size n= 100. For each of 50 samples, we computed the 100 eigenvalues ofVb and evaluated their maximum and minimum eigenvalues. Of the 50 sets of 100 eigenvalues, we found that all sets range approximately from zero to four with an average minimum eigenvalue 0.0001 and an average maximum eigenvalue 3.9. This clearly
indicates the serious lack of accuracy in estimatingV since all its eigenvalues are equal to 1. The inaccuracy of the estimatorVb is further manifested by the wide range of its eigenvalues displayed in Figure 1. This numerical experiment indicates that it is essential to reduce the
index Eigenvalues 0 20 40 60 80 100 0 1 2 3 4
(a) 50 sets of ordered 100 eigenvalues
index Eigenvalues 0 10 20 30 40 50 0 1 2 3 4
(b) 50 pairs of max and min eigenvalues
Figure 1: Plots of eigenvalues Vb from a simulation with 50 repetitions. (a) Each of the 50 curves represents the ordered 100 eigenvalues of each sampled V. (b) the minimum andb maximum eigenvalues ofVb across 50 repetitions.
number of estimated parameters in such a high-dimensional problem.
This paper considers high-frequency prices observed on a large number of assets over many days. We propose a matrix factor model for daily integrated volatility matrix pro-cesses. The matrix factor model facilitates combining high-frequency volatility estimation
with low-frequency dynamic models as well as reducing an effective dimension in large volatil-ity matrices. It is important to note that the proposed matrix factor model is directly for integrated volatility matrices. Since prices for different assets are typically observed at differ-ent times, it is often impossible to apply an ordinary factor model to the original price data directly. Nevertheless the available abundance of the information in high-frequency data should make modeling daily volatilities easier. Indeed the inference for our matrix factor model is more direct than that for the ordinary factor volatility models for price data.
Our estimation procedure consists of three steps. First we estimate integrated volatility matrix for each day by thresholding average realized volatility matrix (TARVM) estimators. We then perform an eigen-analysis to fit a matrix factor model for the estimated daily integrated volatility matrices and obtain estimated daily factor matrices. Finally we fit a vector autoregressive (VAR) model for the estimated daily volatility factor matrices. The proposed methodology pools together strengths in modeling and estimation at both low-frequency and high-low-frequency levels. In the univariate case where dimension reduction is not an issue, Andersen, Bollerslev and Diebold (2003) and Corsi (2003) demonstrated that the forecasting for volatilities may be improved from fitting a heterogeneous AR model to RV and BPRV based estimators of integrated volatilities. The approach is termed as the HAR-RV model. Our proposal may be viewed as a high dimensional version of the HAR-RV approach based on new idea on matrix factor modeling.
We have established novel asymptotic theory for the proposed methodology in the frame-work that allowsp(number of assets),n(average sample size), andL(number of days) all go to infinity. The established convergence rates for TARVM estimators and the matrix factor model under matrix norm provide a theoretical justification for the proposed methodology. These results indicate that the relevant eigenvalues and eigenvectors in the proposed factor modeling can be consistently estimated for largep. We also show that fitting the VAR model
with the estimated daily volatility factor matrices from high-frequency data is asymptotically as efficient as that with true daily volatility factor matrices.
The rest of the paper is organized as follows. The proposed methodology is presented in Section 2. Its asymptotic theory is established in Section 3. Numerical illustration is reported in Section 4. Section 5 features conclusions. All proofs are collected in Section 6.
2
Methodology
2.1
Price model and observed data
Suppose that there arepassets and their log price processX(t) ={X1(t),· · ·, Xp(t)}T obeys
an Itˆo process governed by
dX(t) =µtdt+σtdWt, t∈[0, L], (1)
whereLis an integer, Wt is ap-dimensional standard Brownian motion, µt is a drift taking
values in IRp, and σ
t is a p×p matrix. Both µt and σt are assumed to be continuous int.
Let a day be a unit time. The integrated volatility matrix for theℓ-th day is defined as
Σx(ℓ) =
∫ ℓ
ℓ−1
σsσTsds, ℓ= 1,· · · , L.
Suppose that high-frequency prices for the i-th asset on the ℓ-th day are observed at times
tij ∈(ℓ−1, ℓ],ℓ = 1,· · · , L. We denote by Yi(tij) the observed log price of the i-th asset at
timetij. Due to the so-called non-synchronized problem, typicallyti1j ̸=ti2j for any i1 ̸=i2.
Furthermore the high-frequency prices are typically masked by some micro-structure noise in the sense that the observed log price Yi(tij) is a noisy version of the corresponding true
log price Xi(tij). A common practice is to assume
where εi(tij) are i.i.d. noise with mean zero and variance ηi, and εi(·) and Xi(·) are
inde-pendent with each other.
Let ni(ℓ) be the sample size for asset i on the ℓ-th day, i.e. ni(ℓ) = the number of tij ∈(ℓ−1, ℓ],n(ℓ) =
∑p
i=1ni(ℓ)/p, the average sample size of the p assets on the ℓ-th day,
and n=∑Lℓ=1n(ℓ)/L, the average sample size across the p assets and over all L days.
2.2
Realized volatility matrix estimator
To highlight the basic idea in realized volatility matrix estimation, we first consider estimat-ingΣx(1), the integrated volatility matrix on day one, by averaging realized volatility matrix
(ARVM) estimator proposed in Wang and Zou (2010). Suppose thatτ ={τr, r= 1,· · ·, m}
is a pre-determined sampling frequency. For asset i, define previous-tick times
τi,r = max{tij ≤τr, j = 1,· · · , ni(1)}, r= 1,· · ·, m.
Based on τ we define realized co-volatility between assets i1 and i2 by
˜ Σy(1,τ)[i1, i2] = m ∑ r=1 [Yi1(τi1,r)−Yi1(τi1,r−1)] [Yi2(τi2,r)−Yi2(τi2,r−1)], (3)
and realized volatility matrix by ˜
Σy(1,τ) = ( ˜Σy(1,τ)[i1, i2])1≤i1,i2≤p. (4)
We take the pre-determined sampling frequency τ as the following regular grids. Given a fixedm, there are K = [n(1)/m] classes of non-overlap regular grids given by
τk ={(r−1)/m, r = 1,· · · , m}+(k−1)/n(1) ={(r−1)/m+(k−1)/n(1), r= 1,· · · , m}, (5)
wherek = 1,· · · , K, and n(1) is the average sample size on day one. For each τk, using (3) and (4) we define realized co-volatility ˜Σy(1,τk)[i1, i2] between assets i1 and i2 and realized
volatility matrix ˜Σy(1,τk). The ARVM estimator is given by ˜ Σy(1)[i1, i2] = 1 K K ∑ k=1 ˜ Σy(1,τk)[i1, i2]−2mηbi11(i1 =i2), (6) ˜ Σy(1) = ( ˜Σy(1)[i1, i2]) = 1 K K ∑ k=1 ˜ Σy(1,τk)−2mηb, (7) where b ηi = 1 2ni(1) n∑i(1) j=1 [Yi(ti,j)−Yi(ti,j−1)]2, (8)
are estimators of noise variances ηi, and ηb = diag(bη1,· · · ,ηbp) is the estimator of η =
diag(η1,· · · , ηp). The averaging in (6) and (7) is to reduce the impact of microstructure
noise on realized volatility matrices ˜Σy(1,τk) and yield a better ARVM estimator.
When p is small, ˜Σy(1) provides a good estimator for Σx(1). But for large p, it is well
known that ˜Σy(1) is inconsistent. In fact, statistics theory for small n and largepor largen
but much largerpproblems shows that the eigenvalues and the eigenvectors of, for example, a sample covariance matrix or a realized volatility matrix are inconsistent estimators for the corresponding true eigenvalues and eigenvectors. The proposed methodology in this paper relies on consistent estimation of eigenvalues and eigenvectors of large volatility matrices. In order to estimateΣx(1) consistently for largep, we need impose some sparsity structure on
Σx(1) (see (18) in Section 3) and threshold ˜Σy(1) by retaining its elements whose absolute
values exceed a given value and replacing others by zero. See Bickel and Levina (2008a,b), Johnstone and Lu (2009), Wang and Zou (2010). We threshold ˜Σy(1) and obtain an estimator
b Σy(1) =Tϖ[ ˜Σy(1)] = ( ˜ Σy(1)[i1, i2]1(|Σ˜y[i1,i2]|≥ϖ) ) , (9)
whereϖis a threshold. The (i1, i2)-th element ofΣby(1) is equal to ˜Σy(1)[i1, i2] if its absolute
value is greater than or equal to ϖ and zero otherwise. The threshold ARVM estimator
b
Similarly, based on high-frequency data on the ℓ-th day we construct ARVM estimator ˜
Σy(ℓ) and define TARVM estimatorΣby(ℓ) to provide an estimator for the integrated volatility
matrix Σx(ℓ), ℓ= 2,· · ·, L.
2.3
A matrix factor model
To reduce the effective number of entries in Σx(ℓ) and connect high-frequency volatility
matrix estimation with low-frequency volatility dynamic models, we propose a factor model as follows,
Σx(ℓ) =A Σf(ℓ)AT +Σ0, ℓ= 1,· · · , L, (10)
wherer is a fixed small integer (much smaller thanp),Σ0 is ap×ppositive definite constant
matrix,Σf(ℓ) arer×rpositive definite matrices and treated as factor volatility process, and
Ais ap×rfactor loading matrix. This effectively assumes that the daily dynamical structure of the matrix process Σx(ℓ) is driven by that of a lower-dimensional latent process Σf(ℓ),
whileΣ0 represents the static part ofΣx(ℓ). Although the form of the above model is similar
to the factor volatility models proposed by, for example, Engle and Rothschild (1990), the key difference here is that we have the ‘observations’Σby(·) directly on the volatility process
Σx(·). Since the high-frequency prices are measured at the different times for different assets,
we cannot apply a factor model directly to the observed high-frequency data Yi(tij).
The availability of the estimators for Σx(·) from high-frequency data makes it easier
to estimate both the factor loading matrix A and the factor volatility Σf(·). In fact the
estimation problem now reduces to a standard eigen-analysis and can be easily performed for
pas large as a few thousands. This is in marked contrast to the more standard circumstances when only the observations on Xt are available; see, for example, Pan and Yao (2008). To
fix the idea, let us temporarily assume that we observe Σx(ℓ). Note that there is no loss of
generality in assuming A in (10) satisfying the condition AT A =I
completely identifiable even under this constraint, however the linear space spanned by the columns of A is. Note that there exists a p×(p−r) matrix B for which BT A = 0 and
BT B=I
p−r, i.e. (A,B) is a p×porthogonal matrix. Now multiplyingBT on both sides of
(10), we obtain that BTΣx(ℓ) = BTΣ0. (11) Put ¯ Σx = 1 L L ∑ ℓ=1 Σx(ℓ), S¯x = 1 L L ∑ ℓ=1 {Σx(ℓ)−Σ¯x}2. (12)
Equation (11) implies that for allℓ = 1,· · · , L,BTΣ
x(ℓ) = BTΣ¯x, and BTS¯xB= 1 L L ∑ ℓ=1 {BTΣx(ℓ)−BT Σ¯x}{Σx(ℓ)B−Σ¯xB}= 0. (13)
This suggests that the columns of B are the p−r orthonormal eigenvectors of ¯Sx,
cor-responding to the (p−r)-fold eigenvalue 0. The other r orthonormal eigenvectors of ¯Sx,
corresponding to the r non-zero eigenvalues, may be taken as the columns of the factor loading matrixA.
Of course Σx(ℓ) is unknown in practice. We use Σby(ℓ) as a proxy. Let
¯ Σy = 1 L L ∑ ℓ=1 b Σy(ℓ), S¯y = 1 L L ∑ ℓ=1 {Σby(ℓ)−Σ¯y}2, (14)
where Σby(ℓ) are TARVM estimators computed from high-frequency data; see Section 2.2
above. Then the estimator Ab is obtained using the r orthonormal eigenvectors of ¯Sy,
cor-responding to the r largest eigenvalues, as its columns. Consequently the estimated factor volatilities are
b
Σf(ℓ) =AbTΣby(ℓ)Ab, ℓ= 1,· · ·, L, (15)
and the estimator forΣ0 in model (10) may be taken as b
2.4
VAR modeling for factor volatilities
With estimated factor volatility matrices in (15), we build up the dynamical structure of Σx(ℓ) by fitting a VAR model to Σbf(ℓ). One alternative is to adopt more sophisticated
multivariate volatility models to fitΣbf(ℓ) orΣb
1/2
f (ℓ) (see Wang and Yao (2005) and Remark
5 after Lemma 6 in Section 6). We opt to a simple VAR model in the spirit of the HAR-RV approach advocated by Andersen, Bollerslev and Diebold (2003) and Corsi (2003). They demonstrate that fitting an AR model to realized (one-dimensional) volatilities may lead to significant improvement in volatility forecasting.
For a r ×r matrix Σ, let vech(Σ) be the r(r+ 1)/2×1 vector obtained by stacking together the truncated column vectors ofΣ, where the truncating means to remove all the elements above the main diagonal. Then the VAR model for Σf(ℓ) is of the form
vech{Σf(ℓ)}=α0+
q
∑
j=1
αjvech{Σf(ℓ−j)}+eℓ, (17)
whereq ≥1 is an integer,α0 is a vector,α1,· · · ,αq are square matrices, and eℓ is a vector
white noise process with zero mean and finite fourth moments. Since Σf(ℓ) are estimated
by Σbf(ℓ), with a fixed q, we adopt the least squares estimators αbj for the coefficients αj,
which are the minimizer of
L ∑ ℓ=q+1 ||vech{Σbf(ℓ)} −α0− q ∑ j=1 αivech{Σbf(ℓ−j)}||2,
where|| · || denotes the Euclidean norm of a vector. The orderq may be determined by, for example, the standard criteria such as AIC or BIC.
3
Asymptotic Theory
First we introduce some notations. Given a p-dimensional vector x= (x1,· · ·, xp)T and a p
byp matrix U= (Uij), define matrix norm as follows,
∥U∥2 = sup{∥U x∥2,∥x∥2 = 1}, ∥x∥2 = ( p ∑ i=1 |xi|2 )1/2 .
Then ∥U∥2 is equal to the square root of the largest eigenvalue of UTU, where UT is the
transpose ofU, and for symmetric U, ∥U∥2 is equal to its largest absolute eigenvalue.
Second we state the following assumptions for the asymptotic analysis.
(A1). We assume all row vectors of AT and Σ0 in factor model (10) obey the sparsity
con-dition (18) below. For a p-dimensional vector x= (x1,· · · , xp)T, we say it is sparse if
it satisfies
p
∑
i=1
|xi|δ ≤C π(p), (18)
where δ∈[0,1), C is a positive constant, andπ(p) is a deterministic function ofpthat grows slowly in p with typical examples π(p) = 1 or log p.
(A2). Assume factor model (10) has fixed r factors, with AT A =I
r, and matrices Σ0 and
Σf in (10) satisfy
∥Σ0∥2 <∞, max
1≤ℓ≤L|Σf(ℓ)[j, j]|=OP(logL), j = 1,· · · , r.
(A3). We impose the following moment conditions on diffusion driftµt= (µ1(t),· · · , µp(t))T
and diffusion variance σt= (σij(t))1≤i,j≤p in price model (1) and micro-structure noise εi(tij) in data model (2): for some β ≥4,
max 1≤i≤p0max≤t≤LE[|σii(t)| β ]<∞, max 1≤i≤p0max≤t≤LE[|µi(t)| 2β ]<∞, max 1≤i≤p0≤maxtij≤L E[|εi(tij)|2β]<∞.
(A4). Each of p assets has at least one observation between τrk and τrk+1. That is, in the construction of ARVM estimator we assume m=o(n), and
C1 ≤ min
1≤i≤p1min≤ℓ≤L
ni(ℓ)
n ≤1max≤i≤p1max≤ℓ≤L ni(ℓ)
n ≤C2, 1max≤i≤p1max≤ℓ≤L1≤maxj≤ni(ℓ)
|tij−ti,j−1|=O(n−1).
(A5). The characteristic polynomial of VAR model (17) has no roots in the unit circle so that it is a casual VAR model.
Remark 1. Condition (A1) together with factor model (10) imply thatΣx(ℓ) are sparse,
which is required to consistently estimateΣx(ℓ) for large p and will be shown by Lemma 2
in Section 6. When δ = 0 in (18), sparsity refers to that there are at most C π(p) number of non-zero coordinates in x= (x1,· · · , xp)T, and matrix sparsity means that each row has
at most C π(p) number of non-zero elements. Sparsity is often a reasonable assumption for large volatility matrices. We may further improve sparsity for the volatility matrices by transformations such as removing the overall market effect and the sector effect. Condition A2 imposes realistic bounded eigenvalues on Σ0 and a logarithm temporal growth on Σf(ℓ)
over [0, L]. AsΣ0is a constant matrix andΣf(ℓ) are small matrices of fixed sizer, Condition
(A2) together with factor model (10) guarantee that the maximum eigenvalue ofΣx(ℓ) is free
ofpand has only order log L, which will be proved in Lemma 1 in Section 6. The logarithm rate in (A2) is rather weak and reasonable, as the maxima of sequences of independent and typically dependent random variables are of a logarithm order. The assumption is to relieve from specifying temporal and cross-section dependence structures on the volatilities over time and across assets. Condition (A3) is the minimal moment requirements for the price process and microstructure noise. (A4) is a technical condition that ensures adequate number of observations between grids and establishes the asymptotic theory for the proposed methodology. (A5) is a standard condition for stationary AR time series.
We establish the asymptotic theory for the proposed models and the associated estimation methods. Sincep,nandLstand for dimension (number of assets), average daily observations,
and the number of days, we let p, n and L all go to infinity in the asymptotics. The two theorems below give the eigenvalue and eigenvector convergence for the difference between
¯
Sx and ¯Sy defined in (12) and (14), respectively.
Theorem 1 Suppose Models (1), (2) and (10) satisfy Conditions (A1)-(A4). As n, p, L all go to infinity, we have
||S¯y −S¯x||2 =OP (π(p) [en(p2L)β1]1−δ log2L),
where en ∼n−1/6 for the noise case and en∼ n−1/3 for the no noise case [i.e. εi(tij) = 0 in
(2)], and threshold ϖ used in (9) is of order en(p2L)
1
β logL.
Theorem 2 Suppose Models (1), (2) and (10) satisfy Conditions (A1)-(A4). Denote the ordered eigenvalues of S¯x by λ1 ≥ · · · ≥λp. Assume that there is a positive constant c such
thatλj−λj+1 ≥cfor j = 1,· · · , r. Let a1,· · · ,ar be the eigenvectors ofS¯x corresponding to
ther largest eigenvalues λ1,· · · , λr. Also set λb1 ≥ · · · ≥bλr be the r largest eigenvalues ofS¯y
and ba1,· · · ,abr the corresponding eigenvectors. Let A = (a1,· · · ,ar) and Ab = (ba1,· · · ,bar).
Then as n, p, L go to infinity, we have
AT Ab −Ir =OP ( π(p) [en(p2L) 1 β]1−δ log2L ) , b Σf(ℓ)−Σf −AT Σ0A=OP ( π(p) [en(p2L) 1 β]1−δ log2L ) ,
where en and ϖ are the same as in Theorem 1, and since the matrices are of fixed size r,
the convergence holds under any matrix norms.
Remark 2. Since en(p2L)
1
β is powers of n, p, L while π(p) log2L depends on p and L
through logarithm and thus is negligible in comparison with [en(p2L)
1
β]1−δ. So the
conver-gence rate is nearly equal to [en(p2L)
1
β]1−δ. In order to consistently estimate the r largest
eigenvalues and their corresponding eigenvectors of ¯Sx we need to makeen(p2L)
1
As en∼n−1/3 for the noiseless case and n ∼n−1/6 for the noise case, en(p2L)
1
β goes to zero
if p2L grows more slowly than nβ/3 for the noiseless case and nβ/6 for the noise case. For reasonably largeβ in moment assumption A3, the consistent requirement can accommodate the scenario when p is comparable to or larger than n. Thus, Theorems 1 and 2 establish the valid theoretical foundation for the proposed methodology in the sense that it yields consistent estimators of the r largest eigenvalues and their corresponding eigenvectors for the factor-based analysis under the large p scenario.
Next we establish asymptotic theory for parameter estimation in the VAR model (17) based on high-frequency data.
Theorem 3 Suppose thatαbi are least squares estimators of αi based on data Σbf(ℓ)from the
VAR model (17) and we denote byα˜i the least squares estimators ofαi based on oracle data
Σf(ℓ)from the same VAR model (17). Then under Conditions (A1)-(A5) and the eigenvalue
assumption of Theorem 2, b α0−α˜0−vech{ATΣ0A}=OP ( π(p) [en(p2L) 1 β]1−δ log2L ) , b αi−α˜i =OP ( π(p) [en(p2L) 1 β]1−δ log2L ) , i= 1,· · · , q. In particular, as n, p, L→ ∞, if π(p) [en(p2L) 1 β]1−δL1/2 log2L→0, then L1/2{αb0−α0−vech(ATΣ0A),αb1−α1,· · · ,αbq−αq}
has the same limiting distribution as L1/2( ˜α0−α0,α˜1−α1· · ·,α˜q−αq).
Remark 3. Theorem 3 shows that the proposed data-driven method of model fitting based on Σbf(ℓ) estimated from high-frequency data can asymptotically achieve the same
result as an oracle that uses true Σf(ℓ) for model fitting. In other words, fitting the VAR
model with the estimated daily volatility factor matrices from high-frequency data can be asymptotically as efficient as that with true daily volatility factor matrices.
Remark 4. We may replace the ARVM estimator used in the first stage by other volatility matrix estimators, for example in Barndorff-Nielsen et al. (2010), Christensen et. al. (2010), Griffin and Oomen (2011), and Zhang (2011). However, these estimators enjoy good properties only for the fixed matrix size p that is very small relative to sample size. When p is allowed to grow with sample size and its magnitude is comparable to sample size, all the estimators become inconsistent. Regularization adjustment such as thresholding is needed to make them consistent. For example, to improve the convergence rate of the ARVM estimator we may use the multi-scale scheme in Fan and Wang (2007, section 4.3) and Zhang (2006) to construct the following multi-scale realized volatility matrix (MRVM) estimator, ˜ Σ∗y(1) = κ ∑ m=1 amΓb Km +ζ(ΓbK1 −ΓbKκ),
whereκ is the integer part of √n,ΓbKm is defined via (3) and (4) as follows,
b ΓKm = 1 Km Km ∑ k=1 ˜ Σy(1,τk) = ( 1 Km Km ∑ k=1 ˜ Σy(1,τk)[i1, i2] ) 1≤i1,i2≤p , Km =m+κ, am = 12(m+κ)(m−κ/2−1/2) κ(κ2−1) , ζ = (2κ)(κ+ 1) (n+ 1)(κ−1).
For fixed p and noisy data, the ARVM estimator ˜Σy(1) in (7) has convergence rate n−1/6,
while the MRVM estimator ˜Σ∗y(1) can achieve the optimal convergence rate n−1/4 [Tao et.
al. (2011)]. However, as p goes to infinity and p and n are comparable, ˜Σ∗y(1) becomes inconsistent. Similar to (9) we need to threshold ˜Σ∗y(1) and obtain
b Σ∗y(1) =Tϖ[ ˜Σ ∗ y(1)] = ( ˜ Σ∗y(1)[i1, i2]1(|Σ˜∗y[i1,i2]|≥ϖ) ) ,
where ϖ is a threshold. Similarly we can define Σb∗y(ℓ) for ℓ = 2,· · · , L. If daily integrated volatility matrices Σx(ℓ) are estimated by Σb
∗
y(ℓ) instead of Σby(ℓ) for performing
eigen-analysis and fitting the matrix factor and VAR models described in Sections 2.3 and 2.4, we expect to obtain the same conclusions as in Theorems 1-3 but withen ∼n−1/4 for the noisy
4
Numerical examples
We illustrate the proposed methodology with two sets of high-frequency data, the tick by tick prices of the 410 stocks traded in Shenzhen Stock Exchange and the 630 stocks traded in Shanghai Stock Exchange over a period of 177 days in 2003. The daily average intraday observations over the 177 days range from 194 to 1384 with overall average 578 for the stocks traded in the Shenzhen market and from 210 to 1620 with overall average 575 for the stocks traded in the Shanghai market.
4.1
Eigen-analysis based on estimated daily integrated volatility
matrices
For each of the 177 days, we compute the estimated daily integrated volatility matrices using TARVM estimator in (9) with grids being selected in accord of 5 minute returns and thresholds being the top five percent of the largest absolute entries. This yields a sequence of 177 matrices of Σby(ℓ), ℓ = 1,· · ·, L = 177, where the daily integrated volatility matrices
for Shenzhen and Shanghai data sets are of sizes 410 by 410 and 630 by 630, respectively. The eigenvalues and eigenvectors of the sample variance matrix ¯Sy are then evaluated, and
the 20 largest eigenvalues, multiplied by 1000, are plotted in Figures 2 and 3 for Shenzhen and Shanghai data sets, respectively. The plots show that the largest eigenvalue for the Shenzhen data and the two largest eigenvalues for the Shanghai data are much larger than the corresponding other eigenvalues, which are in a much smaller magnitude and decrease slowly.
5 10 15 20 Index 0.00 0.05 0.10 0.15 Eigenvalue
(a) The 20 largest eigenvalues
5 10 15 20 Index 0.005 0.010 0.015 Eigenvalue
(b) The 2nd largest to the 20th largest eigenvalues
Figure 2: Plots of the 20 largest eigenvalues of ¯Sy for the data set from Shenzhen Stock
Exchange. (a) The plot of all 20 largest eigenvalues. (b) The plot of the second largest to 20th largest eigenvalues.
5 10 15 20 Index 0.0 0.2 0.4 0.6 0.8 1.0 Eigenvalue
(a) The 20 largest eigenvalues
5 10 15 20 Index 0.010 0.015 0.020 0.025 0.030 0.035 Eigenvalue
(b) The 3rd largest to the 20th largest eigenvalues
Figure 3: Plots of the 20 largest eigenvalues of ¯Sy for the data set from Shanghai Stock
Exchange. (a) The plot of all 20 largest eigenvalues. (b) The plot of the third largest to 20th largest eigenvalues.
0 20 40 60 80 100 0.0 0.2 0.4 0.6 Sample Eigen v alue
(a) The 20 largest eigenvalues over 100 samples for r=1
0 20 40 60 80 100 0.0 0.2 0.4 0.6 Sample Eigen v alue
(b) The 20 largest eigenvalues over 100 samples for r=2
Figure 4: Plots of the 20 largest eigenvalues of ¯Syover 100 simulated samples. The horizontal
axis indicates 100 simulated samples, and the 20 largest eigenvalues of ¯Sy for each sample
are plotted vertically as 20 points. (a) and (b) correspond to the cases ofr = 1 and r = 2, respectively.
4.2
A simulation study on volatility factor selection
Theorems 1 and 2 imply that the eigenvalue difference between ¯Sy and ¯Sx converges in
probability to zero, where ¯Sx has r positive eigenvalues and p−r zero eigenvalues. Thus we
may selectr such that the smallest p−r eigenvalues of ¯Sy are close to 0 while the r largest
eigenvalues are significantly larger. Figures 2 and 3 suggestr= 1 andr= 2 for the data sets from the Shenzhen and Shanghai Exchanges, respectively. We conduct a simulation study below to provide some support for such empirical selection ofr.
In the simulation study we consider two scenarios with r = 1 and r = 2, where p= 410 and L= 177. The simulation proceeds as follows. For the case of r = 1, we generate Σf(ℓ)
from an AR(1) model with mean, AR coefficient and noise variance being (6,0.65,0.3) and then simulate Σx(ℓ) from the matrix factor model (10) with loading matrix A formed by
the eigenvector corresponding to the largest eigenvalue of ¯Sy obtained from the Shenzhen
data. For the case of r = 2, we take Σf(ℓ)[1,2] = Σf(ℓ)[2,1] = 0, and generate Σf(ℓ)[1,1]
and Σf(ℓ)[2,2] from two AR(1) models with mean, AR coefficient and noise variance being
(6,0.65,0.3) and (4,0.5,0.3), respectively, and we simulate Σx(ℓ) from the matrix factor
model (10) with loading matrixA formed by the two eigenvectors corresponding to the two largest eigenvalues of ¯Sy obtained from the Shenzhen data.
We simulate high-frequency price data from model (1) with zero drift by discretizing the diffusion equation,
X(tk) =X(tk−1) +σtk−1[Wtk −Wtk−1],
wheretk =ℓ−1+k/3n,k = 1,· · · ,3n,n= 200,ℓ = 1,· · · ,177, during the period of theℓ-th
day, we takeσtk to beA[Σf(ℓ) + 0.32Zk]
1/2AT,Z
k = (Zk[j1, j2])1≤j1,j2≤r are rbyrmatrices
whose entries Zk[j1, j2] are standard normal random variables with temporal correlation corr(Zk[j1, j2], Zk′[j1, j2]) = exp(−|k −k′|), and zero correlation for different entries, i.e. corr(Zk[j1, j2], Zk′[j1′, j2′]) = 0 for (j1, j2) ̸= (j1′, j2′). Finally, data Yi(tk) are obtained from
model (2) by adding to X(tk) i.i.d. normal noise with mean zero and standard deviation
0.064. We generate non-synchronized data as follows. Grouping together three consecutive time points we divide the 600 time pointstkduring each day into 200 groups{t3j−2, t3j−1, t3j}, j = 1,· · · ,200. For each asset, we select one time point at random from each group; from the simulated 600 values ofYi(tk) we choose 200 values corresponding to the selected time points;
we use the 200 chosen values to form noisy non-synchronized high-frequency data Yi(tj).
We calculate ARVM estimator ˜Σy(ℓ) based on the data in the ℓ-th day and the threshold
estimator Σby(ℓ) as described in Section 2.2. According to the description in Section 2.3 we
compute ¯Sy from Σby(ℓ) and then the eigenvalues and eigenvectors of ¯Sy. We repeat the
whole simulation procedure 100 times. As in Wang and Zou (2010), estimators Σby(ℓ) are
tuned to minimize its estimated mean squares error based on 100 repetitions. Figure 4 plots the 20 largest eigenvalues of ¯Sy over the 100 simulated samples for the cases of r = 1 and r= 2. The plots show that for the case ofr = 1, the largest eigenvalues are clustered around 0.5, and for the case of r = 2, the two largest eigenvalues are fluctuated around 0.5 and 0.4, respectively, and these large eigenvalues are much larger than other eigenvalues in the corresponding cases, where these small eigenvalues are close to zero. Moreover, the clusters in Figure 4 for the 100 simulated samples are apparently quite tight and separate. The simulation results indicate that the largest eigenvalue and the two largest eigenvalues for the respective cases ofr = 1 andr = 2 are significant and hence the selection of volatility factors based on large eigenvalues matches very well with the true values ofr in the corresponding cases.
The daily average intraday observations over the 177 days for the stocks traded in the Shenzhen and Shanghai markets are from around 200 to over 1000. As the simulation results reported above are for the case with 200 intraday observations, we have tried to increase intraday observations from 200 to 600 and 1000 in the simulation study and found the similar
cluster patterns for the eigenvalues. In fact, the eigenvalue clusters become tighter as the number of intraday observations increases.
The procedure in Hansen and Lunde (2005) is used to calculate the noise to signal ratios for the simulated and real data. The average noise to signal ratio over 177 days is found to be 0.009 and 0.002 for the stocks traded in the Shenzhen and Shanghai markets, respectively. Noise standard deviation 0.064 used in the simulation amounts to average noise to signal ratio 0.009. To replicate the noise to signal ratio scenarios in the real data, we reduce the noise to signal ratio in the simulation study by decreasing noise standard deviation from 0.064 to 0.02, which corresponds to average noise to signal ratio from 0.009 to 0.001. Again we have discovered that the eigenvalues exhibit the resembling patterns. Moreover, we find that the smaller the noise standard deviations are, the tighter the eigenvalue clusters are.
We propose a data-dependent method to select m for ARVM estimator defined in (6) and (7) as follows. Let m be the grid number of pre-sampling frequencies τk in (5). To denote the dependence onm, we add superscriptm to daily ARVM estimators given by (6) and (7) and denote them by ˜Σmy (ℓ) = ( ˜Σm
y (ℓ)[i1, i2]) for the ℓ-th day, ℓ = 1,· · · , L. Since
for each (i1, i2), ˜Σmy (ℓ)[i1, i2] is a daily realized co-volatility between assets i1 and i2, we
predict one day ahead daily realized co-volatility by current daily realized co-volatility and use predication errors as a criterion to selectm. Let
Ψ(m) = 1 p2L p ∑ i1=1 p ∑ i2=1 L ∑ ℓ=2 { ˜ Σmy (ℓ−1)[i1, i2]−Σ˜my (ℓ)[i1, i2] }2 .
The value of m is selected by minimizing Ψ(m), and we use the selected value to define ARVM estimator ˜Σmy (ℓ) and evaluate the estimated daily integrated volatility matrices.
4.3
Matrix factor model and VAR model fitting
The patterns exhibited in Figures 2 and 3 and the simulation study lead us to selectr = 1 and
Shenzhen Stock Exchange data withr = 1. LetAb be the eigenvector of ¯Sy corresponding to
the largest eigenvalue. We then evaluate the factor volatility sequenceΣbf(ℓ) =AbTΣby(ℓ)A,b ℓ = 1,· · · , L= 177, which is now a univariate time series. An AR(3) model, selected from PACF together with AIC and BIC, is fitted to the time series Σbf(ℓ). Figure 5 displays the
time series plots and the ACF plots of both the original time seriesΣbf(ℓ) and the residuals
resulted from the AR(3) fitting. It shows that the factor model and also the AR(3) model for factors provide reasonably good fittings to the data.
Now we move to the analysis of the Shanghai Stock Exchange data with r = 2. The estimator Ab of factor loadings A is taken to be the 2×630 matrix consisting of the two eigenvectors of ¯Sy corresponding to the two largest eigenvalues. Now the daily factor
volatil-ities Σbf(ℓ) =AbT Σby(ℓ)A,b ℓ = 1,· · · , L= 177, is a series of 2×2 matrices.
Take the two diagonal elements and one off-diagonal element fromΣbf(ℓ) to form trivariate
time series vech{Σbf(ℓ)}, which is plotted in Figure 6. We fit vech{Σbf(ℓ)}to the VAR model
and use AIC and BIC criteria to select its order q.
The fitting yields a VAR model of order q= 2 with the estimated coefficients
b α0 = 0.008 0.003 0.008 , αb1 = 0.016 0.099 0.162 −0.232 −0.396 0.822 −0.407 −0.747 1.218 , αb2 = 0.523 1.295 −0.981 0.109 0.262 −0.203 0.387 0.961 −0.649
and the estimated innovation covariance matrix
0.0045 −0.0011 0.0010 −0.0011 0.0006 0.0002 0.0010 0.0002 0.0007 .
The ACFs of vech{Σbf(ℓ)} plotted in Figure 7 show that the factor volatility series are
highly correlated. Figure 8(a-c) displays the residuals resulted from above model fitting, whose ACFs are plotted in Figure 9. These plots indicate that the VAR(2) model provides
(a) Factor volatility Time F actor v olatility 0 50 100 150 0.00 0.04 0.08 0.12 0 5 10 15 20 0.0 0.4 0.8 Lag A CF
(b) ACF of factor volatility
5 10 15 20 0.0 0.2 0.4 0.6 Lag P ar tial A CF
(c) PACF of factor volatility
(d) Standardized residuals Time Residuals 0 50 100 150 −0.05 0.00 0.05 0.10 0 5 10 15 20 0.0 0.4 0.8 Lag A CF
(e) ACF of residuals
5 10 15 20 −0.15 −0.05 0.05 0.15 Lag P ar tial A CF (f) PACF of residuals
Figure 5: Fitting Shenzhen data: (a) time plot of factor volatility series, (b) ACF of factor volatility series, (c) PACF of factor volatility series, (d) time plot of the residuals from the AR(3) fitting, (d) ACF of the residuals, and (e) PACF of the residuals.
adequate fit to the data. 0 50 100 150 Time 0.00 0.04 0.08 Factor volatility
(a) Component 1 of factor volatility
0 50 100 150 Time 0.00 0.04 0.08 0.12 Factor volatility
(b) Component 2 of factor volatility
0 50 100 150 Time 0.00 0.10 0.20 Factor volatility
(c) Component 3 of factor volatility
Figure 6: Time plots for vech(Σbf) for the Shanghai Stock Exchange data. (a) and (b)
correspond to the first and second diagonal elements of Σbf, respectively, with (c) for the
off-diagonal element of Σbf.
5
Conclusions
In this paper, we have proposed a novel approach to model the volatility and co-volatility dynamics of daily returns for a large number of financial assets based on high-frequency intraday data. The core of the proposed method is to impose a matrix form of factor model
0 5 10 15 20 0.0 0.2 0.4 0.6 0.8 1.0 Lag ACF factor1 0 5 10 15 20 0.0 0.2 0.4 0.6 0.8 1.0 Lag fct1 & fct2 0 5 10 15 20 0.0 0.2 0.4 0.6 0.8 1.0 Lag fct1 & fct3 −20 −15 −10 −5 0 0.0 0.2 0.4 0.6 0.8 1.0 Lag ACF fct2 & fct1 0 5 10 15 20 0.0 0.2 0.4 0.6 0.8 1.0 Lag factor2 0 5 10 15 20 0.0 0.2 0.4 0.6 0.8 1.0 Lag fct2 & fct3 −20 −15 −10 −5 0 0.0 0.2 0.4 0.6 0.8 1.0 Lag ACF fct3 & fct1 −20 −15 −10 −5 0 0.0 0.2 0.4 0.6 0.8 1.0 Lag fct3 & fct2 0 5 10 15 20 0.0 0.2 0.4 0.6 0.8 1.0 Lag factor3
Figure 7: ACF plots of the corresponding factor volatility vech(Σbf) displayed in Figure 6
for the data set from Shanghai Stock Exchange. The three plots on diagonal correspond to the ACFs of three factor volatility components with off-diagonal plots for their cross ACFs.
0 50 100 150 Time -0.04 0.00 0.04 Residuals
(a) Component 1 of residuals
0 50 100 150 Time 0.00 0.05 0.10 Residuals (b) Component 2 of residuals 0 50 100 150 Time -0.05 0.05 0.15 Residuals (c) Component 3 of residuals
Figure 8: Time plots of the residuals resulted from a VAR(2) fitting to vech(Σbf) for the
Shanghai Stock Exchange data. (a) and (b) correspond to the first and second diagonal elements ofΣbf, respectively, and (c) to the off-diagonal element of Σbf.
0 5 10 15 20 −0.5 0.0 0.5 1.0 Lag ACF residual1 0 5 10 15 20 −0.5 0.0 0.5 1.0 Lag rsd1 & rsd2 0 5 10 15 20 −0.5 0.0 0.5 1.0 Lag rsd1 & rsd3 −20 −15 −10 −5 0 −0.5 0.0 0.5 1.0 Lag ACF rsd2 & rsd1 0 5 10 15 20 −0.5 0.0 0.5 1.0 Lag residual2 0 5 10 15 20 −0.5 0.0 0.5 1.0 Lag rsd2 & rsd3 −20 −15 −10 −5 0 −0.5 0.0 0.5 1.0 Lag ACF rsd3 & rsd1 −20 −15 −10 −5 0 −0.5 0.0 0.5 1.0 Lag rsd3 & rsd2 0 5 10 15 20 −0.5 0.0 0.5 1.0 Lag residual3
Figure 9: ACF plots of the corresponding three residual components in Figure 8 for the data set from Shanghai Stock Exchange. The three plots on diagonal correspond to the ACFs of three residual components with off-diagonal plots for their cross ACFs.
on the sparse versions of realized volatility estimators obtained via thresholding. The fitting of the factor model boils down to an eigen-analysis for a non-negative definite matrix, and therefore is feasible with an ordinary PC when the number of assets is in the order of a few thousands. The asymptotic theory is developed in the manner that the number of assets, the numbers of intraday observations and the number of days concerned go to infinity all together. Numerical illustration with intraday prices from both Shenzhen and Shanghai markets indicates that the factor modeling strategy works effectively as the daily volatility dynamics of all the assets in those two markets was driven by one (for Shenzhen) or two (for Shanghai) common factors.
As far as we are aware, this work represents the first attempt to use high-frequency data to model ultra-high dimensional volatility matrices and combine high-frequency volatility matrix estimation with low-frequency volatility dynamic models. While the approach yields new volatility estimation and prediction procedures that are better than methods only based on either high-frequency volatility estimation or low-frequency volatility dynamic modeling, we leave some open issues as well as a number of important future research topics. For example, volatility factors are important both statistically and economically, it is desirable to have data driven methods to select the number of significant factors for fitting the VAR model. The ARVM estimator is used to estimate daily volatility matrices and perform eigen-analysis in Sections 2.2 and 2.3, it is very interesting and challenging to investigate the performance of the methodology when other volatility matrix estimators instead of the ARVM estimator are employed. Large volatility matrix prediction is another important research topic. For example, the fitted matrix factor and VAR(2) models obtained from Shanghai market data can be used to forecast future integrated volatility matrix by first predicting h-step ahead factor volatility Σf(L+h) from the derived VAR(2) model and
matrixΣx(L+h). However, for the prediction of large volatility matrices, we need to properly
gauge the predict error and investigate the impact of matrix size on the prediction.
6
Appendix: Proofs of Theorems
Besides matrix norm, we need other two ℓd norms. Given a p-dimensional vector x =
(x1,· · · , xp)T and a pby pmatrix U= (Uij), define their ℓd-norms as follows,
∥x∥d= ( p ∑ i=1 |xi|d )1/d , ∥U∥d= sup{∥U x∥d,∥x∥d= 1}, d= 1,2,∞.
Note the facts that ∥U∥2 is equal to the square root of the largest eigenvalue of UT U,
∥U∥1 = max 1≤j≤p p ∑ i=1 |Uij|, ∥U∥∞= max 1≤i≤p p ∑ j=1 |Uij|, and ∥U∥22 ≤ ∥U∥1∥U∥∞.
For symmetric U, ∥U∥2 is equal to its largest absolute eigenvalue, and ∥U∥2 ≤ ∥U∥1 =
∥U∥∞. Denote by C generic constant whose value may change from appearance to appear-ance.
Before proving theorems we need to establish six lemmas. Lemmas 1 and 2 show that Condition A2 gives an order for ∥Σx(ℓ)∥2 while Condition A1 together with A2 guarantee
sparsity for allΣx(ℓ).
Lemma 1 Assumption A2 implies that the maximum eigenvalue of Σx(ℓ) are bounded
uni-formly over ℓ= 1,· · · , L, that is,
max
Proof. From factor model (10) and sub-multiplicative property of norm∥·∥2 (i.e. ∥U V∥2 ≤
∥U∥2∥V∥2 for matrices Uand V), we have
∥Σx(ℓ)∥2 ≤ ∥AΣf(ℓ)AT +Σ0∥2 ≤ ∥A∥2∥Σf(ℓ)∥2∥AT∥2+∥Σ0∥2 ≤r2 r ∑ j=1 Σf(ℓ)[j, j] +∥Σ0∥2,
where we use the facts that since ∥AT∥
2,∥A∥2 ≤ trace(A AT) = trace(AT A) = r, and
∥Σf(ℓ)∥2 ≤trace(Σf(ℓ)) =
∑r
j=1Σf(ℓ)[j, j]. The lemma is a direct consequence of
Assump-tion A2.
Lemma 2 Assumptions A1 and A2 imply sparsity for Σx(ℓ) uniformly over ℓ = 1,· · ·, L,
that is,
p
∑
j=1
|Σx(ℓ)[i, j]|δ ≤M π(p, L), i= 1,· · · , p, ℓ= 1,· · · , L, (19)
where M is a positive random variable, π(p, L) = π(p) logδL, and δ and π(p) are given as in Assumption A1.
Proof. First we give an inequality that for anyy1,· · ·, ym,
( m ∑ j=1 |yj| )δ ≤ m ∑ j=1 |yj|δ. (20) Takewj =|yj|/ ∑m j=1|yj|. Then ∑m j=1wj = 1, 0 ≤wj ≤1, and w δ j ≥ wj. The inequality is proved as follows, m ∑ j=1 wjδ≥ m ∑ j=1 wj = 1.
Inequality (20) indicates that the sum of two sparse matrices are also sparse. Thus with condition A1 and (10) it is enough to show thatA Σf(ℓ)AT is sparse for ℓ= 1,· · · , L.
LetA = (aij),Σf(ℓ) = (Σf(ℓ)[i, j]),U=A Σf(ℓ)AT = (uij), andG= max{|Σf(ℓ)[i, j]|, ℓ=
Hence, |uij|δ = r ∑ h=1 r ∑ k=1 aihΣf(ℓ)[h, k]ajk δ ≤ r ∑ h=1 r ∑ k=1 |aihΣf(ℓ)[h, k]ajk|δ ≤Gδ r ∑ h=1 r ∑ k=1 |aihajk|δ, p ∑ j=1 |uij|δ ≤Gδ r ∑ h=1 r ∑ k=1 |aih|δ p ∑ j=1 |ajk|δ ≤r2C Gδπ(p), (21)
where the last inequality is from the facts that the elements of A are bounded by 1 and the column vectors ofAobey (18). AsG=OP(logL), the boundr2C Gδπ(p) on the right hand
side of (21) can be expressed asM π(p, L).
The next lemma derives the summation results under the established sparsity in Lemma 2. Lemma 3 The sparsity established in Lemma 2 for allΣx(ℓ)infers that for any fixed a >0,
max 1≤ℓ≤L1max≤i≤p p ∑ j=1 |Σx(ℓ)[i, j]|1(|Σx(ℓ)[i, j]| ≤aϖ) =OP(π(p, L)ϖ1−δ), (22) max 1≤ℓ≤L1max≤i≤p p ∑ j=1 1(|Σx(ℓ)[i, j]| ≥aϖ) = OP(π(p, L)ϖ−δ). (23)
Proof. With simple algebraic manipulations we obtain max 1≤ℓ≤L1max≤i≤p p ∑ j=1 |Σx(ℓ)[i, j]|1(|Σx(ℓ)[i, j]| ≤aϖ) ≤(aϖ)1−δ max 1≤ℓ≤L1max≤i≤p p ∑ j=1 |Σx(ℓ)[i, j]|δ1(|Σx(ℓ)[i, j]| ≤aϖ) ≤(aϖ)1−δ max 1≤ℓ≤L1max≤i≤p p ∑ j=1 |Σx(ℓ)[i, j]|δ ≤(aϖ)1−δM π(p, L) =OP(π(p, L)ϖ1−δ),
which proves (22). (23) is proved as follows, max 1≤ℓ≤L1max≤i≤p p ∑ j=1 1(|Σx(ℓ)[i, j]| ≥aϖ)≤ max 1≤ℓ≤L1max≤i≤p p ∑ j=1 ( |Σx(ℓ)[i, j]| aϖ )δ 1(|Σx(ℓ)[i, j]| ≥aϖ) ≤(aϖ)−δ max 1≤ℓ≤L1max≤i≤p p ∑ j=1 |Σx(ℓ)[i, j]|δ ≤(aϖ)−δM π(p, L) = OP(π(p, L)ϖ−δ).
Next two lemmas are results about ARVM estimator ˜Σy(ℓ) that we need later to establish
Lemma 4 Under Models (1)-(2) and Conditions A3-A4 we have for all 1 ≤ i, j ≤ p and
1≤ℓ ≤L,
E(|Σ˜y(ℓ)[i, j]−Σx(ℓ)[i, j]|β)≤C eβn, (24)
where C is a generic constant free of n, p and L, and the convergence rate en is specified as en∼n−1/6 for the noise case and en ∼n−1/3 for the noiseless case [i.e. εi(tij) = 0 in (2)].
Proof. The lemma is a consequence of applying Theorem 1 in Wang and Zou (2010) to the current set-up.
Lemma 5 Under conditions A1-A4, we have
max 1≤ℓ≤L1≤maxi,j≤p| ˜ Σy(ℓ)[i, j]−Σx(ℓ)[i, j]|=OP(en(p2L) 1 β) = o P(ϖ), (25) P ( max 1≤ℓ≤L1max≤i≤p p ∑ j=1 1{|Σ˜y(ℓ)[i, j]−Σx(ℓ)[i, j]| ≥ϖ/2}>0 ) =o(1), (26) max 1≤ℓ≤L1≤maxi≤p p ∑ j=1 1(|Σ˜y(ℓ)[i, j]| ≥ϖ,|Σx(ℓ)[i, j]|< ϖ) = OP(π(p)ϖ−δ), (27) where ϖ is as in Theorem 1. Proof. Takingd=d1en(p2L) 1
β and applying Markov inequality and (24), we have
P ( max 1≤ℓ≤L1≤maxi,j≤p| ˜ Σy(ℓ)[i, j]−Σx(ℓ)[i, j]|> d ) ≤ L ∑ ℓ=1 p ∑ i,j=1 P ( |Σ˜y(ℓ)[i, j]−Σx(ℓ)[i, j]|> d) ≤ Cp2Leβn dβ = C dβ1 →0,
asp, n, L→ ∞ and then d1 → ∞. This proves (25), using which we can obtain
P ( max 1≤ℓ≤L1max≤i≤p p ∑ j=1 1{|Σ˜y(ℓ)[i, j]−Σx(ℓ)[i, j]| ≥ϖ/2}>0 ) ≤P ( max 1≤ℓ≤L1max≤i,j≤p| ˜ Σy(ℓ)[i, j]−Σx(ℓ)[i, j]| ≥ϖ/2 ) ≤ 2βp2LCeβn ϖβ = 2βC logβL →0,
asn, p, L→0, which proves (26). Then we apply (23) and (26) to show (27) as follows. max 1≤ℓ≤L1max≤i≤p p ∑ j=1 1(|Σ˜y(ℓ)[i, j]| ≥ϖ,|Σx(ℓ)[i, j]|< ϖ) ≤ max 1≤ℓ≤L1max≤i≤p p ∑ j=1 1(|Σ˜y(ℓ)[i, j]| ≥ϖ,|Σx(ℓ)[i, j]| ≤ϖ/2) + max 1≤ℓ≤L1max≤i≤p p ∑ j=1 1(|Σ˜y(ℓ)[i, j]| ≥ϖ, ϖ/2<|Σx(ℓ)[i, j]|< ϖ) ≤ max 1≤ℓ≤L1≤maxi≤p p ∑ j=1 1(|Σ˜y(ℓ)[i, j]−Σx(ℓ)[i, j]| ≥ϖ/2) + max 1≤l≤L1≤maxi≤p p ∑ j=1 1(|Σx(ℓ)[i, j]|> ϖ/2) ≤oP(1) + 2δM π(p, L)ϖ−δ =OP(π(p, L)ϖ−δ).
Next lemma provides the convergence rate for TARVM estimator Σby(ℓ) under matrix
norm uniformly over allℓ.
Lemma 6 Under conditions A1-A4 we have
max 1≤ℓ≤L|| b Σy(ℓ)−Σx(ℓ)||2 =OP(π(p, L)ϖ1−δ) =OP(π(p)[en(p2L) 1 β]1−δ logL),
where en and ϖ are as in Theorem 1.
Proof. Using the relationship between ℓ2 and ℓ∞ norms and triangle inequality, we have
max 1≤ℓ≤L|| b Σy(ℓ)−Σx(ℓ)||2 ≤ max 1≤ℓ≤L|| b Σy(ℓ)−Σx(ℓ)||∞ ≤ max 1≤ℓ≤L|| b Σy(ℓ)− Tϖ[Σx(ℓ)]||∞ | {z } I + max 1≤ℓ≤L||Tϖ[Σx(ℓ)]−Σx(ℓ)||∞ | {z } II . Lemma 3 implies II = max 1≤ℓ≤L1max≤i≤p p ∑ j=1 |Σx(ℓ)[i, j]|1(|Σx(ℓ)[i, j]| ≤ϖ) = OP(π(p, L)ϖ1−δ).