MIDAS Models - Some applications of mixed data sampling regression models

The model discussed in this section is the natural multivariate extension of the univariate Mixed Data Sampling Regression Models (MIDAS) proposed by Ghysels, Santa- Clara and Valkanov (2004a) mixed with the factor model approach. The MIDAS class of models allows us to extract additional information from the fact that the data of interest usually have lower frequency than the data available to the researcher. Thus, theoret- ically, the projection of lower-frequency data on higher-frequency data should lead to an increase in estimation efficiency compared to the case of projecting lower- on lower- frequency data. However, this approach cannot be directly implemented. Consider the following example. A researcher wants to project the next day realized volatility (daily squared returns) on one day of five-minute squared returns. “Direct” projection requires 288 parameters to estimate. Instead, the authors propose parameterizing

the set of parameters by some reasonable function. The idea behind this approach is well known in distributed lag models (see, for example, Judge et al. (1985)). One of the problems arising in such a class of models is the proper ”superparameterization” of parameters. Following the original paper, two possible parameterizations will be considered: Beta Lag Polynomial and Exponential Almon Lag. For completeness of presentation, formulae for them are provided below.

The beta polynomial is given by

aj(a, b) = f(_jmaxj , a, b) Pjmax j=1 f( j jmax, a, b) (4.4)

where f(x, a, b) =xa−1(1−x)b−1/B(a, b) and B(a, b) = Γ(a)Γ(b)/Γ(a+b). The exponential Almon lag polynomial is given by:

aj(κ1, κ2, ..., κQ) =

eκ1j+...+κQjQ

Pjmax

j=1 eκ1j+...+κQj

Q (4.5)

Both of them have important features for volatility prediction purposes. Both are strictly positive, which is required fora.s. positive definiteness of the estimated volatility; both allow for equal weights (a = b = 1 and κ1 = κ2 = ... = κQ = 0), which corresponds to a rolling sample estimator of the volatility, and both can have a slowly decaying pattern that is typical of a volatility filter. Specification 4.5 allows us to have more than 2 superparameters to govern the behavior of the polynomial. Specification 4.4 is more stable.

4.3.1 Diagonal Factor MIDAS

The model proposed in this section is created by combining a factor auto regressive conditional heteroskedasticity (FARCH) model with the idea of mixed frequencies proposed by Ghysels, Santa-Clara and Valkanov (2004a). This model aims, first, to create

a parsimonious model with a small number of parameters and, second, to incorporate high-frequency data now available to the researcher. The proposed model assumes that returns fromk assets between time t and t+h are distributed conditionally multivariate normal with mean zero and conditional covariance matrix Ht+h,t. The assumption about zero mean can also be satisfied by prefiltering the returns of the interest.

rt+h,t|Ft−1 ∼N(0, Ht+h,t)

rt+h,t= Λft+h,t+t+h,t

Ht+h,t≡ΛFt+h,tΛ0+ Σ

whereFt+h,t is the m×m conditional covariance of observable factors, Λ is then×m factor loading matrix, and Σ is the n×n constant covariance of idiosyncratic noise. Here we assume that Σ is diagonal. The log-likelihood function can be written

L=−1 2 T X t=hτ (klog(2π) + log(|Ht+h,t|) +rt0+h,tHt−+1h,trt+h,t) (4.6)

The main departure from the “standard” factor model is the use of the different frequencies on the estimation stage. In the basic setup it is assumed that the returns weights vector of the corresponding factork, sk, is known. It is assumed that the first factor is “the market”, i.e., s1 =ι/k, where ι is 1×k vector of ones. The other factors

are constructed using the errors from the linear projection of the individual returns on the market factor. All of them are constructed to be mutually orthogonal in the unconditional sense. Further, it is assumed that orthogonality holds for the conditional variance-covariance matrix of factors. The procedure proposed is the following,

1. Construct high-frequency factors

fkt+j/l=s0kr hf t+j/l

2. Construct daily realized variation of the factors Qf_kt₊₁_,t

Qf_kt₊₁_,t = l

i=1

(f_kthf₊_i/l)2

3. Estimate the model by quasi-maximum likelihood (Eq. 4.6) under the assumption that Ft+h,t is diagonal with diagonal elements

{Ft+h,t}k =µhk+φhk jXmax

j=1

b(j, θk)Qkt−j+1,t−j

So, in matrix form,

Ft+h,t=       µh 1 0 . .. 0 µh m      + jXmax j=1       φh 1b(θh1, j)Q1t−j+1,t−j 0 . .. 0 φh mb(θmh, j)Qmt−j+1,t−j      

It is obvious that Ft+h,t is positive definite as soon as µhk >0, φhk >0,∀k.

It has long been recognized that volatility tends to react more to negative returns than to positive returns. Nelson (1991a) and Engle and Ng (1993) show that GARCH models that incorporate this asymmetry perform better in forecasting the market variance. In addition, as pointed out in Ghysels, Santa-Clara and Valkanov (2006a), asym- metric MIDAS specification allows us to test the persistence of negative and positive

price shocks. So, the natural extension for the multivariate Factor MIDAS is {F_tasy₊_h,t_}k = " µh_k++φh_k+ j_Xmax j=1 b(j, θ+_k)Qkt−j+1,t−j # I_{f1t,t−1>0}+ (4.7) " µh_k−+φh_k− jXmax j=1 b(j, θ_k−)Qkt−j+1,t−j # I_{f1t,t−1≤0} (4.8)

4.4 Data

Our empirical analysis is based on 22 stocks that were included in the DJ index from April 1993 till December 2002.3 _{These data are partially used in Ghysels, Santa-Clara}

and Valkanov (2004a). The number of observations of these stocks is sufficient to make use of high-frequency data and to see the performance of the MF-MIDAS in relatively large portfolio analysis. All data returns are reported from 9:30 am to 4:00 pm every trading day. The returns for some days are removed from the sample to avoid the inclu- sion of regular and predictable market closures which affect the volatility dynamics. For constructing the dataset, we follow the methods used by Andersen, Bollerslev, Diebold and Labys (2001a), who use a similar five-minute dataset of returns from the foreign exchange market. The final dataset contains 2260 trading days with 79 observations per day for a total of 178,540 observations. Daily returns from the same dataset are constructed by summing up intradaily returns, i.e., rt,t−1 = Pm_i₌₁rt−1+i

m,t−1+i−1m . By

the logic of mixed-frequency regressions, we will estimate conditional variance models based on two time horizons, 5 days and 10 days. The factors follow univariate MIDAS

3_{The stocks considered are: AT&T Corporation (T), The Coca-Cola Company (KO), E.I. DuPont}

de Nemours (DD), Eastman Kodak Company (EK), General Electric Company (GE), General Motors Corporation (GM), International Business Machines Corp. (IBM), Altria Group, Inc. (MO), United Technologies Corporation (UTX), The Procter & Gamble Co. (PG), Caterpillar Inc. (CAT), The Boeing Company (BA), International Paper Company (IP), 3M Company (MMM), Merck & Co., Inc. (MRK), JPMorgan Chase & Co. (JPM), Alcoa Inc. (AA), The Walt Disney Company (DIS), Mc- Donald’s Corporation (MCD), American Express Company (AXP), Honeywell International (HON), and Exxon Mobil Corporation (XOM).

regressions with the restricted Beta lag structure. We use a truncation of 50 daily lags in the estimation.4 _{The models are estimated with quasi-maximum likelihood. Non-}

overlapping prediction horizons are used to eliminate autocorrelation in the residuals due to overlapping prediction horizons.

In document Some applications of mixed data sampling regression models (Page 151-156)