Post-Estimation Diagnostics - Markov-Switching ACI Model

3.3 Markov-Switching ACI Model

3.3.4 Post-Estimation Diagnostics

The residual series {εˆi}i=1:T is often used to perform diagnostic tests for the original ACI model. The main property exploited here is that according to the RTCT,

{εî}i=1:T follows an i.i.d. unit exponential process if the model specification is correct. As a result, diagnostic tests of the original ACI model usually involve testing the unit exponentiality of{εî}i=1:T and the presence of autocorrelation in the series {εî}i=1:T. Since the RTCT applies to the general intensity process of point processes, this applies to the MS-ACI model as well and similar tests can be constructed for our MS-ACI model accordingly. These tests, however, are not directly applicable to the MS-ACI model with only parameter estimates ϑˆ from the SAEM algorithm. This is because a state vector _S is required for the residual series to be calculated, which implies that without conditioning on a particular realization of the state vector, the diagnostic tests cannot be performed due the inability to obtain {εî}i=1:T. Thanks to the estimation of the most probable state sequence, we are able to obtain the conditional residual series {εî|Sˆ}i=1:T by plugging in the estimated (most probable) state sequence _Sˆ in the calculation of the residual series. Similar tests such as the Ljung-Box test for autocorrelation or some empirical density function tests for unit exponentiality can be performed on {εî|Sˆ}i=1:T, which renders model diagnostics and comparison possible for our MS-ACI model. Certainly, one can perform the diagnostic tests on some residual series conditioning on any arbitrary _S, but the estimated state vector _Sˆ ensures that the corresponding conditional residual series

{εˆi|Sˆ}i=1:T has the largest complete likelihood in the state vectors drawn for its estimation. Thus, the test performances for the residual series conditional on the estimated state vector are likely to be better compared to conditioning on an arbitrary_S. For the original ACI model, the maximized log-likelihood also provides important insights in model comparisons and can be used to construct a likelihood ratio test for nested models. In the MS-ACI case, however, the maximized observed log-likelihood is not available because the SAEM algorithm does not evaluate it directly, and the integral over all realizations of _S is intractable due to the dimensionality of_S. The conditional log-likelihood of the MS-ACI model lnL(ϑˆ;_Y|_Sˆ) does provide some information on the goodness-of-fit of the model, but is not decisive due to the lack of theoretical foundations to construct a likelihood-based test.

The residual tests described above assess the appropriateness of applying the ACI framework. To be specific, the autocorrelation tests examine whether the autore- gressive structure Φ˜i is correctly specified, and the empirical density tests assess the goodness-of-fit of the baseline specification. These tests, however, do not provide a

method to evaluate the validity of the regime-switching structure. Methods to assess the contribution of an extra regime to the model estimation is important because the inclusion of another regime can lead to over-parametrization which reduces the efficiency of parameter estimation when the data generating process (DGP) does not possess the regime-switching property assumed by the model. In a Bayesian framework, this can be performed by computing the Bayes factor from the marginal likelihood (Bauwens, Dufays, and Rombouts (2014) develop the algorithm for the MS- GARCH model, which can be applied to our case if a Bayesian approach is pursued). In the frequentist’s approach, this is usually accomplished by the likelihood ratio test for the nested models, which is not available due to the intractable observed likelihood. To provide a descriptive statistic that reflects the validity of the existence of M regimes, we focus on the T×M posterior probability matrix P∆ _{conditioning on the}

estimated parameters ϑˆ and the estimated state vector Sˆ. The element at the inter-

section of rowiand columnminP∆_{, denoted by}P∆

i,m=p∆(si=m|sˆ1:i−1,sˆi+1:T,Y,ϑˆ), is the posterior probability of the i-th state being classified as statem conditioning on sˆ1:i−1, sˆi+1:T, Y and ϑˆ calculated similarly as in (C.7) with the truncation lag

∆ being the last adapted ∆ in the estimation. Based on this matrix, we construct

a statistic named the ‘Significance of Regimes’ (SoR hereafter) which serves as an indicator of the overall significance of the regime-switching structure. It is calculated as follows: SoR=T−1 T

∑

i=1 max m∈MP ∆ i,m. (3.42)

Intuitively, SoR is the average of the largest probability in every row of P∆_{. It}

measures the average (conditional) probability of each state being classified in the most probable states. The rationale behind this statistic is that, assuming the DGP consists ofM distinct regimes with densities far apart from each other (which is often assumed in the MS-GARCH literature), the probability of any observation being classified into its corresponding true state will be close to one. On the contrary, when all M densities are identical, all the elements in the matrixP∆ _{reduce to}_M−1_{. This} gives the range of SoR, and measures the significance of the existing regimes, with 1 being very significant for all the regimes and M−1 being not significant at all. The SoR allows easy comparisons across models with different number of regimes and baseline specifications. Moreover, we can calculate SoR for each regime to compare their relative significance. The SoR for the l-th regime is defined as:

SoR(l) =T−1 T

∑

i=1 max m∈MP ∆

i,m1l{max arg

m∈M (P

∆

In document Point process based high frequency volatility estimation:theory and applications (Page 129-131)