5 Simulation Study





Σ_X(i)Σ^>_X(i)K_Y,M^>

Σ_X(i)Σ^>_X(i)K_Z,1^> (i)







The explicit form of conditional mean and variance allows to sample from the filtering distribution of Xt+1directly. Finally, resample particles Xt+∆:t+(M −1)∆ using the same weights and piece it together with X_t+1to form the smoothed distribution of X_t+∆:t+1. Given the smoothed state paths, sufficient statistics can be updated via

s^X_t+1 = S_X(s^X_{t+(M −1)∆}, Xt, Xt+1, λ^X_t+1), s^Y_t+1 = S_Y(s^Y_t , X_t+∆:t+1),

s^Z_t+1 = S_Z(s^Z_{t+(M −1)∆}, Xt+1, λt+1), s^λ_t+1 = S_λ(s^λ_{t+(M −1)∆}, λt+1), s_t+1 = (s^X_t+1, s^λ_t+1, s^Y_t+1, s^Z_t+1),

and the parameter posterior p(Θ|s_t+1) follows immediately.

We perform a simulation study to explore the performance of our mixed-frequency particle filter and learning and to compare it to naive approaches to fix the mixed frequency problem such as, e.g., temporal aggregation.

5.1 Model Specification

For our analysis, we choose a simplified MFCDLMRS specification, which incorporates mixed-frequency observations and exhibits latent states with regime shifts. In particular, we consider a two-state regime-switching model with a latent state vector (X, λ). The auxiliary state λ is a two-state Markov chain with state space I = {1, 2} and transition probability (π_ij)_i,j∈I. At each state i ∈ I, the re-alizations of X are independent and normally distributed with mean K_X(i) and variance Σ²_X(i). At the end of each unit period, we observe the sum of X per unit time, defined as Y . Each quarter, we also observe a noisy signal of X, denoted by Z. The measurement error ^Z is independent, identically distributed with zero mean and variance Σ²_Z. Summarizing our model specification, which we label by M₀, we have:

where M = 1/∆ = 4 is the sampling frequency.

[Table 1 about here.]

5.2 Estimation Results

For the data generating process we use M₀ with model inputs and parameters reported in Table1.⁹ We assume the initial state λ₀ to be equal to 1 and latent. We simulate a joint sample path of, say, yearly data up to time T = 50. Hence, the dataset for estimation consists of 50 yearly observations of Y and of 200 quarterly observations of Z. We consider both particle filtering (PF) and particle learning (PL), and forward smoothing (FS) and backward smoothing (BS) for both methods. We denote them by PFFS, PFBS, and PLFS. Particle learning via the backward smoothing utilizes

9Note that when Σ is a scalar, the normal-inverse-Wishart distribution, which we assumed in Section4.1, becomes a normal-inverse-gamma distribution (up to a slight variation of the degree of freedom), which we denote by N IG.

Also, when λ takes only two values, the Dirichlet distribution degenerates to a Beta distribution.

an additional mixed-frequency particle filter via either forward or backward smoothing. Hence, we denote them by PLBSFS and PLBSBS, respectively. For comparison, we also implement particle learning under the conditional independence assumption as inCarvalho et al.(2010), which we refer to by CJLP. The conditional independence assumption leads to an efficient, but potentially biased estimator.¹⁰

For the particle filters, we use a maximum likelihood estimator (MLE). In particular, we maximize the joint marginal likelihood

Θ = argmaxb

L(Θ; Y^T),

with the marginal likelihood computed using the filtered particles as

L(Θ; Y^T) = p(Y^T|Θ) = p(Y_∆|Θ) ·

In Table1we summarize the MLE estimates.¹¹ PFFS and PFBS perform reasonably well. PFFS provides more accurate estimates of the transition probability (π_ij)_i,j∈I and variances Σ²_X(1) and Σ²_Z, while PFBS performs better in estimating the remaining parameters, although it uses a smaller sample size than PFFS. However, both methods fail to accurately estimate Σ²_X(1).

We next explore the performance of particle learning. For comparison, we also consider analytic learning (AL), which assumes the latent state vector (X, λ) to be observable and, hence, can be conducted without particle approximations. Table 2, Panel B reports the estimation results under the prior specifications listed in Panel A. We observe that all methods perform reasonably well in

10Under conditional independence, p^N(Xt+(M −1)∆|Θ⁽ⁱ⁾, Y^{t+(M −1)∆}) and p^N(Xt+(M −1)∆|Y^{t+(M −1)∆}) are identical.

As p^N(X_{t+(M −1)∆}|Y^{t+(M −1)∆}) is an immediate result from particle learning, the forward filter previously necessary to compute p^N(Xt+(M −1)∆|Θ⁽ⁱ⁾, Y^{t+(M −1)∆}) is no longer needed.

11MLE via particle filtering iterates over parameters in search for an optimum. Both PLBSFS and PLBSBS require a forward filter to complete the backward smoothing. Therefore, we set the number of particles for these algorithms to be relatively small.

identifying both the transition probabilities (π_ij)i,j∈I and the mean K_X. However, some notable differences exist for the estimates of the variances. Unsurprisingly, AL generates the most accurate estimates, but even AL struggles to match the true value for Σ²_X(1). All the other methods fail to match particularly Σ²_Z and Σ²_X(1), but estimate Σ²_X(2) reasonably well. Estimates for CJLP are more biased by comparison. Obviously, the assumption of conditional independence comes at a price in that CJLP fails to provide accurate estimates for all the variances.

[Table 2 about here.]

[Figure 1 about here.]

In Panels A and B of Figure 1, we plot the true states λ^true_t and X_t^true against the filtered states when we use PLFS. Clearly, learning about the auxiliary states λ and X is highly accurate.

Furthermore, in Panels C and D we observe that also the posterior distribution of the transition probabilities converge quickly to their true values.

[Figure 2 about here.]

In Figure 2we plot the posterior distribution for K_X and Σ²_X using PLFS.¹² While the posterior distributions for K_X and also for Σ²_X(2) converge reasonably fast to their true values, they do not for Σ²_X(1). Convergence is slow with a relatively flat and potentially biased distribution.¹³ Our explanation of this observation is as follows. First, given that Y is sampled at a low frequency, particle learning may not lead to an exact disentanglement of X from ^Z. Second, the normal-inverse-gamma prior is heavily tailed and the outliers from the parameter posterior potentially give rise to a misidentification of latent states, which further leads to a larger variance estimate. Indeed, by inspection of Panels A and B in Figure 1, we find that occasionally the states are not exactly identified. Johannes et al. (2014) interpreted this problem as “confounded” learning. Uncertainty about one parameter or state may lead to an impediment to learning about others and such an uncertainty may only dissipate slowly over time.

12We do not present the posterior distribution for Σ²Zas a similar conclusion holds as it does for Σ²X.

13We get similar results from PLBSFS, PLBSBS, and CJLP. Also, a change of random seeds in MATLAB, sample size, particle size, etc., yields analogous results. They can be obtained from the authors.

5.3 Aggregation Effect

When dealing with mixed-frequency space models, it is a common practice to specify the state-space evolution as having the same frequency as observations, which can be achieved by simply aggregating higher frequency observations. Such an approach fails to fully exploit all available data and may severely impact estimation performances and identifiability. To illustrate the impact of aggregation, we start from a simplified model M₁, in which we switch off the noisy signal Z from model M₀. Hence, we compare the baseline model M₁,

M1 :

to its time-aggregated version, which we denote by M₂:

M2 :

M2 differs from M₁ only in that M₂ is assumed to evolve at the sampling frequency of Y , while in M₁ we assume that the unobservable state evolves at a quarterly frequency. For particle learning we use PLFS with observations up to time T = 200 and with the same prior as in Table1.

[Table 3 about here.]

Table 3 report the estimates for M₁ and M₂ using PLFS. For model M₁, we also report the estimates using AL. We observe that when we increase sample size from T = 50 to T = 200, the estimation errors for model M₁ are decreased significantly more than those for model M₂. This finding is confirmed when we look at the convergence of the different parameters. Figure3illustrates the convergence for transition probabilities for M₁ and M₂. The low frequency of the observed process Y leads to very high estimation errors, particularly for small sample sizes. In contrast, M₁ converges gradually to its true value with increasing sample size. The same behavior is observed for the posteriors of K_X, K_Y and the variances Σ_X, Σ_Y. Especially for the variances, as illustrated in

Figure 4, the problem of slow convergence and confounded learning is aggravated in that even with a sample size of T = 200 the 90% confidence bound remains very broad.

[Figure 3 about here.]

[Figure 4 about here.]

As a first comparison of models M₁ and M₂, we can compute the relative likelihood ratio of M₁ being true, i.e.,

LR_t(M = M₁|i ∈ {1, 2}) = p(Y^t|M₁) P

i∈{1,2}p(Y^t|M_i) ∈ [0, 1].

Using numerical simulations, we find that the small-sample behavior of the likelihood ratio largely depends on the prior specification, which provides no hints on the model specification analysis. Nev-ertheless, already with T = 50, we find LR_{T =50}(M = M1|i ∈ {1, 2}) = 0.909. The likelihood of M1 becomes dominant as new observations are available and this feature is invariant under a change of prior. At T = 200, we get LR_{T =200}(M = M1|i ∈ {1, 2}) = 0.999. Hence, model M₂ is clearly rejected in favor of model M₁.

As an additional comparison, we investigate whether particle learning correctly identifies regime shifts and study their performances for out-of-sample prediction of Y . M₁ has more latent variables compared to M₂, which leads to a rather weak identifiability of the state process λ. Hence, to make these models comparable, we define the accuracy ratio of state identification for each model in the following way. If the filtered mean for each state is closer to the true state, then we view this state as correctly identified. Under this assumption, we define the accuracy ratio for the three models as:

AR_t(Mi) = 1

Clearly, AR_t(Mi) ∈ [0, 1]. For the out-of-sample prediction of Y , we define the predictive mean

squared error (MSE) as

MSE_t(M_i) = 1 t ×

l=1

EMi[Y_l+1|Y^l] − Y_l+1^true2

, for i ∈ {0, 1, 2}. (44)

From Figure 5 we find that M₀ outperforms for both criteria, implying that adding observations, even if they are noisy, improves state identification and out-of-sample prediction. In addition, M₁ outperforms M₂, even at the cost of introducing much more latent variables. Therefore, we conclude that a model consistent with the inherent evolution frequency of the data generating process has better model performances than its the time-aggregated version.

[Figure 5 about here.]

Hence, in summary, we find that temporal aggregation tends to smooth out the sharp regime-switching behavior within a unit time interval and further leads to a potential model misspecification.

This conclusion is not a particularity of our model specification, it also holds for other specifications.¹⁴

6 Conclusion

This paper develops a general particle filtering and learning framework for mixed-frequency state-space models. Built on the auxiliary particle filter, our mixed-frequency particle methods use a smoother so as to draw the Bayesian inference from low-frequency observations. For particle learning, we track parameter posterior through the updating of sufficient statistics. The sufficient statistics for parameter posterior can be viewed as additional states that drive the state-space models. Though only a particular class of mixed-frequency models is studied, the idea of using smoothing to facilitate particle filtering and learning can be extended to a variety of other mixed-frequency settings.

To illustrate our approach to mixed-frequency particle learning, we use a mixed-frequency condi-tional dynamic linear model with regime switching. Assuming a conjugate prior, we derive closed-form procedures for likelihood-based importance resampling, propagation, and sufficient statistics recur-sion. Our approach allows us to study in a unified framework state filtering and regime switching,

14In particular, we also analyzed the estimation performance of a time-aggregated conditional dynamic linear model.

Our results remained robust. The results can be obtained by the authors on request.

parameter uncertainty and parameter learning, and the impact of mixed-frequency observations. We do such study in a simulation exercise. The estimation results for both mixed-frequency particle filtering and learning are rather striking. First of all, the estimates are overall reasonably well, given such a weak identifiability from the mixed-frequency nature of the observations. Second, our results indicate that when estimating a state-space model, we must take the sampling frequency inherent in the data generating processes into account. Third, we must fully exploit all data available. Ignoring the mixed-frequency nature of the data may lead to potentially wrong inference.

References

Bai, Jennie, Eric Ghysels, and Jonathan H Wright, 2013, State space models and midas regressions, Econometric Reviews 32, 779–813.

Bouwman, Kees E., and Jan P.A.M. Jacobs, 2011, Forecasting with real-time macroeconomic data:

The ragged-edge problem and revisions, Journal of Macroeconomics 33, 784 – 792.

Briers, Mark, Arnaud Doucet, and Simon Maskell, 2010, Smoothing algorithms for state–space mod-els, Annals of the Institute of Statistical Mathematics 62, 61–89.

Carlin, Bradley P, Nicholas G Polson, and David S Stoffer, 1992, A monte carlo approach to nonnormal and nonlinear state-space modeling, Journal of the American Statistical Association 87, 493–500.

Carter, Chris K, and Robert Kohn, 1994, On gibbs sampling for state space models, Biometrika 81, 541–553.

Carvalho, Carlos, Michael S Johannes, Hedibert F Lopes, and Nick Polson, 2010, Particle learning and smoothing, Statistical Science 25, 88–106.

Chen, Rong, and Jun S Liu, 2000, Mixture kalman filters, Journal of the Royal Statistical Society:

Series B (Statistical Methodology) 62, 493–508.

Chopin, Nicolas, Alessandra Iacobucci, Jean-Michel Marin, Kerrie Mengersen, Christian P Robert, Robin Ryder, and Christian Schäfer, 2010, On particle learning, arXiv preprint arXiv:1006.0554 . Christensen, Bent Jesper, Olaf Posch, and Michel van der Wel, 2016, Estimating dynamic equilibrium

models using mixed frequency macro and financial data, Journal of Econometrics 194, 116–137.

Collin-Dufresne, Pierre, Michael Johannes, and Lars A Lochstoer, 2016, Parameter learning in general equilibrium: The asset pricing implications, The American Economic Review 106, 664–698.

De Jong, Piet, and Neil Shephard, 1995, The simulation smoother for time series models, Biometrika 82, 339–350.

Eraker, Bjørn, Ching Wai Jeremy Chiu, Andrew T Foerster, Tae Bong Kim, and Hernán D Seoane, 2015, Bayesian mixed frequency vars, Journal of Financial Econometrics 13, 698–721.

Fearnhead, Paul, David Wyncoll, and Jonathan Tawn, 2010, A sequential smoothing algorithm with linear computational cost, Biometrika 97, 447–464.

Foroni, Claudia, and Massimiliano Giuseppe Marcellino, 2013, A survey of econometric methods for mixed-frequency data, Available at SSRN 2268912 .

Geweke, John, and Hisashi Tanizaki, 2001, Bayesian estimation of state-space models using the metropolis–hastings algorithm within gibbs sampling, Computational Statistics & Data Analysis 37, 151–170.

Ghysels, Eric, 2016, Macroeconomics and the reality of mixed frequency data, Journal of Economet-rics .

Ghysels, Eric, Pedro Santa-Clara, and Rossen Valkanov, 2004, The midas touch: Mixed data sampling regression models, Finance .

Godsill, Simon J, Arnaud Doucet, and Mike West, 2012, Monte carlo smoothing for nonlinear time series, Journal of the american statistical association .

Gordon, Neil J, David J Salmond, and Adrian FM Smith, 1993, Novel approach to nonlinear/non-gaussian bayesian state estimation, in IEE Proceedings F-Radar and Signal Processing , volume 140, 107–113, IET.

Hamilton, James Douglas, 1994, Time series analysis, volume 2 (Princeton university press Prince-ton).

Harvey, Andrew C, 1990, Forecasting, structural time series models and the Kalman filter (Cambridge university press).

Johannes, Michael S, Lars A Lochstoer, and Yiqun Mou, 2014, Learning about consumption dynam-ics, Journal of Finance, forthcoming .

Kalman, Rudolph Emil, 1960, A new approach to linear filtering and prediction problems, Journal of basic Engineering 82, 35–45.

Kim, Chang-Jin, Charles R Nelson, et al., 1999, State-space models with regime switching: classical and Gibbs-sampling approaches with applications, volume 2 (MIT press Cambridge, MA).

Kitagawa, Genshiro, 1996, Monte carlo filter and smoother for non-gaussian nonlinear state space models, Journal of computational and graphical statistics 5, 1–25.

Kitagawa, Genshiro, 1998, A self-organizing state-space model, Journal of the American Statistical Association 1203–1215.

Liu, Jane, and Mike West, 2001, Combined parameter and state estimation in simulation-based filtering, in Sequential Monte Carlo methods in practice, 197–223 (Springer).

Marcellino, Massimiliano, and Vasja Sivec, 2016, Monetary, fiscal and oil shocks: Evidence based on mixed frequency structural favars, Journal of Econometrics .

Mariano, Roberto S, and Yasutomo Murasawa, 2010, A coincident index, common factors, and monthly real gdp, Oxford Bulletin of Economics and Statistics 72, 27–46.

Niemi, Jarad, and Mike West, 2010, Adaptive mixture modeling metropolis methods for bayesian analysis of nonlinear state-space models, Journal of Computational and Graphical Statistics 19, 260–280.

Pitt, Michael K, and Neil Shephard, 1999, Filtering via simulation: Auxiliary particle filters, Journal of the American statistical association 94, 590–599.

Polson, Nicholas G, Jonathan R Stroud, and Peter Müller, 2008, Practical filtering with sequential parameter learning, Journal of the Royal Statistical Society: Series B (Statistical Methodology) 70, 413–428.

Qian, Hang, 2016, A computationally efficient method for vector autoregression with mixed frequency data, Journal of Econometrics .

Schorfheide, Frank, and Dongho Song, 2015, Real-time forecasting with a mixed-frequency var, Jour-nal of Business & Economic Statistics 33, 366–380.

Schorfheide, Frank, Dongho Song, and Amir Yaron, 2014, Identifying long-run risks: A bayesian mixed-frequency approach, Technical report, National Bureau of Economic Research.

Shephard, Neil, and Michael K Pitt, 1997, Likelihood analysis of non-gaussian measurement time series, Biometrika 84, 653–667.

Storvik, Geir, 2002, Particle filters for state-space models with the presence of unknown static pa-rameters, IEEE Transactions on signal Processing 50, 281–289.

Stroud, Jonathan R, Peter Müller, and Nicholas G Polson, 2011, Nonlinear state-space models with state-dependent variances, Journal of the American Statistical Association .

Viefers, Paul, 2011, Bayesian inference for the mixed-frequency var model .

West, M, and J Harrison, 1998, Bayesian forecasting and dynamic models (2nd edn), Journal of the Operational Research Society 49, 179–179.

Yang, Biao, Jonathan R Stroud, and Gabriel Huerta, 2016, Sequential monte carlo smoothing with parameter estimation, arXiv preprint arXiv:1604.05658 .

In document Particle Filtering, Learning, and Smoothing for Mixed-Frequency State-Space Models (Page 23-35)