Nowcasting GDP using dynamic factor model: A Bayesian approach

(1)

2020

Nowcasting GDP using dynamic factor model: A Bayesian

approach

Yixiao Zhang Iowa State University

Follow this and additional works at: https://lib.dr.iastate.edu/etd

Recommended Citation Recommended Citation

Zhang, Yixiao, "Nowcasting GDP using dynamic factor model: A Bayesian approach" (2020). Graduate Theses and Dissertations. 17858.

https://lib.dr.iastate.edu/etd/17858

This Thesis is brought to you for free and open access by the Iowa State University Capstones, Theses and Dissertations at Iowa State University Digital Repository. It has been accepted for inclusion in Graduate Theses and Dissertations by an authorized administrator of Iowa State University Digital Repository. For more information, please contact [email protected].

(2)

by

Yixiao Zhang

A dissertation submitted to the graduate faculty in partial fulfillment of the requirements for the degree of

DOCTOR OF PHILOSOPHY

Major: Statistics

Program of Study Committee: Cindy Yu, Major Professor

Huaiqing Wu Sergio Lence Ulrike Genschel Wolfgang Kliemann

The student author, whose presentation of the scholarship herein was approved by the program of study committee, is solely responsible for the content of this dissertation. The Graduate College will ensure this dissertation is globally accessible and will not permit alterations after a degree is

conferred.

Iowa State University Ames, Iowa

2020

(3)

TABLE OF CONTENTS

Page

LIST OF TABLES . . . v

LIST OF FIGURES . . . vii

ACKNOWLEDGMENTS . . . xi

ABSTRACT . . . xii

CHAPTER 1. INTRODUCTION . . . 1

CHAPTER 2. NOWCASTING GDP USING DYNAMIC FACTOR MODEL WITH KNOWN NUMBER OF FACTORS . . . 6

2.1 Structure of Dataset and the GRS Approach . . . 6

2.1.1 Working With Unbalanced Data . . . 6

2.1.2 The GRS Approach . . . 9

2.2 Alternative Approach to Nowcasting: Bayesian Markov Chain Monte Carlo Method 10 2.2.1 Model Modifications . . . 11

2.2.2 Estimating Dynamic Factor Models via a Bayesian MCMC approach . . . 12

2.2.3 Nowcasting GDPyK+1. . . 15

2.3 Bayesian Approach in Nowcasting: Simulation Evidence . . . 16

2.3.1 In-sample Estimation of Latent FactorsFt. . . 17

2.3.2 Out-of-sample Nowcasting Performance . . . 21

2.4 Bayesian Approach in Nowcasting: Empirical Evidence . . . 26

CHAPTER 3. NOWCASTING GDP USING DYNAMIC FACTOR MODEL WITH UN-KNOWN NUMBER OF FACTORS . . . 34

3.1 Structure of Dataset and Model Set-ups . . . 34

3.1.1 DFM with Constant Volatility . . . 34

3.1.2 Working With Unbalanced Data . . . 36

3.2 Bayesian MCMC Estimation Method and Nowcasting . . . 39

3.2.1 Estimating DFM With Constant Volatility via MCMC . . . 39

3.2.2 Nowcasting GDPyK+1 and Estimating the Number of Factors . . . 43

3.3.1 Estimating the Number of Latent Factors . . . 46

3.3.3 Estimation of Latent Variables . . . 50

(4)

CHAPTER 4. NOWCASTING GDP USING DYNAMIC FACTOR MODEL WITH

UN-KNOWN NUMBER OF FACTORS AND STOCHASTIC VOLATILITY . . . 64

4.1 DFM with Stochastic Volatility . . . 64

4.2 Estimating Stochastic Volatility via MCMC . . . 65

4.3.1 Estimating the Number of Latent Factors . . . 69

4.3.3 Estimation of Latent Variables . . . 71

4.4 Bayesian Approach in Nowcasting: Empirical Evidence . . . 72

CHAPTER 5. SUMMARY AND DISCUSSION . . . 85

BIBLIOGRAPHY . . . 87

APPENDIX A. CHAPTER 2 APPENDIX . . . 91

A.1 Identification Assumptions . . . 91

A.2 Posterior Distributions . . . 91

A.2.1 Sampling the mean of monthly seriesµ . . . 92

A.2.2 Sampling the factor loadings matrixθ . . . 92

A.2.3 Sampling the covariance in monthly seriesΩ . . . 92

A.2.4 Sampling the AR(1) coefficientsaj . . . 93

A.2.5 Sampling the variance in factor equationsσ_j2 . . . 93

A.2.6 Sampling the coefficients associated with factorsβ . . . 94

A.2.7 Sampling the variance in GDP equationη2 . . . 94

A.2.8 Sampling the latent factorsFt . . . 94

APPENDIX B. CHAPTER 2 SUPPLEMENTAL MATERIAL . . . 97

B.1 Burn-in Period For Simulation Study . . . 97

B.2 Inference for Simulation Study . . . 97

B.3 Inference for Empirical Study . . . 99

B.4 Information of Monthly Series in Empirical Study . . . 100

APPENDIX C. CHAPTER 3 APPENDIX . . . 114

C.1 Sampling the mean of monthly seriesµ . . . 114

C.2 Sampling the factor loadings matrixθ . . . 114

C.3 Sampling the covariance in monthly seriesΩ . . . 115

C.4 Sampling the AR(1) coefficientsaj . . . 115

C.5 Sampling the variance in factor equationsσ2_j . . . 116

C.6 Sampling the coefficients associated with factors β . . . 116

C.7 Sampling the variance in GDP equationη2 . . . 116

C.8 Sampling the latent factorsFt. . . 116

C.9 Sampling the binary indicatorzj . . . 118

C.10 Sampling the binary probabilitypj . . . 119

(5)

APPENDIX D. CHAPTER 4 APPENDIX . . . 121

D.1 Sampling the correlation matrixΨ . . . 121

D.2 Sampling the variance in volatility equationτ2 . . . 121

D.3 Sampling the volatility parametersωt using Particle Gibbs with backward simulation122 D.4 A illustration of conditional auxiliary particle filter with backwards simulation . . . 123

(6)

LIST OF TABLES

Page

Table 2.1 Data Releasing Format in Current Quarter. Dark gray cells mean xi∈v_1,t∗ , gray cells meanxi∈v2,t∗ , light gray cell means xi∈v3,t∗ , white cells mean

unre-leased data series at (q, t). . . 7

Table 2.2 Overall picture of the entire dataset available at the first release date of the second month in the current quarter. Cells highlighted in gray color indicates the corresponding variables in the cells are available. . . 8

Table 2.3 Relative in-sample fit errors of GDP RISF E, and relative in-sample esti-mation errorRISEE(j) (j= 1,2,3) in 3 factors, for both simulations. . . . 21

Table 2.4 This table reports the percentages of reduction in MANE’s of both methods relative to RW, i.e. (M AN Em −M AN ERW)/M AN ERW ×100 (in %), wherem∈ {BAY, GRS}, for both simulation studies. . . 26

Table 2.5 Data Used For Nowcasting. Each column, from left to right, represents No. of release, name of data block, release date, No. of series released in each block, and total number of series for the corresponding release. RL stands for release. . . 29

Table 2.6 Percentage of reduction in MANE’s of both methods relative to RW, i.e.

(M AN Em−M AN ERW)/M AN ERW×100 (in %), wherem∈ {BAY, GRS},

for empirical studies. M1/M2/M3 stands for the first/second/third nowcast-ing month. . . 31

Table 3.1 Overall picture of the entire dataset available at the one release date of the second month in the current quarter. Cells highlighted in gray color indicates the corresponding variables in the cells are available. . . 38

Table 3.2 Data Releasing Format in Simulation study when nowcasting quarterK+1’s GDP in month 3K+1, 3K+2, and 3K+3. RL stands for release. Cells in red represents data released in the first month 3K+ 1. Cells in green represents data released on the second month 3K + 2. Cells in blue represents data released in the third month 3K+ 3. . . 45

Table 3.3 This table reports the percentages of reduction in MANE’s relative to RW, i.e. (M AN ER−M AN ERW)/M AN ERW ×100 (in %). Panel (a) is for

R= 3, panel (b) is for R= 6 . . . 50

Table 3.4 Data Releasing Format of month T for empirical data. RL stands for Re-lease, with release 1 colored in red, release 2 colored in green, and releae 3 colored in blue. The number in parentheses is the number of series for that particular release. Set 1 to 6 are for notation puporse. . . 52

Table 3.5 Data Transformation. x∗_it denotes the raw data and xit denotes the trans-formed data. . . 52

Table 3.6 Detailed information of monthly series used in empirical study. Series names are adopted from Federal Reserve Bank of St. Louis. . . 60

(7)

Table 3.7 This table reports the percentages of reduction in MANE’s of both methods relative to RW, i.e. (M AN E−M AN ERW)/M AN ERW×100 (in %) using the US market data. . . 61

Table 4.1 This table reports the percentages of reduction in MANE’s relative to RW, i.e. (M AN E−M AN ERW)/M AN ERW×100 (in %). . . 71

Table 4.2 This table reports the percentages of reduction in MANE’s of both methods relative to RW, i.e. (M AN E−M AN ERW)/M AN ERW×100 (in %) using the US market data. . . 73

Table B.1 95% CIs forβ₄ based on all releases in all three month of the quarter. . . . 99

Table B.2 Information of the monthly data series. Transformation 1, 2, 3 stand for 12-month growth rate, 12-12-month difference and no transformation respectively. 101

Table D.1 Graphical representation of a run of the conditional auxiliary particle filter with backwards simulation when t = 6 and P = 5. The left figure repre-sents all ancestral trajectories with grey lines. In the right figure, the black lines represent the ancestral lineage for {ωp₆ : p = 1, ...,5}. The blue path represents a sample that could be taken by the backwards simulator. . . 125

(8)

LIST OF FIGURES

Page

Figure 2.1 Simulated GDP and latent factors. The top row is the GDP, and the bottom three rows are simulated first, second, and third latent factors respectively. The left panel is for Simulation 1 and the right panel is for Simulation 2. . . 18

Figure 2.2 Absolute values of estimated latent factors from BAY (green dashed line) and GRS (red dotted line) approaches versus the truth (black solid line), with the left panel being Simulation 1 and the right panel being Simulation 2. 20

Figure 2.3 Nowcasting performance for Simulation 1, the left panel plots BAY nowcasts and the right panel plots GRS nowcasts. The first, second, and third row are nowcasts in the first, second, and third month of the given quarter respectively. In each cell, the curves colored in red, green and blue with different knot types represent the nowcast results based on the first, second, and third release dates in a given month of a given quarter respectively. . . 22

Figure 2.4 Nowcasting performance for Simulation 2, the left panel plots BAY nowcasts and the right panel plots GRS nowcasts. The first, second, and third row are nowcasts in the first, second, and third month of the given quarter respectively. In each cell, the curves colored in red, green and blue with different knot types represent the nowcast results based on the first, second, and third release dates in a given month of a given quarter respectively. . . 23

Figure 2.5 Mean absolute nowcasting error ratios for Simulation 1. The left panel is for BAY, the right panel is for GRS, and the horizontal line is 100% representing the baseline for RW. The first, second, and third release are colored from dark to light. . . 24

Figure 2.6 Mean absolute nowcasting error ratios for Simulation 2 (b). The left panel is for BAY, the right panel is for GRS, and the horizontal line is 100% representing the baseline for RW. The first, second, and third release are colored from dark to light. . . 25

Figure 2.7 Aggregated standard deviations for Simulation 1 and 2. The left panel is for Simulation 1, the right panel is for Simulation2, with first, second, and third release colored from dark to light. . . 27

Figure 2.8 China’s quarterly nominal GDP growth rate for 1999Q1 to 2010Q2. Quar-ters to the left of the dashed line are reserved as in-sample data, quarQuar-ters to the right of the dashed line are out-sample nowcasting targets. . . 28

Figure 2.9 Nowcasting performance in the empirical study, the left panel being BAY approach, the right panel being GRS approach, and the 3 rows being 3 nowcasting months. In each subplot, the solid curve is the real GDP, while 8 other curves in different colors and line types represent nowcasting results from 8 different release dates. . . 32

(9)

Figure 2.10 Relative ratios of MANE’s for both GRS (1st row) and BAY (2nd row) approaches using RW as the baseline. Three columns are for 3 nowcasting months, and 8 bars in each subplot represent 8 different dates. RL stands for release. . . 33

Figure 2.11 Estimates of 3 latent factors from both GRS and BAY methods in the empirical study. . . 33

Figure 3.1 Distribution of estimated number of latent factors by month when the true number of factorsr = 4. First row is the results forR= 3, second row is the results forR= 6. The first, second, and third columns represent nowcasting in the first, second, and third month of the quarter respectively. In each subplot, on the x-axis is the estimated number of factors, the height of the bars represent the porpotion of the estimated number of factors. . . 47

Figure 3.2 95% confidence intervals of estimated number of latent factors by month, eolving over the last 20 quarters. Left column is the results forR= 3, right column is the results for R = 6. The first, second, and third column are nowcasting in the first, second, and third month of the quarter respectively. The solid flat line represents the true number of factors, the red dashed line represent the mean of estimated number of factors for that quarter. The gray shaded area is the 95% confidence intervals calculated using normal approximation based on 100 estimates. . . 55

Figure 3.4 Averaged MANE ratios forR= 3 . . . 56

Figure 3.5 Averaged MANE ratios forR= 6 . . . 56

Figure 3.6 Averages mean absolute nowcasting error ratios (relative to RW). Panel (a) is forR= 3, panel (b) is for R= 6. The first, second, and third release are colored as dark gray, gray, and light gray. . . 56

Figure 3.7 Nowcasting performance over the last 20 quarters by 3 releases in each month. Each row represents nowcasting in each month of the quarter. Black solid line represents the true GDP value, and dashed lines with different knot types represent the GDP nowcasts with red, green, and blue as release 1, 2, and 3. . . 57

Figure 3.8 In-sample fit of latent factors forR= 6. Absolute value is used for the true latent factors and in-sample fits. Black solid line represents the true value and red dashed line represents the in-sample fitted value. . . 58

Figure 3.9 Quarterly real GDP growth rate in US. Right to the dashed line is the nowcasting horizon. . . 59

Figure 3.10 Distribution of estimated number of latent factors by month for the US market data. The left, middle, and right columns are nowcasting in the first, second, and third month of the quarter respectively. . . 61

Figure 3.11 Nowcasting over 2003Q1 to 2016Q4 by 3 releases in each month for the US market data. Black solid line represents the true GDP value, dashed lines with different knot types represent the GDP nowcasts with red, green, and blue as release 1, 2, and 3. . . 62

Figure 3.12 Averaged mean absolute nowcasting error ratios (relative to RW). The hor-izontal line is 100% representing the baseline for RW. The first, second, and third release are colored as dark gray, gray, and light gray for each month. . 63

(10)

Figure 3.13 Absolute values of estimated first latent factors of in-sample analysis in the US market data. . . 63

Figure 4.1 Distribution of estimated number of latent factors by month when the true number of factors r = 4. The first, second, and third columns represent nowcasting in the first, second, and third month of the quarter respectively. In each subplot, on the x-axis is the estimated number of factors, the height of the bars represent the porpotion of the estimated number of factors. . . . 70

Figure 4.2 95% confidence intervals of estimated number of latent factors by month, eolving over the last 20 quarters. The first, second, and third column are nowcasting in the first, second, and third month of the quarter respectively. The solid flat line represents the true number of factors, the red dashed line represent the mean of estimated number of factors for that quarter. The gray shaded area is the 95% confidence intervals calculated using normal approximation based on 100 estimates. . . 75

Figure 4.3 Averages mean absolute nowcasting error ratios (relative to RW). The first, second, and third release are colored as dark gray, gray, and light gray. . . . 76

Figure 4.4 Nowcasting performance over the last 20 quarters by 3 releases in each month. Each row represents nowcasting in each month of the quarter. Black solid line represents the true GDP value, and dashed lines with different knot types represent the GDP nowcasts with red, green, and blue as release 1, 2, and 3. . . 77

Figure 4.5 In-sample fit of latent factors forR= 6. Absolute value is used for the true latent factors and in-sample fits. Black solid line represents the true value and red dashed line represents the in-sample fitted value. . . 78

Figure 4.6 Heat map of the difference between estimated correlation matrix and the true value for in-sample analysis, i.e. ( ˆΨ−Ψ) where ˆΨ is the posterior mean of correlation matrix. On the x-axis is the row index, on the y-axis is the column index. Positive differences are colored in red, negative differences are colored in blue with lighter color represents smaller difference. . . 79

Figure 4.7 Estimated stochastic volatility (ˆωit) for each individual series of in-sample analysis. In each subplot, the x-axis is the month of the in-sample period, on the y-axis is the value of ˆωit. Black line represents the true value with red dashed line represents estimate. Index of the monthly series in on top of each subplot. . . 80

Figure 4.8 Distribution of estimated number of latent factors by month for the US market data.The left, middle, and right columns are nowcasting in the first, second, and third month of the quarter respectively. . . 81

Figure 4.9 Nowcasting over 2003Q1 to 2016Q4 by 3 releases in each month for the US market data. Black solid line represents the true GDP value, dashed lines with different knot types represent the GDP nowcasts with red, green, and blue as release 1, 2, and 3. . . 82

Figure 4.10 Averaged mean absolute nowcasting error ratios (relative to RW). The hor-izontal line is 100% representing the baseline for RW. The first, second, and third release are colored as dark gray, gray, and light gray for each month. . 83

Figure 4.11 Absolute values of estimated first latent factors of in-sample analysis in the US market data. . . 83

(11)

Figure 4.12 Estimated stochastic volatility ( ˆΩ_t_[_i,i_]=exp(2ˆωit)) for each individual series of in-sample analysis in the US market data. Index of the monthly series in

on top of each subplot. . . 84

Figure B.1 Time-series plot for posterior samples of Afor Simulation 1. . . 105

Figure B.2 Time-series plot for posterior samples of β0, β4 for Simulation 1. . . 106

Figure B.3 Time-series plot for posterior samples of β₁,β₂,β₃ for Simulation 1. . . 107

Figure B.4 Posterior samples ofFtfrom 1st,50th,500th,5000th iteration for Simulation 1.108 Figure B.5 Ft estimation based on three releases in all month of the quarter for GRS approach. From top to bottom, are estimations for the first, second and third latent factors. From left to right, are estimations in first, second, and third month of the quarter. Three releases are colored in red, green, abd blue with different node type. . . 109

Figure B.6 Ft estimation based on three releases in all month of the quarter for BAY approach. From top to bottom, are estimations for the first, second and third latent factors. From left to right, are estimations in first, second, and third month of the quarter. Three releases are colored in red, green, abd blue with different node type. . . 110

Figure B.7 95% CIs for β₁,β₂,β₃ estimation based on different releases in the first month. CIs based on different releases are color coded. RL stands for release, dotted line represents 0. . . 111

Figure B.8 95% CIs forβ₁,β₂,β₃ estimation based on different releases in the second month. CIs based on different releases are color coded. RL stands for release, dotted line represents 0. . . 112

Figure B.9 95% CIs for β₁,β₂,β₃ estimation based on different releases in the third month. CIs based on different releases are color coded. RL stands for release, dotted line represents 0. . . 113

(12)

ACKNOWLEDGMENTS

I would like to take this opportunity to express my thanks to those who helped me with various aspects of conducting research and the writing of this dissertation. First and foremost, Dr. Cindy Yu for her guidance, patience and support throughout this research and the writing of this disser-tation. Her insights and words of encouragement have often inspired me and renewed my hopes for completing my graduate education. I would additionally like to thank Dr. Haitao Li for his guidance throughout the research.

(13)

ABSTRACT

Real-time nowcasting is an assessment of current economic conditions from timely released economic series (such as monthly macroeconomic data) before the direct measure (such as quarterly GDP figure) is disseminated. Dynamic factor models (DFMs) are widely used in econometrics to bridge series with different frequencies and achieve a reduction in dimensionality. However, most of the research using DFMs often assumes the number of factors is known. In this dissertation, we first develop a Bayesian approach to provide a way to deal with unbalanced feature of the data set and to estimate latent common factors when the number of factors is assumed to be fixed and known. Then we extend our method such that it can identify the unknown number of factors and estimate the latent dynamic factors of DFMs accurately in a real-time nowcasting framework. The proposed method can deal with the unbalanced data, which is typical of a real-time nowcasting analysis. We demonstrate the validity of our approach through simulation studies and explore the applicability of our approach through empirical studies in nowcasting China’s GDP or US GDP using monthly data series of several categories in each country’s market respectively. The simulation studies and empirical studies indicate that our Bayesian approach is a viable option to conduct real-time nowcasting for China’s and US’s quarterly GDP.

(14)

CHAPTER 1. INTRODUCTION

A real time nowcasting is a process to assess or reconstruct current-quarter GDP from timely released economic and financial series before the figure is disseminated in order to gauge the overall macroeconomic conditions in real time. This is of interest because most data are released with a lag and are subsequently released. In principle, any release, no matter at what frequency, may potentially affect current-quarter estimates and their precision. Both forecasting and nowcasting are important tasks for central banks due to the following, including but not limited to, two reasons. Firstly, many policy decisions, including monetary policy, need to be made in real time and are based on assessments of current and future economic conditions. Secondly, to central banks, estimated current-quarter GDP figures are often used as relevant inputs for model-based longer-term forecasting exercises in the banks.

Real time nowcasting faces some challenges. The first one is how to bridge information contained in monthly data with the quarterly GDP. Baffigi et al. (2004), R¨unstler and S´edillot (2003), Kitchen and Monaco (2003) study the idea of bridge equations which use small models to “bridge” the information contained in one or a few key monthly data series with the quarterly growth rate of GDP. However, they involve judgmental nowcasts and only deal with a few monthly data series. Then, how to deal with a large number of monthly data series becomes the second challenge. The use of factor models (FMs) for macroeconomic forecasting is now standard at central banks and other institutions. Many authors, such as Boivin and Ng (2005), Forni et al. (2005), D’Agostino and Giannone (2006), have shown that these models are successful in this regard. But FMs have not been used specifically for the problem of nowcasting in real time. The third challenge is that a large number of monthly data series are released at alternative times and with different lags, causing unbalanced data at the end of the sample. In real time, some data are released at the beginning of the month, some are in the middle, and some are at the end. Consequently,

(15)

the underlying data sets are unbalanced at end of the sample (i.e. at a real time when a new release happens). Some authors, including Croushore and Stark (2001), Koenig et al. (2003) and Orphanides (2002), discussed about this issue, but focused on data revisions and its implications, instead of statistical estimation. Appropriately dealing with this “jagged edge” feature of the data is the key for producing a nowcast by exploiting information in the most recent releases, and it has a chance to compete with judgmental nowcasts.

Giannone et al. (2008) provided a frequentist inference framework for the parametric dynamic factor models, and takes advantages of different data releases throughout the month and updates the nowcast based on each new data release. They use dynamic factor models (DFMs) to bridge monthly information with quarterly GDP and achieve a reduction in the dimensionality of the monthly data. The framework also formalizes the updating of the GDP nowcast as monthly data are released throughout the quarter. They combined principal component analysis (PCA) together with modified Kalman Filter (KF) to deal with the jagged edge feature of the data. Hereafter, we call the method proposed in Giannone et al. (2008) the GRS approach.

Since invented, the GRS approach has been implemented in many applications. Yiu and Chow (2011) nowcasted Chinese GDP using the GRS approach and discovered that interest rate data is the single most important category of economic series in estimating current-quarter GDP in China. Chernis and Sekkel (2017) showed that in a pseudo-real-time setting, the DFM outperformed univariate benchmarks as well as other commonly used nowcasting models, such as mixed-data sampling and bridge regressions, when nowcasting Canada’s GDP growth. For the US market, the Federal Reserve Bank of New York published a platform called the New York Fed Staff Nowcast which has been estimating the US’s GDP growth for the current and subsequent quarter, based on data released over the course of each week since April 2016. The behind-the-scenes methodology of the platform is built on the GRS approach and details can be found in Aarons et al. (2016) and Bok et al. (2018).

Based on the DFMs in Giannone et al. (2008), we propose a Bayesian Markov Chain Monte Carlo (MCMC) based inference framework which provides a more natural way to deal with the

(16)

“jagged edge” feature of the data and generates timely nowcast results of quarterly GDP in Chap-ter 2. HereafChap-ter, we refer our Bayesian approach as the BAY method. There are some differences between the BAY method and the GRS method. One, the GRS approach parameterizes the vari-ance matrix of monthly information data series to be a diagonal matrix to offer convenience of solving the unbalanced data issue, while we consider non-zero cross-sectional correlations. Two, the GRS approach estimates parameters and latent factors in multi-steps, i.e. first using PCA to obtain parameter estimates and then using KF to obtain latent factor estimates, while we combine estimation of all parameters and factors. We integrate all into one single estimation framework so that uncertainty of parameters and latent variables can be taken into account simultaneously and inferences are readily made based on posterior draws after the burn-in period. Through simula-tion studies, we evaluate our BAY approach based on the accuracy of estimated latent factors and nowcasting results. We also investigate the applicability of our approach by applying it to nowcast annual growth rates of China’s quarterly GDP using monthly released data series in several cate-gories, including industrial production, fixed asset investment, external sector, money market and financial market in China.

All the above applications of the GRS approach, along with the BAY approach proposed in Chapter 2 has two fundamental assumptions. They assume the number of factors of DFMs is fixed and given, while in reality it is unknown to us. In practice, people determine the number of factors by looking at the cumulative proportions of variances explained by the first few principal components from PCA. It is a rather subjective choice. The second key assumption is that the volatilities of macroeconomic series are treated as constant over time. In the past decade, macro empirical literature has been paying close attention to time-varying parameters. Many papers, such as Primiceri (2005), Cogley and Sargent (2005), and Benati and Surico (2008), mentioned that characterizing macroeconomic data with constant parameter models is deficient, and the form of slow, continuous, and time-varying parameters is much more desirable. Negro and Otrok (2008) bridged the literature on factor models with the literature on parameter instability by considering DFMs with time-varying factor loadings and stochastic volatility (SV). Clark (2011) focused on

(17)

adding SV to density forecasts of US’s GDP growth, unemployment, inflation, and the federal funds rate from Bayesian Vector Autoregression (BVAR) analyses and discovered material improvements in the real-time accuracy of density forecasts and the accuracy of point forecasts when SV was taken into consideration.

We relax the above two assumptions, allowing the DFM to have an unknown number of factors in Chapter 3, and the volatility of macroeconomic data to vary over time in Chapter 4. We make modifications and improvements to our BAY approach to handling these two changes. Time-varying factor loadings are beyond the scope of this dissertation. The improved BAY approach addresses the unknown number of latent factors and SV by consolidating ideas from prior work. Zhang et al. (2013) proposed a Bayesian method of estimating the covariance matrices in the form of a factor model with an unknown number of latent factors by introducing binary indicators for factor selections. We adopt the idea and introduce a binary indicator to a set of candidate latent factors. If the data suggests a certain factor to be selected, the binary indicator will return 1 for that factor, and 0 otherwise. By counting how many 1’s, we get an estimate of the number of latent factors. Follett and Yu (2019) introduced a SV model in the framework of vector autoregression (VAR) that involves the comovement of the time-changing variances across series. The time-varying volatility is achieved by allowing a static correlation matrix generated from an Lewandowski, Kurowicka, and Joe (LKJ) prior proposed in Lewandowski et al. (2009), and a random walk process for the variance of each individual time series. They also proposed to use the particle Gibbs with backward simulation algorithm to estimate the time-varying volatility parameters effectively. The alogrithm is directly used in our MCMC algorithm to generate posterior samples for SV.

We are not the first one to consider estimating the number of factors. Bai and Ng (2002, 2006) proposed a class of information criteria and showed the number of factors could be consistently estimated using those criteria in a large panel of data setting in DFMs. Alternative methods that involve the calculation of eigenvalues of the sample covariance matrix are available, including Onatski (2009, 2010) and Ahn and Horenstein (2013) who considered the estimation of the number of factors in approximate factor models and generalized dynamic factor structure. Our BAY

(18)

ap-proach is fundamentally different from these apap-proaches in the following aspects. First, we resolve the problem of unknown number of factors using a Bayesian framework, while they approached the question using a frequentist view. Second, we estimate the number of factors within a nowcasting set-up, while they isolated the question on its own and did not consider forecasting/nowcasting. Third, we consider time-varying volatility while they assumed constant volatility. Through simula-tion studies, we evaluate our BAY approach based its accuracy in estimating the unknown number of factors and the latent state variables (dynamic factors and SV), and its effectiveness in producing reliable nowcasts in real time. We also investigate the applicability of our approach by applying it to nowcast US’s quarterly GDP growth using monthly released data series in several categories in the US market.

The rest of this dissertation is organized as follows. In Chapter 2, we present the results from the simulation and empirical studies using Chinese market data when the number of latent factors is assumed to be fixed and known. In Chapter 3, we show the results from the simulation and empirical studies using US’s market data when the number of latent factors is unknown. In Chapter 4 we extend the model in pervious chapter to allow time-varying volatility and show results from the simulation and empirical studies using US’s market data based on extended SV model. Chapter 5 concludes the dissertation.

(19)

CHAPTER 2. NOWCASTING GDP USING DYNAMIC FACTOR MODEL WITH KNOWN NUMBER OF FACTORS

In this chapter, we use dynamic factor models to bridge monthly information with quarterly GDP and achieve reduction in the dimensionality of the monthly data assuming the number of factors is fixed and known. We develop a Bayesian approach to provide a way to deal with unbal-anced feature of the dataset and to estimate latent common factors. We demonstrate the validity of our approach through simulation studies, and explore the applicability of our approach through an empirical study in nowcasting China’s GDP using 117 monthly data series of several categories in Chinese market.

2.1 Structure of Dataset and the GRS Approach

In this section, we will first describe the problem in a stylized way. The goal is to evaluate the current quarter nowcast of GDP based on the flow of information that becomes available during the quarter. Then we will review the frequentist framework proposed by Giannone et al. (2008) in order to build a foundation for our BAY approach.

2.1.1 Working With Unbalanced Data

In real time at a particular release date, some series have observations through the current month, whereas for others the most recent observations are from the previous month. Consequently, the underlying datasets are unbalanced. Appropriately dealing with this unbalanced feature of the data is key for nowcasting.

Lettbe the index for month andkbe the index for quarter. Letxt= (x1,t, ...., xn,t)0 be an×1 vector denotingnmonthly data series at montht, and ykbe quarterly GDP at quarter k. Assume

(20)

the qth release date in month t, where q = 1, ...Q. The releasing set v∗_q,t contains indexes of n∗_q

monthly series that are released at the release date (q, t), wheren∗_q =||v_q,t∗ || denoting the number of newly released monthly series. Let vq,t denote the set collecting indexes of all xi,t’s that have been released at or before the release date (q, t), that is vq,t =Si≤qvi,t∗ , and nq =Pi≤qn∗i is the number of available series at (q, t). Thus xi∈vq,t represents monthly series that are available at the

release date (q, t). Without loss of generality, we assume the release dates for all series are fixed across months.

Table2.1 illustrates how the data is released in a given quarter using a toy example. Consider there are n = 6 monthly series, xt = (x1,t, x2,t, x3,t, x4,t, x5,t, x6,t)0, released at three different dates (i.e. Q = 3). Without loss of generality, xit can always be arranged such that index i follows the order of releasing dates. Suppose {x1,t, x2,t, x3,t}, with the dark gray background, are released at the first releasing date (1, t), so the releasing set v₁∗_,t = {1,2,3}. Since this is the very first releasing date so the available set v1,t is also {1,2,3} and the available monthly series xi∈v1,t is {x1,t, x2,t, x3,t}. Then at the second releasing date (2, t), {x4,t, x5,t}, with the

gray background, are released, hence the releasing set v∗₂_,t = {4,5}. The available set v2,t is {1,2,3,4,5} so xi∈v2,t = {x1,t, x2,t, x3,t, x4,t, x5,t}. At the last releasing date (3, t), the last series

x6,t, with the light gray background, becomes available, so v∗3,t = {6}, v3,t = {1,2,3,4,5,6}, and

xi∈v3,t ={x1,t, x2,t, x3,t, x4,t, x5,t, x6,t}.

Table 2.1: Data Releasing Format in Current Quarter. Dark gray cells mean xi∈v_1,t∗ , gray cells mean xi∈v∗

2,t, light gray cell means xi∈v3,t∗ , white cells mean unreleased data series at (q, t).

(q, t) (1, t) (2, t) (3, t) xi∈vq,t x1,t x1,t x1,t x2,t x2,t x2,t x3,t x3,t x3,t N A x4,t x4,t N A x5,t x5,t N A N A x6,t

(21)

Table2.2gives an overall picture of the entire dataset available when doing nowcasting. Partic-ularly, we consider the first release date of the second month in the current quarter as an example. Suppose we want to nowcast GDP in the current quarter yK+1 using all monthly information up

through month T (the end of the sample), where T = 3K+ 1 (means the first month nowcast), orT = 3K+ 2 (means the second month nowcast), or T = 3K+ 3 (means the third month now-cast). The observations available to use at the release date (q, t), highlighted in gray color in Table

2.2, include {y1, y2, ..., yK} and {x1,x2, ...,xT−1,xi∈vq,T}. The goal is to nowcast yK+1 using all

information available at every release date (q, t) in the current quarter, i.e. real time nowcasting. Note that the factors {F1, ...,FT} are not observed, and need to be estimated using the adjusted Kalman Filter (e.g. the GRS approach), or using the Bayesian smoothing techniques (e.g. our BAY approach). At every new release date (q, t) (q = 1, ..., Q), model parameters and {F1, ...,FT} are updated with this additional information from the new release, and nowcast ofyK+1is re-produced.

Therefore there are 3Qnowcast results in the current quarter.

Table 2.2: Overall picture of the entire dataset available at the first release date of the second month in the current quarter. Cells highlighted in gray color indicates the corresponding variables in the cells are available.

k 1 2 · · · K K+1 t 1 2 3 4 5 6 · · · T −4 T −3 T −2 T −1 T T+ 1 xi,t x1,1 x1,2 x1,3 x1,4 x1,5 x1,6 · · · x1,T−4 x1,T−3 x1,T−2 x1,T−1 x1,T NA x2,1 x2,2 x2,3 x2,4 x2,5 x2,6 · · · x2,T−4 x2,T−3 x2,T−2 x2,T−1 x2,T NA x3,1 x3,2 x3,3 x3,4 x3,5 x3,6 · · · x3,T−4 x3,T−3 x3,T−2 x3,T−1 x3,T NA x4,1 x4,2 x4,3 x4,4 x4,5 x4,6 · · · x4,T−4 x4,T−3 x4,T−2 x4,T−1 NA NA x5,1 x5,2 x5,3 x5,4 x5,5 x5,6 · · · x5,T−4 x5,T−3 x5,T−2 x5,T−1 NA NA x6,1 x6,2 x6,3 x6,4 x6,5 x6,6 · · · x6,T−4 x6,T−3 x6,T−2 x6,T−1 NA NA yk y1 y2 · · · yK yK+1 Ft F1 F2 F3 F4 F5 F6 · · · FT−4 FT−3 FT−2 FT−1 FT

How to handle the unbalanced feature of the monthly data series will be discussed in Section

(22)

2.1.2 The GRS Approach

Since there are numerous series in the information set, modelingydirectly on allxwould involve too many parameters, and hence the model performs poorly in nowcasting/forecasting because of the large uncertainty in parameters’ estimation (“the curse of dimensionality”). The fundamental idea of Giannone et al. (2008) is to explore the collinearity of the series by summarizing all available information into a few common factors. Because of collinearity, a linear combination of the common factors is able to approximate the dynamic interactions among the series and to postulate a parsimonious model that works well in nowcasting/forecasting. The GRS approach formulates the model in the following ways.

First, they assume the monthly data series is a linear function of a few unobserved common factorsFt,

xt=µ+ΘFt+t, (2.1)

where xt = (x1,t, ...., xn,t)0 be a n×1 monthly data series at month t, for t = 1, ..., T, Ft = {f1t,· · · , frt}0 be ar×1 monthly common factors at montht,Θis then×r factor loading matrix,

µ is the mean vector, and t ∼ N(0,Ωn×n). The number of latent factors, r, is assumed to be

known and r << n. Then they further specify the dynamics of the common factors as follows,

Ft=AFt−1+ut, (2.2)

whereAis ar×rmatrix and all roots ofdet(Ir−Az) lie outside the unit circle, andut∼N(0,Σr×r). Finally, the quarterly GDP is assumed to be a linear function of the common factors in the third month of the quarter,

yk=β0+β10F3k+νk, (2.3)

where β0 is a scalar, β1 is a r×1 vector, vk ∼ N(0, η2), and k= 1, ..., K. We assume 3K+ 1≤

T ≤ 3K+ 3. The dynamic factor model specified this way not only can bridge the information contained in monthly data series with the quarterly GDP, but also helps reduce the dimension of parameters, thus increasing the degrees of freedom.

(23)

If the complete set of monthly data series is observed, the unobserved common factors Ft

(t = 1, ..., T) could be consistently estimated by PCA as recently shown by several authors in

literature. However, when doing real time nowcasting, the dataset is unbalanced and we want to exploit the additional information from a newly released set, which requires dealing with missing data at the end of the sample. To overcome this difficulty, the GRS estimates parameters and common factorsFt (t= 1, ..., T) based on the following three stages. The first stage uses an OLS regression on principal components (PCs) extracted from a balanced panel that has truncated at the previous month T −1, i.e. the balanced panel is {x1, ...,xT−1} in Table 2.2. And the second

stage adjusts the Kalman smoother based on the estimated parameters from the first stage in order to deal with the unbalanceness of data. More specifically, at the release date (q, T), the variance covariance matrix used in the Kalman filter is defined as ˜Ωvq,T =diag( ˜w

2 11, ...,w˜nn2 ), where ˜ w_ii2 =        w2_ii ifi∈vq,T ∞ ifi /∈vq,T , fori= 1, ..., n, (2.4)

w2_iiis the ith diagonal element of covariance matrix for t, estimated from the balanced panel data in the first stage. And then the Kalman Filter algorithm is applied on the entire (unbalanced) data, i.e. {x1,x2, ...,xT−1,xi∈vq,T}, using ˜Ωvq,T. In this way, Giannone et al. (2008) argues that

the filter, through its implicit signal extraction process, will put no weight on missing observations in the computation of the factors. Then the third stage estimates the coefficients in equation (2.3) by OLS regression of GDP on the latent factors estimated by the Kalman filtering. Readers can refer to Giannone et al. (2008) for more details.

2.2 Alternative Approach to Nowcasting: Bayesian Markov Chain Monte

Carlo Method

In this section, we discuss some restrictions imposed on the model to avoid the non-identifiable issue of common factorsFt, introduce our BAY approach to estimate model parameters and latent factorsFt, and provide formulas for nowcasting using both GRS and BAY approaches.

(24)

2.2.1 Model Modifications

It is known that dynamic factor models suffer from a non-identifiable issue. Following Stock and Watson (2002), two sets of assumptions, Assumption F and Assumption M are constructed (details are in Appendix A.1). Specifically in this chapter, some restrictions are imposed on matrices A

and Σ in the dynamics of latent factors in equation (2.2) as follows, A = diag(a1, a2, ..., ar) and

Σ=diag(σ2₁, σ2₂, ..., σ_r2), where |aj|<1 (j= 1, ..., r) and σi2/(1−a2i)> σj2/(1−a2j), ∀i < j. These restrictions together with the prior specification for loading matrixΘ (discussed later) satisfy the identification assumption F1 in Appendix A.1. Stock and Watson (2002) also shows that this assumption identifies the factors up to a change of sign.

For comparison with the random walk approach, we also consider allowing the quarterly GDP

yk to depend on lags of latent factors and GDP itself. The model considered in our article can be summarized by the following:

xt = µ+ΘFt+t, fort= 1, ..., T

Ft = AFt−1+ut, fort= 1, ..., T (2.5)

yk = β0+β01F3k+β02F3k−1+β03F3k−2+β4yk−1+νk, fork= 1, ..., K

where t ∼ N(0,Ωn×n), ut ∼ N(0,Σr×r), vk ∼ N(0, η2), and matrices A and Σ have the re-strictions discussed in the beginning of this subsection, which implies fj,t = ajfj,t−1+σjuj,t for

j= 1, ..., rand uj,t ∼N(0,1).

At the release date (q, T), we have observationsY={y1, y2, ..., yK}andX(q,T)={x1,x2, ...xT−1, xi∈vq,T}; latent variablesF={F1,F2, ...,FT}; and model parametersΨ={µ,Θ,Ω,A,Σ, β0,β1,β2, β3, β4, η2}. The goals are to estimate Ψ and F using the observables and to nowcast the current

quarter GDP yK+1 at every release date (q, T) for q = 1, ..., Q. The original GRS approach is

(25)

2.2.2 Estimating Dynamic Factor Models via a Bayesian MCMC approach

We face a couple of challenges in estimating the above model. First, it is computationally infeasible to integrate out the high-dimensional latent variable F to obtain the likelihood based only on observables. Second, the observations in the panel of X(q,T) = {x1,x2, ...xT−1,xi∈vq,T}

are not balanced. To overcome these difficulties, we develop a computational Bayesian Markov Chain Monte Carl (MCMC) approach for estimating the above dynamic factor model in real time. MCMC conducts inferences by simulating efficiently from (potentially complicated) posterior dis-tributions of model parameters and latent variables given the observables. MCMC samples from the typically high-dimensional and complex posterior distributions by generating a Markov Chain over parameters and latent variables whose equilibrium distribution is the desired posterior distri-bution. The Monte Carlo method uses these samples for numerical integration for parameter and state estimation.

In order to facilitate the derivation of the joint posterior distribution, the dynamic of xt in equation (2.5) is rewritten as

xt=µ+ [In×n⊗F0t]∗θ+t,

where θ = vec(Θ) = (θ1, ...,θn)0 if Θ = (θ10, ...,θ0n)0 such that θi,(i = 1, ..., n) is a 1×r vector representing the ith row ofΘ. Thus the conditional density ofxt in equation (2.5) is

xt|Ft,Θ,Ω∼N(µ+ [In×n⊗F0t]∗θ,Ω), fort= 1, ..., T−1, (2.6)

the conditional density of Ftin equation (2.5) is

Ft|Ft−1,A,Σ∼N(AFt−1,Σ), fort= 2, ..., T, (2.7)

and the conditional density of yk in equation (2.5) is

(26)

for k = 1, ..., K. The joint posterior distribution, p(Ψ,F|Y,X₍_q,T₎), can be decomposed into products of individual conditionals,

where p(xt|Ft,θ,Ω), p(Ft|Ft−1,A,Σ), and p(yk|β,F3k,F3k−1,F3k−2, yk−1, η2) are given

accord-ing to the distributions in equations (2.6), (2.7) and (2.8) respectively. Here π(Ψ) is the prior distribution forΨ, which will be specified later.

To deal with the missing data inxi∈vq,T at the end of the sample, we define an indicator matrix

1vq,T as a nq ×n matrix obtained by deleting the i

th _{row from the identity matrix} _I

n×n if the corresponding xi,T is missing at time (q, T), where icould be any index from i= 1, ..., n. For the toy example discussed in Section 2.1.1, we have

1v1,T =       1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0       , 1v2,T =              1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0              , 1v3,T =I6×6.

Then we can write xi∈vq,T asxi∈vq,T =1vq,TxT, thus the conditional density ofxi∈vq,T in equation

(2.9) is xi∈vq,T =1vq,TxT|FT,Θ,Ω∼N(1vq,T(µ+ [In×n⊗F 0 T]∗θ),1vq,TΩ1 0 vq,T). (2.10)

Note this remedy for the unbalanced data issue is different from GRS which restricts matrix ˜Ωto be diagonal with ∞ variances for the missing series in order to implement the Kalman Filtering algorithm. We complete the model specification by assigning prior distributions for the parameter set Ψin Bayesian framework.

(27)

We set the prior forΘ as,

Θn×r∼M atrix N ormal(0n×r,In×n,Ir×r). (2.11)

In this way, the identification assumption F1 in AppendixA.1is satisfied by the law of large number. The prior for Ωis

Ωn×n∼Inverse W ishart

1

nIn×n, νθ

, (2.12)

where νθ is a pre-specified scaler. This prior allows t to be correlated across series, satisfying assumption M1 in AppendixA.1.

The prior for A is standard normal truncated at [−1,1], for j= 1, ..., r, that is

π(aj) =                    0 ifaj ≤ −1 φ(aj) Φ(1)−Φ(−1) if −1< aj <1 0 ifaj ≥1 , (2.13)

whereφ(·) and Φ(·) are PDF and CDF for standard normal distribution respectively. Andπ(A) =

Qr

j=1π(aj).

The prior for Σis, forj= 1, ..., r,

σ_j2iid∼ Inverse Gamma(αs, βs), (2.14)

whereαs, βs are prespecified scalers. Andπ(Σ) =Qrj=1π(σj). The prior for β= (β0,β01,β02,β03, β4)0 is

β₍₃_r₊₂₎_×₁∼N(0₍₃_r₊₂₎×1,I(3r+2)×(3r+2)). (2.15)

The prior for η2 is

η2∼Inverse Gamma(αh, βh), (2.16)

whereαh, βh are pre-specified scalers.

We assume all priors are independent. Following standard MCMC procedure, we derive the com-plete conditional distributions for each parameter and latent variable, and obtain posterior samples

(28)

by simulating from these individual complete conditionals iteratively. More specifically, we obtain the posterior distributionp(Ψi|Ψ−i,Y,X(q,T),F) where Ψiis theithelement ofΨandΨ−icontains all the parameters except for Ψi, and the posterior distribution for factors p(Ft|Ψ,Y,X(q,T)) for

all t. In estimation, we draw posterior samples from the above complete conditional distributions and use the means of the posterior samples as parameter estimates and the standard deviations of the posterior samples as standard errors of the parameter estimates. Appendix A.2 provides the posterior distributions for all model parameters and latent factors.

The GRS approach estimates parameters and latent factors in multi-stages, i.e. first using PCA to obtain parameter estimates, then using KF to obtain latent factor estimates and lastly using OLS to estimate coefficients β’s in the dynamics of quarterly GDP. However, we integrate estimation of all parameters and factors into a single framework so that uncertainty of parameters and latent variables can be taken into account simultaneously and inferences are readily made based on posterior draws after the burn-in period. Also note that in the GRS approach, only information fromX enters into estimation ofFin its PCA and Kalman Filter stages, while our BAY approach uses both information from monthly series X and quarterly GDP Y to update F (see details in AppendixA.2).

2.2.3 Nowcasting GDP yK+1

Recall that we have complete xt fort= 1, ..., T −1, and associatedyk fork= 1, ..., K are also available. Suppose we are at (q, T) in monthT (q = 1, ..., Q), the task is to nowcastyK+1 based on

{x1, ....,xT−1,xi∈vq,T}and {y1, ..., yK}. HereT can be the first (T = 3K+ 1), second (T = 3K+ 2)

or even third (T = 3K+ 3) month of quarterK+ 1. Let ˆβ0, ˆβi (i= 1,2,3), ˆβ4 and ˆFt(t= 1, ..., T) are estimated parameters and latent common factors from the GRS approach, and let β₀(g), β(_ig)

(i= 1,2,3),β₄(g) andF(_tg) (t= 1, ..., T) be thegth posterior draws of parameters and latent factors after the burn-in period in the BAY approach, whereg = 1, ..., G. The nowcast can be calculated as follows.

(29)

• In the first month, i.e. whenT = 3K+ 1, the nowcast of K+ 1 quarterly GDP using BAY and GRS are given by:

ˆ y_KBAY₊₁ = 1 G G X g=1 h β(₀g)+ (β(₁g))0(A(g))2F(_Tg)+ (β(₂g))0A(g)F(_Tg)+ (β₃(g))0F_T(g)+β₄(g)yK i , (2.17a) ˆ y_KGRS₊₁ = ˆβ0+ ˆβ 0 1Aˆ2FˆT + ˆβ 0 2AˆFˆT + ˆβ 0 3FˆT + ˆβ4yK. (2.17b)

• In the second month, i.e. whenT = 3K+ 2, the nowcast of K+ 1 quarterly GDP using BAY and GRS are given by:

ˆ y_KBAY₊₁ = 1 G G X g=1 h β₀(g)+ (β(₁g))0A(g)F(_Tg)+ (β(₂g))0F_T(g)+ (β(₃g))0F_T(g₋)₁+β₄(g)yK i , (2.18a) ˆ y_KGRS₊₁ = ˆβ0+ ˆβ 0 1AˆFˆT + ˆβ 0 2FˆT + ˆβ 0 3FˆT−1+ ˆβ4yK. (2.18b)

• In the third month, i.e. when T = 3K+ 3, the nowcast ofK+ 1 quarterly GDP using BAY and GRS are given by:

ˆ y_KBAY₊₁ = 1 G G X g=1 h β₀(g)+ (β(₁g))0F_T(g)+ (β(₂g))0F_T(g₋)₁+ (β(₃g))0F_T(g₋)₂+β₄(g)yK i , (2.19a) ˆ y_KGRS₊₁ = ˆβ0+ ˆβ 0 1FˆT + ˆβ 0 2FˆT−1+ ˆβ 0 3FˆT−2+ ˆβ4yK. (2.19b)

Note that all of these ˆFt’s for the GRS, orβ(_ig)’s andF(_tg)for the BAY, are updated in every release date within a month. And then ˆy_KGRS₊₁ and ˆyBAY_K₊₁ are re-produced for each additional release date. The superscript (q, T) in both ˆy_KGRS₊₁ and ˆy_KBAY₊₁ has been suppressed to simplify notations.

2.3 Bayesian Approach in Nowcasting: Simulation Evidence

In this section, through numerical simulations, we investigate two questions on Bayesian analysis of dynamic factor models. The first question is whether it can identify the latent factors Ft accurately. The second question is whether it can produce reliable nowcasting results. Specifically, we compare in-sample estimation ofFtand out-of-sample nowcasting performance of the BAY and the GRS approaches, when addressing these two questions.

(30)

In each of 2 simulation studies below, we generate data according to the model in (2.5) where

T = 210 (months), K = 70 (quarters), and n= 60 (monthly data series) with Q = 3 releases in each month. We set the first, second and third release sets as v₁∗_,t ={1, ...,20},v₂∗_,t={21, ...,40}, and v∗₃_,t ={41, ...,60}, i.e. n∗₁ =n∗₂=n∗₃= 20.

Simulation 1 In this simulation, we assume there is no correlation among blocks of three

re-leases. The following parameters are used to generate the data: A = diag(0.7,−0.65,0.6); Σ =

diag(5,3.5,2);β0= 0.5,β1=β2=β3 = (4,1,0.5)0, andβ4= 0.65;η2 = 2;Ω=diag(Ω11,Ω22,Ω33)

whereΩqq(q = 1,2,3) isn∗q×n∗q covariance matrix for theqthrelease dataset that is generated from an Inverse-Wishart distributionIW(n∗_q,In∗

q×n∗q/60); andθ = (θ1, ...,θ60)

0_{, where}_θ

i = (θi1,θi2,θi3)

is the ith row of Θ, where θir is the coefficient for fj,t. For i = 1, ...,20, θi1 is simulated from N(0,1), while θi2,θi3 are simulated from N(0,0.0025). For i = 21, ...,40, θi1,θi2 are simulated

fromN(0,1), whileθi3 is simulated fromN(0,0.0025). Fori= 41, ...,60,θi1, θi2,θi3 are simulated

fromN(0,1). This set-up assumes the first release mainly contains information from the first latent factor, the second release mainly contains information from the first two latent factors, and the last release contains information from all three latent factors.

Simulation 2In this simulation, we allow no-zero correlation among blocks of three releases, that

is,Ωis simulated fromIW(60,I60×60/60). Other parameters have the same set-up with Simulation

1.

Figure 2.1 plots the simulated GDP yk and three common factors Ft from Simulation 1 and Simulation 2. The first 38 quarters and the corresponding 114 monthly data series are used as in-sample data, and the out-of-sample nowcasting performance is assessed based on the rest of 32 quarters and 96 monthly data series.

2.3.1 In-sample Estimation of Latent Factors Ft

We first focus on simulation evidence that the BAY method can accurately estimate the latent factors Ft. Obtaining good estimates of latent factors is an important task if one is interested in using the same set of common factors to explain the movements of different economic or financial

(31)

Simulation 1 Quarter GDP 0 10 20 30 40 50 60 70 −100 −50 0 50 100 Simulation 2 Quarter GDP 0 10 20 30 40 50 60 70 −300 −100 0 100 Month f1t 0 50 100 150 200 −5 0 5 Month f1t 0 50 100 150 200 −8 −4 0 2 4 6 Month f2t 0 50 100 150 200 −5 0 5 10 Month f2t 0 50 100 150 200 −5 0 5 Month f3t 0 50 100 150 200 −6 −4 −2 0 2 4 6 Month f3t 0 50 100 150 200 −4 −2 0 2 4

Figure 2.1: Simulated GDP and latent factors. The top row is the GDP, and the bottom three rows are simulated first, second, and third latent factors respectively. The left panel is for Simulation 1 and the right panel is for Simulation 2.

(32)

time series. For the BAY MCMC, we use the results from PCA and Kaman Filter in the GRS approach as the initial values of parameters and latent factors. The hyper-parameter values are set to be: νθ=n+ 2 for the prior ofΩin (2.12), (αs, βs) = (2, r+ 2) for the prior of Σin (2.14), and (αh, βh) = (2,0.0001) for the prior of η2 in (2.16).

We run our MCMC procedure for 7,000 iterations, discarding the first 5,000 and using the last 2,000 iterations for posterior summaries. Investigation (not reported here) is conducted to check how fast posterior draws of parameters and latent factors converge. It is found that the burn-in period with 5,000 iterations is enough. See more details in the supplemental fileB.

Figure 2.2 plots the estimated latent factors from both BAY and GRS approaches, together with the true latent factors, for two simulations in the in-sample period t = 1, ...,114. Absolute values are used in the pictures because the restrictions based on Stock and Watson (2002) can identify the factors up to a change of sign. Figure2.2shows that the estimated latent factors from both BAY and GRS approaches are close to the true factors, which confirms that by introducing Assumption F and Assumption M, the latent factors are identifiable for both approaches.

In addition to graphical illustrations, we also compare the sample fit errors of GDP and in-sample estimation error of latent factors between the two approaches. The relative in-sample fit error for GDP is calculated as RISF E = P38

k=1|ˆy f it k − yk|/ P38 k=1|yk|, where ˆyf it_k = ˆβ0 + ˆ β0₁Fˆ3k+ ˆβ 0 2Fˆ3k−1 + ˆβ 0

3Fˆ3k−2 + ˆβ4yk−1, and ˆβ’s and ˆFt’s are estimated using the in-sample data either from GRS or BAY method. For BAY, posterior means are used as estimated parameters and latent factors. The relative in-sample estimation error for Ft is calculated as RISEE(j) =

P114

t=1||fˆjt| − |fjt||/P114t=1|fjt|, where ˆfjt (j = 1,2,3) is the estimated jth factor at time t from either GRS and BAY method. Table 2.3 reports both RISF E and RISEE(j) (j = 1,2,3) for both methods and both simulation studies. Table 2.3again shows that the averages of in-sample estimation errors of latent factors for both BAY and GRS method are small in all three factors. For Simulation 1, the BAY approach produces better RISEE for the second and third latent factors and close RISEE for the first latent factor. For Simulation 2, the BAY approach only outperforms GRS in terms of RISEE for the third latent factor and close RISEE for the first two latent factors.

(33)

Simulation 1 Month F1 t 0 20 40 60 80 100 0 5 10 15 Simulation 2 Month F1 t 0 20 40 60 80 100 0 5 10 15 Month F2 t 0 20 40 60 80 100 0 5 10 15 20 Month F2 t 0 20 40 60 80 100 0 2 4 6 8 10 Month F3 t 0 20 40 60 80 100 0 1 2 3 4 5 6 7 Month F3 t 0 20 40 60 80 100 0 1 2 3 4 5 6 7 True BAY GRS

Figure 2.2: Absolute values of estimated latent factors from BAY (green dashed line) and GRS (red dotted line) approaches versus the truth (black solid line), with the left panel being Simulation 1 and the right panel being Simulation 2.

(34)

Table 2.3: Relative in-sample fit errors of GDP RISF E, and relative in-sample estimation error

RISEE(j) (j= 1,2,3) in 3 factors, for both simulations.

Simulation RISF E RISEE(1) RISEE(2) RISEE(3)

GRS BAY GRS BAY GRS BAY GRS BAY

1 0.448 0.446 1.098 1.124 0.382 0.360 0.309 0.277 2 0.336 0.325 0.923 0.934 0.684 0.707 0.804 0.767

The in-sample fit results in this subsection show that our MCMC method provides accurate identification of latent factors, which can help shed light on comovements of latent factors and some of the economic/financial series.

2.3.2 Out-of-sample Nowcasting Performance

Out-of-sample nowcasting performance of GRS and BAY methods is assessed based on 32 one-step-ahead nowcasting. For both methods, in each additionally added quarter, the model parameters and latent factors are updated for each release date within each of the 3 nowcasting months in the current quarter. Then the nowcast results are produced according to (2.17) - (2.19), and there are totally 32×3×3 = 288 nowcast results.

Figure 2.3 and 2.4 show the nowcasting performance for Simulation 1 and 2. The left panel plots BAY nowcasts and the right panel plots GRS nowcasts. The first, second, and third row are nowcasts in the first, second, and third month of the given quarter respectively. In each cell, the curves colored in red, green and blue with different knot types represent the nowcast results based on the first, second, and third release dates in a given month of a given quarter respectively. Comparing GRS to BAY approaches, both methods give excellent nowcasts from the very first release in the first month all the way to the last release in the third month of a quarter. There are no distinguishable changes from releases to releases in the same month for both GRS and BAY approaches. But some improvement can be spotted when moving from nowcasts in the first month to nowcasts in the third month.

(35)

BAY First Month Nowcast Quarter GDP −100 0 100 38 41 44 47 50 53 56 59 62 65 68

GRS First Month Nowcast

Quarter GDP −100 0 100 38 41 44 47 50 53 56 59 62 65 68

BAY Second Month Nowcast

Quarter GDP −100 0 100 38 41 44 47 50 53 56 59 62 65 68

GRS Second Month Nowcast

Quarter GDP −100 0 100 38 41 44 47 50 53 56 59 62 65 68

BAY Third Month Nowcast

Quarter GDP −100 0 100 38 41 44 47 50 53 56 59 62 65 68

GRS Third Month Nowcast

Quarter GDP −100 0 100 38 41 44 47 50 53 56 59 62 65 68 True RL1 RL2 RL3

Figure 2.3: Nowcasting performance for Simulation 1, the left panel plots BAY nowcasts and the right panel plots GRS nowcasts. The first, second, and third row are nowcasts in the first, second, and third month of the given quarter respectively. In each cell, the curves colored in red, green and blue with different knot types represent the nowcast results based on the first, second, and third release dates in a given month of a given quarter respectively.

We use the mean absolute nowcasting error (MANE) as a measure of nowcasting accuracy. For either GRS or BAY, let ˆy(_Kq,T₊₁) be the nowcast calculated according to (2.17) - (2.19) at qth

release date of monthT, where q = 1,2,3 and T = 3K+ 1 (first month nowcast), or T = 3K+ 2 (second month nowcast), orT = 3K+ 3 (third month nowcast) of the current quarterK+ 1. Then

M AN E(q, T) = 32−1P70

K+1=39|ˆy (q,T)

K+1 −yK+1|. In order to compare the nowcasting performance

of the two methods with the random walk (RW) approach, which takes GDP from the previous quarter as the nowcast for the current quarter GDP, relative MANE’s (relative to the MANE of RW) are used. Figure 2.5 and 2.6 show the ratios (in percentages) of MANE’s of GRS or BAY to the MANE of RW in nine combinations of 3 releases and 3 nowcasting months for Simulation 1 and 2. The left panel is for BAY, the right panel is for GRS, and the horizontal line is 100% representing the baseline for RW, with first, second, and third release colored from dark to light. Bars shorter than the reference line indicates that nowcasts are better than RW, otherwise worse

(36)

BAY First Month Nowcast Quarter GDP −300 0 38 41 44 47 50 53 56 59 62 65 68

GRS First Month Nowcast

Quarter

GDP

−300

0

38 41 44 47 50 53 56 59 62 65 68

BAY Second Month Nowcast

Quarter

GDP

−300

0

38 41 44 47 50 53 56 59 62 65 68

GRS Second Month Nowcast

Quarter

GDP

−300

0

38 41 44 47 50 53 56 59 62 65 68

BAY Third Month Nowcast

Quarter

GDP

−300

0

38 41 44 47 50 53 56 59 62 65 68

GRS Third Month Nowcast

Quarter GDP −300 0 38 41 44 47 50 53 56 59 62 65 68 True RL1 RL2 RL3

Figure 2.4: Nowcasting performance for Simulation 2, the left panel plots BAY nowcasts and the right panel plots GRS nowcasts. The first, second, and third row are nowcasts in the first, second, and third month of the given quarter respectively. In each cell, the curves colored in red, green and blue with different knot types represent the nowcast results based on the first, second, and third release dates in a given month of a given quarter respectively.

than RW. Figure2.5and 2.6tell the same story as Figure2.3and 2.4, moving from the first month to the third month, there are significant reductions in terms of MANE’s for both GRS and BAY approaches. Comparing the nowcasting errors between releases in the same month, GRS nowcasting slightly improves, but there is no significant change in MANE for BAY. Both methods beat RW in terms of nowcasting errors.

Table2.4reports the percentages of reduction in MANE’s of both methods relative to RW, i.e.

(M AN Em −M AN ERW)/M AN ERW ×100 (in %), where m ∈ {BAY, GRS}, for Simulation 1

and 2. The more negative the percentage is, the more reduction in nowcasting errors it represents. When comparing the averages of percentages over 3 releases for each month, we see BAY in general has more reduction in terms of nowcasting errors than GRS. Even though there is no noticeable change across releases in BAY, the percentages of reduction in MANE of BAY in the first release

(37)

First Month Second Month Third Month BAY P ercentage 0 20 40 60 80 100 120

First Month Second Month Third Month

GRS P ercentage 0 20 40 60 80 100 120 RL1 RL2 RL3 RW

Figure 2.5: Mean absolute nowcasting error ratios for Simulation 1. The left panel is for BAY, the right panel is for GRS, and the horizontal line is 100% representing the baseline for RW. The first, second, and third release are colored from dark to light.

is higher than those of GRS in the third release for the same month in most of the places, except for the second and third months in Simulation 2.

We conduct some investigation to understand why there is not much change across different releases in BAY method. First, in the results reported in the supplemental fileB, we find that there are no significant changes in estimated parameters and factors across different releases, based on some tests and graphical displays. It is actually not surprising given that little extra information becomes available at a new release date. For example, when nowcasting y39 based on the very

first release in the first month, the number of data points in hand is 60×114 + 20 = 6860, and when the second set of release is available, the number of data points becomes 6860 + 20 = 6880, which has only roughly 0.29% new information added. However, the estimated factors in GRS have relatively much bigger changes across different releases. Because the BAY method results in estimated factors that are already very accurate (even at the first release) as shown in Section

(38)

First Month Second Month Third Month BAY P ercentage 0 20 40 60 80 100 120

First Month Second Month Third Month

GRS P ercentage 0 20 40 60 80 100 120 RL1 RL2 RL3 RW

Figure 2.6: Mean absolute nowcasting error ratios for Simulation 2 (b). The left panel is for BAY, the right panel is for GRS, and the horizontal line is 100% representing the baseline for RW. The first, second, and third release are colored from dark to light.

2.3.1, it is much more difficult for BAY-estimated factors to improve further, compared to the

GRS-estimated factors. Second, theoretically, adding more series on the left hand side of dynamics ofxt in (2.1) won’t change the asymptotic consistency property of the estimator forFt, but will help gain efficiency. In order to see such efficiency gain effect, we calculate the posterior standard deviations of nowcasts for different releases. Define the standard deviation of nowcast for quarterly GDP at release q in month T for quarter K+ 1, where q = 1,2,3, T = 1,2,3, and K+ 1 = 39, ...,70, as

SD(_Kq,T₊₁)= q (G−1)−1PG g=1 (ˆy_K(q,T₊₁))(g)₋_y_¯(q,T) K+1 2 where ¯y_K(q,T₊₁) = _G1 PG g=1(ˆy (q,T) K+1)(g). Furthermore,

define the aggregated level standard deviation for release q in month T of quarter K + 1, as

SD(q,T)_{= 32}−1P70

K+1=39SD (q,T)

K+1. Figure 2.7plots the side-by-side aggregated standard deviation

for BAY nowcasts for Simulation 1 and 2. The left panel is for Simulation 1, the right panel is for Simulation2, with first, second, and third release colored from dark to light. A downward stepwise

(39)

Table 2.4: This table reports the percentages of reduction in MANE’s of both methods relative to RW, i.e. (M AN Em−M AN ERW)/M AN ERW ×100 (in %), where m ∈ {BAY, GRS}, for both simulation studies.

(a) Percentage of reductions in MANE’s relative to RW in Simulation 1

Release BAY BAY BAY GRS GRS GRS

1st Month 2nd Month 3rd Month 1st Month 2nd Month 3rd Month 1st -35.473% -72.928% -93.946% -31.294% -67.197% -88.106% 2nd -35.519% -73.089% -93.860% -33.847% -71.211% -92.905% 3rd -35.355% -72.855% -93.850% -34.753% -72.865% -93.946% Average -35.782% -72.857% -93.885% -33.298% -70.424% -91.652%

(b) Percentage of reductions in MANE’s relative to RW in Simulation 2

Release BAY BAY BAY GRS GRS GRS

1st Month 2nd Month 3rd Month 1st Month 2nd Month 3rd Month 1st -51.869% -71.435% -93.742% -36.531% -59.402% -85.088% 2nd -51.997% -72.006% -93.699% -48.905% -71.240% -94.285% 3rd -52.264% -72.034% -94.590% -50.923% -72.620% -95.400% Average -52.043% -71.825% -94.010% -45.453% -67.754% -91.591%

trend can be detected. Therefore, for BAY approach, although additional releases does not improve nowcastings in terms of MANE, more monthly data series do help the precision of BAY nowcastings. In the simulation studies considered in this article, we evaluate our BAY method based on its estimation accuracy in latent factors and nowcasting performance for the quarterly GDP. In terms of estimation accuracy in factors, both of the BAY and GRS approaches can produce accurate estimated factors. In terms of nowcasting performance, the two methods are comparable with BAY being slightly better in sense of resulting in smaller nowcasting errors. Our simulation results suggest that the BAY method has the potential to estimate the dynamic factor models well and produce reliable nowcasting results.

2.4 Bayesian Approach in Nowcasting: Empirical Evidence

Despite its advantages demonstrated in the numerical simulations, it is not immediately clear that the BAY method can outperform the RW or the GRS approach in empirical application. In