Bayesian and Indirect Inference Estimation

Chapter 2 Shadow Banking System and Related Literature

2.6 Bayesian and Indirect Inference Estimation

2.6.1 Bayesian Approach

Conventional statistical estimations always assume no relationship between variables. Thus, the null hypothesis normally indicates no relationship and no prior knowledge of variables. However, it is often the case that researchers do have some understanding of the relationship between variables, which may be based on earlier research and investigations. It is different from the conventional approach (frequentist framework) that relies on the notion of repeating the same experiment many times. Instead, with the Bayesian technique, researchers can encompass the background knowledge and take it into the process of estimation of parameters. Hence, the key difference between Bayesian and conventional statistics, for example, maximum likelihood, is the different views of unknown parameters in a model.

For example, consider a regression 𝑦 = 𝛼 + 𝛽1𝑥1+ 𝛽2𝑥2+ 𝜀, where 𝑦 is the dependent variable, 𝑥1 and 𝑥2 are the independent variables, 𝜀 is the residual, 𝛼, 𝛽₁ and 𝛽₂ are the unknown parameters that we need to estimate. Conventional approaches assume that all parameters have only one true fixed value but unknown before estimation. Bayesian methods do not provide on value but rather a probability distribution, which implies each parameter is estimated to have a distribution that includes uncertainty about the value. Such uncertainty is specified before taking the model to the data and is called prior distribution. Then these prior distributions of all

estimated parameters are combined with the observed data that is expressed in terms of the likelihood function to obtain the posterior distribution, which is the estimated results of the parameter values. Thus, these three ingredients, i.e. prior distribution, data and posterior distribution, constitute the Bayes’ theorem (Van de Schoot and

Depaoli, 2014).

Prior distribution of Bayesian statistics reflects prior knowledge of the underlying parameters, which can be stemmed from previous studies and investigations (O’Hagan et al., 2006). The variance of the prior distribution implies the level of uncertainty about the value of the parameters, the smaller the variance, the more certain about the value of the parameter. There are three types of prior distribution regarding the level of certainty of the parameter value, non-informative priors, weakly-informative priors and informative priors. Non-informative priors simply imply a great deal of uncertainty or have no prior knowledge about the value. Weakly-informative priors contain some useful information but typically have limited influence on the final parameter estimate. Finally, the priors that include the most amount of information about the values are informative priors. The last type of prior has a large impact on the final estimates. After specifying the priors, Bayes’ theorem then takes it to the data that contain new and true information and obtain the posterior distribution, which reflects one’s updated knowledge about the estimated parameters (Van de Schoot and

Comparing with the conventional frequentist approach, the major difference is that only Bayes can incorporate background knowledge into the estimation and allow for updating the previous understanding after analysing with the new data. Another advantage of Bayesian statistic is that it does not require testing the same null hypothesis repeatedly. One can pick up the theory from prior literature and conduct further analysis. In addition to theoretical advantages, one practical advantage of using Bayesian methods is that it can deal with small sample size, which is not based on the central limit theorem as in the frequentist approach. The prior distribution only reflects the background knowledge of the theory and is not based on sample size. The maximum likelihood function of the data is scaled by the size of the sample. With more data in the sample, the likelihood function contains more information and may have a larger influence on the final estimation. For a small sample, prior distribution plays heavier role in the estimation, while with large sample, data have a larger impact on the posterior distribution. Many papers have shown the benefits of using Bayesian methods when large data set is not available (Zhang et al., 2007; Lee and Song, 2004; Hox et al., 2012).

Bayesian estimation has been substantially applied in macroeconomics research in the last three decades. Querron-Quintana and Nason (2013) explain the reason that Bayesian method becomes popular is that it offers researcher the chance to estimate and evaluate macro models where frequentist approach often finds challenge to implement, especially for DSGE models. Another attractive aspect of the popularity of the Bayesian approach is the increasing computational power to estimate medium

and large scale DSGE frameworks. In addition, frequentist econometricians argue that DSGE models are misspecified versions of the true model, even though these models are always seen as abstractions of economies. Hence, Bayes’ theorem is favoured in estimating DSGE models as it eschews the existence of such true model and claims that all models are false, but one can be better than another.

2.6.2 Indirect Inference Estimation

Berger and Wolpert (1988) state that Bayes’ theorem, based on the likelihood principle, does not assume the existence of a true or correctly specified DSGE model. The likelihood principle implies that all evidence about a DSGE model is contained in its likelihood conditional on the data. Therefore, from a Bayesian economist perspective, one model can be more likely to be better compared with the benchmark model. Indeed, there is no model can be literally true since the ‘real world’ is too complex to be explained by one model, which implies all DSGE models are false or ‘misspecified’. Nevertheless, as argued in Le et al. (2015), an abstract model with implied residuals may still be able to mimic the data. For example, a model with the assumption of perfect competition may never exist in reality, but can still be able to predict the behaviour of industries with a high degree of competition. Thus, although the DSGE model maybe just a simplified version of a complex reality, it should be tested on its explanatory power against the real data.

Bayesian estimation can evaluate the model by creating a likelihood ratio, which essentially states that one model is better or worse than the other. It does not directly test the model against the real data. Thus, indirect inference technique is applied in this thesis for judging whether the model is partially rejected or not rejected by the data. Indirect inference estimation has been widely used in history (see Smith, 1993; Gregory and Smith, 1991; Gourieroux et al., 1993; Gourieroux and Monfort, 1996; Canova, 2007). Le et al. (2011) refine the method with Monte Carlo simulation, which is adopted in this research. The basic idea of indirect inference estimation is to compare the simulated data generated from the DSGE model with the actual data. Specifically, a different set of model parameters would generate different simulated moments from the same model. If the simulated moments are sufficiently close to the moments generated by the actual data, the model can be viewed as to pass the indirect inference test, and the set of model parameter that passes the test is the final estimated parameters.

To obtain the moments of the actual data, we need to choose the auxiliary model. Meenagh et al. (2009) demonstrate that the Vector Autoregressive model (VAR) can be used as an approximation of the reduced form of the DSGE models. Hence, we use the VAR model as the auxiliary model and incorporate with the chosen actual data to calculate the moments of the real data. The simulated moments from the model are used to compare with the moments from the auxiliary model. To determine whether two sets of moments are close to each other, we need to compute the Wald statistic. According to Le et al. (2011), there are two different types of Wald statistics: the ‘Full

Wald’ which considers all endogenous variables in the DSGE model and the ‘Directed Wald’ which mainly focuses on some aspects of the model performance. One should notice that the more variables and lags are included in the auxiliary model, the higher power of the indirect inference test and the higher chance that the model can be rejected. Therefore, it is arguably sufficient that if we mainly focus on the key endogenous variables in our model, which including output, inflation and interest rate.

The purpose of the indirect inference testing is to find out whether a certain set of model parameters can compute the Wald statistics that pass the critical value, while the indirect inference estimation is aimed to find out at least one set of model parameters that can finally pass the test. This implies to simulate the model and test different sets of parameters a hundred or even a thousand times.

In document Modelling shadow banking system and housing market in China (Page 84-89)