EMPIRICAL ILLUSTRATION - and a Single Explanatory Variable

and a Single Explanatory Variable

2.7 EMPIRICAL ILLUSTRATION

The regression model outlined in this chapter is probably too simple to be used for any serious empirical work. For one thing, to simplify the algebra, we have not included an intercept in the model. Furthermore, virtually any serious appli- cation will involve several explanatory variables. Hence, to illustrate the basic concepts discussed in this chapter, we will work with a data set artificially gen- erated by the computer. That is, we set N D 50. We begin by generating values of the explanatory variable, xi, which are i.i.d. draws from the N.0; 1/ distri- bution for i D 1; : : : ; 50. We then generate values for the errors, "i, which are N.0; h1/. Finally, we use the explanatory variables and errors to generate the dependent variable yi D þxi C"i. We set þ D 2 and h D 1. We use two priors, the noninformative one given in (2.23) and the informative natural conjugate prior given in (2.7) with þ D 1:5, V D 0:25, ¹ D 10 and s2 D1. The choices of data generating process and prior hyperparameter values are purely illustrative.

Tables 2.1 and 2.2 present prior and posterior properties of the model parameters, þ and h, respectively, using (2.7)–(2.22). Figure 2.1 plots posteriors for þ under the informative and noninformative priors as well as the informative prior itself (the noninformative prior for þ is simply a flat line). From (2.13) it follows that the plotted p.d.f.s are all t-densities. Posterior properties based on the noninformative prior reflect only likelihood function information and are equivalent to frequentist OLS quantities (see (2.19)–(2.22)). For this reason, the Table 2.1 Prior and Posterior Properties ofþ

Prior Posterior

Using Noninformative Using Informative

Informative Prior Prior

Mean 1:50 2:06 1:96

Table 2.2 Prior and Posterior Properties of h

Prior Posterior

Using Noninformative Using Informative

Informative Prior Prior

Mean 1.00 1.07 1.04 St. Deviation 0.45 0.21 0.19 1 1.2 1.4 1.6 1.8 Probability density 2 b 2.2 2.4 2.6 2.8 3 Prior Posterior Likelihood

Figure 2.1 Marginal Prior and Posteriors forþ

marginal posterior for þ under the noninformative prior is labeled ‘Likelihood’ in Figure 2.1.

The tables and figure show clearly how Bayesian inference involves combin- ing prior and data information to form a posterior. For instance, in Figure 2.1, it can be seen that the posterior based on the informative prior looks to be an average of the prior density and the likelihood function. Tables 2.1 and 2.2 show that the posterior means of both parameters, E.þjy/ and E.hjy/, using the informative prior lie between the relevant prior mean and the likelihood-based quantity (i.e. the posterior mean using the noninformative prior). The prior we have selected contains less information than the data. This can be seen either in the figure (i.e. the prior p.d.f. is more dispersed than the likelihood) or in the tables (i.e. the prior standard deviations are larger than the likelihood-based quantities).

Remember that, since the data set has been artificially created, we know that the true parameter values are þ D 2 and h D 1. You would, of course, never expect a point estimate like a posterior mean or an OLS quantity to be pre- cisely equal to the true value. However, the posterior means are quite close to their true values relative to their posterior standard deviations. Note also that the posterior standard deviations using the informative prior are slightly smaller than those using the noninformative prior. This reflects the intuitive notion that, in general, more information allows for more precise estimation. That is, it is intuitively sensible that a posterior which combines both prior and data information will be less dispersed than one which uses a noninformative prior and is based only on data information. In terms of the formulae, this intuitive notion is captured through (2.9) being smaller than (2.19) if V > 0. Note, however, that this intuition is not guaranteed to hold in every case since, if prior and data information are greatly different from one another, then (2.12) can become much bigger than (2.22). Since both V and ¹s2 enter the formula for the posterior standard deviation of þ, it is possible (although unusual) for the posterior standard deviation using an informative prior to be larger than that using a noninformative prior.

To illustrate model comparison, let us suppose we are interested in comparing the model we have been discussing to another linear regression model which contains only an intercept (i.e. in this second model xi D1 for i D 1; : : : ; 50). For both models, we use the same informative prior described above (i.e. both priors are N G.1:5; 0:25; 1; 10/). Assuming a prior odds ratio equal to one, (2.34) can be used to calculate the posterior odds ratio comparing these two models. Of course, we know our first model is the correct one and, hence, we would expect the posterior odds ratio to indicate this. This does turn out to be the case, since we find a posterior odds ratio of 3749:7. In words, we are finding overwhelming support for our correct first model. It is almost 4000 times more likely to be true than the second model. In terms of posterior model probabilities, the posterior odds ratio implies that p.M1jy/ D 0:9997 and p.M2jy/ D 0:0003. If we were to do Bayesian model averaging using these two models, we would attach 99:97% weight to results from the first model and only 0:03% weight to results from the second (see (2.41)).

Predictive inference can be carried out using (2.40). We illustrate how this is done by selecting the point xŁ D0:5. Using the informative prior, it turns out that

yŁjy ¾ t.0:98; 0:97; 60/ Using the noninformative prior, it turns out that

yŁjy ¾ t.1:03; 0:95; 50/

Either of these probability statements can be used to present point predictions, predictive standard deviations, or any other predictive function of interest you may wish to calculate.

2.8 SUMMARY

In this chapter, we have gone through a complete Bayesian analysis (i.e. likelihood, prior, posterior, model comparison and prediction), for the Normal linear regression model with a single explanatory variable and a so-called natural conjugate prior. For the parameters of this model,þ and h, this prior has a Normal- Gamma distribution. The natural conjugate nature of the prior means that the posterior also has a Normal-Gamma distribution. For this prior, posterior and predictive inference and model comparison can be done analytically and no posterior simulation is required. Other themes introduced in this chapter include the concept of a noninformative prior and Bayesian model averaging.

2.9 EXERCISES

2.9.1 Theoretical Exercises

1. Prove the result in (2.8). Hint: This is a standard derivation proved in many other textbooks such as Poirier (1995, p. 527) or Zellner (1971, pp. 60–61), and you may wish to examine some of these references if you are having trouble.

2. For this question, assume the likelihood function is as described in Section 2.2 with known error precision, h D 1; and xi D1 for i D 1; : : : ; N. (a) Assume a Uniform prior forþ such that þ ¾ U.Þ; /. Derive the posterior

p.þjy/.

(b) What happens to p.þjy/ as Þ ! 1 and ! 1?

(c) Use the change-of-variable theorem (Appendix B, Theorem B.21) to derive the prior for a one-to-one function of the regression coefficient, g.þ/; assuming that þ has the Uniform prior given in (a). Sketch the implied prior for several choices of g./ (e.g. g.þ/ D log.þ/; g.þ/ D

exp.þ/

1Cexp.þ/; g.þ/ D exp.þ/, etc.).

(d) Consider what happens to the priors in part (c) asÞ ! 1 and ! 1. (e) Given your answers to part (d), discuss whether a prior which is ‘noninformative’ when the model is parameterized in one way is also ‘noninformative’ when the model is parameterized in a different way.

2.9.2 Computer-Based Exercises

Remember that some data sets and MATLAB programs are available on the website associated with this book.

3. Generating artificial data sets. This is an important skill since they can be used to understand the properties of models and investigate the performance of a particular computer algorithm. Since you have chosen the values for

parameters, you know roughly what answer you would hope your econometric methods should give.

(a) Generate several artificial data sets from the Normal linear regression model by using the following steps: (i) Choose values forþ; h and N (e.g. þ D 2, h D 1 and N D 100); (ii) Generate N values for the explanatory variable from a distribution of your choice (e.g. take N D 100 draws from the U.0; 1/ distribution); (iii) Generate N values of the errors by taking N i.i.d. draws from the N.0; h1/; and (iv) Construct data on the dependent variables using your chosen value for þ and the data generated in steps (ii) and (iii) (i.e. use yi DþxiC"i for i D 1; : : : ; N).

(b) Make XY-plots of each data set to see how your choices ofþ; h, and N are reflected in the data.

4. Bayesian inference in the Normal linear regression model: prior sensitivity. (a) Generate an artificial data set with þ D 2, h D 1 and N D 100 using the

U.0; 1/ distribution to generate the explanatory variable.

(b) Assume a prior of the form þ; h ¾ N G.þ; V ; s2; ¹/ with þ D 2; V D 1; s2 D1; ¹ D 1, and calculate the posterior means and standard deviations ofþ and h. Calculate the Bayes factor comparing the model with þ D 0 to that with þ 6D 0. Calculate the predictive mean and standard deviation for an individual with x D 0:5.

(c) How does your answer to part (b) change if V D 0:01? What if V D 0:1? What if V D 10? What if V D 100? What if V D 1 000 000?

(d) How does your answer to part (b) change if ¹ D 0:01? What if ¹ D 0:1? What if¹ D 10? What if ¹ D 100? What if ¹ D 1 000 000?

(e) Set the prior mean ofþ different from the value used to generate the data (e.g.þ D 0) and repeat part (c).

(f) Set the prior mean of h far from its true value (e.g. s2D100) and repeat part (d).

(g) In light of your findings in parts (b) through (f) discuss the sensitivity of posterior means, standard deviations and Bayes factors to changes in the prior.

(h) Repeat parts (a) through (g) using more informative (e.g. N D 1000) and less informative (e.g. N D 10) data sets.

(i) Repeat parts (a) through (h) using different values forþ and h to generate artificial data.

3 The Normal Linear Regression

In document Koop - Bayesian Econometrics 2003 (Page 39-44)