The linear regression model - Bayesian Econometric Methods

The linear regression model is the workhorse of econometrics. In addition to being im-portant in its own right, this model is an imim-portant component of other, more complicated models. The linear regression model posits a linear relationship between the dependent variabley_iand a1 × k vector of explanatory variables, x_i, wherei= 1, . . . , N indexes the relevant observational unit (e.g., individual, ﬁrm, time period, etc.). In matrix notation, the linear regression model can be written as

y= Xβ + , (10.1)

wherey = [y1 y₂· · · yN]is anN vector,

⎡

⎢⎣ x₁

... x_N

⎤

⎥⎦

is anN × k matrix, and = [₁₂ · · · _N] is anN vector of errors. Assumptions about

and X deﬁne the likelihood function. Unless explicitly noted otherwise, the questions in this chapter assume that and X satisfy what are often referred to as the classical assump-tions. With regards to the explanatory variables, we assume that either they are not random or, if they are random variables, it is acceptable to proceed conditionally on them. That is, the distribution of interest isy|X [see Poirier (1995), pp. 448–458 for a more detailed discussion of the random explanatory variable case]. For simplicity of notation, we will not include X as a conditioning argument in most equations and leave this source of condi-tioning implicit. Finally, in this chapter and those that follow we typically reserve capital letters such asX to denote matrices and lowercase letters such as y and x_ito denote scalar or vector quantities. When the distinction between scalars and vectors is important, this will be made clear within the exercise. We also drop the convention of using capital letters

107

108 10 The linear regression model

to denote random variables, since our focus throughout the remainder of the book is almost exclusively ex post. This notation is primarily needed when considering ex ante proper-ties and their contrast with those obtained from the ex post viewpoint (e.g., in Chapter 4).

These issues are not taken up in the following chapters and thus this notational distinction is decidedly less critical.

With regard to the errors in the regression model, we assume they are independently normally distributed with mean zero and common varianceσ². That is,

∼ N

0_N, σ²I_N

, (10.2)

where0_N is anN vector of zeroes, I_N is theN × N identity matrix, and N(·, ·) denotes the multivariate normal distribution (see Appendix Deﬁnition 3). In this chapter we choose to work with the error precision,h ≡ σ⁻², and, thus, the normal linear regression model depends on the parameter vector [βh]. By using the properties of the multivariate nor-mal distribution, it follows that p(y|β, h) = φ

y; Xβ, h⁻¹I_N

, and thus the likelihood function is given by

In an analogous fashion to expressions given in Exercises 2.3 and 2.14, it is often convenient to write the likelihood function in terms of ordinary least squares (OLS) quantities

β= This can be done by using the fact that

(y − Xβ)(y − Xβ) = SSE +

Exercise 10.1 (Posterior distributions under a normal-gamma prior) For the normal linear regression model under the classical assumptions, use a normal-Gamma prior [i.e., the prior forβ and h is N G

β, Q, s⁻², ν

; see Appendix Deﬁnition 5]. Derive the poste-rior forβ and h and thus show that the normal-gamma prior is a conjugate prior for this model.

10 The linear regression model 109 Solution

Using Bayes’ theorem and the properties of the normal-gamma density, we have p(β, h|y) ∝ p (β, h) L (β, h)

where f_γ() denotes the gamma density (see Appendix Deﬁnition 2) and φ () the multi-variate normal p.d.f. This question can be solved by plugging in the forms for each of the densities in these expressions, rearranging them (using some theorems in matrix algebra), and recognizing that the result is the kernel of normal-gamma density. (Details are pro-vided in the following.) Since the posterior and prior are both normal-gamma, conjugacy is established. The steps are elaborated in the remainder of this solution.

Begin by writing out each density (ignoring integrating constants not involving the parameters) and using the expression for the likelihood function in (10.4):

p(β, h|y) ∝

whereν = ν + N. In this expression, β enters only in the terms in the exponent,

β− β

110 10 The linear regression model

Thus,β enters only through the term

β− β Q⁻¹

β− β

and we can establish that the kernel ofβ|y, h is given by

Since this is the kernel of a normal density we have established thatβ|y, h ∼ N

β, h⁻¹Q⁻¹

. We can derivep(h|y) by using the fact that

p(h|y) =

p(β, h|y) dβ =

p(h|y) p (β|y, h) dβ.

Since p.d.f.s integrate to one we can integrate out the component involving normal density forp(β|y, h) and we are left with

We recognize the ﬁnal expression forp(h|y) as the kernel of a gamma density, and thus we have established thath|y ∼ γ

s⁻², ν .

Sinceβ|y, h is normal and h|y is gamma, it follows immediately that the posterior for β andh is N G

β, Q, s⁻², ν

. Since prior and posterior are both normal-gamma, conjugacy has been established.

Exercise 10.2 (A partially informative prior) Suppose you have a normal linear re-gression model with partially informative natural conjugate prior where prior information is available only onJ ≤ k linear combinations of the regression coefﬁcients and the prior forh is the standard noninformative one: p(h) ∝ 1/h. Thus, Rβ|h ∼ N(r, h⁻¹V_r), where R is a known J × k matrix with rank(R) = J, r is a known J vector, and V_r is aJ × J positive deﬁnite matrix. Show that the posterior is given by

β, h|y ∼ NG

10 The linear regression model 111 model in (10.1) can be written as

y= Zγ + ,

. In words, we have transformed the model so that an informative prior is employed forγ₂ = Rβ whereas a noninformative prior is employed forγ₁. The remainder of the proof is essentially the same as for Exercise 10.1. Assuming a normal-gamma natural conjugate prior forγ and h leads to a normal-gamma posterior. Tak-ing noninformative limitTak-ing cases for the priors forγ₁andh and then transforming back to the original parameterization (e.g., usingβ = A⁻¹γ and X = ZA) yields the expressions given in the question.

Exercise 10.3 (Ellipsoid bound theorem) Consider the normal linear regression model with natural conjugate prior,β, h∼ NG

β, Q, s⁻², ν

. Show that ifβ = 0 then, for any choice ofQ, the posterior mean β must lie in the ellipsoid:

Whenβ= 0, the formula for the posterior mean β is given in Exercise 10.1 as

β= QXX β, (10.5)

where

Q⁻¹+ XX₋₁

. (10.6)

112 10 The linear regression model

Multiplying (10.5) byQ⁻¹(using the expression in Equation 10.6) we obtain

Q⁻¹+ XX

β= XX β, rearranging and multiplying both sides of the equation byβyields

βQ⁻¹β= βXX

But the ellipsoid bound equation given in the question can be rearranged to be identical to (10.7). In particular, expanding the quadratic form on the left-hand side of the ellipsoid bound equation yields

Thus, (10.7) is equivalent to the ellipsoid bound theorem.

You may wish to read Leamer (1982) or Poirier (1995, pp. 526–537) for more details about ellipsoid bound theorems.

*Exercise 10.4 (Problems with Bayes factors using noninformative priors: Leamer [1978, p. 111]) Suppose you have two normal linear regression models:

M_j : y = Xjβ_j+ j,

wherej= 1, 2, Xjis anN×kjmatrix of explanatory variables,β_j is ak_j vector of regres-sion coefﬁcients, and_j is anN vector of errors distributed as N

0_N, h⁻¹_j I_N given in the solution to Exercise 10.1) and the Bayes factor comparingM₂ toM₁is given by

10 The linear regression model 113

The casek₁= k2corresponds to two nonnested models with the same number of explana-tory variables. cancel out in the Bayes factor. Furthermore, under the various assumptions about Q⁻¹

j it

114 10 The linear regression model

In part (b)|Q⁻¹_j | = c and hence

|Q⁻¹₂ |/|Q⁻¹₁ |

= 1 for all c regardless of what kj is.

Thus, the result in part (b) is established.

In part (c) |Q⁻¹_j | = c |X_jX_j| and hence

|Q⁻¹₂ |/|Q⁻¹₁ |

= |X₂X₂|/|X₁X₁| for all c regardless of what k_j is. Using this result, simpliﬁes the Bayes factor to the expression given in part (c).

Exercise 10.5 (Multicollinearity) Consider the normal linear regression model with nat-ural conjugate prior: N G

β, Q, s⁻², ν

. Assume in addition thatXc = 0 for some non-zero vector of constantsc. Note that this is referred to as a case of perfect multicollinearity.

It implies that the matrixX is not of full rank and(XX)⁻¹does not exist.

(a) Show that, despite this pathology, the posterior exists ifQ is positive deﬁnite.

(b) Deﬁne

α= cQ⁻¹β.

Show that, givenh, the prior and posterior distributions of α are identical and equal to N

cQ⁻¹β, h⁻¹cQ⁻¹c .

Hence, although prior information can be used to surmount the problems caused by perfect multicollinearity, there are some combinations of the regression coefﬁcients about which (conditional) learning does not occur.

Solution

(a) The solution to this question is essentially the same as to Exercise 10.1. The key thing to note is that, sinceQ is positive deﬁnite, Q⁻¹ exists and, hence,Q⁻¹ exists despite the fact thatXX is rank deﬁcient. In the same manner, it can be shown that the posterior for β and h is N G

(b) The properties of the normal-gamma distribution imply β|h ∼ N

β, h⁻¹Q . The properties of the normal distribution imply

α|h ≡ cQ⁻¹β|h ∼ N

cQ⁻¹β, h⁻¹cQ⁻¹c , which establishes that the prior has the required form.

10 The linear regression model 115 Using the result from part (a) gives the relevant posterior of the form

β|y, h ∼ N

β, h⁻¹Q , which implies

α|y, h ∼ N

cQ⁻¹β, h⁻¹cQ⁻¹QQ⁻¹c . The mean can be written as

cQ⁻¹β= c

Q⁻¹− XX

Q⁻¹β+ Xy

= cQ⁻¹β+ cXy− cXXQ

Q⁻¹β+ Xy

= cQ⁻¹β, sinceXc= 0.

The variance can be written as

h⁻¹cQ⁻¹QQ⁻¹c= h⁻¹cQ⁻¹

Q⁻¹+ XX₋₁ Q⁻¹

= h⁻¹cQ⁻¹

I−

Q⁻¹+ XX₋₁ XX

= h⁻¹cQ⁻¹c,

sinceXc= 0. Note that this derivation uses a standard theorem in matrix algebra that says, ifG and H are k× k matrices and (G + H)⁻¹exists, then

(G + H)⁻¹H= I_k− (G + H)⁻¹G.

Combining these derivations, we see that the posterior is also α|y, h ∼ N

cQ⁻¹β, h⁻¹cQ⁻¹c , as required,

11

In document Bayesian Econometric Methods (Page 131-141)