The linear regression model is the workhorse of econometrics. In addition to being im-portant in its own right, this model is an imim-portant component of other, more complicated models. The linear regression model posits a linear relationship between the dependent variableyiand a1 × k vector of explanatory variables, xi, wherei= 1, . . . , N indexes the relevant observational unit (e.g., individual, firm, time period, etc.). In matrix notation, the linear regression model can be written as
y= Xβ + , (10.1)
wherey = [y1 y2· · · yN]is anN vector,
X=
⎡
⎢⎣ x1
... xN
⎤
⎥⎦
is anN × k matrix, and = [12 · · · N] is anN vector of errors. Assumptions about
and X define the likelihood function. Unless explicitly noted otherwise, the questions in this chapter assume that and X satisfy what are often referred to as the classical assump-tions. With regards to the explanatory variables, we assume that either they are not random or, if they are random variables, it is acceptable to proceed conditionally on them. That is, the distribution of interest isy|X [see Poirier (1995), pp. 448–458 for a more detailed discussion of the random explanatory variable case]. For simplicity of notation, we will not include X as a conditioning argument in most equations and leave this source of condi-tioning implicit. Finally, in this chapter and those that follow we typically reserve capital letters such asX to denote matrices and lowercase letters such as y and xito denote scalar or vector quantities. When the distinction between scalars and vectors is important, this will be made clear within the exercise. We also drop the convention of using capital letters
107
108 10 The linear regression model
to denote random variables, since our focus throughout the remainder of the book is almost exclusively ex post. This notation is primarily needed when considering ex ante proper-ties and their contrast with those obtained from the ex post viewpoint (e.g., in Chapter 4).
These issues are not taken up in the following chapters and thus this notational distinction is decidedly less critical.
With regard to the errors in the regression model, we assume they are independently normally distributed with mean zero and common varianceσ2. That is,
∼ N
0N, σ2IN
, (10.2)
where0N is anN vector of zeroes, IN is theN × N identity matrix, and N(·, ·) denotes the multivariate normal distribution (see Appendix Definition 3). In this chapter we choose to work with the error precision,h ≡ σ−2, and, thus, the normal linear regression model depends on the parameter vector [βh]. By using the properties of the multivariate nor-mal distribution, it follows that p(y|β, h) = φ
y; Xβ, h−1IN
, and thus the likelihood function is given by
In an analogous fashion to expressions given in Exercises 2.3 and 2.14, it is often convenient to write the likelihood function in terms of ordinary least squares (OLS) quantities
β= This can be done by using the fact that
(y − Xβ)(y − Xβ) = SSE +
Exercise 10.1 (Posterior distributions under a normal-gamma prior) For the normal linear regression model under the classical assumptions, use a normal-Gamma prior [i.e., the prior forβ and h is N G
β, Q, s−2, ν
; see Appendix Definition 5]. Derive the poste-rior forβ and h and thus show that the normal-gamma prior is a conjugate prior for this model.
10 The linear regression model 109 Solution
Using Bayes’ theorem and the properties of the normal-gamma density, we have p(β, h|y) ∝ p (β, h) L (β, h)
where fγ() denotes the gamma density (see Appendix Definition 2) and φ () the multi-variate normal p.d.f. This question can be solved by plugging in the forms for each of the densities in these expressions, rearranging them (using some theorems in matrix algebra), and recognizing that the result is the kernel of normal-gamma density. (Details are pro-vided in the following.) Since the posterior and prior are both normal-gamma, conjugacy is established. The steps are elaborated in the remainder of this solution.
Begin by writing out each density (ignoring integrating constants not involving the parameters) and using the expression for the likelihood function in (10.4):
p(β, h|y) ∝
whereν = ν + N. In this expression, β enters only in the terms in the exponent,
β− β
110 10 The linear regression model
Thus,β enters only through the term
β− β Q−1
β− β
and we can establish that the kernel ofβ|y, h is given by
Since this is the kernel of a normal density we have established thatβ|y, h ∼ N
β, h−1Q−1
. We can derivep(h|y) by using the fact that
p(h|y) =
p(β, h|y) dβ =
p(h|y) p (β|y, h) dβ.
Since p.d.f.s integrate to one we can integrate out the component involving normal density forp(β|y, h) and we are left with
We recognize the final expression forp(h|y) as the kernel of a gamma density, and thus we have established thath|y ∼ γ
s−2, ν .
Sinceβ|y, h is normal and h|y is gamma, it follows immediately that the posterior for β andh is N G
β, Q, s−2, ν
. Since prior and posterior are both normal-gamma, conjugacy has been established.
Exercise 10.2 (A partially informative prior) Suppose you have a normal linear re-gression model with partially informative natural conjugate prior where prior information is available only onJ ≤ k linear combinations of the regression coefficients and the prior forh is the standard noninformative one: p(h) ∝ 1/h. Thus, Rβ|h ∼ N(r, h−1Vr), where R is a known J × k matrix with rank(R) = J, r is a known J vector, and Vr is aJ × J positive definite matrix. Show that the posterior is given by
β, h|y ∼ NG
10 The linear regression model 111 model in (10.1) can be written as
y= Zγ + ,
. In words, we have transformed the model so that an informative prior is employed forγ2 = Rβ whereas a noninformative prior is employed forγ1. The remainder of the proof is essentially the same as for Exercise 10.1. Assuming a normal-gamma natural conjugate prior forγ and h leads to a normal-gamma posterior. Tak-ing noninformative limitTak-ing cases for the priors forγ1andh and then transforming back to the original parameterization (e.g., usingβ = A−1γ and X = ZA) yields the expressions given in the question.
Exercise 10.3 (Ellipsoid bound theorem) Consider the normal linear regression model with natural conjugate prior,β, h∼ NG
β, Q, s−2, ν
. Show that ifβ = 0 then, for any choice ofQ, the posterior mean β must lie in the ellipsoid:
Whenβ= 0, the formula for the posterior mean β is given in Exercise 10.1 as
β= QXX β, (10.5)
where
Q=
Q−1+ XX−1
. (10.6)
112 10 The linear regression model
Multiplying (10.5) byQ−1(using the expression in Equation 10.6) we obtain
Q−1+ XX
β= XX β, rearranging and multiplying both sides of the equation byβyields
βQ−1β= βXX
But the ellipsoid bound equation given in the question can be rearranged to be identical to (10.7). In particular, expanding the quadratic form on the left-hand side of the ellipsoid bound equation yields
Thus, (10.7) is equivalent to the ellipsoid bound theorem.
You may wish to read Leamer (1982) or Poirier (1995, pp. 526–537) for more details about ellipsoid bound theorems.
*Exercise 10.4 (Problems with Bayes factors using noninformative priors: Leamer [1978, p. 111]) Suppose you have two normal linear regression models:
Mj : y = Xjβj+ j,
wherej= 1, 2, Xjis anN×kjmatrix of explanatory variables,βj is akj vector of regres-sion coefficients, andj is anN vector of errors distributed as N
0N, h−1j IN given in the solution to Exercise 10.1) and the Bayes factor comparingM2 toM1is given by
10 The linear regression model 113
The casek1= k2corresponds to two nonnested models with the same number of explana-tory variables. cancel out in the Bayes factor. Furthermore, under the various assumptions about Q−1
j it
114 10 The linear regression model
In part (b)|Q−1j | = c and hence
|Q−12 |/|Q−11 |
= 1 for all c regardless of what kj is.
Thus, the result in part (b) is established.
In part (c) |Q−1j | = c |XjXj| and hence
|Q−12 |/|Q−11 |
= |X2X2|/|X1X1| for all c regardless of what kj is. Using this result, simplifies the Bayes factor to the expression given in part (c).
Exercise 10.5 (Multicollinearity) Consider the normal linear regression model with nat-ural conjugate prior: N G
β, Q, s−2, ν
. Assume in addition thatXc = 0 for some non-zero vector of constantsc. Note that this is referred to as a case of perfect multicollinearity.
It implies that the matrixX is not of full rank and(XX)−1does not exist.
(a) Show that, despite this pathology, the posterior exists ifQ is positive definite.
(b) Define
α= cQ−1β.
Show that, givenh, the prior and posterior distributions of α are identical and equal to N
cQ−1β, h−1cQ−1c .
Hence, although prior information can be used to surmount the problems caused by perfect multicollinearity, there are some combinations of the regression coefficients about which (conditional) learning does not occur.
Solution
(a) The solution to this question is essentially the same as to Exercise 10.1. The key thing to note is that, sinceQ is positive definite, Q−1 exists and, hence,Q−1 exists despite the fact thatXX is rank deficient. In the same manner, it can be shown that the posterior for β and h is N G
(b) The properties of the normal-gamma distribution imply β|h ∼ N
β, h−1Q . The properties of the normal distribution imply
α|h ≡ cQ−1β|h ∼ N
cQ−1β, h−1cQ−1c , which establishes that the prior has the required form.
10 The linear regression model 115 Using the result from part (a) gives the relevant posterior of the form
β|y, h ∼ N
β, h−1Q , which implies
α|y, h ∼ N
cQ−1β, h−1cQ−1QQ−1c . The mean can be written as
cQ−1β= c
Q−1− XX
Q
Q−1β+ Xy
= cQ−1β+ cXy− cXXQ
Q−1β+ Xy
= cQ−1β, sinceXc= 0.
The variance can be written as
h−1cQ−1QQ−1c= h−1cQ−1
Q−1+ XX−1 Q−1
c
= h−1cQ−1
I−
Q−1+ XX−1 XX
c
= h−1cQ−1c,
sinceXc= 0. Note that this derivation uses a standard theorem in matrix algebra that says, ifG and H are k× k matrices and (G + H)−1exists, then
(G + H)−1H= Ik− (G + H)−1G.
Combining these derivations, we see that the posterior is also α|y, h ∼ N
cQ−1β, h−1cQ−1c , as required,