Hierarchical models - Bayesian Econometric Methods

In this chapter we take up issues relating to estimation in hierarchical models. Many models belong in this class, and indeed, several exercises appearing in later chapters (e.g., mixture models and panel probit models) involve specifications with hierarchical structures that could be included as exercises here. In this chapter, however, we confine our attention to linear specifications and pay particular attention to normal hierarchical linear regression models. In econometrics, these are commonly used with panel (also called longitudinal) data and several of the questions in this chapter use such data. With the exception of the first exercise, all questions involve posterior simulation using Gibbs samplers. Introductory exercises on Gibbs sampling were provided in Chapter 11. Matlab code for all the empirical work is also provided on the Web site associated with this book.

Exercise 12.1 (General results for normal hierarchical linear models) Lety be an N vector andθ₁,θ₂, andθ₃be parameter vectors of lengthk₁,k₂, andk₃, respectively. LetX, W , and Z be known N× k1, k₁× k2, andk₂× k3matrices andC₁,C₂, andC₃beN× N, k₁× k1, andk₂× k2known positive deﬁnite matrices. Assume

y|θ1∼ N (Xθ1, C₁) , (12.1)

θ₁|θ₂ ∼ N (W θ₂, C₂) , (12.2) and

169

170 12 Hierarchical models (a) Show that

y|θ₂∼ N

XW θ₂, C₁+ XC₂X

(12.4) and thus that this normal linear regression model with hierarchical prior can be written as a different normal linear regression model with nonhierarchical prior.

(b) Derivep(θ₂|y, θ₃).

θ₁|y, θ₃ ∼ N (Dd, D), where

D⁻¹ = XC₁⁻¹X+

C₂+ W C3W₋₁ and

d= XC₁⁻¹y+

C₂+ W C₃W₋₁

W Zθ₃.

Solution

(a) Equation (12.1) is equivalent to

y= Xθ₁+ , where∼ N (0, C₁). Similarly, (12.2) can be written as

θ₁= W θ₂+ v, wherev∼ N (0, C₂). Substituting, we obtain

y= XW θ2+ Xv + .

Using the properties of the normal distribution leads immediately to (i)y|θ2 is normal, (ii) its mean isXW θ₂, and (iii) its covariance isE

(Xv + ) (Xv + )

= C1+ XC2X (since the question assumptions imply and v are independent of one another). Hence, the required result is proven.

(b) By Bayes’ theorem,

p(θ2|y, θ3) ∝ p (θ2|θ3) p (y|θ2, θ₃) .

The ﬁrst term on the right-hand side is simply the normal prior in (12.3) and the second term is the normal likelihood in (12.4). The derivation of the resulting posterior is standard (see Exercises 10.1 and 13.1 for related derivations). It can be veriﬁed that

θ₂|y, θ3∼ N(Hh, H), where

H =

C₁+ XC2X₋₁

XW + C₃⁻¹₋₁ and

h= WX

C₁+ XC₂X₋₁

y+ C₃⁻¹Zθ₃.

12 Hierarchical models 171 (c) Again, Bayes’ theorem implies p(θ₁|y, θ₃) ∝ p (θ₁|θ₃) p (y|θ₁, θ₃). The assumptions of the question implyp(y|θ₁, θ₃) is given in (12.1). A proof the same as that for part (a), except that (12.2) and (12.3) are used, establishes that the prior for θ₁ withθ₂ integrated out takes the form

θ₁|θ₃∼ N

W Zθ₃, C₂+ W C₃W

. (12.5)

Hence, just as in part (b), we have a normal likelihood (12.1) times a normal prior (12.5).

The derivation of the resulting posterior is standard (see Exercises 2.3 and 10.1 for closely related derivations). It can be veriﬁed that the mean and covariance of the normal posterior, p(θ₁|y, θ₃), are as given in the question.

Exercise 12.2 (Hierarchical modeling with longitudinal data) Consider the following longitudinal data model:

y_it = α_i+ _it, _it^iid∼ N(0, σ²), (12.6)

α_i ^iid∼ N(α, σ²_α), (12.7)

wherey_itrefers to the outcomes for individual (or more generally, group)i at time t and α_i is a person-speciﬁc random effect. We assumei= 1, 2, . . . , N and t = 1, 2, . . . , T (i.e., a balanced panel).

(a) Comment on how the presence of the random effects accounts for correlation patterns within individuals over time.

(b) Derive the conditional posterior distributionp(αi|α, σ², σ_α², y).

(c) Obtain the mean of the conditional posterior distribution in (b). Comment on its rela-tionship to a shrinkage estimator [see, e.g., Poirier (1995), Chapter 6]. How does the mean change asT and σ²/σ²_αchange?

Solution

(a) Conditional on the random effects{α_i}^N_i=1, they_itare independent. However, marginal-ized over the random effects, outcomes are correlated within individuals over time. To see this, note that we can write (12.6) equivalently as

y_it= α + u_i+ _it, where we have rewritten (12.7) as

α_i = α + u_i, u_i^iid∼ N(0, σ_α²), and substituted this result into (12.6). Thus fort= s,

Cov(yit, y_is|α, σ², σ_α²) = Cov(ui+ it, u_i+ is) = Cov(ui, u_i) = Var(ui) = σ_α², so that outcomes are correlated over time within individuals. However, the random effects have not permitted any degree of correlation between the outcomes of different individuals.

172 12 Hierarchical models (c) The mean of this conditional posterior distribution is easily obtained from our solution in (b):

This is in the form of a shrinkage estimator, where the conditional posterior mean ofα_iis a weighted average (withw serving as the weight) of the averaged outcomes for individual i, y_i, and the common mean for all individuals,α. As T → ∞ (and holding all else constant) we see thatw → 1, so the conditional posterior mean approaches the average of the indi-vidual outcomes. Intuitively, this makes sense, since asT → ∞, we acquire more and more information on individuali, and thus data information from individual i dominates any in-formation provided by the outcomes of other individuals in the sample. As(σ²/σ²_α) → ∞, w→ 0, so our posterior mean collapses to the common mean α. In this case the error vari-ability in the outcome equation continues to grow relative to the varivari-ability in the random effects equation, and thus our posterior mean reduces the common effect for all individuals, α.

12 Hierarchical models 173 Exercise 12.3 (A posterior simulator for a linear hierarchical model: Gelfand et al.

[1990]) We illustrate Bayesian procedures for estimating hierarchical linear models using a data set that has become something of a “classic” in the MCMC literature. These data come from the study of Gelfand, Hills, Racine-Poon, and Smith (1990). In this exercise, we derive in full detail the complete posterior conditionals. In the following exercise we ﬁt this model and provide some diagnostic checks for convergence. In subsequent examples complete derivations of the conditionals will not typically be provided, as these derivations of will often follow similarly to those described in this exercise.

In the rat growth model of Gelfand, et al. (1990), 30 different rats are weighed at ﬁve different points in time. We denote the weight of rati at measurement j as y_ij and letx_ij denote the age of theith rat at the jth measurement. Since each of the rats were weighed at exactly the same number of days since birth, we have

x_i1= 8, x_i2= 15, x_i3= 22, x_i4= 29, x_i5= 36 ∀i.

The data used in the analysis are provided in Table 12.1.

Table 12.1: Rat growth data from Gelfand et al. (1990).

Weight Measurements Weight Measurements

Rat Rat

i y_i1 y_i2 y_i3 y_i4 y_i5 i y_i1 y_i2 y_i3 y_i4 y_i5

1 151 199 246 283 320 16 160 207 248 288 324

2 145 199 249 293 354 17 142 187 234 280 316

3 147 214 263 312 328 18 156 203 243 283 317

4 155 200 237 272 297 19 157 212 259 307 336

5 135 188 230 280 323 20 152 203 246 286 321

6 159 210 252 298 331 21 154 205 253 298 334

7 141 189 231 275 305 22 139 190 225 267 302

8 159 201 248 297 338 23 146 191 229 272 302

9 177 236 285 340 376 24 157 211 250 285 323

10 134 182 220 260 296 25 132 185 237 286 331

11 160 208 261 313 352 26 160 207 257 303 345

12 143 188 220 273 314 27 169 216 261 295 333

13 154 200 244 289 325 28 157 205 248 289 316

14 171 221 270 326 358 29 137 180 219 258 291

15 163 216 242 281 312 30 153 200 244 286 324

In our model, we want to permit unit-speciﬁc variation in initial birth weight and growth rates. This leads us to specify the following model:

y_ij|α_i, β_i, σ², x_ij ^ind∼ N(α_i+ β_ix_ij, σ²), i = 1, 2, . . . , 30, j = 1, 2, . . . , 5, so that each rat possesses its own interceptα_iand growth rateβ_i.

174 12 Hierarchical models

We also assume that the rats share some degree of “commonality” in their weight at birth and rates of growth, and thus we assume that the intercept and slope parameters are drawn from the same normal population:

θ_i =

We complete our Bayesian analysis by specifying the following priors:

σ²|a, b ∼ IG(a, b), θ₀|η, C ∼ N(η, C), Σ⁻¹|ρ, R ∼ W ([ρR]⁻¹, ρ),

withW denoting the Wishart distribution (see Appendix Deﬁnition 6 for details).

Derive the complete conditionals for this model and comment on any intuition behind the forms of these conditionals.

Solution

Given the assumed conditional independence across observations, the joint posterior distri-bution for all the parameters of this model can be written as

p(Γ|y) ∝ notationΓ_−xto denote all parameters other thanx. We have stacked the observations over time for each individual rat so that

y_i=

Complete posterior conditional forθ_i

We ﬁrst note that this complete conditional is proportional to the aforementioned joint posterior. Thus, all of the terms in the product that do not involveθ_i are absorbed into the normalizing constant of this conditional. Hence,

p(θ_i|Γ_−θ_i, y) ∝ p(y_i|X_i, θ_i, σ²)p(θ_i|θ₀,Σ).

12 Hierarchical models 175 This ﬁts directly into the framework of Exercise 12.1 [see also Lindley and Smith (1972)], and thus we ﬁnd

Because of the conditional independence, we can draw each of theθ_iin turn by sampling from the corresponding complete conditional.

Complete posterior conditional forθ₀ Forθ₀, a similar argument shows

p(θ₀|Γ_−θ₀, y) ∝

In this form, the results of Exercise 12.1 again apply with θ₀|Γ−θ0, y∼ N(Dθ0d_θ₀, D_θ₀),

176 12 Hierarchical models Complete posterior conditional forσ² Forσ², we ﬁnd

Complete posterior conditional forΣ⁻¹ We ﬁnd

In document Bayesian Econometric Methods (Page 193-200)