In this chapter we take up issues relating to estimation in hierarchical models. Many models belong in this class, and indeed, several exercises appearing in later chapters (e.g., mixture models and panel probit models) involve specifications with hierarchical structures that could be included as exercises here. In this chapter, however, we confine our attention to linear specifications and pay particular attention to normal hierarchical linear regression models. In econometrics, these are commonly used with panel (also called longitudinal) data and several of the questions in this chapter use such data. With the exception of the first exercise, all questions involve posterior simulation using Gibbs samplers. Introductory exercises on Gibbs sampling were provided in Chapter 11. Matlab code for all the empirical work is also provided on the Web site associated with this book.
Exercise 12.1 (General results for normal hierarchical linear models) Lety be an N vector andθ1,θ2, andθ3be parameter vectors of lengthk1,k2, andk3, respectively. LetX, W , and Z be known N× k1, k1× k2, andk2× k3matrices andC1,C2, andC3beN× N, k1× k1, andk2× k2known positive definite matrices. Assume
y|θ1∼ N (Xθ1, C1) , (12.1)
θ1|θ2 ∼ N (W θ2, C2) , (12.2) and
θ2|θ3 ∼ N (Zθ3, C3) . (12.3) We make conditional independence assumptions such thatp(y|θ1, θ2) = p (y|θ1) and p(θ1|θ2, θ3) = p (θ1|θ2) .
169
170 12 Hierarchical models (a) Show that
y|θ2∼ N
XW θ2, C1+ XC2X
(12.4) and thus that this normal linear regression model with hierarchical prior can be written as a different normal linear regression model with nonhierarchical prior.
(b) Derivep(θ2|y, θ3).
(c) Show that
θ1|y, θ3 ∼ N (Dd, D), where
D−1 = XC1−1X+
C2+ W C3W−1 and
d= XC1−1y+
C2+ W C3W−1
W Zθ3.
Solution
(a) Equation (12.1) is equivalent to
y= Xθ1+ , where∼ N (0, C1). Similarly, (12.2) can be written as
θ1= W θ2+ v, wherev∼ N (0, C2). Substituting, we obtain
y= XW θ2+ Xv + .
Using the properties of the normal distribution leads immediately to (i)y|θ2 is normal, (ii) its mean isXW θ2, and (iii) its covariance isE
(Xv + ) (Xv + )
= C1+ XC2X (since the question assumptions imply and v are independent of one another). Hence, the required result is proven.
(b) By Bayes’ theorem,
p(θ2|y, θ3) ∝ p (θ2|θ3) p (y|θ2, θ3) .
The first term on the right-hand side is simply the normal prior in (12.3) and the second term is the normal likelihood in (12.4). The derivation of the resulting posterior is standard (see Exercises 10.1 and 13.1 for related derivations). It can be verified that
θ2|y, θ3∼ N(Hh, H), where
H =
WX
C1+ XC2X−1
XW + C3−1−1 and
h= WX
C1+ XC2X−1
y+ C3−1Zθ3.
12 Hierarchical models 171 (c) Again, Bayes’ theorem implies p(θ1|y, θ3) ∝ p (θ1|θ3) p (y|θ1, θ3). The assumptions of the question implyp(y|θ1, θ3) is given in (12.1). A proof the same as that for part (a), except that (12.2) and (12.3) are used, establishes that the prior for θ1 withθ2 integrated out takes the form
θ1|θ3∼ N
W Zθ3, C2+ W C3W
. (12.5)
Hence, just as in part (b), we have a normal likelihood (12.1) times a normal prior (12.5).
The derivation of the resulting posterior is standard (see Exercises 2.3 and 10.1 for closely related derivations). It can be verified that the mean and covariance of the normal posterior, p(θ1|y, θ3), are as given in the question.
Exercise 12.2 (Hierarchical modeling with longitudinal data) Consider the following longitudinal data model:
yit = αi+ it, itiid∼ N(0, σ2), (12.6)
αi iid∼ N(α, σ2α), (12.7)
whereyitrefers to the outcomes for individual (or more generally, group)i at time t and αi is a person-specific random effect. We assumei= 1, 2, . . . , N and t = 1, 2, . . . , T (i.e., a balanced panel).
(a) Comment on how the presence of the random effects accounts for correlation patterns within individuals over time.
(b) Derive the conditional posterior distributionp(αi|α, σ2, σα2, y).
(c) Obtain the mean of the conditional posterior distribution in (b). Comment on its rela-tionship to a shrinkage estimator [see, e.g., Poirier (1995), Chapter 6]. How does the mean change asT and σ2/σ2αchange?
Solution
(a) Conditional on the random effects{αi}Ni=1, theyitare independent. However, marginal-ized over the random effects, outcomes are correlated within individuals over time. To see this, note that we can write (12.6) equivalently as
yit= α + ui+ it, where we have rewritten (12.7) as
αi = α + ui, uiiid∼ N(0, σα2), and substituted this result into (12.6). Thus fort= s,
Cov(yit, yis|α, σ2, σα2) = Cov(ui+ it, ui+ is) = Cov(ui, ui) = Var(ui) = σα2, so that outcomes are correlated over time within individuals. However, the random effects have not permitted any degree of correlation between the outcomes of different individuals.
172 12 Hierarchical models (c) The mean of this conditional posterior distribution is easily obtained from our solution in (b):
This is in the form of a shrinkage estimator, where the conditional posterior mean ofαiis a weighted average (withw serving as the weight) of the averaged outcomes for individual i, yi, and the common mean for all individuals,α. As T → ∞ (and holding all else constant) we see thatw → 1, so the conditional posterior mean approaches the average of the indi-vidual outcomes. Intuitively, this makes sense, since asT → ∞, we acquire more and more information on individuali, and thus data information from individual i dominates any in-formation provided by the outcomes of other individuals in the sample. As(σ2/σ2α) → ∞, w→ 0, so our posterior mean collapses to the common mean α. In this case the error vari-ability in the outcome equation continues to grow relative to the varivari-ability in the random effects equation, and thus our posterior mean reduces the common effect for all individuals, α.
12 Hierarchical models 173 Exercise 12.3 (A posterior simulator for a linear hierarchical model: Gelfand et al.
[1990]) We illustrate Bayesian procedures for estimating hierarchical linear models using a data set that has become something of a “classic” in the MCMC literature. These data come from the study of Gelfand, Hills, Racine-Poon, and Smith (1990). In this exercise, we derive in full detail the complete posterior conditionals. In the following exercise we fit this model and provide some diagnostic checks for convergence. In subsequent examples complete derivations of the conditionals will not typically be provided, as these derivations of will often follow similarly to those described in this exercise.
In the rat growth model of Gelfand, et al. (1990), 30 different rats are weighed at five different points in time. We denote the weight of rati at measurement j as yij and letxij denote the age of theith rat at the jth measurement. Since each of the rats were weighed at exactly the same number of days since birth, we have
xi1= 8, xi2= 15, xi3= 22, xi4= 29, xi5= 36 ∀i.
The data used in the analysis are provided in Table 12.1.
Table 12.1: Rat growth data from Gelfand et al. (1990).
Weight Measurements Weight Measurements
Rat Rat
i yi1 yi2 yi3 yi4 yi5 i yi1 yi2 yi3 yi4 yi5
1 151 199 246 283 320 16 160 207 248 288 324
2 145 199 249 293 354 17 142 187 234 280 316
3 147 214 263 312 328 18 156 203 243 283 317
4 155 200 237 272 297 19 157 212 259 307 336
5 135 188 230 280 323 20 152 203 246 286 321
6 159 210 252 298 331 21 154 205 253 298 334
7 141 189 231 275 305 22 139 190 225 267 302
8 159 201 248 297 338 23 146 191 229 272 302
9 177 236 285 340 376 24 157 211 250 285 323
10 134 182 220 260 296 25 132 185 237 286 331
11 160 208 261 313 352 26 160 207 257 303 345
12 143 188 220 273 314 27 169 216 261 295 333
13 154 200 244 289 325 28 157 205 248 289 316
14 171 221 270 326 358 29 137 180 219 258 291
15 163 216 242 281 312 30 153 200 244 286 324
In our model, we want to permit unit-specific variation in initial birth weight and growth rates. This leads us to specify the following model:
yij|αi, βi, σ2, xij ind∼ N(αi+ βixij, σ2), i = 1, 2, . . . , 30, j = 1, 2, . . . , 5, so that each rat possesses its own interceptαiand growth rateβi.
174 12 Hierarchical models
We also assume that the rats share some degree of “commonality” in their weight at birth and rates of growth, and thus we assume that the intercept and slope parameters are drawn from the same normal population:
θi =
We complete our Bayesian analysis by specifying the following priors:
σ2|a, b ∼ IG(a, b), θ0|η, C ∼ N(η, C), Σ−1|ρ, R ∼ W ([ρR]−1, ρ),
withW denoting the Wishart distribution (see Appendix Definition 6 for details).
Derive the complete conditionals for this model and comment on any intuition behind the forms of these conditionals.
Solution
Given the assumed conditional independence across observations, the joint posterior distri-bution for all the parameters of this model can be written as
p(Γ|y) ∝ notationΓ−xto denote all parameters other thanx. We have stacked the observations over time for each individual rat so that
yi=
Complete posterior conditional forθi
We first note that this complete conditional is proportional to the aforementioned joint posterior. Thus, all of the terms in the product that do not involveθi are absorbed into the normalizing constant of this conditional. Hence,
p(θi|Γ−θi, y) ∝ p(yi|Xi, θi, σ2)p(θi|θ0,Σ).
12 Hierarchical models 175 This fits directly into the framework of Exercise 12.1 [see also Lindley and Smith (1972)], and thus we find
Because of the conditional independence, we can draw each of theθiin turn by sampling from the corresponding complete conditional.
Complete posterior conditional forθ0 Forθ0, a similar argument shows
p(θ0|Γ−θ0, y) ∝
In this form, the results of Exercise 12.1 again apply with θ0|Γ−θ0, y∼ N(Dθ0dθ0, Dθ0),
176 12 Hierarchical models Complete posterior conditional forσ2 Forσ2, we find
Complete posterior conditional forΣ−1 We find