From the Bayesian standpoint, given a regionC⊂ Θ and the data y, it is meaningful to ask the following: What is the probability thatθ lies in C? The answer is direct:
1 − α ≡ Pr(θ ∈ C|y) =
Cp(θ|y)dθ, (5.1)
where 0 < α < 1 is defined implicitly in (5.1). The region C is known as a 1 − α Bayesian credible region. There is no need to introduce the additional frequentist concept of
“confidence.”
This chapter focuses on the case in which the posterior probability content is first set at some preassignedα (say α = .10, .05, or .01) and then the “smallest” credible region that attains posterior probability content of 1 − α is sought. This leads to the following definition.
Definition 5.1 Letp(θ|y) be a posterior density function. Let Θ∗ ⊂ Θ satisfying:
(a)Pr(θ ∈ Θ∗|y) = 1 − α,
(b) for allθ1∈ Θ∗andθ2 ∈ Θ/ ∗,p(θ1|y) ≥ p(θ2|y).
ThenΘ∗ is defined to be a highest posterior density (HPD) region of content(1 − α) forθ.
Given a probability of content1−α, the HPD region Θ∗has the smallest possible volume in the parameter spaceΘ of any 1 − α Bayesian credible region. If p(θ|y) is not uniform overΘ, then the HPD region of content 1 − α is unique. Hereafter, we focus on the case in which there is a single parameter of interest and all other parameters have been integrated out of the posterior.
Constructing an HPD interval is conceptually straightforward. Part (b) of Definition 5.1 implies that, if[a, b] is an HPD interval, then p(a|y) = p(b|y). This suggests a graphical approach in which a horizontal line is gradually moved vertically downward across the posterior density, and where it intersects the posterior density, the corresponding abscissa
51
52 5 Interval estimation
−4 −3 −2 −1 0 1 2 3 4
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4
Figure 5.1 — A Hypothetical Posterior Density.
values are noted, and the posterior is integrated between these points. Once the desired posterior probability is reached, the process stops. If the posterior density is symmetric and unimodal with modeθm, then the resulting1−α HPD interval is of the form [θm−δ, θm+δ], for suitableδ, and it cuts off equal probability α/2 in each tail.
In (5.1) we started with the region and then found its posterior probability content. In Definition 5.1 we started with the posterior probability content and then found the smallest region with that content. From a pure decision theoretic perspective, it is interesting to start with a cost function measuring the undesirability of largeα and large volume of a region, and then to pick both α and C to minimize expected posterior cost [see Casella, Hwang, and Robert (1993)]. For a thorough discussion of both Bayesian and frequentist interval estimation, see Casella and Berger (2002).
Exercise 5.1 (Graphical determination of an HPD interval) Consider the posterior density in Figure 5.1. Use the graphical approach described earlier to obtain three sets of HPD intervals.
5 Interval estimation 53
−4 4
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4
a1 a2
b1 b2 b3 b4
c1 c2
Figure 5.2 — Three different HPD intervals.
Solution
In Figure 5.2, we present three different HPD intervals associated with the posterior in 5.1.
The highest line in the figure produces the interval[a1, a2]. This interval is narrow and con-tains rather low posterior probability. If this process is repeated a second time the horizontal line produces the disjoint intervals[b1, b2] and [b3, b4], which contain more posterior proba-bility, but still relatively little. If these two intervals, however, contain the desired posterior probability, then the process stops and the HPD region consists of the union of these two disjoint intervals. If a higher posterior content is desired, the process repeats again until, say, the third horizontal line that produces the interval[c1, c2]. If even more posterior con-tent is desired, the process can be repeated again.
Exercise 5.2 (Calculating a posterior interval probability) Consider a random sample of sizeT = 2 from an exponential distribution with mean θ−1and a gamma prior distribu-tionG(α, β) with α = 2 and β = 4. Suppose the resulting sample mean is y = .125. Find the posterior probability of the credible interval[3.49, 15.5].
54 5 Interval estimation Solution
From Exercise 2.10 it follows that the posterior distribution ofθ is G(α, β), where α = 2 + 2 = 4 and β = [4−1 + 2(.125)]−1 = 2. β = 2 implies that the gamma distribu-tion reduces to a chi-square distribudistribu-tion with eight degrees of freedom [see Poirier (1995, p. 100)]. Consulting a standard chi-square table [e.g., Poirier (1995, p. 662)], we see that Pr(χ2(8) ≤ 3.49) = .10 and Pr(χ2(8) ≤ 15.5) = .95. Therefore, the posterior probability of[3.49, 15.5] is .95 − .10 = .85.
Exercise 5.3 (HPD intervals under normal random sampling with known variance) Consider a random sample Y1, Y2, . . . , YT from an N(θ1, θ−12 ) distribution with known varianceθ2−1and anN(µ, θ2−1/T) prior for the unknown mean θ1.
(a) Find the1 − α HPD interval for θ1.
(b) Supposeα = .05, θ2 = .01, µ = 8, T = 5, T = 25, and y = 2. Find the 1 − α HPD interval forθ1.
(c) Consider the limiting prior asT → 0 in Exercise 4.6. Find the 1 − α HPD interval for θ1.
(d) Supposeα= .05, θ2 = .01, T = 25, and y = 2. Find a 95 percent confidence interval and interpret your result.
Solution
(a) From Exercise 2.6(b) it follows that the posterior density forθ1is p(θ1|y) = φ(θ1; µ, θ−12 /T),
whereT and µ are given by (2.38) and (2.39), respectively. Since the posterior density is symmetric and unimodal with modeµ, the resulting HPD interval of content1 − α cuts off equal probabilityα/2 in each tail. Thus, a 1 − α HPD interval for θ1is
µ− zα/2(θ2−1/T)1/2< θ1 < µ+ zα/2(θ2−1/T)1/2, (5.2) wherezα/2is the standard normal critical value that cuts offα/2 probability in the right-hand tail. From the standpoint of subjective probability, (5.2) implies that the posterior probability ofθ1lying betweenµ± zα/2(θ−12 /T)1/2is .95.
(b) The given values correspond to anN(8, 20) prior distribution for θ1. Using a standard normal table, it follows thatzα/2 = 1.96. Plugging into (2.38) and (2.39) we obtain T = T + T = 5 + 25 = 30 and µ = (T µ + T y)/T = [5(8) + 25(2)]/30 = 3. Also note that θ2−1/T = 100/30 = 10/3. Hence the posterior distribution for θ1isN(3, 10/3) and (5.2) implies the .95 HPD interval3 ± 1.96(10/3)1/2or−.58 < θ1 <6.58.
(c) From Exercise 2.6 it is seen that the posterior hyperparameters simplify toµ = y and T = T . Therefore, HPD interval (5.2) reduces to
y− zα/2(θ2−1/T)1/2< θ1 < y+ zα/2(θ2−1/T)1/2, (5.3)
5 Interval estimation 55 which is numerically identical to the standard non-Bayesian confidence interval. The inter-pretation of (5.3) is, however, in terms of “probability” and not “confidence”: GivenY = y, the ex post probability ofθ1falling in (5.2) is .95. The common misinterpretation of a clas-sical confidence interval in terms of final precision is fortuitously correct in this special case.
(d) Plugging into (5.3) implies the .95 confidence interval
2 ± 1.96(100/25)1/2 or − 1.92 < θ < 5.92.
The interpretation of this confidence interval is that it is a realization of a procedure that, upon repeated sampling, yields random intervals[Y − 3.92, Y + 3.92] that have an ex ante sampling probability of .95 of capturingθ1. The realized interval−1.92 < θ < 5.92 either does or does not capture the unknown constantθ1. The frequentist “confidence” lies in the procedure used to obtain the realized interval.
Exercise 5.4 (HPD intervals under normal random sampling with unknown vari-ance) Consider Exercise 5.3 but with θ2 unknown. Suppose the joint prior distribution for θ = [θ1 θ2] is the normal-gamma distribution, denoted θ ∼ NG(µ, q, s−2, ν) with density given in (2.26).
(a) Find the1 − α HPD interval for θ1.
(b) Supposeµ= 8, q = .20, s−2 = .25, and ν = 4. Find the marginal prior density for θ1. How does it compare to the prior in Exercise 5.3(b)?
(c) Supposeα= .05, T = 25, y = 2, and s2 = 102.5. Find the 1 − α HPD interval for θ1. (d) Using the values in (c), find the1 − α HPD interval for θ2.
Solution
(a) The posterior distribution forθ is N G(µ, q, s−2, ν), as given in the solution to Exercise 2.4. By using the solution to Exercise 2.8, it follows immediately that the marginal poste-rior distribution ofθ1 ist(θ1; µ, s2q, ν). Hence, analogous to (5.2), a 1 − α HPD interval forθ1is
µ− tν,α/2(s2q)1/2< θ1 < µ+ tν,α/2(s2q)1/2, (5.4) wheretν,α/2cuts offα/2 probability in the right hand tail of a Student t-distribution with ν degrees of freedom. From the standpoint of subjective probability, (5.4) implies that the posterior probability ofθ1lying betweenµ± tν,α/2(s2q)1/2is .95.
(b) Again using the solution to Exercise 2.8, the given hyperparameter values imply that the marginal prior distribution for θ1 ist(θ1; 8, 20, 4). This is similar to the prior in Exercise 5.3: It has the same location but fatter tails.
56 5 Interval estimation
(c) Plugging the given hyperparameter values into (2.27) and (2.29)–(2.31), we obtain ν = 4 + 25 = 29,
q= [.20−1+ 25]−1= 1/30, µ= [(.20)−1(8) + (25)(2)]/30 = 3, and
s2 = 29−1[4(.25) + 24(102.5) + (.20 + 25−1)−1(2 − 8)2] = 90.03.
Hence, the posterior distribution forθ1ist(θ1; 3,3.00, 29). Using at-table gives t29(.975) = 2.045. Therefore, (5.4) implies the .95 HPD interval
3 ± 2.045[3.00]1/2 or − .54 < θ1 <6.54, similar to the solution of Exercise 5.3(b).
(d) From the definition of the normal-gamma distribution, the marginal posterior distribu-tion for the precisionθ2isγ(s−2, ν) = γ(.011, 29). Table 5.1 provides values of the p.d.f.
and c.d.f. of aγ(.011, 29) distribution. From inspection we see that the interval [a, b] with equal p.d.f. ordinates and that contains .95 posterior probability is[.005736, .1693].
Table 5.1: p.d.f and c.d.f. values of aγ(.011, 29) random variable.
θ2 p.d.f. c.d.f.
.00200 .0018 .0000
.00270 .0405 .0000
.00300 .1136 .0000
.00390 1.212 .0005
.00550 15.55 .0106
.005736 20.15 .0148
.00600 26.21 .0209
.00800 93.59 .1368
.00900 124.4 .2468
.01030 140.9 .4225
.01080 139.1 .4927
.01300 96.14 .7586
.01500 48.75 .9015
.01666 23.02 .9591
.01693 20.15 .9648
.01720 17.50 .9700
.02000 3.466 .9948
.02160 1.213 .9983
.02620 .0405 .9999
5 Interval estimation 57 Exercise 5.5 (Berger and Wolpert [1988, pp. 5–6]) To further clarify the difference between a Bayesian posterior interval and a confidence interval, consider the following.
Given an unknown parameterθ,−∞ < θ < ∞, suppose Yi (i = 1, 2) are iid binary ran-dom variables with probabilities of occurrence equally distributed over points of support θ− 1 and θ + 1. That is, Pr(Yi= θ − 1|θ) = 1/2 and Pr(Yi = θ + 1|θ) = 1/2.
Suppose the prior forθ is constant. Such an “improper” prior will be discussed in Chapter 8.
(a) Suppose we observey1 = y2. Find the posterior distribution ofθ.
(b) Suppose we observey1 = y2. Find the posterior distribution ofθ.
(c) Consider
θ =
(1/2)(Y1+ Y2) if Y1 = Y2,
Y1− 1 ifY1 = Y2. (5.5)
In repeated sampling, what is the ex ante probability that (5.5) containsθ? Note that the answer is the same if the second line of (5.5) is changed toY1+ 1 if Y1 = Y2.
Solution
(a) If y1 = y2, then one of the values must equal θ− 1 and the other equals θ + 1. Ex post, averaging these two values it is absolutely certain thatθ = (1/2)(y1 + y2); that is, Pr[θ = (1/2)(y1+ y2)|y1, y2] = 1.
(b) Ify1 = y2 = y, say, then the common value y is either θ − 1 or θ + 1. Since the prior does not distinguish among values ofθ, ex post, it is equally uncertain whether θ= y − 1 orθ= y + 1; that is, Pr(θ = y − 1|y1, y2) = Pr(θ = y + 1|y1, y2) = 1/2.
(c) Parts (a) and (b) suggest the ex post (i.e., posterior) probability that (5.5) equals θ is either 1 or 1/2, depending on whether y1 = y2 ory1 = y2. Given the data, we can, of course, determine which posterior probability applies. From the ex ante perspective, how-ever, Pr(Y1 = Y2) = Pr(Y1 = Y2) = 1/2. Therefore, ex ante, the probability that (5.5) containsθ is an equally weighted average of our two ex post coverage probabilities, and thus the ex ante sampling probability of (5.5) containing θ is (1/2)(1) + (1/2)(1/2) = .75.
The embarrassing question for the pure frequentist is this: Why use the realized value of (5.5) to estimate θ and then report the ex ante confidence level 75 percent instead of the appropriate ex post measure of uncertainty?
Exercise 5.6 (Conditional frequentist reasoning and ancillarity) The preceding ex-ercise demonstrates the important difference between frequentist ex ante and Bayesian ex post reasoning. The latter is consistent with the likelihood principle [see Berger and Wolpert (1988)], which, loosely speaking, states that two experiments involving the same unknown parameter θ that give rise to proportional likelihood functions contain the same evidence aboutθ. Likelihood principle proponents condition on all the data, whereas pure frequen-tist reasoning averages unconditionally over all the data that could have possibly been ob-served. Conditional frequentists lie somewhere between likelihood principle proponents and pure frequentists, arguing that inference should be conditional on ancillary statistics
58 5 Interval estimation
(statistics whose distribution does not depend onθ), but otherwise be unconditional. Con-ditional inference goes a long way toward eliminating some of the embarrassing problems in the pure frequentist approach. Demonstrate this observation using the design of Exercise 5.5 in terms of the statisticZ = |Y1− Y2|.
Solution
First, note thatZ is ancillary since its p.m.f. is Pr(Z = 0) = Pr(Z = 2) = 1/2, which does not depend onθ. The coverage probabilities of (5.5), conditioned on Z, are the appealing ex post probabilities of .5 and 1, respectively, forZ = 0 and Z = 2. Therefore, the con-ditional frequentist inference in this case is identical to the Bayesian posterior probabilities.
Exercise 5.7 (HPD intervals and reparameterization) Given the likelihoodL(θ) and the prior p.d.f. p(θ), let γ = g(θ) be a one-to-one transformation of θ. Also, let A be a 1 − α HPD region for θ and define B = {γ : γ = g(θ), θ ∈ A}. Is B a 1 − α HPD region forγ?
Solution
No. Although condition (a) in Definition 5.1 holds, condition (b) does not. To see this con-sider the following. The fact thatA is a1 − α HPD interval for θ implies that if θ1 ∈ A andθ2 ∈ A, then p(θ/ 1|y) ≥ p(θ2|y). By definition of B, γ1 ∈ B iff θ = g−1(γ1) ∈ A.
Supposeγ1 ∈ B, and γ2∈ B. By a change-of-variables,/ p(γ1|y) =
∂g−1(γ1)
∂γ
pθ|y[g−1(γ1)], p(γ2|y) =
∂g−1(γ2)
∂γ
pθ|y[g−1(γ2)],
γ2 ∈ B iff θ/ 2 = g−1(γ2) /∈ A. Therefore, although pθ|y[g−1(γ1)] ≥ pθ|y[g−1(γ2)], the Jacobian terms cannot be ordered. In other words, the posterior probability content is main-tained under reparameterization, but the minimal length is not.