Interval estimation - Bayesian Econometric Methods

From the Bayesian standpoint, given a regionC⊂ Θ and the data y, it is meaningful to ask the following: What is the probability thatθ lies in C? The answer is direct:

1 − α ≡ Pr(θ ∈ C|y) =

Cp(θ|y)dθ, (5.1)

where 0 < α < 1 is deﬁned implicitly in (5.1). The region C is known as a 1 − α Bayesian credible region. There is no need to introduce the additional frequentist concept of

“conﬁdence.”

This chapter focuses on the case in which the posterior probability content is ﬁrst set at some preassignedα (say α = .10, .05, or .01) and then the “smallest” credible region that attains posterior probability content of 1 − α is sought. This leads to the following deﬁnition.

Deﬁnition 5.1 Letp(θ|y) be a posterior density function. Let Θ^∗ ⊂ Θ satisfying:

(a)Pr(θ ∈ Θ^∗|y) = 1 − α,

(b) for allθ₁∈ Θ^∗andθ₂ ∈ Θ/ ^∗,p(θ1|y) ≥ p(θ2|y).

ThenΘ^∗ is deﬁned to be a highest posterior density (HPD) region of content(1 − α) forθ.

Given a probability of content1−α, the HPD region Θ^∗has the smallest possible volume in the parameter spaceΘ of any 1 − α Bayesian credible region. If p(θ|y) is not uniform overΘ, then the HPD region of content 1 − α is unique. Hereafter, we focus on the case in which there is a single parameter of interest and all other parameters have been integrated out of the posterior.

Constructing an HPD interval is conceptually straightforward. Part (b) of Deﬁnition 5.1 implies that, if[a, b] is an HPD interval, then p(a|y) = p(b|y). This suggests a graphical approach in which a horizontal line is gradually moved vertically downward across the posterior density, and where it intersects the posterior density, the corresponding abscissa

52 5 Interval estimation

−4 −3 −2 −1 0 1 2 3 4

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4

Figure 5.1 — A Hypothetical Posterior Density.

values are noted, and the posterior is integrated between these points. Once the desired posterior probability is reached, the process stops. If the posterior density is symmetric and unimodal with modeθ^m, then the resulting1−α HPD interval is of the form [θ^m−δ, θ^m+δ], for suitableδ, and it cuts off equal probability α/2 in each tail.

In (5.1) we started with the region and then found its posterior probability content. In Deﬁnition 5.1 we started with the posterior probability content and then found the smallest region with that content. From a pure decision theoretic perspective, it is interesting to start with a cost function measuring the undesirability of largeα and large volume of a region, and then to pick both α and C to minimize expected posterior cost [see Casella, Hwang, and Robert (1993)]. For a thorough discussion of both Bayesian and frequentist interval estimation, see Casella and Berger (2002).

Exercise 5.1 (Graphical determination of an HPD interval) Consider the posterior density in Figure 5.1. Use the graphical approach described earlier to obtain three sets of HPD intervals.

5 Interval estimation 53

−4 4

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4

a₁ a₂

b₁ b₂ b₃ b₄

c₁ c₂

Figure 5.2 — Three different HPD intervals.

Solution

In Figure 5.2, we present three different HPD intervals associated with the posterior in 5.1.

The highest line in the ﬁgure produces the interval[a₁, a₂]. This interval is narrow and con-tains rather low posterior probability. If this process is repeated a second time the horizontal line produces the disjoint intervals[b₁, b₂] and [b₃, b₄], which contain more posterior proba-bility, but still relatively little. If these two intervals, however, contain the desired posterior probability, then the process stops and the HPD region consists of the union of these two disjoint intervals. If a higher posterior content is desired, the process repeats again until, say, the third horizontal line that produces the interval[c₁, c₂]. If even more posterior con-tent is desired, the process can be repeated again.

Exercise 5.2 (Calculating a posterior interval probability) Consider a random sample of sizeT = 2 from an exponential distribution with mean θ⁻¹and a gamma prior distribu-tionG(α, β) with α = 2 and β = 4. Suppose the resulting sample mean is y = .125. Find the posterior probability of the credible interval[3.49, 15.5].

54 5 Interval estimation Solution

From Exercise 2.10 it follows that the posterior distribution ofθ is G(α, β), where α = 2 + 2 = 4 and β = [4⁻¹ + 2(.125)]⁻¹ = 2. β = 2 implies that the gamma distribu-tion reduces to a chi-square distribudistribu-tion with eight degrees of freedom [see Poirier (1995, p. 100)]. Consulting a standard chi-square table [e.g., Poirier (1995, p. 662)], we see that Pr(χ²(8) ≤ 3.49) = .10 and Pr(χ²(8) ≤ 15.5) = .95. Therefore, the posterior probability of[3.49, 15.5] is .95 − .10 = .85.

Exercise 5.3 (HPD intervals under normal random sampling with known variance) Consider a random sample Y₁, Y₂, . . . , Y_T from an N(θ1, θ⁻¹₂ ) distribution with known varianceθ₂⁻¹and anN(µ, θ₂⁻¹/T) prior for the unknown mean θ1.

(a) Find the1 − α HPD interval for θ₁.

(b) Supposeα = .05, θ₂ = .01, µ = 8, T = 5, T = 25, and y = 2. Find the 1 − α HPD interval forθ₁.

(d) Supposeα= .05, θ₂ = .01, T = 25, and y = 2. Find a 95 percent conﬁdence interval and interpret your result.

Solution

(a) From Exercise 2.6(b) it follows that the posterior density forθ₁is p(θ₁|y) = φ(θ₁; µ, θ⁻¹₂ /T),

whereT and µ are given by (2.38) and (2.39), respectively. Since the posterior density is symmetric and unimodal with modeµ, the resulting HPD interval of content1 − α cuts off equal probabilityα/2 in each tail. Thus, a 1 − α HPD interval for θ1is

µ− z_α/2(θ₂⁻¹/T)^1/2< θ₁ < µ+ z_α/2(θ₂⁻¹/T)^1/2, (5.2) wherez_α/2is the standard normal critical value that cuts offα/2 probability in the right-hand tail. From the standpoint of subjective probability, (5.2) implies that the posterior probability ofθ₁lying betweenµ± z_α/2(θ⁻¹₂ /T)^1/2is .95.

(b) The given values correspond to anN(8, 20) prior distribution for θ₁. Using a standard normal table, it follows thatz_α/2 = 1.96. Plugging into (2.38) and (2.39) we obtain T = T + T = 5 + 25 = 30 and µ = (T µ + T y)/T = [5(8) + 25(2)]/30 = 3. Also note that θ₂⁻¹/T = 100/30 = 10/3. Hence the posterior distribution for θ₁isN(3, 10/3) and (5.2) implies the .95 HPD interval3 ± 1.96(10/3)^1/2or−.58 < θ₁ <6.58.

(c) From Exercise 2.6 it is seen that the posterior hyperparameters simplify toµ = y and T = T . Therefore, HPD interval (5.2) reduces to

y− z_α/2(θ₂⁻¹/T)^1/2< θ₁ < y+ z_α/2(θ₂⁻¹/T)^1/2, (5.3)

5 Interval estimation 55 which is numerically identical to the standard non-Bayesian confidence interval. The inter-pretation of (5.3) is, however, in terms of “probability” and not “confidence”: GivenY = y, the ex post probability ofθ₁falling in (5.2) is .95. The common misinterpretation of a clas-sical confidence interval in terms of final precision is fortuitously correct in this special case.

(d) Plugging into (5.3) implies the .95 conﬁdence interval

2 ± 1.96(100/25)^1/2 or − 1.92 < θ < 5.92.

The interpretation of this conﬁdence interval is that it is a realization of a procedure that, upon repeated sampling, yields random intervals[Y − 3.92, Y + 3.92] that have an ex ante sampling probability of .95 of capturingθ₁. The realized interval−1.92 < θ < 5.92 either does or does not capture the unknown constantθ₁. The frequentist “conﬁdence” lies in the procedure used to obtain the realized interval.

Exercise 5.4 (HPD intervals under normal random sampling with unknown vari-ance) Consider Exercise 5.3 but with θ₂ unknown. Suppose the joint prior distribution for θ = [θ1 θ₂] is the normal-gamma distribution, denoted θ ∼ NG(µ, q, s⁻², ν) with density given in (2.26).

(a) Find the1 − α HPD interval for θ1.

(b) Supposeµ= 8, q = .20, s⁻² = .25, and ν = 4. Find the marginal prior density for θ1. How does it compare to the prior in Exercise 5.3(b)?

(c) Supposeα= .05, T = 25, y = 2, and s² = 102.5. Find the 1 − α HPD interval for θ₁. (d) Using the values in (c), ﬁnd the1 − α HPD interval for θ₂.

Solution

(a) The posterior distribution forθ is N G(µ, q, s⁻², ν), as given in the solution to Exercise 2.4. By using the solution to Exercise 2.8, it follows immediately that the marginal poste-rior distribution ofθ₁ ist(θ₁; µ, s²q, ν). Hence, analogous to (5.2), a 1 − α HPD interval forθ₁is

µ− t_ν,α/2(s²q)^1/2< θ₁ < µ+ t_ν,α/2(s²q)^1/2, (5.4) wheret_ν,α/2cuts offα/2 probability in the right hand tail of a Student t-distribution with ν degrees of freedom. From the standpoint of subjective probability, (5.4) implies that the posterior probability ofθ₁lying betweenµ± t_ν,α/2(s²q)^1/2is .95.

(b) Again using the solution to Exercise 2.8, the given hyperparameter values imply that the marginal prior distribution for θ₁ ist(θ₁; 8, 20, 4). This is similar to the prior in Exercise 5.3: It has the same location but fatter tails.

56 5 Interval estimation

q= [.20⁻¹+ 25]⁻¹= 1/30, µ= [(.20)⁻¹(8) + (25)(2)]/30 = 3, and

s² = 29⁻¹[4(.25) + 24(102.5) + (.20 + 25⁻¹)⁻¹(2 − 8)²] = 90.03.

Hence, the posterior distribution forθ₁ist(θ₁; 3,3.00, 29). Using at-table gives t₂₉(.975) = 2.045. Therefore, (5.4) implies the .95 HPD interval

3 ± 2.045[3.00]^1/2 or − .54 < θ1 <6.54, similar to the solution of Exercise 5.3(b).

(d) From the deﬁnition of the normal-gamma distribution, the marginal posterior distribu-tion for the precisionθ₂isγ(s⁻², ν) = γ(.011, 29). Table 5.1 provides values of the p.d.f.

and c.d.f. of aγ(.011, 29) distribution. From inspection we see that the interval [a, b] with equal p.d.f. ordinates and that contains .95 posterior probability is[.005736, .1693].

Table 5.1: p.d.f and c.d.f. values of aγ(.011, 29) random variable.

θ₂ p.d.f. c.d.f.

.00200 .0018 .0000

.00270 .0405 .0000

.00300 .1136 .0000

.00390 1.212 .0005

.00550 15.55 .0106

.005736 20.15 .0148

.00600 26.21 .0209

.00800 93.59 .1368

.00900 124.4 .2468

.01030 140.9 .4225

.01080 139.1 .4927

.01300 96.14 .7586

.01500 48.75 .9015

.01666 23.02 .9591

.01693 20.15 .9648

.01720 17.50 .9700

.02000 3.466 .9948

.02160 1.213 .9983

.02620 .0405 .9999

5 Interval estimation 57 Exercise 5.5 (Berger and Wolpert [1988, pp. 5–6]) To further clarify the difference between a Bayesian posterior interval and a conﬁdence interval, consider the following.

Given an unknown parameterθ,−∞ < θ < ∞, suppose Y_i (i = 1, 2) are iid binary ran-dom variables with probabilities of occurrence equally distributed over points of support θ− 1 and θ + 1. That is, Pr(Y_i= θ − 1|θ) = 1/2 and Pr(Y_i = θ + 1|θ) = 1/2.

Suppose the prior forθ is constant. Such an “improper” prior will be discussed in Chapter 8.

(a) Suppose we observey₁ = y2. Find the posterior distribution ofθ.

(b) Suppose we observey₁ = y2. Find the posterior distribution ofθ.

θ =

(1/2)(Y1+ Y2) if Y1 = Y2,

Y₁− 1 ifY₁ = Y2. (5.5)

In repeated sampling, what is the ex ante probability that (5.5) containsθ? Note that the answer is the same if the second line of (5.5) is changed toY₁+ 1 if Y₁ = Y₂.

Solution

(a) If y₁ = y2, then one of the values must equal θ− 1 and the other equals θ + 1. Ex post, averaging these two values it is absolutely certain thatθ = (1/2)(y1 + y2); that is, Pr[θ = (1/2)(y1+ y2)|y1, y₂] = 1.

(b) Ify₁ = y2 = y, say, then the common value y is either θ − 1 or θ + 1. Since the prior does not distinguish among values ofθ, ex post, it is equally uncertain whether θ= y − 1 orθ= y + 1; that is, Pr(θ = y − 1|y₁, y₂) = Pr(θ = y + 1|y₁, y₂) = 1/2.

(c) Parts (a) and (b) suggest the ex post (i.e., posterior) probability that (5.5) equals θ is either 1 or 1/2, depending on whether y₁ = y₂ ory₁ = y₂. Given the data, we can, of course, determine which posterior probability applies. From the ex ante perspective, how-ever, Pr(Y₁ = Y₂) = Pr(Y₁ = Y₂) = 1/2. Therefore, ex ante, the probability that (5.5) containsθ is an equally weighted average of our two ex post coverage probabilities, and thus the ex ante sampling probability of (5.5) containing θ is (1/2)(1) + (1/2)(1/2) = .75.

The embarrassing question for the pure frequentist is this: Why use the realized value of (5.5) to estimate θ and then report the ex ante conﬁdence level 75 percent instead of the appropriate ex post measure of uncertainty?

Exercise 5.6 (Conditional frequentist reasoning and ancillarity) The preceding ex-ercise demonstrates the important difference between frequentist ex ante and Bayesian ex post reasoning. The latter is consistent with the likelihood principle [see Berger and Wolpert (1988)], which, loosely speaking, states that two experiments involving the same unknown parameter θ that give rise to proportional likelihood functions contain the same evidence aboutθ. Likelihood principle proponents condition on all the data, whereas pure frequen-tist reasoning averages unconditionally over all the data that could have possibly been ob-served. Conditional frequentists lie somewhere between likelihood principle proponents and pure frequentists, arguing that inference should be conditional on ancillary statistics

58 5 Interval estimation

(statistics whose distribution does not depend onθ), but otherwise be unconditional. Con-ditional inference goes a long way toward eliminating some of the embarrassing problems in the pure frequentist approach. Demonstrate this observation using the design of Exercise 5.5 in terms of the statisticZ = |Y₁− Y₂|.

Solution

First, note thatZ is ancillary since its p.m.f. is Pr(Z = 0) = Pr(Z = 2) = 1/2, which does not depend onθ. The coverage probabilities of (5.5), conditioned on Z, are the appealing ex post probabilities of .5 and 1, respectively, forZ = 0 and Z = 2. Therefore, the con-ditional frequentist inference in this case is identical to the Bayesian posterior probabilities.

Exercise 5.7 (HPD intervals and reparameterization) Given the likelihoodL(θ) and the prior p.d.f. p(θ), let γ = g(θ) be a one-to-one transformation of θ. Also, let A be a 1 − α HPD region for θ and deﬁne B = {γ : γ = g(θ), θ ∈ A}. Is B a 1 − α HPD region forγ?

Solution

No. Although condition (a) in Deﬁnition 5.1 holds, condition (b) does not. To see this con-sider the following. The fact thatA is a1 − α HPD interval for θ implies that if θ₁ ∈ A andθ₂ ∈ A, then p(θ/ 1|y) ≥ p(θ2|y). By deﬁnition of B, γ1 ∈ B iff θ = g⁻¹(γ1) ∈ A.

Supposeγ₁ ∈ B, and γ2∈ B. By a change-of-variables,/ p(γ1|y) =

∂g⁻¹(γ1)

∂γ

p^θ|y[g⁻¹(γ1)], p(γ₂|y) =

∂g⁻¹(γ2)

∂γ

p^θ|y[g⁻¹(γ₂)],

γ₂ ∈ B iff θ/ 2 = g⁻¹(γ2) /∈ A. Therefore, although p_θ|y[g⁻¹(γ1)] ≥ p_θ|y[g⁻¹(γ2)], the Jacobian terms cannot be ordered. In other words, the posterior probability content is main-tained under reparameterization, but the minimal length is not.

6

In document Bayesian Econometric Methods (Page 75-83)