Interval estimation - The empirical Bayes approach

The empirical Bayes approach

3.5 Interval estimation

Obtaining an empirical Bayes confidence interval (EBCI) is, in principle, very simple. Given an estimated posterior

we would any other posterior distribution to obtain an HPD or equal tail credible set for

then a

Why is this interval "naive?" From introductory mathematical statistics we have that

(3.33) In the Gaussian/Gaussian case, a 95% naive EBCI would be

The term under the square root approximates

the EB approach simply replaces integration with maximization. But this corresponds to only the term in (3.33); the naive EBCI is ignoring the posterior uncertainty about the second term (3.33)). Hence, the since equal tail naive EBCI for is

That is, if is such that

we could use it as

naive interval may be too short, and have lower than advertised coverage probability.

To remedy this, we must first define the notion of "EB coverage".

Definition 3.1 for

where

Morris actually proposes estimating shrinkage by

The first term in (3.34) approximates the first term in (3.33), and is es-sentially

in (3.34) approximates

thus serves to "correct" the naive EB interval by widening it somewhat.

Notice the amount of correction decreases to zero as k increases as the estimated shrinkage factor

the estimated prior mean.

(3.34) For the Gaussian/Gaussian model (3.9), Morris (1983a) suggests basing the EBCI on the modified estimated posterior that uses the "naive" mean, but inflates the variance to capture the second term in equation (3.33). This

distribution is

3.5.1 Morris' approach

Naive EB intervals typically fail to attain their nominal coverage probabil-ity, in either the conditional or unconditional EB sense (or in the frequentist sense). One cannot usually show this analytically, but it is often easy to check via simulation. We devote the remainder of this section to outlining methods that have been proposed for "correcting" the naive EBCI.

So we are evaluating the performance of the EBCI over the variability in-herent in both

authors have suggested instead conditioning on some data summary, b(y).

For example, taking b(y) = y produces fully conditional (Bayesian) cover-age. Many likelihood theorists would take b(y) equal to an appropriate an-cillary statistic instead (see Hill, 1990). Or we might simply take b(y) = on the grounds that

Definition 3.2

givenb(y) if,for eachb(y) = band

is a (1- )100% unconditional EB confidence set if and only if for each

and the data. But is this too weak a requirement? Many

is for when is known.

where

the naive EB variance estimate. The second term the second term in (3.33), and

decreases, or as the data-value approaches is a 100(1- )%conditional EB confidence set for

which is the result of several "ingenious adhockeries" (see Lindley, 1983) designed to better approximate (3.33). It is important to remember that this entire derivation assumes

estimate may be substituted. Morris offers evidence (though not a formal proof) that his intervals do attain the desired nominal coverage. Extension of these ideas to non-Gaussian and higher dimensional settings is possible but awkward; see Morris (1988) and Christiansen and Morris (1997).

3.5.2 Marginal posterior approach

This approach is similar to Morris' in that we mimic a fully Bayesian calcu-lation, but a bit more generally. Since

where

Several simplifications to this procedure are available for many models.

First, if density

conditions only on a univariate statistic. Second, note that

(3.35) the product of the sampling model and the prior. Hence

only on

(3.36) Appropriate percentiles of the marginal posterior

mated posterior) determine the EBCI. This is an intuitively reasonable approach, since l_h accounts explicitly for the uncertainty in

words, mixing

As indicated by expression (3.35), the first term in the integral in equa-tion (3.36) will typically be known due to the conjugate structure of the hierarchy. However, two issues remain. First, will h be available? And sec-ond, even if so, can the integral be computed analytically?

Deely and Lindley (1981) were the first to answer both of these questions affirmatively, obtaining closed-form results in the Poisson/gamma and "sig-nal detection" (i.e., where

to the approach as "Bayes empirical Bayes," since placing a hyperprior on is essentially a fully Bayesian solution to the EB problem. However, the first general method for implementing the marginal posterior approach is is a 0-1 random variable) cases. They referred with respect to hshould produce wider intervals.

In other (instead of the esti-and the marginal posterior can be written as

and

depends is sufficient for

then we can replace by which

and has in the marginal family

and base inference about on the marginal posterior,

is known; if it is unknown, a data-based

is unknown, we place a hyperprior

bootstrap process data process

Figure 3.3 Diagram of the Laird and Louis Type III Parametric Bootstrap.

due to Laird and Louis (1987), who used the bootstrap. Their idea was to take

arguments interchanged. We then approximate

(3.37)

where as

via what the authors refer to as the Type III Parametric Bootstrap: given we draw

From the bootstrapped data sample

Repeating this process N times, we obtain

1, . . ., N for use in equation (3.37), the quantiles of which, in turn, provide our corrected EBCI.

Figure 3.3 provides a. graphical illustration of the parametric bootstrap process: it simply mimics the process generating the data, replacing the unknown

process is entirely generated from a single parameter estimate

by resampling from the data vector y itself (a "nonparametric bootstrap").

Hence we can easily draw from available analytically (e.g., when in many marginal MLE settings).

Despite this computational convenience, the reader might well question the wisdom of taking

be to choose a reasonable

obtain an estimator of the corresponding

match a prespecified hyperprior Bayes solution, and, unlike be sensitive to the precise choice of

objective, then

lengthen the naive EBCI, but need not "correct" the interval to level 1-as we desire. Still, Carlin and Gelfand (1990) show how to use the Type

the sampling density of given with the by observing

j = I.... N. Notice that this estimator converges to by the Law of Large Numbers. The values are easily generated

and then draw for i = 1,...k.

we may compute

by its estimate The "parametric" name arises because the rather than even when this function is not itself must be computed numerically, as

An ostensibly more sensible approach would

compute and

This marginal posterior would would not However, if EB coverageis the After all, taking quantiles of will may be as good as

III parametric bootstrap to match any is available in closed form.

3.5.3 Bias correction approach

A problem with the marginal posterior approach is that it is to say how to pick a good hyperprior

that actually achieve the nominal EB coverage rate). In fact, if

badly biased, the naive EBCI may not be too short, but too long.Thus, in general, what is needed is not a method that will widen the naive interval, but one that will correctit.

Suppose we attempt to tackle the problem of EB coverage directly. Recall that

and

Hence Ris the true EB coverage, conditional on tail area. Usually the naive EBCI is too short, i.e.,

If we solved

ditionally "correct the bias" in our naive procedure and give us intervals with the desired conditional EB coverage. But of course

instead we might solve

(3.38) for

our corrected confidence interval. We refer to this interval as a conditionally bias correctedEBCI. For unconditionalEB correction, we can replace ^Rby

and solve

is called the unconditionally bias correctedEBCI.

Implementation if

exponential/inverse gamma models), then solving (3.38) requires only tra-ditional numerical integration and a rootfinding algorithm. If

is not available (e.g., if

son/gamma and beta/binomial models), then solving (3.38) must be done via Monte Carlo methods. In particular, Carlin and Gelfand (1991a) show how the Type III parametric bootstrap may be used in this regard. Notice is available in closed form (e.g., the Gaussian/Gaussian and

itself is not available analytically, as in the Pois-for The naive interval with a replaced by this and take the naive interval with a replaced by as is unknown, so for then using this with would con-of the naive EB

is the quantile of Define

is quantiles (i.e., one that will result in

provided the sampling density

that in either case, since

where

Carlo methods must be employed. mathematical subroutines can be em-ployed at this innermost step.

To evaluate whether bias correction is truly effective, we must check whether

(3.39)

Ifhold for large k, but, in this event. the naive EB interval would do fine as well! For fixed k, Carlin and Gelfand (1990) provide conditions (basically, stochastic ordering ^of

where

and

That is, since area,

Example 3.3 We now consider the exponential/inverse gamma model, where

tribution for

= 2 and k = 5. Our simulation used 3000 replications, and set (recall α is now near 0.95).

increases, and that this decrease is more pronounced for larger we see that is close to a for near 1, but decreases The behavior for the upper tail is similar, except that

(unconditional bias correction), and plotting the values obtained as Consider bias correcting the lower tail of the EBCI. Solving

for

a function of steadily as values of

is now an increasing function of

To evaluate the ability ^of the various methods considered in this sec-tion to achieve nominal EB coverage in this example, we simulated their unconditional EB coverage probabilities and average interval lengths for two nominal coverage levels (90% and 95%) where we have set the true value of

N = 400 in those methods that required a parametric bootstrap. In ad-dition to the methods presented in this section, we included the classical (frequentist) interval and two intervals that arise from matching a specific so that the MLE is then

The marginal dis-lies in an interval containing the nominal coverage level

the true coverage of the bias corrected EB tail

and in under which

as (i.e., is consistent for ), then (3.39) would certainly is the cdf corresponding to So even when Monte

is typically chosen to be conjugate with

Table 3.4 Comparison of simulated unconditional EBcoverage probabilities, ex-gamma model.

hyperprior Bayes solution as in equation (3.36). The two hyperpriors we consider are both noninformative and improper, namely,

(so that

can also have an impact on coverage, and this impact is difficult to performs well, with the intervals being too short.

= .95. Finally, of the two hyperprior matching intervals, only

= .90 and the latter a bit

and (so that

Looking at the results in Table 3.4, we see that the classical method faithfully produces intervals with the proper coverage level, but which are extremely long relative to all the EB methods. The naive EB intervals also perform as expected, having coverage probabilities significantly below the nominal level. The intervals based on bias correction and the Laird and Louis bootstrap approach both perform well in obtaining nominal coverage, with the former being somewhat shorter for

shorter for the one based on

This highlights the difficulty in choosing hyperpriors that have the desired result with respect to empirical Bayes coverage.

On a related note, the proper choice of marginal hyperparameter esti-mate

predict. Additional simulations (not shown) suggest that the best for is not the marginal MLE, but the marginal uniformly minimum variance

unbiased estimate (UMVUE),

MLE-based intervals were, in general, a bit too long for the bias correction method, but too short for the Laird and Louis method.

In document Bayes and Empirical Bayes Methods for Data Analysis - Carlin Louis (Page 92-99)