Performance of Bayes procedures
4.3 Frequentist performance: Confidence intervals
They summarize the results of their investigation by concluding that at least for the models considered, the Bayesian approach is quite robust in that it is preferred to the frequentist so long as either the assumed prior is approximately correct or if relatively little weight is put on the prior.
The Bayesian will get in trouble if she or he is overly confident about an incorrect assumption.
Inequality (4.6) can be explored quite generally to see when the Bayesian approach is superior. For example, if
term is 0 and the comparison depends on the relation between assumed prior mean
This special case structures the foregoing evaluations of the beta/binomial and Gaussian/Gaussian models.
Samaniego and Reneau further investigate the situation when the true and assumed prior means are equal. Then inequality (4.6) becomes
(4.7) and the weight put on the prior mean
(1-the then the first
Before considering frequentist performance,
(4.8) Properties of the HPD interval
is degenerate at
4.3.1 Beta/Binomial model
a monotone decreasing function. Thus the HPD Bayesian credible interval is one-sided. From (4.8) it is easy to show that
so letting
by solving the equation S(t) _ interval as
More generally, it is easy to show that when X = 0, the Bayesian interval is one-sided so long as
X = n, and is left as an exercise.
Unlike the case of the Gaussian/Gaussian model, however, if one uses the zero-precision prior (M = 0). technical problems occur if X equals either 0 or n. Hence the prior must contain at least a small amount of information. Fortunately, we have seen that using a small value of M > 0 actually leads to point estimates with better frequentist performance than the usual maximum likelihood estimate. W e investigate the properties of the corresponding interval estimates below.
As an interesting sidelight. consider representing the upper limit in (4.9) as a pseudo number of events one might expect in a future experiment divided by the sample size (n). Then.
Thus as the sample size gets large. the numerator of the upper limit of the confidence interval converges to a fixed number of pseudo-events. For example, with
- log(.05)
value is the upper limit for most individuals when asked: "In the light of zero events in the current experiment. what is the largest number of events you think is tenable in a future replication of the experiment?"
Frequentist performance Based on the
100(1
-obtained by taking the upper and lower a-points of the posterior distribu-tion. But as we have just seen, this approach would be silly for
X = 0. This is because the monotone decreasing nature of the posterior in this case implies that the
higher posterior density than any
comment applies to the situation where b = (1
-A possible remedy would be to define the "corrected equal-tail" interval for t. Doing this, we obtain the HPD the upper endpoint of the HPD interval may be found
A similar result holds for the case
3. As reported by Louis (1981a) and Mann et al. (1988), this this number of pseudo-events is
posterior distribution for a simple is the "equal-tail" interval ) % Bayesian credible interval for
when values excluded on the lower end would have values within the interval! A similar
) M 1 and X = n.
as follows:
(4.10) where
and again
and removes the most glaring deficiency of the equal-tail procedure in this case, it will still be overly broad when the posterior is two-tailed but skewed to the left or right.
Of course, the HPD interval provides a remedy to this problem. at some computational expense. Since the one-tailed intervals in equation (4.10) are already HPD, we focus on the two-tailed case. The first step is to find the two roots
not too difficult numerically, since polynomial in
where F is the cdf of our beta posterior. Finally. we adjust
down depending on whether the posterior coverage of our interval was too large or too small, respectively. Iterating this procedure produces the HPD interval.
Notice that the above algorithm amounts to finding the roots of the equation
wherein each step requires a rootfinder and a beta cdf routine. The latter is now readily available in many statistical software packages (such as S), but the former may require us to do some coding on our own. Fortunately, a simple bisection (or interval halving) technique will typically be easy to program and provide adequate performance. For a description of this tech-nique, the reader is referred to a calculus or elementary numerical methods text, such as Burden et al. (1981, Section 2.1). Laird and Louis (1982) give an illustration of the method's use in an HPD interval setting.
We now examine the frequentist coverage of our Bayesian credible inter-vals. Adopting the notation
note carefully that θ
possible effect of posterior skewness, we first compute the lower and upper non-coverages
(4.11)
so that coverage( = 1 - lower( - upper( ). These coverages are easily is held fixed in this calculation. To allow for the
this is defined as up or We next find
and of the equation for a given This is Though this interval is still straightforward to compute
quantile of the posterior,
denotes the
in this case is simply a (a+b+n)-degree
We begin by reporting coverages of the corrected equal tail intervals defined in equation (4.10) based on three beta priors that are symmetric about
Jeffreys prior, the uniform prior, and an informative Beta(5, 5) prior. The table also lists the prior mean
of the weight given to the prior by the posterior when n = 10, namely, Since the Beta(5, 5) prior gives equal posterior weight to the data and the prior, this is clearly a strongly informative choice.
For n = 10, Figure 4.6 plots the coverages of the Bayesian intervals with nominal coverage probability
"true"
obtained by summing binomial probabilities. For example
Figure 4.6 Coverage of "corrected" equal tail 95% credible interval under three symmetric priors with n = 10.
values from 0.05 to 0.95. Since the symmetry of the priors implies
= .95 across a arid of possible the prior precision M, and the percentage
= 0.5. These are the first three listed in Table 4.1, namely, the (4.12)
Table 4.1 Beta prior distributions that lower
interpretation of the plot, horizontal reference lines are included at the 0.95 and 0.025 levels. We see that the two noninformative priors produce values for coverage
Figure 4.7 Coverage of "corrected" equal tail 95% credible interval under three symmetric priors with n = 40.
= upper(1 - ), the latter quantity is not shown. To enhance
that are reasonably stable at the nominal level for
4.3.2 Fieller-Creasy problem
and so endpoints for a 100(1-by solving the quadratic equation
for
While appearing reasonable, this approach suffers from a serious defect:
the quadratic equation (4.13) may have no real roots. That is, the discrim-We now take up the problem of estimating the ratio of two normal means, a problem originally considered by Fieller (1954) and Creasy (1954). Suppose
assumed independent given The parameter of interest is the mean ratio
frequentist approach might begin by observing that all
irregularities in the pattern. However, the informative prior performs poorly for extreme
course, the narrowness of its intervals should be borne in mind as well). Of particular interest is the 0% coverage provided if
of the fact that the informative prior will not allow the intervals to stray this far from 0.5 - even if the data encourage such a move (i.e., X = 0 or n). As in the case of point estimation, we see that informative priors enable more precise estimation, but carry the risk of disastrous performance when their informative content is in error.
Figure 4.7 repeats the calculations of Figure 4.6 in the case where n = 40.
Again both noninformative priors produce intervals with good frequentist coverage, with the uniform prior now performing noticeably better for the most extreme
still poorly for
sample size, the prior now contributes only 20% of the posterior information content, instead of 50%.
Finally, Figure 4.8 reconsiders the n = 10 case using an asymmetric Beta(1, 3) prior, listed last in Table 4.1. This prior is mildly informative in this case (contributing 4/14 = 29% of the posterior information) and favors smaller
a mirror image of lower
values of the former shown for large informative priors must be used with care.
) % confidence interval for and
values. Again, the message is clear:
is no longer and this is evident in the unacceptably large values over larger ones. As a result, upper
This is due to the fact that, with the increased values. The informative prior also performs better, though
= 0.05 or 0.95, a result values, and only slightly better for
though the discrete nature of the binomial distribution causes some values near 0.5 (though of
We notate the roots of this equation by
could be derived (4.13)
inant of the quadratic equation,
Figure 4.8 Coverage of "corrected" equal tail 95% credible interval under an asymmetric Beta(a = 1, b = 3) prior with n = 10.
may be negative. To remedy this, we might try the following more naive approach. Write
the last line arising from a one-term Taylor expansion of the denominator (4.14)
on the previous line. Hence
and so it makes sense to simply take proximation (4.14) again to obtain
so a sensible variance estimate is given by
Thus an alternate (1-by
While this procedure will never fail to produce an answer, its accuracy is certainly open to question.
By contrast, the Bayesian procedure is quite straightforward. Stephens and Smith (1992) show how to obtain samples from the joint posterior dis-tribution of
using a Monte Carlo method called theweighted bootstrap (see Subsec-tion 5.3.2). We stay within the simpler
conjugate priors
be made noninformative by setting
then obtain the independent posterior distributions
)100% frequentist confidence interval for is given
and under the noninformative prior
known setting, and adopt the
and (recall that these may
From Example 2.1 we Next, we may use
ap-We could now derive the distribution of
the familiar Jacobian approach, but, like Stephens and Smith (1992), we find it far easier to use a sampling-based approach. Suppose we draw
and then define
stitute a sample of size G from the posterior distribution of (1
-the
element in the sample, and ple. (Note that
or else this calculation may require interpolation.)
To compare the frequentist performance of these three approaches to the Fieller-Creasy problem, we perform a simulation study. After choosing some "true" values for
and apply each of the three procedures to each of the generated data pairs (Note that to implement our sampling-based Bayesian proce-dure, this means we must generate
simulated data pair. Thus we effectively have a simulation within a simula-tion, albeit one that is very easily programmed.) For each simulated data pair, we record the lengths of the three resulting confidence intervals and whether the true
a
method with good frequentist performance will have a coverage proba-bility of (1 - )are not too long on the average.
Table 4.2 gives the resulting average interval lengths and estimated prob-abilities that the true
dence interval, where we take
the exact and approximate frequentist approaches, we also evaluate two Bayesian methods, both of which take the two prior means
equal to 3. However, the first method takes highly informative prior. The second instead takes
vague prior specification. The four portions of the table correspond to the four true
in all cases.
Looking at the table, one is first struck by the uniformly poor perfor-mance of the approximate frequentist method. While it typically produces intervals that meet or exceed the desired coverage level of 95%, they are unacceptably wide - ludicrously so when
based on the exact frequentist method (derived from solving the quadratic equation (4.13)) are much narrower, but in no case do they attain the nom-analytically. perhaps using
g = 1, . . ., G. These values then con-Hence a could be obtained by sorting
as the -smallest
)100% equal-tail credible interval for sample, and subsequently taking
as the
may first have to be rounded to the nearest integer, largest element in the
sam-and we generate
g = 1,...,G} for each
value fell below, within, or above each interval. Clearly or better, while at the same time producing intervals that
lies below, within, or above the computed
confi-= .05 and N = G = 10,000. Along with
= 1, resulting in a and
= 50, a rather combinations (0,1), (0,3), (3,1), and (3,3), where
is close to 0. The intervals
inal 95% coverage level. This is no doubt attributable in part to the fact that the coverage probabilities in the table are of necessity based only on those cases where the discriminant
cases ("failures") were simply ignored when computing the summary infor-mation in the table. With the exception of the final
fail rates (shown just below the method's label in the leftmost column) are quite substantial, suggesting that whatever the method's properties, it will often be unavailable in practice.
By contrast, the performance of the two Bayesian methods is quite im-pressive. The intervals based on the informative prior have both very high
Table 4.2 Simulation results, Fielder-Creasy problem
was positive; the remaining scenario, these