Frequentist performance: Point estimates - Performance of Bayes procedures

Performance of Bayes procedures

4.2 Frequentist performance: Point estimates

Besides having good Bayesian properties, estimators derived using Bayesian methods can have excellent frequentist and EB properties, and produce i mprovements over estimates generated by frequentist or likelihood-based approaches. We start with two basic examples and then a generalization.

still in a basic framework. These show that the Bayesian approach can be very effective in producing attractive frequentist properties, and suggest that this performance advantage holds for more complex models.

4.2.1 Gaussian/Gaussian model

The familiar Gaussian/Gaussian model provides a simple illustration of how the Bayesian formalism can produce procedures with good frequen-tist properties, in many situations better than those based on maximum likelihood or unbiasedness theory. We consider the basic model wherein

and

= 0). We compare the two decision rules

= (1- B)y, where B= 1/(1+

whileδ2 is the posterior mode, median, and mean (the latter implying it is the Bayes rule under squared error loss). We then have

and also

However, the posterior mean of the first coordinate

Crossing will

(that is, the setting of Example 2.1 where

= 1 and

). Here,

(y) = y and is the MLE and UMVUE

Figure 4.2 MSE (risk under squared error loss) for three estimators in the Gaus-sian/Gaussian model.

which is exactly the variance of show that

is, the Bayes rule has smallerfrequentistrisk provided the true mean, close to the prior mean, 0.

This situation is illustrated for the case where B = 0.5 in Figure 4.2, where the dummy argument t is used in place of

by the solid horizontal line at 1, while

centered at 0. For comparison a dashed line corresponding to the MSE for a third rule,

to be 2, ignoring the data completely. Clearly its risk is given by

which is 0 if than

no other rule could possibly have lower MSE for all

plus the square of its bias. It is easy to

if and only if That

is shown is the dotted parabola (y) = 2, is also given. This rather silly rule always estimates

happens to actually be 2, but increases much more steeply moves away from 2. This rule is admissible, since (thanks to the 0 is

4.2.2 Beta/binomial model

As a second example, consider again the estimation of the event probability in a binomial distribution. For the Bayesian analysis we use the conjugate Beta(a, b) prior distribution, and follow Subsection 3.3.2 in reparametrizing value at

unattractive for general use. The Bayes rule

admissible, but notice that it would be inadmissible if B < 0, since then for all

cannot be; the point here is only to show that crossing (shrinking past the prior mean) is a poor idea in this example.

For the more general setting of Example 2.2, the Bayes rule will have smaller MSE than the MLE when

Figure 4.3 MSE for three estimators in the beta/binomial model with n = 1 and

= 0.5.

For the special case where tion), the limits simplify to

(i.e., the prior is "worth" one observa-a broobserva-ad region of superiority.

since

Fortunately, B cannot be negative can also be shown to be values make it

= 2), but the large penalty paid for other

Figure 4.4 MSE for three estimators in the beta/binomial model with n = 20 and

from (a, b) to (

a measure of prior precision (i.e., increasing M implies decreasing prior variance). Based on squared-error loss, the Bayes estimate is the posterior mean, namely,

Hence

likelihood estimate, X/n. Irrespective of the value of The MSE of the Bayes estimate,

This equation shows the usual variance plus squared bias decomposition of mean squared error.

Figure 4.3 shows the risk curve for n = 1,

If one uses the MLE with n = 1, the MLE must be either 0 or l; no experienced data analyst would use such an estimator. Not surprisingly

the MSE is 0 for

= 0.5, and M = 0, 1, 2.

= 0 or 1, but it rises to .25 at = .5. The Bayes rule is then given by

the MLE of is is a weighted average of the prior mean and the maximum M) where = a/(a + b), the prior mean, and M = a + b,

(4.4)

with M = 1 (dotted line) has lower MSE than the MLE for

= (.067-933). When M = 2 (dashed line), the region where the Bayes rule improves on the MLE shrinks toward 0.5, but the amount of improvement is greater. This suggests that adding a little bias to a rule in order to reduce variance can pay dividends.

Next, look at Figure 4.4, where

0, 1, 5. Due to the increased sample size, all three estimators have smaller MSE than for n = 1, and the MLE performs quite well. The Bayes rule with M = 1 produces modest benefits for

near 0 or 1, but it takes a larger M (i.e.,more weight on the prior mean) for the Bayes rule to be very different from the MLE. This is demonstrated by the curve for M = 5, which shows a benefit near 0.5, purchased by a substantial penalty for

need to be convinced by the analysis) would need to be quite confident that is near 0.5 for this estimator to be attractive.

Using the Bayes rule with "fair" prior mean

M = 1pays big dividends whenn = 1 and essentially reproduces the MLE forn = 20. Most would agree that the MLE needs an adjustment ifn = 1, a Figure 4.5 MSE for three estimators in the beta/binomial model with n = 20 and

= 0.5 and small precision near 0 or 1. The data analyst (and others who near 0.5, with little penalty for is again 0.5 but now n = 20 and M =

in the interval

smaller adjustment if n = 2, and so on. Bayes estimates with diffuse priors produce big benefits for small n (where variance reduction is important).

Finally, Figure 4.5 shows the costs and benefits of using a Bayes rule with an asymmetric prior having

prior case

M = 5, modest additional benefits accrue for little above 0), but performance is disastrous for

an estimator might be attractive in some application settings, but would require near certainty that

risk is the integral of these curves with respect to the prior distribution.

These integrals produce the pre-posterior performance of various estimates, a subject to which we return in Section 4.4.

In summary, Bayesian point estimators based on informative priors can be risky from a frequency standpoint. showing that there is no statisti-cal "free lunch." However, even for univariate analyses without compound sampling, the Bayesian formalism with weak prior information produces benefits for the frequentist.

A note on robustness

In both the binomial and Gaussian examples, either an empirical Bayes or hierarchical Bayes approach produces a degree of robustness to prior misspecification, at least within the parametric family, by tuning the hy-perparameters to the data. In addition. for exponential sampling distribu-tions with quadratic variance funcdistribu-tions. Morris (1983b) has shown broad robustness outside of the conjugate prior family.

4.2.3 Generalization

The foregoing examples address two exponential family models with con-jugate prior distributions. Samamego and Reneau (1994) evaluate these situations more generally for linear Bayes estimators (those where the es-timate is a convex combination of the MLE and the prior mean, even if G is not conjugate) based on (iid) data when the MLE is also unbiased (and therefore MVUE). With 0

the form

If the true prior is Go, this Bayes estimate has smaller pre-posterior MSE than the MLE so long as

Since Go can be a point mass, this relation provides a standard, frequentist evaluation.

the MLE, these estimates have

< 0.5. We remark that the Bayes preposterior greater than 0.6. Such below about 0.44 (and a

= 0.5), setting 141 = 1 essentially reproduces the MLE. With

= 0.1 when n = 20. As in the symmetric

(4.6) (4.5)

4.3.1 Beta/binomial model

Again using the notation of Subsection 4.2.2, we recall that the posterior in this case is

we examine how data are processed by the 100(1- 2α)% Bayesian credible interval. Suppose we adopt the uniform Beta(a = l, b = 1) prior, and consider the case where X = 0. We have the posterior distribution

In addition to producing point estimates with very good frequentist per-formance. the Bayesian approach can also be used to develop interval esti-mation procedures with excellent frequentist properties in both basic and complicated modeling scenarios. As in the previous subsection, our strategy will be to use noninformative or minimally informative priors to produce the posterior distribution, which will then be used to produce intervals via either HPD or simpler "equal-tail" methods (see Subsection 2.3.2).

In complicated examples, the Bayesian formalism may be the only way to develop confidence intervals that reflect all uncertainties. We leave intervals arising from the Gaussian/Gaussian model as an exercise, and begin instead with the beta/binomial model.

In document Bayes and Empirical Bayes Methods for Data Analysis - Carlin Louis (Page 106-112)