Nonparametric EB (NPEB) point estimation .1 Compound sampling models

The empirical Bayes approach

3.2 Nonparametric EB (NPEB) point estimation .1 Compound sampling models

Since, for EB analysis, G( )

of its features must be estimated, repeated draws from G are required to obtain information. Consider then the compound sampling model, (3.4)

where i 1, . . . , k. Compound sampling arises in a wide variety of ap-plications including multi-site clinical trials, estimation of disease rates in small geographic areas, longitudinal studies, laboratory assays, and meta-analysis. Model (3.4) can be extended by allowing regression and correlation structures in G, or allowing the

Suppose we seek point estimates for the has corresponding density function g. Writing Y

with mixing via the marginal

poste-by (for example, the marginal maximum The EB approach thus essentially replaces the

in- Ad-may be

takes a parametric form, so that choosing is all that is required to completely specify the has an unknown

),generalizing the approach to a broader

is not completely known and at least some

to be correlated given the

and that the prior cdf G and __

it is left as an exercise to show that the marginal density of

(3.5) where

pendent (and identically distributed as well if 3.2.2 Simple NPEB (Robbins' method)

Consider the basic compound model (3.4) where G is completely unspeci-fied, and

As shown in Appendix B, the Bayes estimate of is the posterior mean,

(3.6) where

structure of this model, we have written the Bayes rule in terms of the data and the marginal density, which is directly estimable from the observed

data.

Taking advantage of this structure, Robbins (1955) proposed the com-pletely nonparametric estimate computed by estimating the marginal

prob-abilities by their empirical frequencies, namely,

(3.7) This formula exemplifies borrowing information. Since the estimate of the

marginal density depends on all of the

is influenced by data from other components.

The foregoing is one of several models for which the analyst can make em-pirical Bayes inferences with no knowledge of or assumptions concerning G.

Maritz and Lwin (1989) refer to such estimators as simple EB estimators.

The added flexibility and robustness of such nonparametric procedures is very attractive, provided they can be obtained with little loss of efficiency.

The Robbins estimate was considered a breakthrough, in part because he showed that (3.7) is asymptotically optimalin that, as

satisfies the equation the data

That is, the are marginally inde-for all i).

is Poisson( ). That is,

under squared error loss

is the marginal distribution of Thus, thanks to the special

the estimate for each component

its Bayes risk

(see (B.6) in Appendix B) converges to the Bayes risk for the true Bayes rule for known G. However, in Section 3.2.3 we show that this estimator actually performs poorly, even when k is large. This poor performance can be explained in part by the estimator's failure to incorporate constraints imposed by the hierarchical structure. For example, though the Bayes es-timator (3.6) is monotone in y, the Robbins estimate (3.7) need not be.

Equation (3.6) also imposes certain convexity conditions not accounted for by (3.7). Several authors (van Houwelingen, 1977; Maritz and Lwin, 1989, Subsection 3.4.5) have developed modifications of the basic estimator that attempt to smooth it. One can also generalize the Robbins procedure to include models where the sampling distribution fis continuous.

Large-sample success of the Robbins approach and its refinements de-pends strongly on the model form and use of the posterior mean (the Bayes rule under squared error loss). A far more general and effective approach adopts the parametric EB template of first estimating G by

using (3.6) with G replaced by

This approach accommodates general models and loss functions, imposes all necessary constraints (such as monotonicity and convexity), and provides a unified approach for discrete and continuous f^. ^To be fully nonparamet-ric with respect to the prior, the approach requires a fully nonparametric estimate of G. Laird (1978) proved the important result that the G which maximizes the likelihood (3.5) is a discrete distribution with at most ^k mass points;

parametric maximimum likelihood (NPML) estimate in Subsection 3.4.3, and consider it in more detail in Section 7.1.

3.2.3 Example: Accident data

We consider counts of accident insurance policies reporting

a particular year (Table 3.1, taken from Simar, 1976). The data are discrete, and the Poisson likelihood with individual accident rates drawn from a prior distribution (producing data exhibiting extra-Poisson variation) is a good candidate model.

We compute the empirical Bayes, posterior mean estimates of the rate pa-rameters

rule" that directly produces the estimates. The table shows the Robbins simple EB estimate, as well as the NPEB estimate obtained by plugging the NPML for G into the Bayes rule. As mentioned in Section 3.1, another option is to use a parametric form for G, say G(

parameter

the most natural choice Proceeding in the manner of Example 2.5, we can then estimate

them into the formula for the Bayes rule, and produce a parametric EB and then That is, we use in place of

is discrete even if G is continuous. We introduce this

non-claims during

using several priors estimated from the data and the "Robbins

), and estimate only the prior is Given our Poisson likelihood, the conjugate G(

from the. marginal distribution plug

Table 3.1 Simar (1976) Accident Data: Observed counts and empirical Bayes posterior means for each number of claims per year for k = 9461 policies issued by La Royal Belge Insurance Company. The y_i are the observed frequencies,

observed relative "Robbins" is the Robbins NPEB rule, "Gamma" is the PEB posterior mean estimate based on the Poisson/gamma model, and "NPML"

is the posterior mean estimate based on the EB rule for the nonparametric prior.

estimate. The specifics of this approach are left as Exercise 9. (The student may well wish to study 3.3 before attempting this exercise.)

Table 3.1 reports the empirical relative frequencies for the various ob-served

Robbins rule, the PEB Poisson/gamma model, and the NPML approach.

Despite the size of our dataset (k = 9461), the Robbins rule performs errati-cally for all but the smallest observed values of

on the estimates, and fails to exploit the fact that the marginal distribu-tion of y is generated by a two-stage process with a Poisson likelihood.

By first estimating the prior, either parametrically or nonparametrically, constraints imposed by the two-stage process (monotonicity and convexity of the posterior mean for the Poisson model) are automatically imposed.

Furthermore, the analyst is not restricted to use of the posterior mean.

The marginal method of moments estimated mean and variance for the gamma prior are 0.2144 and 0.0160, respectively. Because the gamma dis-tribution is conjugate, the EB estimate is a weighted average of the data y and the prior mean, with a weight of 0.7421 on the prior. For small values of

ment. The predicted accident rate for those with no accidents

approximately 0.16, whereas the estimate based on the MLE is 0, since The Poisson/gamma and NPML methods part company for y values greater than 3. Some insight into the plateau in the NPML estimates for larger y is provided by the estimated prior

Virtually all of the mass is on "safe" drivers (the first two mass points).

displayed in Table 3.2.

=0) is in Table 3.1 are in close agree-all three estimates of

It imposes no restrictions values, as well as the results of modeling the data using the is the

Table 3.2 NPML prior estimate for the Simar Accident Data

As y increases, the posterior loads mass on the highest mass point (3.669), but the posterior mean can never go beyond this value.

Of course, Table 3.1 provides no information on the inferential perfor-mance of these rules. Maritz and Lwin (1989, p. 86) provide simulation comparisons for the Robbins, Poisson/gamma and several "improved Bob-bins" rules (they do not consider the NPML), showing that the EB ap-proach is very effective. Tomberlin (1988) uses empirical Bayes to predict accident rates cross-classified by gender, age, and rating territory. He shows that the approach is superior to maximum likelihood in predicting the next year's rates, especially for data cells with a small number of accidents.

In document Bayes and Empirical Bayes Methods for Data Analysis - Carlin Louis (Page 72-76)