• No results found

The Bayes approach

2.3 Bayesian inference

Having specified the prior distribution using one of the techniques in the previous section, we may now use Bayes' Theorem (2.1) to obtain the pos-terior distribution of the model parameters, or perhaps equation (2.5) to obtain the predictive density of a future observation. Since these distri-butions summarize our current state of knowledge (arising from both the observed data and our prior opinion, if any), we might simply graph the corresponding density (or cumulative distribution) functions and report them as the basis for all posterior inference. However, these functions can be difficult to interpret and, in many cases, may tell us more than we want to know. Hence in this section, we discuss common approaches for summa-rizing such distributions. In particular, we develop Bayesian analogues for the common frequentist techniques of point estimation, interval estimation, and hypothesis testing. Throughout, we work with the posterior distribu-tion of the model parameters, though most of our methods apply equally well to the predictive distribution - after all, future data values and param-eters are merely different types of unknown quantities. We illustrate how Bayesian data analysis is typically practiced, and, in the cases of interval estimation and hypothesis testing, the philosophical advantages inherent in the Bayesian approach.

2.3.1 Point estimation

For ease of notation, consider first the univariate case. To obtain a point estimate

such as its mean, median, or mode. The mode will typically be the easiest to compute, since no standardization of the posterior is then required: we may work directly with the numerator of (2.1). Also, note that when the prior

is flat, the posterior mode will be equal to the maximum likelihood estimate of

as the generalized maximum likelihood estimateof

For symmetric posterior densities, the mean and the median will of course (y) of we simply need to select a summary feature of

For this reason, the posterior mode is sometimes referred to

Figure 2.2 Three point estimates arising from a Gamma(. 5, 1) posterior.

be identical; for symmetric unimodal posteriors, all three measures will co-incide. For asymmetric posteriors, the choice is less clear, though the me-dian is often preferred since it is intermediate to the mode (which considers only the value corresponding to the maximum value of the density) and the mean (which often gives too much weight to extreme outliers). These dif-ferences are especially acute for one-tailed densities. Figure 2.2 provides an illustration using a G(.5, 1) distribution. Here, the modal value (0) seems a particularly poor choice of summary statistic, since it is far from the

"middle" of the density; in fact, the density is not even finite at this value.

In order to obtain a measure of the accuracy of a point estimate we might use the posterior variance with respect to

Writing the posterior mean we have

of

simply as for the moment,

mode = 0 median = 0.227

mean = 0.5

since the middle term on the third line is identically zero. Hence we have shown that the posterior mean

respect to value is

referred to simply as the posterior variance). This is a possible argument for preferring the posterior mean as a point estimate, and also partially ex-plains why historically only the posterior mean and variance were reported

as the results of a Bayesian analysis.

Turning briefly to the multivariate case. we might again take the pos-terior mode as our point estimate

numerically harder to find. When the mode exists, a traditional maximiza-tion method such as a grid search, golden secmaximiza-tion search, or Newton-type method will often succeed in locating it (see e.g., Thisted, 1988, Chapter 4 for a description of these and other maximization methods). The posterior mean

accuracy is captured nicely by the posterior covariance matrix,

Similar to the univariate case, one can show that

so that the posterior mean "minimizes" the posterior covariance matrix with respect to

in multiple dimensions, though it is still often used in one-dimensional subspaces of such problems, i.e., as a summary statistic for

Example 2.7 Perhaps the single most important contribution of statis-tics to the field of scientific inquiry is the general linear model, of which regression and analysis of variance models are extremely widely-used spe-cial cases. A Bayesian analysis of this model was first presented in the landmark paper by Lindley and Smith (1972), which we summarize here.

Suppose that

a p1x 1 parameter vector,

an n x n known covariance matrix. Suppose further that we adopt the prior distribution

is a and

and the posterior distribution of

(2.14) minimizes the posterior variance with over all point estimators Furthermore, this minimum the posterior variance of (usually

though it will now be

is also a possibility, since it is still well-defined and its

The posterior median is more to define

where is an n x 1 data vector, is is an known design matrix, and is

where is a p2x 1 parameter vector, design matrix,

are all known. Then the marginal distribution of

covariance matrix, and is

is is a

where and

(2.15) (2.16) Thus

variability captured by the posterior covariance matrix Var(

As a more concrete illustration, we consider the case of linear regression.

Using the usual notation, we would set noninformative prior is provided by taking precision matrix equal to a

(2.15) and (2.16), we have

and

so that the posterior mean is given by

the usual least squares estimate of of

Recall that the distribution of the least squares estimate is given by

ian inferences regarding If

posterior mean of

based computational methods are available to escape this unpleasant situ-ation, as we shall illustrate later in Example 5.6.

2.3.2 Interval estimation

The Bayesian analogue of a frequentist confidence interval (CI) is usually referred to as a credible set, though we will often use the term "Bayesian

confidence interval" or simply "confidence interval" in univariate cases for consistency and clarity. More formally,

Definition 2.1 A 100 x (1