The Bayes approach
2.3 Bayesian inference
Having specified the prior distribution using one of the techniques in the previous section, we may now use Bayes' Theorem (2.1) to obtain the pos-terior distribution of the model parameters, or perhaps equation (2.5) to obtain the predictive density of a future observation. Since these distri-butions summarize our current state of knowledge (arising from both the observed data and our prior opinion, if any), we might simply graph the corresponding density (or cumulative distribution) functions and report them as the basis for all posterior inference. However, these functions can be difficult to interpret and, in many cases, may tell us more than we want to know. Hence in this section, we discuss common approaches for summa-rizing such distributions. In particular, we develop Bayesian analogues for the common frequentist techniques of point estimation, interval estimation, and hypothesis testing. Throughout, we work with the posterior distribu-tion of the model parameters, though most of our methods apply equally well to the predictive distribution - after all, future data values and param-eters are merely different types of unknown quantities. We illustrate how Bayesian data analysis is typically practiced, and, in the cases of interval estimation and hypothesis testing, the philosophical advantages inherent in the Bayesian approach.
2.3.1 Point estimation
For ease of notation, consider first the univariate case. To obtain a point estimate
such as its mean, median, or mode. The mode will typically be the easiest to compute, since no standardization of the posterior is then required: we may work directly with the numerator of (2.1). Also, note that when the prior
is flat, the posterior mode will be equal to the maximum likelihood estimate of
as the generalized maximum likelihood estimateof
For symmetric posterior densities, the mean and the median will of course (y) of we simply need to select a summary feature of
For this reason, the posterior mode is sometimes referred to
Figure 2.2 Three point estimates arising from a Gamma(. 5, 1) posterior.
be identical; for symmetric unimodal posteriors, all three measures will co-incide. For asymmetric posteriors, the choice is less clear, though the me-dian is often preferred since it is intermediate to the mode (which considers only the value corresponding to the maximum value of the density) and the mean (which often gives too much weight to extreme outliers). These dif-ferences are especially acute for one-tailed densities. Figure 2.2 provides an illustration using a G(.5, 1) distribution. Here, the modal value (0) seems a particularly poor choice of summary statistic, since it is far from the
"middle" of the density; in fact, the density is not even finite at this value.
In order to obtain a measure of the accuracy of a point estimate we might use the posterior variance with respect to
Writing the posterior mean we have
of
simply as for the moment,
mode = 0 median = 0.227
mean = 0.5
since the middle term on the third line is identically zero. Hence we have shown that the posterior mean
respect to value is
referred to simply as the posterior variance). This is a possible argument for preferring the posterior mean as a point estimate, and also partially ex-plains why historically only the posterior mean and variance were reported
as the results of a Bayesian analysis.
Turning briefly to the multivariate case. we might again take the pos-terior mode as our point estimate
numerically harder to find. When the mode exists, a traditional maximiza-tion method such as a grid search, golden secmaximiza-tion search, or Newton-type method will often succeed in locating it (see e.g., Thisted, 1988, Chapter 4 for a description of these and other maximization methods). The posterior mean
accuracy is captured nicely by the posterior covariance matrix,
Similar to the univariate case, one can show that
so that the posterior mean "minimizes" the posterior covariance matrix with respect to
in multiple dimensions, though it is still often used in one-dimensional subspaces of such problems, i.e., as a summary statistic for
Example 2.7 Perhaps the single most important contribution of statis-tics to the field of scientific inquiry is the general linear model, of which regression and analysis of variance models are extremely widely-used spe-cial cases. A Bayesian analysis of this model was first presented in the landmark paper by Lindley and Smith (1972), which we summarize here.
Suppose that
a p1x 1 parameter vector,
an n x n known covariance matrix. Suppose further that we adopt the prior distribution
is a and
and the posterior distribution of
(2.14) minimizes the posterior variance with over all point estimators Furthermore, this minimum the posterior variance of (usually
though it will now be
is also a possibility, since it is still well-defined and its
The posterior median is more to define
where is an n x 1 data vector, is is an known design matrix, and is
where is a p2x 1 parameter vector, design matrix,
are all known. Then the marginal distribution of
covariance matrix, and is
is is a
where and
(2.15) (2.16) Thus
variability captured by the posterior covariance matrix Var(
As a more concrete illustration, we consider the case of linear regression.
Using the usual notation, we would set noninformative prior is provided by taking precision matrix equal to a
(2.15) and (2.16), we have
and
so that the posterior mean is given by
the usual least squares estimate of of
Recall that the distribution of the least squares estimate is given by
ian inferences regarding If
posterior mean of
based computational methods are available to escape this unpleasant situ-ation, as we shall illustrate later in Example 5.6.
2.3.2 Interval estimation
The Bayesian analogue of a frequentist confidence interval (CI) is usually referred to as a credible set, though we will often use the term "Bayesian
confidence interval" or simply "confidence interval" in univariate cases for consistency and clarity. More formally,
Definition 2.1 A 100 x (1