Bayesian Estimation
of the Binomial
Parameter
N. DRAPER
ANDI.
GUTTMAN* Departmentof
Statistics, Universityof
WisconsinMadison, Wisconsin
1. INTI~ODUCTI~X
A recent paper by Feldman and Fox (1968) discussed the estimation of the parameter n in the binomial distribution when the other parameter p is known. In spite of their statement that “the real interest is in estimates that do well for r moderate relative to n” (see below for notation), their paper is mostly concerned with asymptotic results and so does not seem to be particularly relevant to the
“real” problem as they state it.
In this note we give a Bayesian treatment of the problem both when p is known and when p is unknown. The Bayesian approach is a very practical one and is especially easy to apply in ‘%mall” problems because less computing is involved. It also does not require any appeal to asymptotic distributional results.
(Asymptotic expansions of factorials can however be used if desired to evaluate posterior distributions approximately and, for samples where truncation of the posterior is slight, asymptotic normality can be used for approximate representa- tions.)
The problem is as follows. Suppose X, , . . . , X, are r independent B(n, p) random variables. Given observations x1 , * . * , xr , estimate n.
How would an estimate of a binomial parameter n be useful in practice? Suppose for example, that the Apex Appliance Company wishes to estimate the number of a certain type of appliance in use in a certain service area. Suppose further that the company believes that the weekly total of defective appliances sent in for repair (irrespective of age) arises with a binomial probability p about whose value they have some prior knowledge. Then a count x of the number of defective appliances received during a routine week could be used to cast light on the population size n. In general then, if we have a characteristic with binomial behaviour and only the successes (or failures) become apparent, we can use these alone to provide information on the population size.
2. BAYESIAN SOLUTION
For a discussion of Bayesian methods we refer the reader to, for example, Lindley (1965). The likelihood is
*
Now
at Centre de Recherches Mathematique, University of &Iontreal. Received May 1969; revised Feb. 1970.n! wb P I xl a P*(l - PY fJ x., (n _ x;)!
,- (2.1)
wherex=(x,,~~,...,x,)‘andt=x~+...+x,.Wenowdiscussthecasesp known and p unknown separately.
2.1 p known. Let pO(n) be a suitable prior distribution for n. One sensible form for pO(n) is
p&> = l/N, l<n<N, (2.2) where N is a large preselected integer. (For example if n were the number of local people of a certain type involved in a certain binomial process, N could be the local population.) The posterior distribution for n is then
r P(n I x, PI Q (1 - drnpo(4 II i-1 (n
‘!XJ!*
The range of n is the union of (n 2 max xi) and any restriction implied by the prior. The mode of the posterior is at rZ where
P(fi - 1 I x, P> I P(fi I x, P> 2 P(fi + 1 I =, P>. (2.4) If (2.2) holds, we find that 7i is the integer solution of
?&n- xi> 2 {n(l -PI,>)’ and fib+
i-l
1
- xi) > ((7~ + i)(l - ~1)’ (2.5) which is identical to the maximum likelihood solution (Feldman and Fox, 1968), as expected.In general, the mode of (2.3) provides an estimate of n but, in addition, the distribution. (2.3) can also be examined to cast light on the precision of the estimate. This should be done numerically for specific examples. Analytical results, such as, for example, obtaining the normalizing constant for (2.3) do not seem feasible. However, it is easy to obtain c numerically in specific examples. 2.2 p unknown. It seems reasonable to assume that n and p are independent a priori. (Appropriate amendments need to be made otherwise.) Let pO(n), p,,(p) be the priors. One sensible form for pO(p) is
PO(P) cc PTl - py, OlPIL (2.6)
which can represent a uniform prior if v 1 = v2 = 1 or a conjugate prior represent- ing information from a prior sample otherwise. The joint posterior is then
In the case where (2.2) applies, we can integrate p out from (2.7) to give the marginal distribution for n which is
Pb I 4 a (m + v1 + v2 - l)! G (n :lxJ (m- t+v,-l)! r ! ’ maxxi -<n_< N. (2.8) As in the previous case, we can use the modal value 72 as an estimate of n, and examine the distribution for conclusions on precision. Again analytical results do
not seem feasible, but numerical results are easy to obtain, using a computer if necessary.
Clearly the distributions are such that consideration of small values of r is less difficult than consideration of large values. We now provide two examples.
3. EXAMPLES
We shall assume that v1 = v2 = 1 and that (2.2) holds, in both examples.
Example
1 (r = 1).P(n I x,
P)a (1 -
PI”0
z ,
x<nlN,
p(n Ix) a l/b + 11,
x<n-<N.
When p is known, ri = integer part of x/p, which is clearly sensible, and we obtain
n I
2 z+l zsz . . . z+k . . . N
Pb I x. PI a (1 -p)= &+I)(1 -pj++z (z+l)(z+2)(1 -p)**/2! . . . (%+I) . . . (z+k)(l -p)=+klk! . . .
So, to take a specific case, if
N = 15, p = 0.8, x = 10, ri = 12, we have
n 10 11 12 13 14 15
An I lO,O.% .094 .206 .247 .214 .150 .090
Clearly, although rZ = 12 gives maximum posterior probability, both 11 and 13 are not unlikely and none of the values of n is improbable. The Bayesian posterior distribution thus provides a much better picture of the situation then fi alone (or even ri and the asymptotic variance as T + a).
The selected value of N may, or may not, affect the distribution much when p is known. Table 1 shows posterior distributions of n for several values of N, for our example. The posterior distribution of n is essentially unchanged (to three decimal places) for N 2 21 while for N 5 20, the effect of reducing N is to truncate the right hand tail (and adjust the other ordinates so that they sum to unity). It would usually be a surprise to the investigator to find that his N did not exceed rZ, and such a discovery would usually cause him to reexamine his given value of p as well as his prior beliefs about n.
[If, to take another example, we knew p = 0.2, and x = 10, then rZ = 50, and we would have a similar situation with a more stretched-out distribution, higher on the n-scale.]
When p is unknown, and if N = 15, and x = 10 we have
n 10 11 12 13 14 15
An I 10) .201 .184 .170 .158 .148 .138
Clearly, lack of information on p makes for a rather cautious inference with ti = 10, the number observed.
TABLE 1
Posterior Distributions for Example 1, p Known. N n 15 16 18 21 10 . 094 . 089 . 086 . 086 11 ,206 .196 ,190 .I89 12 . 247 .236 .228 .227 13 . 214 .204 .I98 .197 14 . 150 .143 .138 .138 15 .090 .086 . 083 . 083 16 . 046 . 044 . 044 17 .022 .021 18 .OlO ,010 19 . 004 20 . 002 21 . 001
As N increases, the tail of the distribution grows longer and longer and the ordinates become smaller and smaller. Some representative distributions are given in Table 2. The value of fi remains the same (fi = 10) as N changes and the basic nature of the inference made does not alter except that as N increases more possibilities are admitted. In other words, the choice of N is not really crucial here when we assume a uniform prior on p, except in so far as it limits the length of the upper tail.
n
TABLE 2
Posterior for Example 1, p Unknown N 15 16 18 21 10 11 12 13 14 15 16 17 18 19 20 21 .201 .178 .147 .119 .184 .163 ,135 .I09 .170 .151 .124 .I01 .158 .140 .115 .094 .148 .131 . 108 . 088 .138 .122 .lOl . 082 .115 .a95 .077 . 090 . 073 . 085 . 069 . 066 . 063 . 060
[Comment.
In view of the great difference in inference between the two cases of our example where p is known and unknown, we find the remark of Feldman and Fox (1968) that “Preliminary work indicates that many of the same results are possible for unknown p.” a puzzling one.]It is interesting to examine a case between the two extreme ones above. Suppose we assume v1 = 5, v2 = 2 so that our prior is proportional to
PV - PI, O<p<l. (3.1)
The shape of this distribution is shown in Figure l(b) where we see the mode at p = 0.8. (It always is when vi + 3 = 4v2.) The expression (3.1) thus represents a situation where p = 0.8 is most likely a priori and surrounding values are less likely, but not completely excluded. The “p known” case is represented in Figure
l(a) and the uniform prior knowledge case in Figure l(c) for comparison. Some representative posterior distributions for n for the case when x = 10 and (2.2) holds, for various values of N, are shown in Table 3. A double mode now occurs at n = 11 and 12, and the distribution tails off slightly faster than in the uniform prior case. If we let vl and v2 increase while keeping v1 + 3 = 4vZ, we gradually move into the case shown in Figure 1 (a) and so obtain the corresponding posterior distributions. We see that the prior (2.6) allows a great variety of prior feelings to be treated.
In situations where large factorials are involved, we can make use of Stirling’s approximation (see, for example, Kendall and Stuart, 1958, p. 81) to obtain an approximate posterior distribution. For purposes of illustration, we have done this for our example in the situation where p = 0.8 is known. The approximate posterior distribution then becomes proportional to
(1 - p>“(n + lln++.
(n - 2 +
l)“-“++ x<nlN) , A c
0 1 0 1 0 I
(a) (b) (cl
TABLE 3
Posterior for Example 1, Prior (8.1) N n 15 16 18 21 10 . 150 .I34 .,116 .102 11 . 194 .174 .150 .I32 12 .l94 .l74 .I50 .l32 13 .l77 .159 ,137 .l21 14 .154 -139 .120 .106 15 .132 .I19 .103 .091 16 .I01 .087 .077 17 .074 .065 18 .062 .055 19 . 046 20 .039 21 . 033
Representative approximate posterior distributions are given in Table 4 and we can compare these with the exact distributions of Table 1. We see that, even for these relatively small values of n and N, excellent approximation is obtained.
In cases (such as in our example) where the posterior is severely truncated and skew, use of a (truncated) Normal approximation would not be wise. However, if truncation were small, a Normal approximation derived from use of the posterior distribution’s mean and variance would provide an adequate representation of
TABLE 4
Approximate Posterior Distributions for Example 1, p Known (Compare with Table 1.) N n 15 16 18 21 10 .098 .094 .092 .090 11 .208 .198 .194 .191 12 .246 .235 .230 .226 13 .212 .202 .I98 .l95 14 .l48 .I41 .I38 .136 15 . 088 .084 .083 .081 16 . 045 .044 .043 17 . 021 .021 18 .009 19 . 004 20 . 002 21 . 001
the whole posterior and our work would be close to that of Feldman and Fox. For the “small” problem case however, the actual posterior (or a “Sterlingized” form of it) needs to be examined to appreciate what inferences can correctly be made.
Example 2
(T = 2)Pb I x, PI a (1 - dZ” z, z* ) ( )C >
maxi_<n_<N,
P(nlx) =
max xi 5 n 5 N.Suppose, to take a specific case, N = 15, x1 = 10, Z~ = 12. Let us first suppose p is known and equal to 0.8 as before: then
p(n 1 10, 12, 0.8) cc (1 - P)*~-~’
( ;b >( F2 > 12 < n _< 15 n
I 12 13 14 15
p(n 1 10, 12, 0.8) I 0.147 0.332 0.325 0.195 If p is unknown but other figures are the same we obtain
An
I 10,
12)0~
(r2t;
:I,! r.
.( >( >
r2 12 i n 5 15n
12 13 14 15An, 10, 12) 0.276 0.266 0.241 0.217
Comments similar to those made in Example 1 apply, and similar further analysis can be undertaken.
ACKSOWLEDGMENTS
We are grateful to the referees for comments which led to an improved pre- sentation. N. R. Draper was partially supported by the United States Navy through the Office of Naval Research, under contract Nonr-1202, Project NR 042 222. I. Guttman was partially supported by the Wisconsin Alumni Research Foundation.
REFERENCES
FELDMAN, D. and FOX, M. (1968). Estimation of the parameter TX in the binomial distribution. J. Amer. Statist. Assoc. 6S, 150-158.
KENDALL, M. G., and STUART, A. (1958). The Advanced Theory of Statistics. Volume I. Griffin, London.
LXNDLEY D. V. (1965). Probability and Statistics from a Bayesian Viewpoint: Parts I and II. Cambridge University Press.