• No results found

The Binomial Distribution

Suppose an event can have only binary outcomes (eg, yes and no, or positive and negative), denoted A and B. The probability of A is denoted by π, or P(A) = π, and this probability stays the same each time the event occurs. The probability of B must therefore be 1 – π, because B occurs if A does not. If an experiment involving this event is repeated n times and the outcome is independent from one trial to another, what is the probability that outcome A occurs exactly X times? Or equivalently, what proportion of the n outcomes will be A? These questions frequently are of interest, especially in basic science research, and they can be answered with the binomial distribution.

Box 4-1. Estimated rates of no biochemical recurrence according to pretreatment prostate-specific antigen values.

Basic principles of the binomial distribution were developed by the 17th century Swiss mathematician Jakob Bernoulli, who made many contributions to probability theory. He was the author of what is generally acknowledged as the first book devoted to probability, published in 1713. In fact, in his honor, each trial involving a binomial probability is sometimes called a Bernoulli trial,

Figure. No Caption available.

Number of Patients at Risk by Pretreatment PSA Valuesa

Total 1607 1600 1552 1176 804 448 237 104 39 14

1 799 795 775 587 387 233 119 57 23 6

2 419 418 407 303 209 98 49 20 4 1

3 163 162 158 126 93 50 25 10 5 3

4 226 225 212 160 115 67 44 17 7 4

aData represent 1607 patients with stage T1b, T1c, T2, and NX tumors; P < 0.001 for all groups. PSA = prostate-specific antigen.

Source: Reproduced, with permission, from Shipley WU, Thomas HD, Sandler HM, Hanks GE, Zietman AL, Perez CA, et al:

Radiation therapy for clinically localized prostate cancer: A multiinstitutional pooled analysis. JAMA 1999;281: 1598–1604.

P.73

and a sequence of trials is called a Bernoulli process. The binomial distribution gives the probability that a specified outcome occurs in a given number of independent trials. The binomial distribution can be used to model the inheritability of a particular trait in genetics, to estimate the occurrence of a specific reaction (eg, the single packet, or quantal release, of acetylcholine at the neuromuscular junction), or to estimate the death of a cancer cell in an in vitro test of a new chemotherapeutic agent.

We use the information collected by Shipley and colleagues (1999) in Presenting Problem 3 to illustrate the binomial distribution.

Assume, for a moment, that the entire population of men with a localized prostate tumor and a pretreatment PSA < 10 has been studied, and the probability of 5-year survival is equal to 0.8 (we use 0.8 for computational convenience, rather than 0.81 as reported in the study). Let S represent the event of 5-year survival and D represent death before 5 years; then, π = P(S) = 0.8 and 1 – π = P(D) = 0.2. Consider a group of n = 2 men with a localized prostate tumor and a pretreatment PSA < 10. What is the probability that exactly two men live 5 years? That exactly one lives

5 years? That none lives 5 years? These probabilities are found by using the multiplication and addition rules outlined earlier in this chapter.

The probability that exactly two men live 5 years is found by using the multiplication rule for independent events. We know that P (S) = 0.8 for patient 1 and P(S) = 0.8 for patient 2. Because the survival of one patient is independent from (has no effect on) the survival of the other patient, the probability of both surviving is

The event of exactly one patient living 5 years can occur in two ways: patient 1 survives 5 years and patient 2 does not, or patient 2 survives 5 years and patient 1 does not. These two events are mutually exclusive; therefore, after using the multiplication rule to obtain the probability of each event, we can use the addition rule for mutually exclusive events to combine the probabilities as follows:

These computational steps are summarized in Table 4-6. Note that the total probability is

which you may recognize as the binomial formula, (a + b)2 = a2 + 2ab + b2.

The same process can be applied for a group of patients of any size or for any number of trials, but it becomes quite tedious. An easier technique is to use the formula for the binomial distribution, which follows. The probability of X outcomes in a group of size n, if each outcome has probability π and is independent from all other outcomes, is given by

where ! is the symbol for factorial; n! is called n factorial and is equal to the product n(n – 1)(n – 2)…(3)(2)(1). For example, 4! = (4)(3)(2)(1) = 24. The number 0! is defined as 1. The symbol πX indicates that the probability is raised to the power X, and (1 – π) n–X means that 1 minus the probability is raised to the power n – X. The expression n!/[ X!(n – X)!] is sometimes referred to as the formula for combinations because it gives the number of combinations (or assortments) of X items possible among the n items in the group.

P.74

Table 4-6. Summary of probabilities for two patients.

To verify that the probability that exactly X = 1 of n = 2 patients survives 5 years is 0.32, we use the formula:

To summarize, the binomial distribution is useful for answering questions about the probability of X number of occurrences in n independent trials when there is a constant probability π of success on each trial. For example, suppose a new series of men with prostate tumors is begun with ten patients. We can use the binomial distribution to calculate the probability that any particular number of them will survive 5 years. For instance, the probability that all ten will survive 5 years is

Similarly, the probability that exactly eight patients will survive 5 years is

Table 4-7 lists the probabilities for X = 0, 1, 2, 3, …, 10; a plot of the binomial distribution when n = 10 and π = 0.8 is given in Figure 4-2. The mean of the binomial distribution is nπ; so (10)(0.8) = 8 is the mean number of patients surviving 10 years in this example. The standard deviation is

P.75

Table 4-7. Probabilities for binomial distribution with n = 10 and π = 0.8.

Number of Patients Surviving πX (1 – π)n – X P ( X)a

0 1 1 0.0000001 0

1 10 0.8 0.0000005 0

2 45 0.64 0.0000026 0.0001

3 120 0.512 0.0000128 0.0008

4 210 0.410 0.000064 0.0055

5 252 0.328 0.00032 0.0264

6 210 0.262 0.0016 0.0881

7 120 0.210 0.008 0.2013

8 45 0.168 0.04 0.3020

9 10 0.134 0.2 0.2684

Thus, the only two pieces of information needed to define a binomial distribution are n and π, which are called the parameters of the binomial distribution. Studies involving dichotomous, or binary, variables often use a proportion rather than a number (eg, the proportion of patients surviving a given length of time rather than the number of patients). When a proportion is used instead of a number of successes, the same two pieces of information (n and π) are needed. Because the proportion is found by dividing X by n, however, the mean of the distribution of the proportion becomes π, and the standard deviation becomes

Even using the formula for the binomial distribution becomes time-consuming, especially if the numbers are large. Also, the formula gives the probability of observing exactly X successes, and interest frequently lies in knowing the probability of X or more successes or of X or less successes. For example, to find the probability that eight or more patients will survive 5 or more years, we must use the formula to find the separate probabilities that eight will survive, nine will survive, and ten will survive and then sum these results; from Table 4-7, we obtain P(X ≥ 8) = P(X = 8) + P(X = 9) + P(X = 10) = 0.3020 + 0.2684 + 0.1074 = 0.6778. Tables giving probabilities for the binomial distribution are presented in many elementary texts. Much research in the health field is conducted with sample sizes large enough to use an approximation to the binomial distribution; this approximation is discussed in Chapter 5.