16. THE NORMAL APPROXIMATION TO THE BINOMIAL DISTRIBUTION
It is sometimes difficult to directly compute probabilities for a binomial (n, p) random variable, X. We need a different table for each value of n, p. If we don't have a table, direct calculations can get cumbersome very quickly.
Eg: Compute P(X ≤ 100) for n = 150, p = 0.35.
For normal random variables, on the other hand, probability calculations are extremely easy; just one table is required.
Fortunately, we can approximate the binomial distribution by a normal distribution, with an appropriate choice of µ and σ .
To get a feel for why this might work, let's study the Quincunx.
The Quincunx is a device invented by Sir Francis Galton in the 1800’s which shows empirically that binomial random variables, observed repeatedly, reveal a histogram which looks bell-shaped, as long as the number of trials is not too small.
See Quincunx website at:
http://www.rand.org/methodology/stat/applets/clt.html
• In general, the distribution of a binomial random variable may be accurately approximated by that of a normal random variable,
as long as np ≥ 5, nq ≥ 5, and assuming that a “continuity correction” is made to account for the fact that we are using a continuous distribution (the normal) to approximate a discrete one (the binomial).
• For approximating the distribution of X, we will use the normal distribution with mean µ = np, variance σ
2= npq, where q = 1 − p. Why are these reasonable choices of µ , σ
2? To study the quality of this approximation, visit the Normal Approximation to the Binomial website at:
http://www.stat.sc.edu/~west/applets/binomialdemo2.html This draws a bar chart of the binomial distribution for a given n, p, and superimposes the approximating normal distribution. Note how skewness increases as p moves away from 0.5.
See histograms of number of dark M&Ms and orange
M&Ms from M&M Lab. (Separate handout).
• If p(x) is the binomial distribution and f (x) is the density of the normal, the approximation is:
Thus, the binomial probability p(a) is approximately equal to the probability that a normal RV with mean np and variance npq lies between x = a − 1/2 and x = a + 1/2.
Also, P(a ≤ X ≤ b) is approximately equal to the area under the normal curve between x = a − 1/2 and x = b + 1/2.
≈ ∫
+−
2 1
2 1
) ( )
(
aa
dx x f a
p
≈ ∫
∑
+= −
2 1
21
) ( )
(
ba b
a x