Paul Segal manages the pig-farming division of the agricultural company Bowman-Lyons-Centerville. A rumored outbreak of Pulluscular Pig Disorder (PPD) in one of Paul's herds is on the verge of causing a public relations disaster.
The main symptom of PPD is a shrinking brain, and the only certain way to diagnose PPD is by measuring brain size post- mortem.
Paul needs to know if his herd is affected by PPD, but he does not want to have to slaughter hundreds of swine to find out. At the preliminary stage, he can offer no more than 5 prime porkers to be slaughtered and diagnosed.
For the pigs slaughtered, the mean brain weight was 0.18 lbs, with a standard deviation of 0.06 lbs. With 95% confidence, in what range does the herd's average brain weight lie?
a. [0.127 lbs, 0.233 lbs]
appropriate.
b. [0.123 lbs, 0.237 lbs]
This is not the correct answer. Make sure you are finding the t-value for a 95% confidence level, not a 90% confidence level.
c. [0.117 lbs, 0.243 lbs]
This is not the correct answer. Make sure you are finding the t-value using n - 1 = 4 degrees of freedom, not n = 5 degrees of freedom.
d. [0.106 lbs, 0.254 lbs]
This is the correct answer. The right t-value for a 95% confidence level and 4 degrees of freedom is 2.78.
Confidence Interval Utility z-table
t-table Proportions
The next morning, you and Alice are about to head off to the hotel pool when Leo calls you. The Customer Response Problem
I'm sorry to disturb you, but I have another problem, and I think you might be able to help.
The Kahana is a very popular resort during the summer tourist season. But the number of leisure visitors drops significantly during the off-season, from September through February and then April through May. We usually have quite a few room vacancies during that period of time. We expect to have about 200 rooms vacant for weeklong periods during the slow season this year.
I've developed a new program that rewards our best guests with a special discount if they book a weeklong stay during our slow period. They won't have complete date flexibility of course, but the steep discount should make the offer attractive for them.
To see how many of our past guests would accept such an offer, I sent promotional brochures to 100 of them. The deadline by which they had to respond to the offer has passed. Ten guests responded with the required room deposit prior to the deadline — that's a solid 10 percent.
I figure if we send out 2,000 promotions, we'll get about 200 responses.
This is a nice idea Leo, but I'm concerned it could backfire. If more than 10% respond to this offer, you might end up disappointing some of the very guests you're trying to reward. Or, if too many respond and you give them all the discount, you'll have to turn away customers willing to pay full price.
That is exactly my concern. I wonder how accurate the 10% response rate is. Just because it held for 100 guests, will it hold for 2,000? What if 11% actually respond to the promotions?
Imagine what would happen if 220 guests responded. I don't want to anger 20 loyal customers by telling them the offer is not valid, but I also don't want to turn away full paying guests to accommodate the extra 20 guests at a discount.
I'm willing to reserve 200 rooms for these discount weeklong stays during the slow season. How many return guests can I safely send the discount offer and be confident that no more than 200 will respond? You can tell that Leo is growing quite comfortable with relying on your statistical methods. He seems almost as interested in them as he is in your results.
Confidence Intervals And Proportions
Sometimes, the question we pose to members of a sample calls for a yes or no answer.
We might survey people in a target market and ask if they plan to buy a new car this year. Or survey voters and ask if they plan to vote for the incumbent candidate for office. Or we might take a sample of the products our plant produced yesterday and count how many are defective.
know what values our data can take — yes or no — but we don't know how often each response will be given.
In these cases, we usually convey the survey results by reporting the percentage of yes responses as a proportion, p-bar. This is our best estimate of p, the true percentage of "yes" responses in the underlying population.
Suppose, for example, that we have posted advertisements in the subway cars on Boston's "Red Line," and want to know what percentage of all passengers remembers seeing our ad.
We create a proper survey, and ask randomly selected Red Line passengers if they remember seeing our ad. 300 passengers respond to our survey, of which 100 passengers report remembering the ad.
Then p-bar is simply 33%, which is the number of people that remember the ad, 100, divided by the number of respondents, 300.
The remaining 200 passengers, or 67% of the sample, report not remembering the ad. The two proportions always add up to 1 because survey respondents report either remembering the ad or not.
Once we know the proportion of the sample, we can draw conclusions about all Red Line passengers. Our best estimate, or point estimate, for p, the percentage of all passengers who remember seeing our ad, is 33%.
As managers, we typically want more than this simple point estimate — we want to know how accurate the estimate is. How far from 33% might the true percentage be? Can we say confidently that it is between 30% and 36%, for example?
When we work with proportions, how do we find a confidence interval around our point estimate?
The process for creating a confidence interval around a proportion is nearly identical to the process we've used before. The only difference is that we can approximate the standard deviation of the population with a simple formula rather than calculating it directly from the raw data.
Based on our sample, our best estimate of the true population proportion is p-bar, the percentage of "yes" responses in our survey. Statistical theory tells us that our best estimate of the standard deviation of the true population proportion is the square root of [(p-bar)*(1 - (p-bar)]. We can use this approximate standard deviation to determine a confidence interval for the proportion.
For our Red Line ad, we approximate the standard deviation with the square root of 0.33 times 0.67, or 0.47. A 95%
confidence interval is 0.33 plus or minus 1.96 times 0.47 divided by the square root of 300. This is equal to 0.33 plus or minus 0.053, or 27.7% to 38.3%.
Unfortunately, there is one catch when we calculate confidence intervals around proportions... Sample Size
Sample size matters, particularly when dealing with very small or very large proportions. Suppose we are sampling New Yorkers for Amyotrophic Lateral Sclerosis, commonly known as Lou Gehrig's Disease. In the U.S., the odds of having the disease are less than 1 in 10,000. Would our sample be useful if we surveyed 100 people?
No. We probably wouldn't find a single person with the disease in our sample. Since the true proportion is very small, we need to have a large enough sample to make sure we find at least a few people with the disease. Otherwise, we will not have enough data to get a good estimate of the true proportion.
There is a guideline we must meet to make sure that our sample is large enough when estimating proportions. Two conditions must be met: First, the product of the sample size and the proportion must be at least 5. Second, the product of the sample size and 1 minus the proportion must also be at least 5.
If both these requirements are met, we can use the sample. Essentially, this guideline guarantees that our sample contains a reasonable number of "yes" and a reasonable number of "no" answers. Our sample will not be useful otherwise.
To avoid an invalid sample, we need to create a large enough sample size to satisfy the requirements. However, since we don't know the proportion p-bar before sampling, we don't know if the two conditions are met before setting the sample size. How can we get around this problem?
Finding a Preliminary Estimate of p-bar
We can obtain a preliminary estimate of p-bar using either of two methods: first, we can use past experience. For example, to estimate the rate of Lou Gehrig's disease, we can research the rate of occurrence in the general population. This is a reasonable first estimate for p-bar.
In many cases, however, we are sampling for the first time. Without past experience, we don't know what p-bar might be. In this case, it may well be worth our time to take a small test sample to estimate the proportion, p-bar.
For example, if the proportion of yes answers in our small test sample is 3%, then we can use 3% as our preliminary estimate of p-bar.
Substituting 3% for p-bar in our two requirements, n(p-bar) " 5 and n(1 - (p-bar)) " 5, tells us that n must satisfy n*0.03 " 5 and n*0.97 " 5. Thus the sample size we need for our real sample must be at least 167.
We would then use a real sample — with at least 167 respondents — to find an actual sample value of p-bar to create a confidence interval for the population proportion.
Proportions are often used to indicate the frequency of some characteristic in a population. The sample proportion p-bar is the number of occurrences of the characteristic in the sample divided by the
number of respondents, the sample size. It is our best estimate of the true proportion in the
population. We can construct a confidence interval for the population proportion. Two guidelines for the sample size must be met for a valid confidence interval: n(p-bar) and n(1 - (p-bar)) must each be at least five.
Solving the Customer Response Problem
Creating confidence intervals around proportions is not much different from creating them around means. Finding the right number of Leo's promotional brochures to mail should be easy.
Leo needs to know how accurate the 10 percent response rate of his 100-customer sample is. Will this response rate hold for 2,000 guests? To how many guests can he send the discount offer for his 200 rooms?
First, you calculate a 95% confidence interval for the response rate.
Enter the lower bound as a decimal number with two digits to the right of the decimal, (e.g., enter "5" as "5.00"). Round if necessary.
z-table
Confidence Interval Utility
The 95% confidence interval for the proportion estimate is 0.0412 to 0.1588, or 4.12% and 15.88%. You obtain that answer by using the sample data and applying the familiar formula:
Then after giving Leo's questions some thought, you recommend to him that he send the mailing to a specific number of guests.
Enter the number of guests as an integer, (e.g., "5"). Round if necessary. z-table
Based on the confidence interval for the proportion, the maximum percentage of people who are likely to respond to the discount offer (at the 95% confidence level) is 15.88%. So, if 15.88% of people were to respond for 200 rooms, how many people should Leo send out the survey to? Simply divide 200 by 0.1588 to get to the answer: Leo needs to send out the survey to at most 1,259 past customers.
Leo is pleased with your work. He tells you to relax and enjoy the resort. Exercise 1: GMW Automotive
GMW is a German auto manufacturer that has regional sales subsidiaries throughout the world. Arturo Lopez heads the Mexican sales division of the company's Latin American subsidiary.
GMW earns additional profit when customers choose to finance their car purchase with a GMW financing package. Arturo has been asked to submit a report to the GMW CEO in Germany about the percentage of GMW customers who opt for financing.
Arturo has asked you, a new member of the division sales team, to devise a way to estimate this percentage. You take a random sample of 64 cars sold in the Mexican sales division, and find that 13 of them, or about 20.3%, opted for GMW financing.
If you want to be 95% confident in your report to Mr. Lopez, you should tell him that the percentage of all Mexican customers opting for GMW financing falls in the range:
a. from 12.0% to 28.6%
This is not the correct answer. The appropriate z-value for a 95% confidence interval is 1.96. b. from 10.4% to 30.2%
This is the correct answer. c. from 15.1% to 25.5%
This is not the correct answer. The appropriate standard deviation for the sample is the square root of [p*(1 - p)] = square root of [0.203*(1 - 0.203)] = 0.40 .
d. You do not have sufficient information to solve this problem
This is not the correct answer. All the data needed to solve this problem are present: the sample size, the sample proportion, and the desired confidence level. Random sampling and the leeway granted by the confidence interval allow us to infer facts about an entire population from a small sample.
z-table