Chapter 18
Experiment – Holiday Nerds
Each member of the class will receive a 1
teaspoon sample of holiday Nerd candies.
Find the proportion (percentage) of green Nerds
in you sample.
Mark your proportion on the dot plot on the board.
Questions…
Modeling the Distribution of
Sample Proportions (cont.)
It turns out that the histogram is unimodal,
symmetric, and centered at p.
More specifically, it’s an amazing and fortunate
fact that a Normal model is just the right one for the histogram of sample proportions.
To use a Normal model, we need to specify its
Modeling the Distribution of
Sample Proportions (cont.)
When working with proportions, knowing the
mean automatically gives us the standard
deviation as well—the standard deviation we will use is
So, the distribution of the sample proportions is
modeled with a probability model that is
, pq
N p
Modeling the Distribution of
Sample Proportions (cont.)
How Good Is the Normal Model?
The Normal model gets better as a good model
for the distribution of sample proportions as the sample size gets bigger.
Just how big of a sample do we need? This will
Assumptions and Conditions
Most models are useful only when specific
assumptions are true.
There are two assumptions in the case of the
model for the distribution of sample proportions:
1. The sampled values must be independent of
each other.
Assumptions and Conditions (cont.)
Assumptions are hard—often impossible—to check.
That’s why we assume them.
Still, we need to check whether the assumptions are
reasonable by checking conditions that provide information about the assumptions.
The corresponding conditions to check before using the
Normal to model the distribution of sample proportions are
Assumptions and Conditions (cont.)
1. 10% condition: If sampling has not been made
with replacement, then the sample size, n, must be no larger than 10% of the population.
2. Success/failure condition: The sample size has
to be big enough so that both and are greater than 10.
So, we need a large enough sample that is not too large.
ˆ
The Sampling Distribution Model
for a Proportion (cont.)
Provided that the sampled values are
independent and the sample size is large
enough, the sampling distribution of is modeled by a Normal model with
Mean:
Standard deviation:
ˆ
p
( )
p
ˆ
p
Example 1: Cars on the Interstate
Of all cars on the interstate, 80% exceed the speed limit. What proportion of speeders might we see among the next 50 cars?
Check 10% condition:
Example continued…
1. 50 is less than 10% of all cars on the highway. 2. np = 50(.8) = 40 and nq = 50(.2) = 10
(This model is approximately Normal) 3.
Example 2: Harley-Davidson Motorcycles
Example 3: Lefties
Suppose that about 13% of the population is left-handed. A 200 seat school auditorium has been built with 15 “lefty seats”, seats that have the built-in desk on the left rather than the right arm of the chair. In a class of 90 students, what’s the
Example 3 continued….
You are trying to find the probability that more than 15 out of 90 students will be lefties. (This is 16.7%)
1. 90 is less than 10% of the student body (think BHS) 2. np = 90(.13) = 11.7 ≥ 10 nq = 90(.87) = 78.3 ≥ 10
(approximately Normal)
3. N(13,.035)
4. P(p-hat>.167) = P(z>1.06) = .1446
What About Quantitative Data?
Proportions summarize categorical variables.
The Normal sampling distribution model looks like
it will be very useful.
Can we do something similar with quantitative
data?
We can indeed. Even more remarkable, not only
Simulating the Sampling Distribution of a Mean
Like any statistic computed from a random
sample, a sample mean also has a sampling distribution.
We can use simulation to get a sense as to what
Means – The “Average” of One Die
Let’s start with a simulation of 10,000 tosses of a
Means – Averaging More Dice
Looking at the average of
two dice after a simulation of 10,000 tosses:
The average of three dice
after a simulation of
Means – Averaging Still More Dice
The average of 5 dice
after a simulation of
10,000 tosses looks like:
The average of 20 dice
after a simulation of
Means – What the Simulations Show
As the sample size (number of dice) gets larger,
each sample average is more likely to be closer to the population mean.
So, we see the shape continuing to tighten
around 3.5
And, it probably does not shock you that the
The Fundamental Theorem of Statistics
The sampling distribution of any mean becomes
Normal as the sample size grows.
All we need is for the observations to be
independent and collected with randomization.
We don’t even care about the shape of the
population distribution!
The Fundamental Theorem of Statistics is called
The Fundamental Theorem of Statistics (cont.)
The CLT is surprising and a bit weird:
Not only does the histogram of the sample
means get closer and closer to the Normal model as the sample size grows, but this is true regardless of the shape of the population distribution.
The CLT works better (and faster) the closer the
The Fundamental Theorem of Statistics (cont.)
The Central Limit Theorem (CLT)
The mean of a random sample has a sampling
distribution whose shape can be approximated by a Normal model. The larger the sample, the
But Which Normal?
The CLT says that the sampling distribution of
any mean or proportion is approximately Normal.
But which Normal model?
For proportions, the sampling distribution is
centered at the population proportion.
For means, it’s centered at the population
mean.
But Which Normal? (cont.)
The Normal model for the sampling distribution of
the mean has a standard deviation equal to
where σ is the population standard deviation.
SD y
n
But Which Normal? (cont.)
The Normal model for the sampling distribution of
the proportion has a standard deviation equal to
ˆ pqSD p
n
Assumptions and Conditions
The CLT requires remarkably few assumptions, so there
are few conditions to check:
1. Random Sampling Condition: The data values must
be sampled randomly or the concept of a sampling distribution makes no sense.
2. Independence Assumption: The sample values must
be mutually independent. (When the sample is drawn without replacement, check the 10% condition…)
3. Large Enough Sample Condition: There is no
Diminishing Returns
The standard deviation of the sampling
distribution declines only with the square root of the sample size.
While we’d always like a larger sample, the
Example 4: SAT Scores
SAT scores should have a mean of 500 and a standard deviation of 100. What about the mean of random
samples of 20 students? The college board assures us that SAT results are Normal.
Check random sampling condition. Check 10% condition.
Example continued…
1. The 20 students are selected randomly.
2. 20 students represents less than 10% of all students.
3. Assumption: students SAT scores are independent, as long as they
weren’t all from the same university.
Discuss!!!!!!!
What is the probability that your sample of students will have a mean
Example 5: Cars again
Speeds of cars on a highway have mean 52 mph and
standard deviation of 6 mph, and are likely to be skewed to the right (a few very fast drivers). Describe what we might see in random samples of 50 cars.
Check random sampling condition. Check 10% condition.
Example continued…
1. The 50 speeds were sampled randomly. 2. 50 cars is less than 10% of all cars.
3. It’s reasonable to think that the speeds of the cars are mutually
independent.
4. The central Limit Theorem applies, because n is large.
What is the probability that the mean speed of your sample of cars will
Example 6: Babies at Birth
At birth, babies average 7.8 pounds, with a standard
Example continued…
1. 34 babies were sampled randomly.
2. As long as 340 babies were born to mothers living in the vicinity of
the factory, 34 is less than 10%.
3. It’s reasonable to assume that the weights of the babies are
mutually independent.
4. The central limit theorem applies since the sample size is big.
Is 7.2 pounds unusually low? z = -1.67. This is within 2 standard
Standard Error
Both of the sampling distributions we’ve looked at
are Normal.
For proportions
For means
ˆ pqSD p
n
SD y
n
Standard Error (cont.)
When we don’t know p or σ, we’re stuck, right?
Nope. We will use sample statistics to estimate
these population parameters.
Whenever we estimate the standard deviation of
Standard Error (cont.)
For a sample proportion, the standard error is
For the sample mean, the standard error is
ˆ pqˆ ˆSE p
n
sSE y
n