Lecture 18: Simple Random Sampling
Tessa L. Childers-Day UC Berkeley
24 July 2014
By the end of this lecture...
You will be able to:
Draw box models for real-world scenarios
Find EVs and SEs for percentages (as opposed to sums)
Explain the difference between drawing with and without
replacement (and how that affects EVs, SEs, and use of the
normal)
Recap: Box Models
Box models are useful in analyzing games of chance Draw a box model to describe a process (sums or classify/count)
Draw randomly, with replacement, from the box Calculate EV for sum and SE for sum
Use normal curve to calculate probabilities of certain
outcomes. Why?
Example: Roulette
You are playing roulette, a game in which a ball has equal chances of landing on one of 18 red spaces, 18 black spaces, or 2 green spaces. What does the box look like?
Now, I want to know what the chance is of having the ball land on a red space 20 or more times in 35 plays.
Draw a box model to describe the game (sums or classify/count)
Draw randomly, with replacement, from the box Calculate EV for sum and SE for sum
Use normal curve to calculate probability
Other Box Models
Box models can also be useful in analyzing the ways that other chance processes work
Surveys, studies, experiments, etc.
What is our “box”?
How do we draw from it?
How is this different from the other box models we’ve seen?
Example: Loan Debt
You are interested in estimating the percentage of students at Cal who have taken out student loans. Here, and in general:
Draw a box model to describe the population Draw randomly, without replacement, from the box Calculate EV for percent and SE for percent
Use normal curve to calculate probability
Chance Variability
Say that the population is divided evenly among class rank.
Will my sample reflect this?
Why or why not?
If I take one sample, put every ticket back, and draw another sample, will they match?
Is this a problem?
The Expected Value
Recall, that when drawing with replacement,
EV for sum = # of draws × average of box.
But our box is made of only “0”s and “1”s, so
EV for # of “1”s = number of draws × proportion of “1”s.
The Expected Value (cont.)
A percentage = # of something
total # × 100, so EV for percent of “1”s = EV for # of “1”s
# of draws × 100
= number of draws × proportion of “1”s
# of draws × 100
= proportion of “1”s × 100
= percentage of “1”s
This is when we draw with replacement.
The Expected Value (cont.)
When drawing without replacement
EV for percent of “1”s = percentage of “1”s.
Intuitively, it makes sense that we expect to see a representative
number of “1”s, since this is a good sampling method
The Standard Error
Recall, that when drawing with replacement, SE for sum = p
# of draws × SD of box.
But our box is made of only “0”s and “1”s, so SE for
# of “1”s = p
# of draws ×(1−0)
s proportion of “1”s
proportion of “0”s
.
The Standard Error (cont.)
A percentage = # of something
total # × 100, so SE for percent of “1”s = SE for # of “1”s
# of draws × 100
=
√
number of draws
# of draws ×
s proportion of “1”s
proportion of “0”s
× 100
= r
proportion of “1”s
proportion of “0”s
√
number of draws × 100
This is when we draw with replacement.
The Standard Error (cont.)
SE for percent of “1”s = r
proportion of “1”s
proportion of “0”s
√
number of draws × 100 We can see that:
Increasing the number of trials:
increases the SE for the sum by a factor of the square root
decreases the SE for % by a factor of the square-root
As with our previous SE’s, this tells us about how far off a
draw will be from the EV
The Standard Error (cont.)
When drawing without replacement SE for percent of “1”s
without replacement = correction
factor × SE for percent of “1”s with replacement . Where
correction factor =
s
population size − sample size population size − 1
Intuitively, it makes sense that we must relate the sample size and
the population size
The Correction Factor
correction factor =
s
population size − sample size population size − 1
What happens if our sample is small compared to the population?
What if it is large?
The Correction Factor (cont.)
0 1000 2000 3000 4000
0.00.20.40.60.81.0
Sample Size
Corection Factor
Pop Size = 100,000 Pop Size = 10,000 Pop Size = 4000 Pop Size = 3000 Pop Size = 2000 Pop Size = 1,000
Comparison
Sum of draws from a box, with replacement
EV for sum
= # of draws × avg. of box SE for sum
= p
# of draws× SD of box
Percent of “1”s from a 0-1 box, with replacement
EV for %
= percent of “1”s in the box SE for %
= r
proportion of “1”s
proportion of “0”s
√
number of draws × 100
Comparison (cont.)
Percent of “1”s from a 0-1 box, with replacement
EV for %
= percent of “1”s in the box
SE for %
= r
proportion of “1”s
proportion of “0”s
√
number of draws × 100
Percent of “1”s from a 0-1 box, without replacement
EV for %
= percent of “1”s in the box
SE for % = correction factor
× r
proportion of “1”s
proportion of “0”s
√
number of draws × 100
Examples
Say we want to find the chance of having a roulette ball land on a red space 20 or more times in 35 plays.
Draw a box, indicate number and kind of tickets, number of draws, kind of draws
Calculate EV for sum
Calculate SE for sum
Use normal curve
Examples (cont.)
Say we want to find the chance of having a roulette ball land on a red space 57% or more of the time, in 35 plays.
Draw a box, indicate number and kind of tickets, number of draws, kind of draws
Calculate EV for percent
Calculate SE for percent
Use normal curve
Examples (cont.)
Assume that there are 30,000 students at UC Berkeley, and that 65% of them have some student loan debt. We want to find the chance of having 67% or more of those sampled have student loan debt, in a sample of size 1,000.
Draw a box, indicate number and kind of tickets, number of draws, kind of draws
Calculate EV for percent
Calculate SE for percent
Use normal curve
Examples (cont.)
I’m interested in looking at student loan debt at other colleges, besides UC Berkeley. So, I expand my survey to include UCLA and Pomona College (a small, liberal arts university). In both places, I will take a sample of 2% of the students, in order to estimate the percentage of students with loan debt. Other things being equal:
1
The accuracy to be expected at UCLA is about the same as the accuracy to be expected at Pomona.
2
The accuracy to be expected at UCLA is quite a bit higher than at Pomona.
3