Stat 20: Intro to Probability and Statistics

(1)

Lecture 18: Simple Random Sampling

Tessa L. Childers-Day UC Berkeley

24 July 2014

(2)

By the end of this lecture...

You will be able to:

Draw box models for real-world scenarios

Find EVs and SEs for percentages (as opposed to sums)

Explain the difference between drawing with and without

replacement (and how that affects EVs, SEs, and use of the

normal)

(3)

Recap: Box Models

Box models are useful in analyzing games of chance Draw a box model to describe a process (sums or classify/count)

Draw randomly, with replacement, from the box Calculate EV for sum and SE for sum

Use normal curve to calculate probabilities of certain

outcomes. Why?

(4)

Example: Roulette

You are playing roulette, a game in which a ball has equal chances of landing on one of 18 red spaces, 18 black spaces, or 2 green spaces. What does the box look like?

Now, I want to know what the chance is of having the ball land on a red space 20 or more times in 35 plays.

Draw a box model to describe the game (sums or classify/count)

Draw randomly, with replacement, from the box Calculate EV for sum and SE for sum

Use normal curve to calculate probability

(5)

Other Box Models

Box models can also be useful in analyzing the ways that other chance processes work

Surveys, studies, experiments, etc.

What is our “box”?

How do we draw from it?

How is this different from the other box models we’ve seen?

(6)

Example: Loan Debt

You are interested in estimating the percentage of students at Cal who have taken out student loans. Here, and in general:

Draw a box model to describe the population Draw randomly, without replacement, from the box Calculate EV for percent and SE for percent

Use normal curve to calculate probability

(7)

Chance Variability

Say that the population is divided evenly among class rank.

Will my sample reflect this?

Why or why not?

If I take one sample, put every ticket back, and draw another sample, will they match?

Is this a problem?

(8)

The Expected Value

Recall, that when drawing with replacement,

EV for sum = # of draws × average of box.

But our box is made of only “0”s and “1”s, so

EV for # of “1”s = number of draws × proportion of “1”s.

(9)

The Expected Value (cont.)

A percentage = # of something

total # × 100, so EV for percent of “1”s = EV for # of “1”s

# of draws × 100

= number of draws × proportion of “1”s

# of draws × 100

= proportion of “1”s × 100

= percentage of “1”s

This is when we draw with replacement.

(10)

The Expected Value (cont.)

When drawing without replacement

EV for percent of “1”s = percentage of “1”s.

Intuitively, it makes sense that we expect to see a representative

number of “1”s, since this is a good sampling method

(11)

The Standard Error

Recall, that when drawing with replacement, SE for sum = p

# of draws × SD of box.

But our box is made of only “0”s and “1”s, so SE for

# of “1”s = p

# of draws ×(1−0)

s proportion of “1”s

proportion of “0”s

.

(12)

The Standard Error (cont.)

A percentage = # of something

total # × 100, so SE for percent of “1”s = SE for # of “1”s

# of draws × 100

=

√

number of draws

# of draws ×

s proportion of “1”s

proportion of “0”s

× 100

= r

proportion of “1”s

√

number of draws × 100

This is when we draw with replacement.

(13)

The Standard Error (cont.)

SE for percent of “1”s = r

√

number of draws × 100 We can see that:

Increasing the number of trials:

increases the SE for the sum by a factor of the square root

decreases the SE for % by a factor of the square-root

As with our previous SE’s, this tells us about how far off a

draw will be from the EV

(14)

The Standard Error (cont.)

When drawing without replacement SE for percent of “1”s

without replacement = correction

factor × SE for percent of “1”s with replacement . Where

correction factor =

s

population size − sample size population size − 1

Intuitively, it makes sense that we must relate the sample size and

the population size

(15)

The Correction Factor

correction factor =

s

population size − sample size population size − 1

What happens if our sample is small compared to the population?

What if it is large?

(16)

The Correction Factor (cont.)

0 1000 2000 3000 4000

0.00.20.40.60.81.0

Sample Size

Corection Factor

Pop Size = 100,000 Pop Size = 10,000 Pop Size = 4000 Pop Size = 3000 Pop Size = 2000 Pop Size = 1,000

(17)

Comparison

Sum of draws from a box, with replacement

EV for sum

= # of draws × avg. of box SE for sum

= p

# of draws× SD of box

Percent of “1”s from a 0-1 box, with replacement

EV for %

= percent of “1”s in the box SE for %

= r

√

number of draws × 100

(18)

Comparison (cont.)

Percent of “1”s from a 0-1 box, with replacement

EV for %

= percent of “1”s in the box

SE for %

= r

√

number of draws × 100

Percent of “1”s from a 0-1 box, without replacement

EV for %

= percent of “1”s in the box

SE for % = correction factor

× r

√

number of draws × 100

(19)

Examples

Say we want to find the chance of having a roulette ball land on a red space 20 or more times in 35 plays.

Draw a box, indicate number and kind of tickets, number of draws, kind of draws

Calculate EV for sum

Calculate SE for sum

Use normal curve

(20)

Examples (cont.)

Say we want to find the chance of having a roulette ball land on a red space 57% or more of the time, in 35 plays.

Draw a box, indicate number and kind of tickets, number of draws, kind of draws

Calculate EV for percent

Calculate SE for percent

Use normal curve

(21)

Examples (cont.)

Assume that there are 30,000 students at UC Berkeley, and that 65% of them have some student loan debt. We want to find the chance of having 67% or more of those sampled have student loan debt, in a sample of size 1,000.

Draw a box, indicate number and kind of tickets, number of draws, kind of draws

Calculate EV for percent

Calculate SE for percent

Use normal curve

(22)

Examples (cont.)

I’m interested in looking at student loan debt at other colleges, besides UC Berkeley. So, I expand my survey to include UCLA and Pomona College (a small, liberal arts university). In both places, I will take a sample of 2% of the students, in order to estimate the percentage of students with loan debt. Other things being equal:

1

The accuracy to be expected at UCLA is about the same as the accuracy to be expected at Pomona.

2

The accuracy to be expected at UCLA is quite a bit higher than at Pomona.

3

The accuracy to be expected at UCLA is quite a bit lower

than at Pomona.

(23)

Important Takeaways

Box models can be used for real world problems, not just gambling

Box = population, Draws from box without replacement = sample (SRS)

EV, and SE for percent of “1”s drawn from a 0-1 box Correction factor for drawing without replacement Probability histogram still normally distributed

We can use the correction factor, combined with the normal curve, to find probabilities under the normal curve

(approximate probabilities from the probability histogram)

Stat 20: Intro to Probability and Statistics

Lecture 18: Simple Random Sampling

Tessa L. Childers-Day UC Berkeley

24 July 2014

By the end of this lecture...

You will be able to:

Draw box models for real-world scenarios

Find EVs and SEs for percentages (as opposed to sums)

Explain the difference between drawing with and without

replacement (and how that affects EVs, SEs, and use of the

normal)

Recap: Box Models

Box models are useful in analyzing games of chance Draw a box model to describe a process (sums or classify/count)

Draw randomly, with replacement, from the box Calculate EV for sum and SE for sum

Use normal curve to calculate probabilities of certain

outcomes. Why?

Example: Roulette

You are playing roulette, a game in which a ball has equal chances of landing on one of 18 red spaces, 18 black spaces, or 2 green spaces. What does the box look like?

Now, I want to know what the chance is of having the ball land on a red space 20 or more times in 35 plays.

Draw a box model to describe the game (sums or classify/count)

Draw randomly, with replacement, from the box Calculate EV for sum and SE for sum

Use normal curve to calculate probability

Other Box Models

Box models can also be useful in analyzing the ways that other chance processes work

Surveys, studies, experiments, etc.

What is our “box”?

How do we draw from it?

How is this different from the other box models we’ve seen?

Example: Loan Debt

You are interested in estimating the percentage of students at Cal who have taken out student loans. Here, and in general:

Draw a box model to describe the population Draw randomly, without replacement, from the box Calculate EV for percent and SE for percent

Use normal curve to calculate probability

Chance Variability

Say that the population is divided evenly among class rank.

Will my sample reflect this?

Why or why not?

If I take one sample, put every ticket back, and draw another sample, will they match?

Is this a problem?

The Expected Value

Recall, that when drawing with replacement,

EV for sum = # of draws × average of box.

But our box is made of only “0”s and “1”s, so

EV for # of “1”s = number of draws × proportion of “1”s.

The Expected Value (cont.)

A percentage = # of something

total # × 100, so EV for percent of “1”s = EV for # of “1”s

# of draws × 100

= number of draws × proportion of “1”s

# of draws × 100

= proportion of “1”s × 100

= percentage of “1”s

This is when we draw with replacement.

The Expected Value (cont.)

When drawing without replacement

EV for percent of “1”s = percentage of “1”s.

Intuitively, it makes sense that we expect to see a representative

number of “1”s, since this is a good sampling method

The Standard Error

Recall, that when drawing with replacement, SE for sum = p

# of draws × SD of box.

But our box is made of only “0”s and “1”s, so SE for

# of “1”s = p

# of draws ×(1−0)

s  proportion of “1”s

  proportion of “0”s



.

The Standard Error (cont.)

A percentage = # of something

total # × 100, so SE for percent of “1”s = SE for # of “1”s

# of draws × 100

=

√

number of draws

# of draws ×

s  proportion of “1”s

  proportion of “0”s



× 100

= r 

s proportion of “1”s

proportion of “0”s

s proportion of “1”s

proportion of “0”s

= r

SE for percent of “1”s = r

= r

= r

× r