• No results found

Stat 20: Intro to Probability and Statistics

N/A
N/A
Protected

Academic year: 2021

Share "Stat 20: Intro to Probability and Statistics"

Copied!
23
0
0

Loading.... (view fulltext now)

Full text

(1)

Lecture 18: Simple Random Sampling

Tessa L. Childers-Day UC Berkeley

24 July 2014

(2)

By the end of this lecture...

You will be able to:

Draw box models for real-world scenarios

Find EVs and SEs for percentages (as opposed to sums)

Explain the difference between drawing with and without

replacement (and how that affects EVs, SEs, and use of the

normal)

(3)

Recap: Box Models

Box models are useful in analyzing games of chance Draw a box model to describe a process (sums or classify/count)

Draw randomly, with replacement, from the box Calculate EV for sum and SE for sum

Use normal curve to calculate probabilities of certain

outcomes. Why?

(4)

Example: Roulette

You are playing roulette, a game in which a ball has equal chances of landing on one of 18 red spaces, 18 black spaces, or 2 green spaces. What does the box look like?

Now, I want to know what the chance is of having the ball land on a red space 20 or more times in 35 plays.

Draw a box model to describe the game (sums or classify/count)

Draw randomly, with replacement, from the box Calculate EV for sum and SE for sum

Use normal curve to calculate probability

(5)

Other Box Models

Box models can also be useful in analyzing the ways that other chance processes work

Surveys, studies, experiments, etc.

What is our “box”?

How do we draw from it?

How is this different from the other box models we’ve seen?

(6)

Example: Loan Debt

You are interested in estimating the percentage of students at Cal who have taken out student loans. Here, and in general:

Draw a box model to describe the population Draw randomly, without replacement, from the box Calculate EV for percent and SE for percent

Use normal curve to calculate probability

(7)

Chance Variability

Say that the population is divided evenly among class rank.

Will my sample reflect this?

Why or why not?

If I take one sample, put every ticket back, and draw another sample, will they match?

Is this a problem?

(8)

The Expected Value

Recall, that when drawing with replacement,

EV for sum = # of draws × average of box.

But our box is made of only “0”s and “1”s, so

EV for # of “1”s = number of draws × proportion of “1”s.

(9)

The Expected Value (cont.)

A percentage = # of something

total # × 100, so EV for percent of “1”s = EV for # of “1”s

# of draws × 100

= number of draws × proportion of “1”s

# of draws × 100

= proportion of “1”s × 100

= percentage of “1”s

This is when we draw with replacement.

(10)

The Expected Value (cont.)

When drawing without replacement

EV for percent of “1”s = percentage of “1”s.

Intuitively, it makes sense that we expect to see a representative

number of “1”s, since this is a good sampling method

(11)

The Standard Error

Recall, that when drawing with replacement, SE for sum = p

# of draws × SD of box.

But our box is made of only “0”s and “1”s, so SE for

# of “1”s = p

# of draws ×(1−0)

s  proportion of “1”s

  proportion of “0”s



.

(12)

The Standard Error (cont.)

A percentage = # of something

total # × 100, so SE for percent of “1”s = SE for # of “1”s

# of draws × 100

=

number of draws

# of draws ×

s  proportion of “1”s

  proportion of “0”s



× 100

= r 

proportion of “1”s

 

proportion of “0”s



number of draws × 100

This is when we draw with replacement.

(13)

The Standard Error (cont.)

SE for percent of “1”s = r 

proportion of “1”s

 

proportion of “0”s



number of draws × 100 We can see that:

Increasing the number of trials:

increases the SE for the sum by a factor of the square root

decreases the SE for % by a factor of the square-root

As with our previous SE’s, this tells us about how far off a

draw will be from the EV

(14)

The Standard Error (cont.)

When drawing without replacement SE for percent of “1”s

without replacement = correction

factor × SE for percent of “1”s with replacement . Where

correction factor =

s

population size − sample size population size − 1

Intuitively, it makes sense that we must relate the sample size and

the population size

(15)

The Correction Factor

correction factor =

s

population size − sample size population size − 1

What happens if our sample is small compared to the population?

What if it is large?

(16)

The Correction Factor (cont.)

0 1000 2000 3000 4000

0.00.20.40.60.81.0

Sample Size

Corection Factor

Pop Size = 100,000 Pop Size = 10,000 Pop Size = 4000 Pop Size = 3000 Pop Size = 2000 Pop Size = 1,000

(17)

Comparison

Sum of draws from a box, with replacement

EV for sum

= # of draws × avg. of box SE for sum

= p

# of draws× SD of box

Percent of “1”s from a 0-1 box, with replacement

EV for %

= percent of “1”s in the box SE for %

= r 

proportion of “1”s

 

proportion of “0”s



number of draws × 100

(18)

Comparison (cont.)

Percent of “1”s from a 0-1 box, with replacement

EV for %

= percent of “1”s in the box

SE for %

= r 

proportion of “1”s

 

proportion of “0”s



number of draws × 100

Percent of “1”s from a 0-1 box, without replacement

EV for %

= percent of “1”s in the box

SE for % = correction factor

× r 

proportion of “1”s

 

proportion of “0”s



number of draws × 100

(19)

Examples

Say we want to find the chance of having a roulette ball land on a red space 20 or more times in 35 plays.

Draw a box, indicate number and kind of tickets, number of draws, kind of draws

Calculate EV for sum

Calculate SE for sum

Use normal curve

(20)

Examples (cont.)

Say we want to find the chance of having a roulette ball land on a red space 57% or more of the time, in 35 plays.

Draw a box, indicate number and kind of tickets, number of draws, kind of draws

Calculate EV for percent

Calculate SE for percent

Use normal curve

(21)

Examples (cont.)

Assume that there are 30,000 students at UC Berkeley, and that 65% of them have some student loan debt. We want to find the chance of having 67% or more of those sampled have student loan debt, in a sample of size 1,000.

Draw a box, indicate number and kind of tickets, number of draws, kind of draws

Calculate EV for percent

Calculate SE for percent

Use normal curve

(22)

Examples (cont.)

I’m interested in looking at student loan debt at other colleges, besides UC Berkeley. So, I expand my survey to include UCLA and Pomona College (a small, liberal arts university). In both places, I will take a sample of 2% of the students, in order to estimate the percentage of students with loan debt. Other things being equal:

1

The accuracy to be expected at UCLA is about the same as the accuracy to be expected at Pomona.

2

The accuracy to be expected at UCLA is quite a bit higher than at Pomona.

3

The accuracy to be expected at UCLA is quite a bit lower

than at Pomona.

(23)

Important Takeaways

Box models can be used for real world problems, not just gambling

Box = population, Draws from box without replacement = sample (SRS)

EV, and SE for percent of “1”s drawn from a 0-1 box Correction factor for drawing without replacement Probability histogram still normally distributed

We can use the correction factor, combined with the normal curve, to find probabilities under the normal curve

(approximate probabilities from the probability histogram)

References

Related documents

The purpose of this work is the develop of the technique and of algorithms of the control and of the forecasting of the karst risk of the territory in urban

Survey responses revealed the counties that had detailed project-level information such as maintenance activities carried out on county roads, cost of maintenance

More significantly, we also find that these optimal proportions often differ substantially from those implied by deterministic lifestyling (which ignores both the plan mem-

In our rat model of prenatal stress, we have demonstrated that increased anxiety-like behavior in the adult male offspring is associated with increased Crhr1 and decreased Crhr2

His long list of biennial appearances has brought an international reputation across Asia, and locally he has been active curating shows, which have included “Faces of the

2005 „Credit Risk Derivatives”, presentation at the Risk Management Forum Conference organized by the Romanian Commodities Exchange, Bucharest, Romania and at the Monetary and

Inequality aversion with low guilt diminishes the utility each bargainer derives from his more equal bargaining outcome, even for the party that materially gains, in comparison to

Os indivíduos foram abatidos com corte próximo ao solo, e separadas as frações folhas, galhos, casca e tronco para a avaliação da concentração de nutrientes