MATH 10: Elementary Statistics and Probability Chapter 7: The Central Limit Theorem

(1)

Chapter 7: The Central Limit Theorem

Tony Pourmohamad

Department of Mathematics De Anza College

(2)

Objectives

By the end of this set of slides, you should be able to:

1 Understand what the central limit theorem is

2 Recognize the central limit theorem problems

(3)

The Central Limit Theorem

• The Central Limit Theorem (CLT) is one of the most powerful and

useful ideas in all of statistics

• For this class, we will consider two application of the CLT:

1 CLT for means (or averages) of random variables 2 CLT for sums of random variables

• Let’s start with an example, courtesy of Professor Mo Geraghty

http://nebula2.deanza.edu:16080/˜mo/holistic/clt.swf

• Try exploring the following website to better understand the CLT

http://spark.rstudio.com/minebocek/CLT_mean/

(4)

The Central Limit Theorem

• So what is happening in the CLT video?

10 Samples Frequency 2.5 3.0 3.5 4.0 4.5 5.0 0 1 2 3 4 100 Samples Frequency 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 0 10 20 30 1,000 Samples 250 10,000 Samples 1000 1500

(5)

The Central Limit Theorem -- Basic Idea

• Imagine there is some population with a mean

µ

and standard

deviation

σ

• We can collect samples of size n where the value of n is "large

enough"

• We can then calculate the mean of each sample

• If we create a histogram of those means, then the resulting

histogram will look close to being a normal distribution

• It does not matter what the distribution of the original population is,

or whether you even know it. The important fact is the the distribution of the sample means tend to follow the normal distribution

(6)

The Central Limit Theorem -- More Formally

• Suppose that we have a large population with with mean

µ

and

standard deviation

σ

• Suppose that we select random samples of size n items this

population

• Each sample taken from the population has its own averageX .

¯

• The sample average for any specific sample may not equal the

(7)

The Central Limit Theorem -- More Formally Continued

• The sample averagesX follow a probability distribution of their own

¯

• The average of the sample averages is the population average:

µ¯

x

= µ

• The standard deviation of the sample averages equals the

population standard deviation divided by the square root of the sample size

σ

¯x

=

σ

√

n

• The shape of the distribution of the sample averagesX is normally

¯

distributed if the sample size is large enough

• The larger the sample size, the closer the shape of the distribution

of sample averages becomes to the normal distribution

• This is the Central Limit Theorem!

(8)

The Central Limit Theorem -- Case 1

• IF a random sample of any size n is taken from a population with a

normal distribution with mean and standard deviation

σ

• THEN distribution of the sample mean has a normal distribution

with:

µ¯

x

= µ

and

σ

¯x

=

σ

√

n and

¯

X

∼

N

(µ¯

x

, σ¯

x

)

(9)

The Central Limit Theorem -- Case 1

X ~ N(10, 2) µ X ~ N(10, 2 50 ) µ 9 / 20

(10)

The Central Limit Theorem -- Case 2

• IF a random sample of sufficiently large size n is taken from a

population with ANY distribution with mean

µ

and standard

deviation

• THEN the distribution of the sample mean has approximately a

normal distribution with:

µ

¯x

= µ

and

σ¯

x

=

σ

√

n and

¯

X

∼

N

(µ

¯x

, σ

¯x

)

(11)

The Central Limit Theorem -- Case 2

X ~ N(10, 2) µ X ~ N(µ, σ n ) µ 11 / 20

(12)

The Central Limit Theorem -- Recap

• 3 important results for the distribution ofX

¯

1 The mean stays the same

µ

¯x

= µ

2 The standard deviation gets smaller

σ¯

x

=

σ

√

n

3 If n is sufficiently large,

¯

X has a normal distribution where

¯

(13)

What is Large n?

• How large does the sample size n need to be in order to use the

Central Limit Theorem?

• The value of n needed to be a "large enough" sample size

depends on the shape of the original distribution of the individuals in the population

• If the individuals in the original population follow a normal

distribution, then the sample averages will have a normal distribution, no matter how small or large the sample size is

• If the individuals in the original population do not follow a normal

distribution, then the sample averages

¯

X become more normally

distributed as the sample size grows larger. In this case the sample

averagesX do not follow the same distribution as the original

¯

population

(14)

What is Large n? Continued

• The more skewed the original distribution of individual values, the

larger the sample size needed

• If the original distribution is symmetric, the sample size needed can

be smaller

• Many statistics textbooks use the rule of thumb n

≥

30, considering

30 as the minimum sample size to use the Central Limit Theorem. But in reality there is not a universal minimum sample size that works for all distributions; the sample size needed depends on the shape of the original distribution

• In this class, we will assume the sample size is large enough for the

(15)

Calculating Probabilities from a Normal Distribution

• Here is the general procedure to calculate probabilities from the

distribution of the sample meanX

¯

1 You are given an interval in terms of

¯

x, i.e.

P

(¯

X

< ¯

x

)

2 Convert to a z score by using

z

=

¯

x

− µ

σ/

√

n

3 Look up probability in z table that corresponds to z score, i.e.

P

(

Z

<

z

)

• This is just the same idea we used in Chapter 6!

(16)

Examples

(17)

Percentile Calculations Based on the Normal Distribution

• Here is the general procedure to calculate the value

¯

x that

corresponds to the Pthpercentile

1 You are given a probability or percentile desired

2 Look up the z score in table that corresponds to the probability

3 Convert to

¯

x by the following formula:

¯

x

= µ +

z

σ

√

n

• Examples: Look at Handout #5 on the website

(18)

Using Your Calculator

• If you have a graphing calculator, your calculator can calculate

all of these probabilities without using a z table

• If you want to calculate P

(

a

< ¯

X

<

b

)

follow these steps:

1 Push 2nd, then DISTR

2 Select normalcdf() and then push ENTER

3 Then enter the following: normalcdf(a,b, µ, σ/√n)

• Question: IfX

¯

∼

N

(

0

,

1

)

, what is the probability P

(−

1

< ¯

X

<

1

)

?

• Solution: normalcdf(

−

1

,

1

,

0

,

1

) =

0

.

6827

≈

68

%

• Question: IfX

¯

∼

N

(

10

,

2

)

(

7

< ¯

X

<

9

)

?

(19)

Using Your Calculator

(¯

X

<

a

)

follow these steps:

3 Then enter the following: normalcdf(−1099_,_a_{, µ, σ/}√_n)

• Question: IfX

¯

∼

N

(

10

,

2

)

(¯

X

<

8

)

?

−

1099

,

8

,

10

,

2

) =

0

.

158656

(¯

X

>

a

)

follow these steps:

3 Then enter the following: normalcdf(a,1099, µ, σ/√n) • Question: IfX

¯

∼

N

(

10

,

2

)

(¯

X

>

9

)

?

• Solution: normalcdf(9

,

1099

,

10

,

2

) =

0

.

691462

(20)

Using Your Calculator

• If you want to calculate the value ofX that gives you the P

¯

th

percentile then follow these steps:

2 Select invNorm() and then push ENTER

3 Then enter the following: invNorm(percentile,µ, σ)

• Question: IfX

¯

∼

N

(

10

,

2

)

, what value ofX gives us the 25

¯

th percentile?

.

25

,

10

,

2

) =

8

.

65102

• Recall: We used the formula

¯

x

= µ +

z

σ/

√

n, so

¯

x

=

10

+ (−

0

.

67

)(

2

) =

8

.

66