Statistics 11 Lecture 18 Sampling Distributions (Chapter 6-2 , 6-3) 1. Definitions again

(1)

1. Definitions again

Review the definitions of POPULATION, SAMPLE, PARAMETER and STATISTIC.

STATISTICAL INFERENCE: a situation where the population parameters are unknown, and we draw conclusions from sample outcomes (those are statistics) to make statements about the value of the population parameters. (p. 194 text refers to measuring sample reliability/trustworthiness)

When random samples are drawn from a population of interest to represent the whole population, they are generally unbiased and representative.

The key to understanding why samples behave this way is a difficult concept: THE SAMPLING

DISTRIBUTION. The sampling distribution is a theoretical/conceptual/ideal probability distribution of a statistic.

A theoretical probability distribution is what the outcomes (i.e. statistics) of some random process (e.g. drawing a sample from population) would look like if you could repeat the random process over and over again and had information (that is the statistics) from every possible sample.

Note that a sampling distribution is the theoretical probability distribution of a statistic. The sampling distribution shows how a statistic varies from sample to sample and the pattern of possible values a statistic takes. We do not actually see sampling distributions in real life, they are simulated.

2. Sampling Distributions for Means

Generally, the objective in sampling is to estimate a population mean µ from sample information Let’s suppose that the 178,455 or so people in this example are a population.

And here is the mean µ_X (mu) and standard deviation σ_X (sigma) of our population:

HINC

--- Percentiles Smallest

1% 2500 20 5% 10000 90

10% 16970 400 Obs 178455 25% 36090 450 Sum of Wgt. 178455

50% 63000 Mean 78163.53 Largest Std. Dev. 63610.55 75% 103000 409000

90% 153200 428000 Variance 4.05e+09 95% 190100 437740 Skewness 2.105389 99% 330000 634000 Kurtosis 10.17414

Suppose we draw a simple random sample of size n from a large population. A simple random sample is a sample where (1) each member of the population had the same chance of being selected (unbiased) (2) the selection of one member has no effect on the probability of another member being selected (independent). Since the sample observations come from the same population, we say that the observations are independent, identically

distributed (i.i.d.) For the samples in this class, you should assume this condition.

Let us call the observed values from the sample X1, X2, ..., Xn. An example: draw a simple random sample (SRS) of 25 from the 178,455 households with measured household income.

Measure the average from the sample of size 25 and compare it to the population average.

Variable | Obs Mean Std. Dev. Min Max ---+--- hinc | 25 82523.6 72275.65 1220 385000

02.0e-064.0e-066.0e-068.0e-061.0e-05Density

0 100000 200000 300000 400000 500000 600000 HINC

(2)

A statistic: The mean of the sample of 25, $82,523.60 is just the plain old mean (from Chapter 2 page 34), here are the household incomes of the 25 people who were sampled:

hinc 1. 124900 2. 33000 3. 130000 4. 385000 5. 62040 6. 47300 7. 51000 8. 115000 9. 56030 10. 102000 11. 115000 12. 60000 13. 42200 14. 37640 15. 56500 16. 33700 17. 26500 18. 42000 19. 104500 20. 104390 21. 51700 22. 81220 23. 1220 24. 114250 25. 86000

A. The Expected Value of the Sample Mean

We certainly would have liked to have done better, that is… a sample mean of $82,523.60 is not the same as the population mean of 78,163.53. Is the sample mean a good estimator of the true population mean µ?

Theory says “let’s think of a sample from a population as being a set of random variables” in other words, while we might know “what might be possible” with respect to household incomes, we don’t know what the sample will look like until it’s actually drawn. The sample mean (from page 196 of your text) is defined as a combination of random variables:

[ ^X ^X ^X _n ]

X ≡ n + + ... + 2

1

the sample mean, being a linear combination of random variables is itself a

random variable. So now we ask the question: what is the expected value and the variance (or standard deviation) of the sample mean, a random variable.

[ ⁽ ₁ ⁾ ⁽ ₂ ⁾ ^... ⁽ ⁾ ]

) 1

( E X E X E X n

X n

E ≡ + + +

but a random variable X will have a distribution p(x) with

mean µ. So E(X1)=E(X2)=…= µ and then

≡ [ µ + µ + + µ ] = 1 [ µ ] = µ 1 ...

)

( n

n X n

E

the interpretation,

that on average, the sample mean X will be expected to be or should be equal to µ

RULE 1: The mean of all possible sample means (all possible

x

of the same size sampled from the same population ) is denoted

X

which in theory should be equal to

µ

(the true population mean).

We define the mean of a single sample as

n x x

x x + + + ⁿ

= ₁ ₂ ...

this is from chapter 2, and we define the Standard deviation of a single sample as

1 ) (

1

2

−

=

∑

=

n X X Sx

n

i i

also from chapter 2. This X can be thought of as the mean of a single sample of size 25 selected at random from all possible samples of size 25 that could have been generated from the population.

(3)

In other words, the mean of sample means calculated from all possible samples of the same size from the same population should be equal to the true population mean. We can check this using a simulation. If I were to draw 10,000 samples of size 25 (with replacement) from our population of 178,455 (with mean income of $78,163.53) the mean of all 10,000 sample means will be equal to, in theory, our true population mean.

r(mean)

--- Percentiles Smallest

1% 52905 41012.4 5% 59254.4 41048

10% 62887.2 41397.6 Obs 10000 25% 69445.4 41722.8 Sum of Wgt. 10000

50% 77510.6 Mean 78209.64 Largest Std. Dev. 12437.97 75% 86195.2 132154.8

90% 94498.4 133668 Variance 1.55e+08 95% 99679 134684 Skewness .3766731 99% 110537.2 137485.6 Kurtosis 3.281289

This is the overall average of 10,000 sample means from samples of size 25 drawn with replacement from our original population of 178,455. We got $78,209.64 as the mean of the 10,000 sample means of all of our samples of size 25, this is very close to the true population mean of $78,163.53 (we are off by .059%) Here’s graph of the 10,000 sample means from our 10,000 samples of size 25:

01.0e-052.0e-053.0e-05Density

40000 60000 80000 100000 120000 140000

r(mean)

Does it look familiar?

The mean of all sample means

X

is considered an unbiased estimator of µ_X(the true population mean) when it comes from a random sample. If your samples are not random, this relationship will not hold. For our first sample of 25 households, the mean of the sample is $82,523.60 but the mean of all 10,000 of the sample means is

$78,209.64 and it’s not too different from the true population mean of $78,163.53.

(4)

B. The (Variance and) Standard Deviation of the Sample Mean

Recall that when we talk about means, we need to talk about standard deviations because they give us a sense of the typical distances between values. For the sample distribution, we need to recognize that a different sample would give us a different result, the question becomes “how different?” The answer is found in

calculating the variance of the sampling distribution. Recall that variances add nicely if the random variables are independent:

[ ]

2

^[

² ² ²

^]

2

1 ...

) ( )

( ...

2 ) ( 1 )

1 ( )

( ≡ + + + = = σ + σ + + σ

X n n Var

X Var X

Var X

n Var X

Var

^this

reduces down to

n n X n

Var

2 2

2

[ ]

) 1

( = σ = σ

(page 197 of your text)

RULE 2. The theoretical standard deviation of all possible

x

's from all possible samples of size n is called the STANDARD ERROR or SE (to distinguish it from the standard deviation) and it is:

SE = σ

_X

= σ n

this is paired with the mean of all sample means above

where σ is the standard deviation of the population. In our population data, σ is 63610.55 so the theoretical standard deviation for a distribution of all possible sample means from samples of size 25 should be

11 . 2722 25 1

55 . 63610 =

=

= N

X

σ σ

We can check whether this holds true or not by examining the results of a simulation from the output above, the standard deviation for our 10,000 sample means (from our samples of size 25) is 12437.97, again, very close to what we get from the theory (12722.11 we are off by about 2%).

This rule is approximately correct as long as your sample is no larger than 5% of your population. So please make a note of this:

o A sample has a mean

x

and it has a standard deviation s and variance s². o A population has a mean µand a standard deviation σand variance σ²

o A sampling distribution or a distribution of all possible sample statistics, in this case the sample mean, also has a mean denoted

µ X

and in theory it’s equal to µbut with a standard deviation (called STANDARD ERROR) of

X

n

σ = σ

^.

Your sample (or any real-life sample) is just one single realization of all possible samples from a population of samples.

The standard error

X

n

σ = σ

of all the SAMPLE MEANS will be smaller than the standard deviation for a single sample and also smaller than the standard deviation for the population. In other words, it is easier to predict the mean of many observations than it is to predict the value of a single observation (or to predict the average of

(5)

small samples). What is causing this? Examine the formula for the standard error of the sampling distribution, note the effect of sample size on the standard error of all sample means. The bigger the sample size gets, the smaller

X

n

σ = σ

^becomes.

3. Normal Distributions and The Central Limit Theorem

Given a simple random sample of size n from a population having mean µ and standard deviation σ, the sample mean

x

will come from a sampling distribution of all possible sample means with mean

µ

_X and standard deviation (called the standard error to make a distinction)

X

n σ = σ

A. Basic Distributional Result

If the original population had a normal distribution, then the distribution of the sample mean will also be normally distributed. This is good, because it means we can use the normal table to make inferences about a particular sample with a statement of probability or chance.

Example. IQ scores are normally distributed with a mean of 100 and a standard deviation of 15. A sample of 25 persons is drawn. How likely is it to get a sample average of 108 or more? (Using Z scores 0.4% or .004 from Table IV) How likely is it for the very first score to be 108 or more? (29.8% or .298 from Table IV)

B. The Central Limit Theorem (p. 201)

No matter what the distribution of the original population (recall our original one is highly right skewed), if the sample size is "large", the distribution of the possible sample means will be close to the normal distribution (often 10 to 20 is large enough). It is a very powerful theorem and it is the reason why the normal distribution is so well studied, we are interested in estimating means and the CLT helps us to understand what to expect.

C. Normal Approximation Rule (p. 201)

In random samples of size n, the sample mean will fluctuate around the population mean with a standard error of n

σ . Therefore, as n increases in size, the sampling distribution of the sample means concentrates more and more around the population mean (this is why bigger samples are better, they increase accuracy). The sampling distribution will become more and more normal.

Let's go back to our first sample of 25 with its mean of 82523.60. The chance of getting a mean that high or

higher is: (1) calculate

. 34

25 55 . 63610

53 . 78163 60

.

82523 − = +

= Z

. .

Z about .34, then (2) do a look-up from standard normal table and you get .367 in the area beyond Z. So the chance (probability) of drawing a sample of size 25 with an average of 82523.60 or higher when you were expecting the average to be 78,163.53 was about 36.7% Your interpretation is that about 37% of time you would get a sample average as high as the one you got. This suggests that it’s not too unusual to be this far from the true average even though you have done everything correctly (e.g. random sample).

NOTE: The Central Limit Theorem only applies to the distribution of possible sample averages (i.e. the sampling distribution) it says nothing about the distribution of individual scores in either the sample or in the population.

For example (next page), here is a graph of our household income variable for the population followed by a graph of our initial sample of size 25 (from the beginning of this lecture) Note that neither is normal, but the sampling distribution of all possible samples of size 25 is normal.

(6)

The Original Population of all households (178,455) with mean 78,163.53

02.0e-064.0e-066.0e-068.0e-061.0e-05Density

0 100000 200000 300000 400000 500000 600000

HINC

Our one sample of size 25 with these statistics

Variable | Obs Mean Std. Dev. Min Max ---+--- hinc | 25 82523.6 72275.65 1220 385000

05.0e-061.0e-051.5e-05Density

0 100000 200000 300000 400000

hinc

Statistics 11 Lecture 18 Sampling Distributions (Chapter 6-2 , 6-3) 1. Definitions again

[ X X X n ]

X ≡ n + + ... + 2

1

1

[ ( 1 ) ( 2 ) ... ( ) ]

) 1

( E X E X E X n

X n

E ≡ + + +

≡ [ µ + µ + + µ ] = 1 [ µ ] = µ 1 ...

)

( n

n X n

E

x

X

µ

We define the mean of a single sample as

∑

X

[ ]

[

]

1 ...

) ( )

( ...

2 ) ( 1 )

1 ( )

( ≡ + + + = = σ + σ + + σ

X n n Var

X Var X

Var X

n Var X

Var

n n X n

Var

[ ]

) 1

( = σ = σ

x

SE = σ

= σ n

11 . 2722 25 1

55 . 63610 =

=

= N

σ σ

x

µ X

n

σ = σ

n

σ = σ

n

σ = σ

x

µ

n σ = σ

. 34

25 55 . 63610

53 . 78163 60

.

82523 − = +

= Z

Our one sample of size 25 with these statistics

[ ^X ^X ^X _n ]

[ ⁽ ₁ ⁾ ⁽ ₂ ⁾ ^... ⁽ ⁾ ]

^[

^]