Distribution of ¯ X
TheStandard Normal Distribution, or Z Distribution, is the distribution of a random variable, Z ∼ N(0, 12).
The distribution of any other normal random variable, X ∼ N(µ, σ2), can be converted to a Z = X −µσ .
Probabilities for these variables are areas under the curve, but since we don’t use calculus in the this course, we can use software or a Z table to find probabilities.
The random variable, Z , is continuous which means the probabilty at any exact point is always 0. Thus, we will find probabilities for ranges of values.
First, some general characteristics of the Z distribution.
The area under the entire curve is 1 since it represents all possible values.
Because it is symmetric, the mean = the median, so the area under the curve to the left of 0 is 0.5 (as is the area to the right).
We say, “The probability that Z is less than 0 is 0.5.”
This is written as P(Z < 0) = 0.5.
Again, since Z is continuous, P(Z ≤ 0) = P(Z < 0) = 0.5.
We will use the Z table found on the Stat30X webpage - http://www.stat.tamu.edu/stat30x/zttables.php
Notice that the only entry on both pages of the table is z = 0.00 and the probability is 0.5000.
The rows of the table are the z-scores with the columns indicating the 2nddecimal.
The body of the table contains the probabilitiesis to the left of any particular z-score = z.zz.
For example, the P(Z < 0.00) = 0.5000 and P(Z < 0.07) = 0.5279.
Examples of reading the table:
P(Z < 1.25) = 0.8944
P(Z < 0.50) = 0.6915
P(Z < −0.75) = 0.2266
P(Z < −2.01) = 0.0222
The Z -table only gives probabilities to theleftof a value. If we want to get probabilities to therightwe use the complement rule,
P(Z > z) = 1 − P(Z < z).
P(Z > 1.25) = 1 − 0.8944 = 0.1056
P(Z > 0.50) = 1 − 0.6915 = 0.3085
P(Z > −0.75) = 1 − 0.2266 = 0.7734
P(Z > −2.01) = 1 − 0.0222 = 0.9778
To find probabilities between two numbers, find the larger area (using the larger value) first and thensubtractthe smaller area. Remember, a probability can never be negative, so check your work!
P(−2.01 < Z < 2.01) = P(Z < 2.01) − P(Z < −2.01)
= 0.9778 − 0.0222 = 0.9556
Now suppose we have a non-standard normal, X ∼ N(µ, σ2), and we want to know the probability that X is less than some value. We must first convert the X to a Z and then use the probabilities from the Z -table. Recall that if X ∼ N(µ, σ2), then
Z =X − µ
σ ∼ N(0, 12) so
P(X < x ) = P(Z < x − µ σ )
Beware: P(X > x ) 6= 1 − P(X < x ) if X is not centered at 0. You must convert to a Z before using the complement rule.
Suppose X ∼ N(2, 32), Given a value x , find the corresponding z and then the probability.
Find P(X > 5) = P
X −µ s >5−23
=P(Z > 1) = 1 − P(Z < 1) = 1 − 0.8413 =0.1587.
Find P(−4 < X < 8) = P
−4−2
3 <X −µs <8−23
=P(−2 < Z < 2) = P(Z < 2) − P(Z < −2)
=0.9772 − 0.0228 =0.9544.
Find P(X < −4or X > 8) = P
X −µ
s < −4−23 or X −µs >8−23
=P(Z < −2or Z > 2) = P(Z < −2)+P(Z > 2)
=0.0228 + (1 − 0.9772) =0.0456.
Note: Since the two areas are the same size, you could have just doubled the lower tail.
Reverse use of Z -table:Finding probabilities given z-scores.
Find the z∗such that Pr (Z < z∗) =0.8485, where 0.8485 is some probability.Answer: z∗=1.03.
Find z∗such that P(Z < z∗) =0.2981Answerz∗= −0.53 Find z∗such that P(Z > z∗) =0.1056 = 1 − P(Z < z∗).
Answer: P(Z < z∗) =1 − P(Z > z∗) =1 − 0.1056 = 0.8944.
z∗=1.25.
Finding Centered Probabilities
What if P(−z∗<Z < z∗) =0.85, where 0.85 is a central area under the Z curve (if it’s not, we can’t do this). Since the total area under the curve is 1 and 1 − 0.85 = 0.15, there is 0.15 of the area outside −z∗ and z∗. And since the Z curve is centered at 0, half of this area is below
−z∗and the other half is above −z∗.
This means
P(−z∗<Z < z∗) = 0.85
= P(Z < z∗) −P(Z < −z∗)
= (1 − 0.075) − 0.075
We can now find z∗such that P(Z < −z∗) =0.075 Answer: z∗=1.44
If we call the central area 1 − α (we’ll discover why later), then the outside area is α and the area to look up α/2.
Standard Normal 5 Number Summary
We know from Chapter 1 that the IQR = Q3− Q1covers the middle 50% of a distribution. So what are zQ1and zQ3?
P(zQ1 <Z < zQ3) = 0.50
= P(Z < zQ3) −P(Z < zQ1)
= 0.75 − 0.25
or P(Z < zQ1) =0.25 and P(Z < zQ3) =0.75 . Answer: zQ1= −0.675 and zQ3=0.675
Adding these numbers to the Empirical Rule numbers, we have estimates for the middle 50, 68, 95 and 99.7%’s as easy references.
Non-standard Normal Example
Suppose the sample proportion of 100 students who think that there is insufficient parking is normally distributed with a mean of 0.8 and a standard deviation of 0.04. As long as we know the distribution is normal, and µ and σ, we can find any probability!
p ∼ N(µp=0.8, σp2=0.042)
How often would we get a sample proportion of 0.75 or less?
P(p ≤ 0.75) = P(p − µ
σ ≤0.75 − 0.8 0.04 )
= P(Z < −1.25) = 0.1056
So what good are these probabilities?
Recall from the Introduction, an important area of statistics is inference:
drawing a conclusion based on data and making decisions based on how likely something is to occur.
Since probabilities tell us how often things occur, we can use them to make our decisions.
But probabilities come from the whole population which would mean we needed acensus, a complete listing of all of the data.
We need to be able to make our decisions based on samples, or even one sample.
General Idea of Inferential Statistics
We take a sample from the whole population.
We summarize the sample using important statistics.
We use those summaries to make inference about the whole population.
We realize there may be some error involved in making inference.
Example: (1988, the Steering Committee of the Physicians’ Health Study Research Group)
Question: Can Aspirin reduce the risk of heart attack in humans?
Sample: Sample of 22,071 male physicians between the ages of 40 and 84, randomly assigned to one of two groups. One group took an ordinary aspirin tablet every other day (headache or not). The other group took a placebo every other day. This group is the control group.
Summary statistic: The rate of heart attacks in the group taking aspirin was only 55% of the rate of heart attacks in the placebo group.
Inference to population: Taking aspirin causes lower rate of heart attacks in humans.
Samples should not bebiased: no favoring of any individual in the population.
Examples of biased samples: select goldfish from a particular store, polling your neighbor rather than the whole city
The selection of an individual in the population should not affect the selection of the next individual:independence.
Example of non-independent sample: when taking a survey on the cost of a college education, we ask both the mother and the father of a student
Samples should be large enough to adequately cover the population.
Example of a small sample: suppose only 20 physicians were used in the aspirin study.
Samples should have the smallest variability possible.
We know that there are many different samples, so we want to make sure our statistics are consistent.
The larger sample we use, the less the different sample statistics will vary.
Although there are many types of samples, we will only discuss the simplest, asample random sample.
Every sample of a particular size, n, from the population has an equal chance of being selected.
A SRS produces an biased statistic.
Since statistics vary from sample to sample, there is a distribution of them called asampling distributionwhich is the distribution of all of the values taken by the statistic in all possible samples of the same size, n, from the same population.
We can then examine the shape, center, and spread of the sampling distribution.
We know that there are many statistic that we can calculate from a sample, but we’re going to start with the sample mean, ¯X .
Bias concerns thecenterof the sampling distribution. A statistic used to estimate aparameteris unbiased if the mean of the sampling
distribution is equal to the true value of the parameter being estimated.
This says that the mean of the sample mean is the same as the mean of the population sampled, µX¯ = µX.
To reduce bias, we use a random sample.
Variability is described by thespreadof the sampling distribution.
To reduce the variability of a statistic, use a larger sample; the larger the sample size, n, the smaller the variance of the statistic.
The reason this is true is because the variance of the sample mean gets smaller as the sample size increases, σX2¯ = σX2/n, or σX¯ = σX/√
n.
Summary
Population Distributionof a random variable
The distribution of all the members of the population.
Parameters help describe the distribution, for example, µ and σ.
Sampling Distributionof a sample statistic This is not the distribution of the sample!
The sampling distribution is the distribution of a statistic.
If we take many, many samples and calculate the statistic for each of those samples, the distribution of all those statistics is the sampling distribution.
We will start with the sampling distribution of the sample mean, X .
We already know that if we take random samples the sample mean is unbiased, µX¯= µX, so we know thecenter.
We can minimize the variance by using a large sample, n, σX¯= σX/√ n, so we know thespread.
Since the sample mean of a normal random variable is also normal, we know theshape.
So, if the X is normal, the distribution of the sample mean, orsampling distribution of the sample mean is
Xn∼ N µ,
σ
√n
2!
the subscript on ¯X indicates the sample size
There has been some concern that young children are spending too much time watching television.
A study in Columbia, South Carolina recorded the number of cartoon shows watched per child from 7:00 a.m. to 1:00 p.m. on a particular Saturday morning for 28 different children.
The results were as follows: 2, 2, 1, 3, 3, 5, 7, 5, 3, 8, 1, 4, 0, 4, 2, 0, 4, 2, 7, 3, 6, 1, 3, 5, 6, 4, 4, 4. (Adapted fromIntro. to Statistics, Milton, McTeer and Corbet, 1997)
Suppose thetrue average for all of South Carolina is 3.4 with a standard deviation of 2.1, and that the data is normal.
What is the population mean? µ = 3.4
What is the sample mean? ¯x = 99/28 = 3.535
What is the approximate sampling distribution (of the sample mean)?
X28∼ N 3.4, 2.1
√28
2!
=N(3.4, 0.42)
Again, what does this mean?
Suppose we take many, many samples (each sample of size 28), then we find the sample mean for each sample.
The sampling distribution of all those means (2.9, 3.4, 4.1, . . . ) is distributed N(3.4, 0.42).
What if the original data (parent population) isnotnormal?
TheCentral Limit Theoremstates that forany population with mean µ and standard deviation σ, the sampling distribution of the sample mean, Xn, is approximately normal when n is large.
Xn∼ N µ,
σ
√n
2!
The central limit theorem is a very powerful tool in statistics.
Remember, the central limit theorem works for any distribution.
Let us see how well it works for the years on pennies.
Penny Population Distribution (276)
Note from the previous slide, the distribution is highly left skewed.
The mean of the 276 pennies is 1992.9.
The standard deviation of the 276 pennies is 8.7.
Let us take 50 samples of size 10.
According to the Central Limit Theorem, the sampling distribution of the sample means should be normal with mean 1992.9 and standard deviation 8.7/√
10 = 2.75.
That is, the sampling distribution, the distribution of the ¯x ’s should be a normal distribution.
Suppose we took 50 samples from these pennies and plotted the sample means:
The distribution of the means of the 50 samples is
Notice ¯xX¯is close to 1992.9 = µ and sX¯is not far from 2.75 = σ.
The previous slide shows the distribution of the means of the 50 samples is slightly skewed but closer to the normal distribution.
So, n = 10 isn’t large enough and taking larger samples would produce a more normal distribution of sample means.
So what is large enough? Theory says at least n = 30, but sometimes more is needed.
So in general:
The mean of sample means is the mean of the data, µX¯ = µX. The standard deviation of the sample means is the standard deviation of the data divided by the square root of the sample size, σX¯ = σX. If the data is normal, then the distribution of the sample means is exactly normal.
But even if the distribution of the data isn’t known, we can say the distribution of the sample means is approximately normal as long as we take a large sample.
Example: Suppose past studies indicate it takes an average of15minutes with a standard deviation of5minutes to memorize a short passage of 100 words. A psychologist claims a new method of memorization will reduce the average time. A random sample of 40 people use the new method and the average time required to memorize the passage is found to be12.5minutes.
12.5 minutes is obviously less than 15, but is it small enough to say that the new method actually reduces the average time or is it just random chance that produced such a small sample mean? How likely is ¯x ≤12.5if µ =15?
First X ∼ N(15, (√5
40)2) =N(15, 0.792)
P(X <12.5) =P(Z <12.5−150.79 ) =P(Z < −3.16) = 0.0008
So, even though 12.5 isn’t much different than 15 minutes, an average this small should rarely if ever happen.