Chapter 21
More about tests and intervals
What I will know and be able to do
Apply confidence intervals, critical values, alpha levels
and power to complete a hypothesis test for a population
proportion.
Assignment:
Read Chapter 21
How to Think About P-Values
• A P-value is a conditional probability—the probability of the observed statistic given that the null hypothesis is true.
• The P-value is NOT the probability that the null hypothesis is true.
• It’s not even the conditional probability that null hypothesis is true given the data.
Example 1: P-value
A medical researcher has tested a new treatment for poison
ivy against the traditional ointment. With a P-value of 0.047,
he concludes the new treatment is more effective. Explain
what the P-value means in this context.
Alpha Levels
•
Sometimes we need to make a firm decision about
whether or not to reject the null hypothesis.
•
When the P-value is small, it tells us that our data are
rare given the null hypothesis.
Alpha Levels (cont.)
•
We can define “rare event” arbitrarily by setting
a threshold for our P-value.
•
If our P-value falls below that point, we’ll reject H
0.
We call such results
statistically significant
.
Alpha Levels (cont.)
•
Common alpha levels are 0.10, 0.05, and 0.01.
•
You have the option—almost the
obligation
—to consider
your alpha level carefully and choose an appropriate one
for the situation.
•
The alpha level is also called the
significance level
.
Example 2: Alpha
A researcher developing scanners to search for hidden
weapons at airports has concluded that a new device is
significantly better than the current scanner. He made
this decision based on a test using α=.05. Would he
Example 2: Alpha (cont.)
At α=.10, he would have made the same decision. We
know his P-value was less then .05, which also has to
be less than .10.
To reject H
0at α=.01, the P-value must be less than .01,
which is not necessarily the case. So he might not
Critical Values Again
•
When the alternative is
one-sided, the critical value
puts all of
on one side:
•
When the alternative is
two-sided, the critical value splits
Confidence Intervals and Hypothesis Tests
•
Confidence intervals and hypothesis tests are built
from the same calculations.
•
They have the same assumptions and conditions.
•
You can approximate a hypothesis test by examining a
confidence interval.
•
Just ask whether the null hypothesis value is consistent
with a confidence interval for the parameter at the
Example 3: Click It or Ticket
Teens are at greatest risk of being killed or injured in traffic crashes.
According to the National Highway Traffic Safety Administration, 65% of young people killed were not wearing a safety belt. Because many
deaths could easily be prevented by the use of safety belts, several states have begun “Click It or Ticket” campaigns. In 2005, a local
newspaper reported that a roadblock resulted in 23 tickets to drivers who were unbelted out of 134 stopped for inspection. Does this
Example 3 continued…
Hypothesis:
H
0: p = .80
H
A: p > .80
The null hypothesis is that 80% of the drivers will be
wearing their safety belts. The alternative hypothesis is
that more than 80% will be wearing their safety belts
Example 3 continued…
Model:
1. I will assume that the drivers are not likely to influence each other about wearing their seatbelt, making them mutually independent. 2. This isn’t a random sample, but I assume that these drivers are
representative of the driving public.
3. 10% condition: 134 is certainly less than all drivers.
4. Success Failure: np = 134(.8) = 111 ≥ 10 and nq = 134(.2) = 23 ≥ 10 therefore, the sample is large enough.
Example 3 cont…
Mechanics: We have to create a confidence level that corresponds to the alpha level of the test. So if α=.05, then we should create a 90% confidence interval because this is a one-sided test. That will leave 5% on each side of the observed proportion.
Example 3 continued…
I am 90% confident that between 77.4% and 88.2% of all
Making Errors
•
Here’s some shocking news for you: nobody’s perfect.
Even with lots of evidence we can still make the wrong
decision.
•
When we perform a hypothesis test, we can make
mistakes in two
ways:
I.
The null hypothesis is true, but we mistakenly reject it.
(Type I error)
II.
The null hypothesis is false, but we fail to reject it.
Analogy to medicine…
•
In medical disease testing, the null hypothesis is
usually the assumption that a person is healthy. The
alternative is that he or she has the disease we’re
testing for.
•
A Type I error is a false positive- a healthy person is
diagnosed with the disease.
Another analogy:
In a Statistics final exam (with H
0: the student has
learned only 60% of the material):
What is Type I error?
(hint: false positive)
What is a Type II error?
Making Errors (cont.)
•
Which type of error is more serious depends on the
situation at hand. In other words, the gravity of the error is
context dependent.
Example 4: Alzheimer’s
Testing for Alzheimer’s disease can be a long and expensive process, consisting of
lengthy tests and medical diagnosis. Recently a group of researchers (Solomon et al., 1998) devised a 7-minute test to serve as a quick screen for the disease for use in the general population of senior citizens. A patient who tested positive would then go through the more expensive battery of tests and medical diagnosis. The authors reported a false positive rate of 4% and a false negative rate of 8%.
a. Put this in the context of a hypothesis test. What are the null and alternative hypotheses?
b. What would a Type I error mean? c. What would a Type II error mean?
Example 4 continued…
a. The null hypothesis is that a person is healthy. The alternative is that they have Alzheimer’s disease.
b. A Type I error is deciding a person has Alzheimer’s when he or she doesn’t.
c. A Type II error is failing to diagnose Alzheimer’s disease when the person has it.
Making Errors (cont.)
•
How often will a Type I error occur?
• Since a Type I error is rejecting a true null hypothesis, the probability of a Type I error is our level.
•
When H
0is false and we reject it, we have done the right
thing.
Making Errors (cont.)
•
When H
0is false and we fail to reject it, we have
made a Type II error.
•
We assign the letter
to the probability of this mistake.
•
It’s harder to assess the value of
because we don’t
know what the value of the parameter really is.
Making Errors (cont.)
• One way to focus our attention on a particular is to think about the
effect size.
• Ask “How big a difference would matter?”
• We could reduce for all alternative parameter values by increasing
.
• This would reduce but increase the chance of a Type I error.
• This tension between Type I and Type II errors is inevitable.
Power
•
The
power
of a test is the probability that it correctly
rejects a false null hypothesis.
•
When the power is high, we can be confident that
we’ve looked hard enough at the situation.
Power (cont.)
•
Whenever a study fails to reject its null hypothesis, the
test’s power comes into question.
•
When we calculate power, we imagine that the null
hypothesis is false.
•
The value of the power depends on how far the truth lies
from the null hypothesis value.
• The distance between the null hypothesis value, p0, and the truth, p, is called the effect size.
A Picture Worth a Thousand Words
•
The larger the effect size, the easier it should be to see it.
•
Obtaining a larger sample size decreases the probability of a
Type II error, so it increases the power.
A Picture Worth a Thousand Words (cont.)
Reducing Both Type I and Type II Error
•
The previous figure seems to show that if we reduce
Type I error, we must automatically increase Type II
error.
•
But, we can reduce both types of error by making both
curves narrower.
•
How do we make the curves narrower?
Reducing Both Type I and Type II Error (cont.)
Reducing Both Type I and Type II Error (cont.)
Example 5: Equal opportunity?
A company is sued for job discrimination because only 19% of the newly hired candidates were minorities when 27% of all applicants were minorities. Is this strong evidence that the company’s hiring practices are discriminatory?
a. Is this a one-tailed or a two tailed test? Why? b. In this context, what would a Type I error be? c. In this context, what would a Type II error be?
d. In this context describe what is meant by the power of the test.
e. If the hypothesis is tested at the 5% level of significance instead of 1%, how will affect the power of the test?
Example 5 continued…
a. One-tailed. The company wouldn’t be sued if “too many”
minorities were hired.
b. Deciding the company is discriminating when it is not.
c. Deciding the company is not discriminating when it is.
d. The probability of correctly detecting discrimination when
it exists.
e. Increases the power.
Example 6: Hoops
A basketball player with a poor foul-shot record practices intensively during the off-season. He tells the coach that he has raised his
proficiency from 60% to 80%. Dubious, the coach asks him to take 10 shots, and is surprised when the player hits 9 out of 10. Did the player prove that he has improved?
a. Suppose the player really is no better than before- still a 60% shooter. What’s the probability he could hit at least 9 out of 10 shots? (Hint: Use a Binomial model)
b. If that is what happened, now the coach thinks the player has improved, when he has not. Which type of error is that?