Chapter 21
How to Think About P-Values
A P-value is a conditional probability—the
probability of the observed statistic given that the null hypothesis is true.
The P-value is NOT the probability that the null
hypothesis is true.
It’s not even the conditional probability that null
hypothesis is true given the data.
Example 1: P-value
Example 1 continued…
If there is no difference in effectiveness, the chance of seeing an observed difference this large or
larger is 4.7% by natural sampling variation. This is very low, so most likely he has evidence that his ointment is more effective, and his null
Alpha Levels
Sometimes we need to make a firm decision
about whether or not to reject the null hypothesis.
When the P-value is small, it tells us that our data
are rare given the null hypothesis.
Alpha Levels (cont.)
We can define “rare event” arbitrarily by setting a
threshold for our P-value.
If our P-value falls below that point, we’ll reject
H0. We call such results statistically significant.
The threshold is called an alpha level, denoted
Alpha Levels (cont.)
Common alpha levels are 0.10, 0.05, and 0.01.
You have the option—almost the obligation—to
consider your alpha level carefully and choose an appropriate one for the situation.
The alpha level is also called the significance
level.
When we reject the null hypothesis, we say that
Example 2: Alpha
A researcher developing scanners to search for
hidden weapons at airports has concluded that a new device is significantly better than the current scanner. He made this decision based on a test using α=.05. Would he have made the same
Example 2: Alpha
At α=.10, he would have made the same decision. We know his P-value was less then .05, which also has to be less than .10.
To reject H0 at α=.01, the P-value must be less than .01, which is not necessarily the case. So he
Critical Values Again
When the alternative is
one-sided, the critical
value puts all of on one side:
When the alternative is
two-sided, the critical
Confidence Intervals and Hypothesis Tests
Confidence intervals and hypothesis tests are
built from the same calculations.
They have the same assumptions and
conditions.
You can approximate a hypothesis test by
examining a confidence interval.
Just ask whether the null hypothesis value is
Example 3: Click It or Ticket
Teens are at greatest risk of being killed or injured in traffic crashes. According to the National Highway Traffic Safety Administration, 65% of young people killed were not
wearing a safety belt. Because many deaths could easily be prevented by the use of safety belts, several states
have begun “Click It or Ticket” campaigns. In 2005, a local newspaper reported that a roadblock resulted in 23 tickets to drivers who were unbelted out of 134 stopped for inspection. Does this provide evidence that the goal of
Example 5 continued…
Hypothesis: H0: p = .80 HA: p > .80
The null hypothesis is that 80% of the drivers will be wearing their safety belts. The alternative
hypothesis is that more than 80% will be wearing their safety belts due to the “Click It or Ticket”
Example 5 continued…
Model:
1. I will assume that the drivers are not likely to influence each other about wearing their seatbelt, making them mutually independent.
2. This isn’t a random sample, but I assume that these
drivers are representative of the driving public.
3. 10% condition: 134 is certainly less than all drivers.
4. Success Failure: np = 134(.8) = 111 ≥ 10 and nq =
134(.2) = 23 ≥ 10 therefore, the sample is large enough. Since all of the conditions are met, the model is
Example 5 cont…
Mechanics: We have to create a confidence level that corresponds to the alpha level of the test. So if α=.05, then we should create a 90% confidence interval
because this is a one-sided test. That will leave 5% on each side of the observed proportion.
Example 5 continued…
Making Errors
Here’s some shocking news for you: nobody’s
perfect. Even with lots of evidence we can still make the wrong decision.
When we perform a hypothesis test, we can
make mistakes in two ways:
I. The null hypothesis is true, but we
mistakenly reject it. (Type I error)
II. The null hypothesis is false, but we fail to
Analogy to medicine…
In medical disease testing, the null hypothesis is
usually the assumption that a person is healthy. The alternative is that he or she has the disease we’re testing for.
A Type I error is a false positive- a healthy person
is diagnosed with the disease.
A Type II error is a false negative – an infected
Another analogy:
In a Statistics final exam (with H0: the student has learned only 60% of the material):
What is Type I error? (hint: false positive) What is a Type II error?
Making Errors (cont.)
Which type of error is more serious depends on the
situation at hand. In other words, the gravity of the error is context dependent.
Here’s an illustration of the four situations in a hypothesis
Example 4: Alzheimer’s
Testing for Alzheimer’s disease can be a long and expensive process, consisting of lengthy tests and medical diagnosis. Recently a group of researchers (Solomon et al., 1998) devised a 7-minute test to
serve as a quick screen for the disease for use in the general
population of senior citizens. A patient who tested positive would then go through the more expensive battery of tests and medical diagnosis. The authors reported a false positive rate of 4% and a false negative rate of 8%.
a. Put this in the context of a hypothesis test. What are the null and
alternative hypotheses?
b. What would a Type I error mean? c. What would a Type II error mean?
Example 4 continued…
a. The null hypothesis is that a person is healthy. The
alternative is that they have Alzheimer’s disease.
b. A Type I error is deciding a person has Alzheimer’s
when he or she doesn’t.
c. A Type II error is failing to diagnose Alzheimer’s disease
when the person has it.
d. A type I error would require more testing, resulting in
Making Errors (cont.)
How often will a Type I error occur?
Since a Type I error is rejecting a true null
hypothesis, the probability of a Type I error is our level.
When H
0 is false and we reject it, we have done the right thing.
A test’s ability to detect a false hypothesis is
Making Errors (cont.)
When H
0 is false and we fail to reject it, we have made a Type II error.
We assign the letter to the probability of this
mistake.
It’s harder to assess the value of because we
don’t know what the value of the parameter really is.
There is no single value for --we can think of
Making Errors (cont.)
One way to focus our attention on a particular is to think
about the effect size.
Ask “How big a difference would matter?”
We could reduce for all alternative parameter values by
increasing .
This would reduce but increase the chance of a Type
I error.
This tension between Type I and Type II errors is
inevitable.
The only way to reduce both types of errors is to collect
Power
The power of a test is the probability that it
correctly rejects a false null hypothesis.
When the power is high, we can be confident that
we’ve looked hard enough at the situation.
Power (cont.)
Whenever a study fails to reject its null
hypothesis, the test’s power comes into question.
When we calculate power, we imagine that the
null hypothesis is false.
The value of the power depends on how far the
truth lies from the null hypothesis value.
The distance between the null hypothesis
value, p0, and the truth, p, is called the effect size.
A Picture Worth a Thousand Words
The larger the effect size, the easier it should be
to see it.
Obtaining a larger sample size decreases the
probability of a Type II error, so it increases the power.
It also makes sense that the more we’re willing to
A Picture Worth a Thousand Words (cont.)
This diagram shows the relationship between
Reducing Both Type I and Type II Error
The previous figure seems to show that if we
reduce Type I error, we must automatically increase Type II error.
But, we can reduce both types of error by making
both curves narrower.
How do we make the curves narrower? Increase
Reducing Both Type I and Type II Error
(cont.)
This figure has means that are just as far apart as in the
previous figure, but the sample sizes are larger, the
Reducing Both Type I and Type II Error
(cont.)
Original comparison of
errors:
Comparison of errors with
Example 5: Equal opportunity?
A company is sued for job discrimination because only19% of the newly hired candidates were minorities when 27% of all applicants were minorities. Is this strong evidence that the company’s hiring
practices are discriminatory?
a. Is this a one-tailed or a two tailed test? Why? b. In this context, what would a Type I error be? c. In this context, what would a Type II error be?
d. In this context describe what is meant by the power of the test.
e. If the hypothesis is tested at the 5% level of significance instead of
1%, how will affect the power of the test?
f. The lawsuit is based on the hiring of 37 employees. Is the power of
Example 5 continued…
a. One-tailed. The company wouldn’t be sued if “too many” minorities were hired.
b. Deciding the company is discriminating when it is not.
c. Deciding the company is not discriminating when it is.
d. The probability of correctly detecting discrimination when it exists.
Example 6: Hoops
A basketball player with a poor foul-shot record practices
intensively during the off-season. He tells the coach that he has raised his proficiency from 60% to 80%.
Dubious, the coach asks him to take 10 shots, and is surprised when the player hits 9 out of 10. Did the player prove that he has improved?
a. Suppose the player really is no better than before- still a 60% shooter. What’s the probability he could hit at least 9 out of 10 shots? (Hint: Use a Binomial model)
b. If that is what happened, now the coach thinks the
Example 6 continued…
a. .04641-binomcdf(10, .6, 8)
b. Type I
c. 37.6% 1-binomcdf(10, .8, 8)
What have we learned?
There’s a lot more to hypothesis testing than a
simple yes/no decision.
And, we’ve learned about the two kinds of errors