swm02_21.ppt

(1)

(2)

Chapter 21

(3)

How to Think About P-Values

 A P-value is a conditional probability—the

probability of the observed statistic given that the null hypothesis is true.

 The P-value is NOT the probability that the null

hypothesis is true.

 It’s not even the conditional probability that null

hypothesis is true given the data.

(4)

Example 1: P-value

(5)

Example 1 continued…

If there is no difference in effectiveness, the chance of seeing an observed difference this large or

larger is 4.7% by natural sampling variation. This is very low, so most likely he has evidence that his ointment is more effective, and his null

(6)

Alpha Levels

 Sometimes we need to make a firm decision

about whether or not to reject the null hypothesis.

 When the P-value is small, it tells us that our data

are rare given the null hypothesis.

(7)

Alpha Levels (cont.)

 We can define “rare event” arbitrarily by setting a

threshold for our P-value.

 If our P-value falls below that point, we’ll reject

H₀. We call such results statistically significant.

 The threshold is called an alpha level, denoted

(8)

Alpha Levels (cont.)

 Common alpha levels are 0.10, 0.05, and 0.01.

 You have the option—almost the obligation—to

consider your alpha level carefully and choose an appropriate one for the situation.

 The alpha level is also called the significance

level.

 When we reject the null hypothesis, we say that

(9)

Example 2: Alpha

A researcher developing scanners to search for

hidden weapons at airports has concluded that a new device is significantly better than the current scanner. He made this decision based on a test using α=.05. Would he have made the same

(10)

Example 2: Alpha

At α=.10, he would have made the same decision. We know his P-value was less then .05, which also has to be less than .10.

To reject H₀ at α=.01, the P-value must be less than .01, which is not necessarily the case. So he

(11)

Critical Values Again

 When the alternative is

one-sided, the critical

value puts all of  on one side:

 When the alternative is

two-sided, the critical

(12)

Confidence Intervals and Hypothesis Tests

 Confidence intervals and hypothesis tests are

built from the same calculations.

 They have the same assumptions and

conditions.

 You can approximate a hypothesis test by

examining a confidence interval.

 Just ask whether the null hypothesis value is

(13)

Example 3: Click It or Ticket

Teens are at greatest risk of being killed or injured in traffic crashes. According to the National Highway Traffic Safety Administration, 65% of young people killed were not

wearing a safety belt. Because many deaths could easily be prevented by the use of safety belts, several states

have begun “Click It or Ticket” campaigns. In 2005, a local newspaper reported that a roadblock resulted in 23 tickets to drivers who were unbelted out of 134 stopped for inspection. Does this provide evidence that the goal of

(14)

Example 5 continued…

Hypothesis: H₀: p = .80 H_A: p > .80

The null hypothesis is that 80% of the drivers will be wearing their safety belts. The alternative

hypothesis is that more than 80% will be wearing their safety belts due to the “Click It or Ticket”

(15)

Example 5 continued…

Model:

1. I will assume that the drivers are not likely to influence each other about wearing their seatbelt, making them mutually independent.

2. This isn’t a random sample, but I assume that these

drivers are representative of the driving public.

3. 10% condition: 134 is certainly less than all drivers.

4. Success Failure: np = 134(.8) = 111 ≥ 10 and nq =

134(.2) = 23 ≥ 10 therefore, the sample is large enough. Since all of the conditions are met, the model is

(16)

Example 5 cont…

Mechanics: We have to create a confidence level that corresponds to the alpha level of the test. So if α=.05, then we should create a 90% confidence interval

because this is a one-sided test. That will leave 5% on each side of the observed proportion.

(17)

Example 5 continued…

(18)

Making Errors

 Here’s some shocking news for you: nobody’s

perfect. Even with lots of evidence we can still make the wrong decision.

 When we perform a hypothesis test, we can

make mistakes in two ways:

I. The null hypothesis is true, but we

mistakenly reject it. (Type I error)

II. The null hypothesis is false, but we fail to

(19)

Analogy to medicine…

 In medical disease testing, the null hypothesis is

usually the assumption that a person is healthy. The alternative is that he or she has the disease we’re testing for.

 A Type I error is a false positive- a healthy person

is diagnosed with the disease.

 A Type II error is a false negative – an infected

(20)

Another analogy:

In a Statistics final exam (with H₀: the student has learned only 60% of the material):

What is Type I error? (hint: false positive) What is a Type II error?

(21)

Making Errors (cont.)

 Which type of error is more serious depends on the

situation at hand. In other words, the gravity of the error is context dependent.

 Here’s an illustration of the four situations in a hypothesis

(22)

Example 4: Alzheimer’s

Testing for Alzheimer’s disease can be a long and expensive process, consisting of lengthy tests and medical diagnosis. Recently a group of researchers (Solomon et al., 1998) devised a 7-minute test to

serve as a quick screen for the disease for use in the general

population of senior citizens. A patient who tested positive would then go through the more expensive battery of tests and medical diagnosis. The authors reported a false positive rate of 4% and a false negative rate of 8%.

a. Put this in the context of a hypothesis test. What are the null and

alternative hypotheses?

b. What would a Type I error mean? c. What would a Type II error mean?

(23)

Example 4 continued…

a. The null hypothesis is that a person is healthy. The

alternative is that they have Alzheimer’s disease.

b. A Type I error is deciding a person has Alzheimer’s

when he or she doesn’t.

c. A Type II error is failing to diagnose Alzheimer’s disease

when the person has it.

d. A type I error would require more testing, resulting in

(24)

Making Errors (cont.)

 How often will a Type I error occur?

 Since a Type I error is rejecting a true null

hypothesis, the probability of a Type I error is our  level.

 When H

0 is false and we reject it, we have done the right thing.

 A test’s ability to detect a false hypothesis is

(25)

Making Errors (cont.)

 When H

0 is false and we fail to reject it, we have made a Type II error.

 We assign the letter  to the probability of this

mistake.

 It’s harder to assess the value of  because we

don’t know what the value of the parameter really is.

 There is no single value for --we can think of

(26)

Making Errors (cont.)

 One way to focus our attention on a particular  is to think

about the effect size.

 Ask “How big a difference would matter?”

 We could reduce  for all alternative parameter values by

increasing .

 This would reduce  but increase the chance of a Type

I error.

 This tension between Type I and Type II errors is

inevitable.

 The only way to reduce both types of errors is to collect

(27)

Power

 The power of a test is the probability that it

correctly rejects a false null hypothesis.

 When the power is high, we can be confident that

we’ve looked hard enough at the situation.

(28)

Power (cont.)

 Whenever a study fails to reject its null

hypothesis, the test’s power comes into question.

 When we calculate power, we imagine that the

null hypothesis is false.

 The value of the power depends on how far the

truth lies from the null hypothesis value.

 The distance between the null hypothesis

value, p₀, and the truth, p, is called the effect size.

(29)

A Picture Worth a Thousand Words

 The larger the effect size, the easier it should be

to see it.

 Obtaining a larger sample size decreases the

probability of a Type II error, so it increases the power.

 It also makes sense that the more we’re willing to

(30)

A Picture Worth a Thousand Words (cont.)

 This diagram shows the relationship between

(31)

Reducing Both Type I and Type II Error

 The previous figure seems to show that if we

reduce Type I error, we must automatically increase Type II error.

 But, we can reduce both types of error by making

both curves narrower.

 How do we make the curves narrower? Increase

(32)

Reducing Both Type I and Type II Error

(cont.)

 This figure has means that are just as far apart as in the

previous figure, but the sample sizes are larger, the

(33)

Reducing Both Type I and Type II Error

(cont.)

 Original comparison of

errors:

 Comparison of errors with

(34)

Example 5: Equal opportunity?

A company is sued for job discrimination because only19% of the newly hired candidates were minorities when 27% of all applicants were minorities. Is this strong evidence that the company’s hiring

practices are discriminatory?

a. Is this a one-tailed or a two tailed test? Why? b. In this context, what would a Type I error be? c. In this context, what would a Type II error be?

d. In this context describe what is meant by the power of the test.

e. If the hypothesis is tested at the 5% level of significance instead of

1%, how will affect the power of the test?

f. The lawsuit is based on the hiring of 37 employees. Is the power of

(35)

Example 5 continued…

a. One-tailed. The company wouldn’t be sued if “too many” minorities were hired.

b. Deciding the company is discriminating when it is not.

c. Deciding the company is not discriminating when it is.

d. The probability of correctly detecting discrimination when it exists.

(36)

Example 6: Hoops

A basketball player with a poor foul-shot record practices

intensively during the off-season. He tells the coach that he has raised his proficiency from 60% to 80%.

Dubious, the coach asks him to take 10 shots, and is surprised when the player hits 9 out of 10. Did the player prove that he has improved?

a. Suppose the player really is no better than before- still a 60% shooter. What’s the probability he could hit at least 9 out of 10 shots? (Hint: Use a Binomial model)

b. If that is what happened, now the coach thinks the

(37)

Example 6 continued…

a. .04641-binomcdf(10, .6, 8)

b. Type I

c. 37.6% 1-binomcdf(10, .8, 8)

(38)

What have we learned?

 There’s a lot more to hypothesis testing than a

simple yes/no decision.

 And, we’ve learned about the two kinds of errors