• No results found

Chapter 21 More about tests and intervals What I will know and be able to do Apply confidence intervals, critical values, alpha levels

N/A
N/A
Protected

Academic year: 2020

Share "Chapter 21 More about tests and intervals What I will know and be able to do Apply confidence intervals, critical values, alpha levels"

Copied!
36
0
0

Loading.... (view fulltext now)

Full text

(1)

Chapter 21

More about tests and intervals

What I will know and be able to do

Apply confidence intervals, critical values, alpha levels

and power to complete a hypothesis test for a population

proportion.

Assignment:

Read Chapter 21

(2)

How to Think About P-Values

• A P-value is a conditional probability—the probability of the observed statistic given that the null hypothesis is true.

• The P-value is NOT the probability that the null hypothesis is true.

• It’s not even the conditional probability that null hypothesis is true given the data.

(3)

Example 1: P-value

A medical researcher has tested a new treatment for poison

ivy against the traditional ointment. With a P-value of 0.047,

he concludes the new treatment is more effective. Explain

what the P-value means in this context.

(4)

Alpha Levels

Sometimes we need to make a firm decision about

whether or not to reject the null hypothesis.

When the P-value is small, it tells us that our data are

rare given the null hypothesis.

(5)

Alpha Levels (cont.)

We can define “rare event” arbitrarily by setting

a threshold for our P-value.

If our P-value falls below that point, we’ll reject H

0

.

We call such results

statistically significant

.

(6)

Alpha Levels (cont.)

Common alpha levels are 0.10, 0.05, and 0.01.

You have the option—almost the

obligation

—to consider

your alpha level carefully and choose an appropriate one

for the situation.

The alpha level is also called the

significance level

.

(7)

Example 2: Alpha

A researcher developing scanners to search for hidden

weapons at airports has concluded that a new device is

significantly better than the current scanner. He made

this decision based on a test using α=.05. Would he

(8)

Example 2: Alpha (cont.)

At α=.10, he would have made the same decision. We

know his P-value was less then .05, which also has to

be less than .10.

To reject H

0

at α=.01, the P-value must be less than .01,

which is not necessarily the case. So he might not

(9)

Critical Values Again

When the alternative is

one-sided, the critical value

puts all of

on one side:

When the alternative is

two-sided, the critical value splits

(10)

Confidence Intervals and Hypothesis Tests

Confidence intervals and hypothesis tests are built

from the same calculations.

They have the same assumptions and conditions.

You can approximate a hypothesis test by examining a

confidence interval.

Just ask whether the null hypothesis value is consistent

with a confidence interval for the parameter at the

(11)

Example 3: Click It or Ticket

Teens are at greatest risk of being killed or injured in traffic crashes.

According to the National Highway Traffic Safety Administration, 65% of young people killed were not wearing a safety belt. Because many

deaths could easily be prevented by the use of safety belts, several states have begun “Click It or Ticket” campaigns. In 2005, a local

newspaper reported that a roadblock resulted in 23 tickets to drivers who were unbelted out of 134 stopped for inspection. Does this

(12)

Example 3 continued…

Hypothesis:

H

0

: p = .80

H

A

: p > .80

The null hypothesis is that 80% of the drivers will be

wearing their safety belts. The alternative hypothesis is

that more than 80% will be wearing their safety belts

(13)

Example 3 continued…

Model:

1. I will assume that the drivers are not likely to influence each other about wearing their seatbelt, making them mutually independent. 2. This isn’t a random sample, but I assume that these drivers are

representative of the driving public.

3. 10% condition: 134 is certainly less than all drivers.

4. Success Failure: np = 134(.8) = 111 ≥ 10 and nq = 134(.2) = 23 ≥ 10 therefore, the sample is large enough.

(14)

Example 3 cont…

Mechanics: We have to create a confidence level that corresponds to the alpha level of the test. So if α=.05, then we should create a 90% confidence interval because this is a one-sided test. That will leave 5% on each side of the observed proportion.

(15)

Example 3 continued…

I am 90% confident that between 77.4% and 88.2% of all

(16)

Making Errors

Here’s some shocking news for you: nobody’s perfect.

Even with lots of evidence we can still make the wrong

decision.

When we perform a hypothesis test, we can make

mistakes in two

ways:

I.

The null hypothesis is true, but we mistakenly reject it.

(Type I error)

II.

The null hypothesis is false, but we fail to reject it.

(17)

Analogy to medicine…

In medical disease testing, the null hypothesis is

usually the assumption that a person is healthy. The

alternative is that he or she has the disease we’re

testing for.

A Type I error is a false positive- a healthy person is

diagnosed with the disease.

(18)

Another analogy:

In a Statistics final exam (with H

0

: the student has

learned only 60% of the material):

What is Type I error?

(hint: false positive)

What is a Type II error?

(19)

Making Errors (cont.)

Which type of error is more serious depends on the

situation at hand. In other words, the gravity of the error is

context dependent.

(20)

Example 4: Alzheimer’s

Testing for Alzheimer’s disease can be a long and expensive process, consisting of

lengthy tests and medical diagnosis. Recently a group of researchers (Solomon et al., 1998) devised a 7-minute test to serve as a quick screen for the disease for use in the general population of senior citizens. A patient who tested positive would then go through the more expensive battery of tests and medical diagnosis. The authors reported a false positive rate of 4% and a false negative rate of 8%.

a. Put this in the context of a hypothesis test. What are the null and alternative hypotheses?

b. What would a Type I error mean? c. What would a Type II error mean?

(21)

Example 4 continued…

a. The null hypothesis is that a person is healthy. The alternative is that they have Alzheimer’s disease.

b. A Type I error is deciding a person has Alzheimer’s when he or she doesn’t.

c. A Type II error is failing to diagnose Alzheimer’s disease when the person has it.

(22)

Making Errors (cont.)

How often will a Type I error occur?

• Since a Type I error is rejecting a true null hypothesis, the probability of a Type I error is our  level.

When H

0

is false and we reject it, we have done the right

thing.

(23)

Making Errors (cont.)

When H

0

is false and we fail to reject it, we have

made a Type II error.

We assign the letter

to the probability of this mistake.

It’s harder to assess the value of

because we don’t

know what the value of the parameter really is.

(24)

Making Errors (cont.)

• One way to focus our attention on a particular  is to think about the

effect size.

• Ask “How big a difference would matter?”

• We could reduce  for all alternative parameter values by increasing

.

• This would reduce  but increase the chance of a Type I error.

• This tension between Type I and Type II errors is inevitable.

(25)

Power

The

power

of a test is the probability that it correctly

rejects a false null hypothesis.

When the power is high, we can be confident that

we’ve looked hard enough at the situation.

(26)

Power (cont.)

Whenever a study fails to reject its null hypothesis, the

test’s power comes into question.

When we calculate power, we imagine that the null

hypothesis is false.

The value of the power depends on how far the truth lies

from the null hypothesis value.

• The distance between the null hypothesis value, p0, and the truth, p, is called the effect size.

(27)

A Picture Worth a Thousand Words

The larger the effect size, the easier it should be to see it.

Obtaining a larger sample size decreases the probability of a

Type II error, so it increases the power.

(28)

A Picture Worth a Thousand Words (cont.)

(29)

Reducing Both Type I and Type II Error

The previous figure seems to show that if we reduce

Type I error, we must automatically increase Type II

error.

But, we can reduce both types of error by making both

curves narrower.

How do we make the curves narrower?

(30)

Reducing Both Type I and Type II Error (cont.)

(31)

Reducing Both Type I and Type II Error (cont.)

(32)

Example 5: Equal opportunity?

A company is sued for job discrimination because only 19% of the newly hired candidates were minorities when 27% of all applicants were minorities. Is this strong evidence that the company’s hiring practices are discriminatory?

a. Is this a one-tailed or a two tailed test? Why? b. In this context, what would a Type I error be? c. In this context, what would a Type II error be?

d. In this context describe what is meant by the power of the test.

e. If the hypothesis is tested at the 5% level of significance instead of 1%, how will affect the power of the test?

(33)

Example 5 continued…

a. One-tailed. The company wouldn’t be sued if “too many”

minorities were hired.

b. Deciding the company is discriminating when it is not.

c. Deciding the company is not discriminating when it is.

d. The probability of correctly detecting discrimination when

it exists.

e. Increases the power.

(34)

Example 6: Hoops

A basketball player with a poor foul-shot record practices intensively during the off-season. He tells the coach that he has raised his

proficiency from 60% to 80%. Dubious, the coach asks him to take 10 shots, and is surprised when the player hits 9 out of 10. Did the player prove that he has improved?

a. Suppose the player really is no better than before- still a 60% shooter. What’s the probability he could hit at least 9 out of 10 shots? (Hint: Use a Binomial model)

b. If that is what happened, now the coach thinks the player has improved, when he has not. Which type of error is that?

(35)

Example 6 continued…

a. .0464 1-binomcdf(10, .6, 8)

b. Type I

c. 37.6% 1-binomcdf(10, .8, 8)

(36)

What have we learned?

There’s a lot more to hypothesis testing than a simple

yes/no decision.

References

Related documents

An analysis of the economic contribution of the software industry examined the effect of software activity on the Lebanese economy by measuring it in terms of output and value

Percentage of Commitment Spent in Private and Public Placements Other than Long Lane School: What would you suggest be done to address the finding that White juveniles spend more

A high-quality ranking or matching algorithm will guide consumers to high-quality decisions only in combination with adequate data of two types: (1) the relevant attributes of

The result of our next experiment indicates that manually translating phrases in the queries, using a phrase dictionary, increased the retrieval performance of the English trans-

The mechanism is built up around heterogeneous goods, useful for different levels of response time of electricity or different Quality of Service agreements, package bidding and

I will argue throughout this chapter that anyone can become a child of the land in Hawai‘i with dedicated engagement in the traditional reciprocal relationship to land,

Speaker Trans- formations are using speech data from the target speaker, Accent Transformations use speech data from different speakers, who have the same accent as the speech

» Server in the middle attacks » Server is security relevant » Costs to create, architect, manage secure servers Router Router Router Router Router Router Peer Peer Peer Peer