Types of errors for a Hypothesis Test

Chapter 8 Hypothesis Testing

8.3 Types of errors for a Hypothesis Test

8.5 Summary 8.6 Exercises

8.1 Introduction

In this chapter we will study another method of inference-making: hypothesis testing. The procedures to be discussed are useful in situations where we are interested in making a decision about a parameter value, rather than obtaining an estimate of its value. It is often desirable to know whether some characteristics of a population is larger than a specified value, or whether the obtained value of a given parameter is less than a value hypothesized for the purpose of comparison.

8.2 Formulating Hypotheses

When we set out to test a new theory, we first formulate a hypothesis, or a claim, which we believe to be true. For example, we may claim that the mean number of children born to urban women is less than the mean number of children born to rural women.

Since the value of the population characteristic is unknown, the information provided by a sample from the population is used to answer the question of whether or not the population quantity is larger than the specified or hypothesized value. In statistical terms, a statistical hypothesis is a statement about the value of a population parameter. The hypothesis that we try to establish is called the alternative hypothesis and is denoted by H_a. To be paired with the alternative hypothesis is the null hypothesis, which is "opposite" of the alternative hypothesis, and is denoted by H0. In this way, the null and alternative hypotheses, both stated in terms of the appropriate parameters, describe two possible states of nature that cannot simultaneously be true. When a researcher begins to collect information about the phenomenon of interest, he or she generally tries to present evidence that lends support to the alternative hypothesis. As you will subsequently learn, we take an indirect approach to obtaining support for the alternative hypothesis: Instead of trying to show that the alternative hypothesis is true, we attempt to produce evidence to show that the null hypothesis is false.

It should be stressed that researchers frequently put forward a null hypothesis in the hope that they can discredit it. For example, consider an educational researcher who designed a new way to teach a particular concept in science, and wanted to test experimentally whether this new method worked better than the existing method. The researcher would design an experiment comparing the two methods. Since the null hypothesis would be that there is no difference between the two methods, the researcher would be hoping to reject the null hypothesis and conclude that the method he or she developed is the better of the two.

The null hypothesis is typically a hypothesis of no difference, as in the above example where it is the hypothesis that there is no difference between population means. That is why the word

"null" in "null hypothesis" is used − it is the hypothesis of no difference.

Example 8.1 Formulate appropriate null and alternative hypotheses for testing the demographer's theory that the mean number of children born to urban women is less than the mean number of children born to rural women.

Solution The hypotheses must be stated in terms of a population parameter or parameters. We will thus define

µ1 = Mean number of children born to urban women, and µ2 = Mean number of children ever born of the rural women.

The demographer wants to support the claim that µ1 is less than µ2; therefore, the null and alternative hypotheses, in terms of these parameters, are

H₀: (µ1 - µ2) = 0 (i.e., µ1 = µ2; there is no difference between the mean numbers of children born to urban and rural women)

Ha: (µ1 - µ2) < 0 (i.e., µ1 < µ2; the mean number of children born to urban women is less than that for the rural women)

Example 8.2 For many years, cigarette advertisements have been required to carry the following statement: "Cigarette smoking is dangerous to your health." But, this waning is often located in inconspicuous corners of the advertisements and printed in small type.

Consequently, a researcher believes that over 80% of those who read cigarette advertisements fail to see the warning. Specify the null and alternative hypotheses that would be used in testing the researcher's theory.

Solution The researcher wants to make an inference about p, the true proportion of all readers of cigarette advertisements who fail to see the warning. In particular, he wishes to collect evidence to support the claim that p is greater than .80; thus, the null and alternative hypotheses are

H

: p = .80 H

: p > .80

Observe that the statement of H0 in these examples and in general, is written with an equality (=) sign. In Example 8.2, you may have been tempted to write the null hypothesis as H0: p ≤ .80.

However, since the alternative of interest is that p > .80, then any evidence that would cause you to reject the null hypothesis H0: p = .80 in favor of Ha: p > .80 would also cause you to reject H0: p = p', for any value of p' that is less than .80. In other words, H₀: p = .80 represents the worst possible case, from the researcher's point of view, when the alternative hypothesis is not correct.

Thus, for mathematical ease, we combine all possible situations for describing the opposite of Ha

into one statement involving equality.

Example 8.3 A metal lathe is checked periodically by quality control inspectors to determine if it is producing machine bearings with a mean diameter of .5 inch. If the mean diameter of the bearings is larger or smaller than .5 inch, then the process is out of control and needs to be adjusted. Formulate the null and alternative hypotheses that could be used to test whether the bearing production process is out of control.

Solution We define the following parameter:

µ = True mean diameter (in inches) of all bearings produced by the lathe

If either µ > .5 or µ < .5, then the metal lathe's production process is out of control. Since we wish to be able to detect either possibility, the null and alternative hypotheses would be

H0: µ = .5 (i.e., the process is in control) Ha: µ ≠ .5 (i.e., the process is out of control)

An alternative hypothesis may hypothesize a change from H₀ in a particular direction, or it may merely hypothesize a change without specifying a direction. In Examples 8.1 and 8.2, the researcher is interested in detecting departure from H0 in one particular direction. In Example 8.1, the interest focuses on whether the mean number of children born to the urban women is less than the mean number of children born to rural women. The interest focuses on whether the proportion of cigarette advertisement readers who fail to see the warning is greater than .80 in Example 8.2. These two tests are called one-tailed tests. In contrast, Example 8.3 illustrates a two-tailed test in which we are interested in whether the mean diameter of the machine bearings differs in either direction from .5 inch, i.e., whether the process is out of control.

8.3 Types of errors for a Hypothesis Test

The goal of any hypothesis testing is to make a decision. In particular, we will decide whether to reject the null hypothesis, H0, in favor of the alternative hypothesis, Ha. Although we would like always to be able to make a correct decision, we must remember that the decision will be based on sample information, and thus we are subject to make one of two types of error, as defined in the accompanying boxes.

Definition 8.1

A Type I error is the error of rejecting the null hypothesis when it is true. The probability of committing a Type I error is usually denoted by α.

Definition 8.2

A Type II error is the error of accepting the null hypothesis when it is false. The probability of making a Type II error is usually denoted by β.

The null hypothesis can be either true or false further, we will make a conclusion either to reject or not to reject the null hypothesis. Thus, there are four possible situations that may arise in testing a hypothesis (see Table 8.1).

Table 8.1 Conclusions and consequences for testing a hypothesis

Conclusions Do not reject

Null Hypothesis

Reject Null Hypothesis Null Hypothesis Correct conclusion Type I error True

"State of Nature" Alternative Hypothesis Type II error Correct conclusion

The kind of error that can be made depends on the actual state of affairs (which, of course, is unknown to the investigator). Note that we risk a Type I error only if the null hypothesis is rejected, and we risk a Type II error only if the null hypothesis is not rejected. Thus, we may make no error, or we may make either a Type I error (with probability α), or a Type II error (with probability β), but not both. We don't know which type of error corresponds to actuality and so would like to keep the probabilities of both types of errors small. There is an intuitively appealing relationship between the probabilities for the two types of error: As α increases, β

decreases, similarly, as β increases, a decreases. The only way to reduce α and β simultaneously is to increase the amount of information available in the sample, i.e., to increase the sample size.

Example 8.4 Refer to Example 8.3. Specify what Type I and Type II errors would represent, in terms of the problem.

Solution A Type I error is the error of incorrectly rejecting the null hypothesis. In our example, this would occur if we conclude that the process is out of control when in fact the process is in control, i.e., if we conclude that the mean bearing diameter is different from .5 inch, when in fact the mean is equal to .5 inch. The consequence of making such an error would be that unnecessary time and effort would be expended to repair the metal lathe.

A Type II error that of accepting the null hypothesis when it is false, would occur if we conclude that the mean bearing diameter is equal to .5 inch when in fact the mean differs from .5 inch. The practical significance of making a Type II error is that the metal lathe would not be repaired, when in fact the process is out of control.

The probability of making a Type I error (α) can be controlled by the researcher (how to do this will be explained in Section 8.4). α is often used as a measure of the reliability of the conclusion and called the level of significance (or significance level) for a hypothesis test.

You may note that we have carefully avoided stating a decision in terms of "accept the null hypothesis H0." Instead, if the sample does not provide enough evidence to support the

alternative hypothesis Ha, we prefer a decision "not to reject H0." This is because, if we were to

"accept H0," the reliability of the conclusion would be measured by β, the probability of Type II error. However, the value of β is not constant, but depends on the specific alternative value of the parameter and is difficult to compute in most testing situations.

In summary, we recommend the following procedure for formulating hypotheses and stating conclusions.

Formulating hypotheses and stating conclusions 1. State the hypothesis as the alternative hypothesis H_a.

2. The null hypothesis, H0, will be the opposite of Ha and will contain an equality sign.

3. If the sample evidence supports the alternative hypothesis, the null hypothesis will be rejected and the probability of having made an incorrect decision (when in fact H₀ is true) is α, a quantity that can be manipulated to be as small as the researcher wishes.

4. If the sample does not provide sufficient evidence to support the alternative hypothesis, then conclude that the null hypothesis cannot be rejected on the basis of your sample. In this situation, you may wish to collect more information about the phenomenon under study.

Example 8.5 The logic used in hypothesis testing has often been likened to that used in the courtroom in which a defendant is on trial for committing a crime.

a. Formulate appropriate null and alternative hypotheses for judging the guilt or innocence of the defendant.

b. Interpret the Type I and Type II errors in this context.

c. If you were the defendant, would you want α to be small or large? Explain.

Solution

a. Under a judicial system, a defendant is "innocent until proven guilty." That is, the burden of proof is not on the defendant to prove his or her innocence; rather, the court must collect

sufficient evidence to support the claim that the defendant is guilty. Thus, the null and alternative hypotheses would be

H0: Defendant is innocent Ha: Defendant is guilty

b. The four possible outcomes are shown in Table 8.2. A Type I error would be to conclude that the defendant is guilty, when in fact he or she is innocent; a Type II error would be to conclude that the defendant is innocent, when in fact he or she is guilty.

Table 8.2 Conclusions and consequences inn Example 8.5

Decision of Court Defendant is

innocent

Defendant is guilty True State of

Nature Defendant is innocent Defendant is guilty

Correct decision

Type I error

Type II error

Correct decision

c. Most would probably agree that the Type I error in this situation is by far the more serious.

Thus, we would want α, the probability of committing a Type I error, to be very small indeed.

A convention that is generally observed when formulating the null and alternative hypotheses of any statistical test is to state H₀ so that the possible error of incorrectly rejecting H₀ (Type I error) is considered more serious than the possible error of incorrectly failing to reject H₀ (Type II error). In many cases, the decision as to which type of error is more serious is admittedly not as clear-cut as that of Example 8.5; experience will help to minimize this potential difficulty.

In document Statistical Data Analysis (Page 116-120)