There are two kinds of errors that can be made in significance testing:
(1) a true null hypothesis can be incorrectly rejected and (2) a false null hypothesis can fail to be rejected.
The former error is called a Type I error and the latter error is called a Type II error.
These two types of errors are defined in the table.
The probability of a Type I error is designated by the Greek letter alpha (α) and is called the Type I error rate; the probability of a Type II error is designated by the Greek letter beta (β), and is called the Type II error rate.
A Type II error is only an error in the sense that an opportunity to reject the null
hypothesis correctly was lost. It is not an error in the sense that an incorrect conclusion was drawn since no conclusion is drawn when the null hypothesis is not rejected. A Type I error, on the other hand, is an error in every sense of the word. A conclusion is drawn that the null hypothesis is false when, in fact, it is true. Therefore, Type I errors are generally considered more serious than Type II errors. The probability of a Type I error (α) is called the significance level and is set by the experimenter. There is a tradeoff between Type I and Type II errors. The more an experimenter protects himself or herself against Type I errors by choosing a low level, the greater the chance of a Type II error. Requiring very strong evidence to reject the null hypothesis makes it very unlikely that a true null hypothesis will be rejected. However, it increases the chance that a false null hypothesis will not be rejected, thus lowering power. The Type I error rate is almost always set at .05 or at .01, the latter being more conservative since it requires stronger evidence to reject the null hypothesis at the .01 level then at the .05 level.
One and Two Tailed Tests
In the section on "Steps in hypothesis testing", the fourth step involves calculating the probability that a statistic would differ as much or more from parameter specified in the null hypothesis as does the statistic obtained in the experiment. This statement implies that a difference in either direction would be counted. That is, if the null hypothesis were H0: μ1 - μ2 = 0, and the value of the statistic M1 - M2 were +5, then the probability of M1 - M2 differing from zero by five or more (in either direction) would be computed.
In other words, probability value would be the probability that either M1 - M2≥ 5 or M1 -M2 ≤ -5.Assume that the figure shown below is the sampling distribution of M1 - M2.
The figure shows that the probability of a value of +5 or more is 0.036 and that the probability of a value of -5 or less is .036. Therefore the probability of a value either greater than or equal to +5 or less than or equal to -5 is 0.036 + 0.036 = 0.072.
A probability computed considering differences in both directions is called a "two-tailed" probability. The name makes sense since both tails of the sampling distribution are considered. There are situations in which an experimenter is concerned only with differences in one direction. For example, an experimenter may be concerned with whether or not μ1 - μ2 is greater than zero. However, if μ1 - μ2 is not greater than zero, the experimenter may not care whether it equals zero or is less than zero. For instance, if a new drug treatment is developed, the main issue is whether or not it is better than a placebo. If the treatment is not better than a placebo, then it will not be used. It does not really matter whether or not it is worse than the placebo.
When only one direction is of concern to an experimenter, then a "one-tailed" test can be performed. If an experimenter were only to be concerned with whether or not μ1 - μ2
is greater than zero, then the one-tailed test would involve calculating the probability of obtaining a statistic as greater than the one obtained in the experiment.
In the example, the one-tailed probability would be the probability of obtaining a value of M1 - M2 greater than or equal to five given that the difference between population means is zero.
The shaded area in the figure is greater than five. The figure shows that the one-tailed probability is 0.036.
It is easier to reject the null hypothesis with a one-tailed than with a two-tailed test as long as the effect is in the specified direction. Therefore, one-tailed tests have lower Type II error rates and more power than do two-tailed tests. In this example, the one-tailed probability (0.036) is below the conventional significance level of 0.05 whereas the two-tailed probability (0.072) is not. Probability values for one-tailed tests are one half the value for two-tailed tests as long as the effect is in the specified direction.
One-tailed and two-tailed tests have the same Type I error rate. One-tailed tests are sometimes used when the experimenter predicts the direction of the effect in advance.
This use of one-tailed tests is questionable because the experimenter can only reject the
null hypothesis if the effect is in the predicted direction. If the effect is in the other direction, then the null hypothesis cannot be rejected no matter how strong the effect is.
A skeptic might question whether the experimenter would really fail to reject the null hypothesis if the effect were strong enough in the wrong direction. Frequently the most interesting aspect of an effect is that it runs counter to expectations. Therefore, an experimenter who committed himself or herself to ignoring effects in one direction may be forced to choose between ignoring a potentially important finding and using the techniques of statistical inference dishonestly. One-tailed tests are not used frequently.
Unless otherwise indicated, a test should be assumed to be two-tailed.
Confidence Intervals & Hypothesis Testing
There is an extremely close relationship between confidence intervals and hypothesis testing. When a 95% confidence interval is constructed, all values in the interval are considered plausible values for the parameter being estimated. Values outside the interval are rejected as relatively implausible. If the value of the parameter specified by the null hypothesis is contained in the 95% interval then the null hypothesis cannot be rejected at the 0.05 level. If the value specified by the null hypothesis is not in the interval then the null hypothesis can be rejected at the 0.05 level. If a 99% confidence interval is constructed, then values outside the interval are rejected at the 0.01 level.
Imagine a researcher wishing to test the null hypothesis that the mean time to respond to an auditory signal is the same as the mean time to respond to a visual signal. The null hypothesis therefore is: μ visual – μ auditory = 0.
Ten subjects were tested in the visual condition and their scores (in milliseconds) were:
355, 421, 299, 460, 600, 580, 474, 511, 550, and 586.
Ten subjects were tested in the auditory condition and their scores were: 275, 320, 278, 360, 430, 520, 464, 311, 529, and 326.
The 95% confidence interval on the difference between means is: 9 ≤ μ visual – μ auditory ≤ 196.
Therefore only values in the interval between 9 and 196 are retained as plausible values for the difference between population means. Since zero, the value specified by the null hypothesis, is not in the interval, the null hypothesis of no difference between auditory and visual presentation can be rejected at the 0.05 level. The probability value for this example is 0.034. Any time the parameter specified by a null hypothesis is not contained in the 95% confidence interval estimating that parameter, the null hypothesis can be rejected at the 0.05 level or less. Similarly, if the 99% interval does not contain the parameter then the null hypothesis can be rejected at the 0.01 level. The null hypothesis is not rejected if the parameter value specified by the null hypothesis is in the interval since the null hypothesis would still be plausible.
However, since the null hypothesis would be only one of an infinite number of values in the confidence interval, accepting the null hypothesis is not justified. There are many
arguments against accepting the null hypothesis when it is not rejected. The null hypothesis is usually a hypothesis of no difference. Thus null hypothesis such as:
μ1 - μ2 = 0 π1 - π2 = 0
in which the hypothesized value is zero are most common. When the hypothesized value is zero then there is a simple relationship between hypothesis testing and confidence intervals:
If the interval contains zero then the null hypothesis cannot be rejected at the stated level of confidence. If the interval does not contain zero then the null hypothesis can be rejected.
This is just a special case of the general rule stating that the null hypothesis can be rejected if the interval does not contain the hypothesized value of the parameter and cannot be rejected if the interval contains the hypothesized value. Since zero is
contained in the interval, the null hypothesis that μ1 - μ2 = 0 cannot be rejected at the 0.05 level since zero is one of the plausible values of μ1 - μ2. The interval contains both positive and negative numbers and therefore μ1 may be either larger or smaller than μ2. None of the three possible relationships between μ1 and μ2:
μ1 - μ2 = 0, μ1 - μ2 > 0, and μ1 - μ2 < 0
can be ruled out. The data are very inconclusive. Whenever a significance test fails to reject the null hypothesis, the direction of the effect (if there is one) is unknown.
Now, consider the 95% confidence interval:
6 < μ1 - μ2 ≤ 15
Since zero is not in the interval, the null hypothesis that μ1 - μ2 = 0 can be rejected at the 0.05 level. Moreover, since all the values in the interval are positive, the direction of the effect can be inferred: μ1 > μ2.
Whenever a significance test rejects the null hypothesis that a parameter is zero, the confidence interval on that parameter will not contain zero. Therefore either all the values in the interval will be positive or all the values in the interval will be negative. In either case, the direction of the effect is known.
Define the Decision Rule and the Region of Acceptance
The decision rule consists of two parts: (1) a test statistic and (2) a range of values, called the region of acceptance. The decision rule determines whether a null hypothesis is accepted or rejected. If the test statistic falls within the region of acceptance, the null hypothesis is accepted; otherwise, it is rejected.
We define the region of acceptance in such a way that the chance of making a Type I error is equal to the significance level. Here is how that is done:
♦ Given the significance level α, find the upper limit (UL) of the range of acceptance.
There are three possibilities, depending on the form of the null hypothesis
-i. If the null hypothesis is μ < M: The upper limit of the region of acceptance will be equal to the value for which the cumulative probability of the sampling distribution is equal to one minus the significance level. That is, P(x < UL) = 1 - α.
ii. If the null hypothesis is μ = M: The upper limit of the region of acceptance will be equal to the value for which the cumulative probability of the sampling distribution is equal to one minus the significance level divided by 2. That is, P(x < UL) = 1 - α/2.
iii. If the null hypothesis is μ > M: The upper limit of the region of acceptance is equal to plus infinity.
♦ In a similar way, we find the lower limit (LL) of the range of acceptance. Again, there are three possibilities, depending on the form of the null hypothesis.
i. If the null hypothesis is μ < M: The lower limit of the region of acceptance is equal to minus infinity.
ii. If the null hypothesis is μ = M: The lower limit of the region of acceptance will be equal to the value for which the cumulative probability of the sampling distribution is equal to the significance level divided by 2. That is, P(x < LL) = α/2
iii. If the null hypothesis is μ > M: The lower limit of the region of acceptance will be equal to the value for which the cumulative probability of the sampling distribution is equal to the significance level. That is, P(x < LL) = α
The region of acceptance is defined by the range between LL and UL.
Accept or Reject the Null Hypothesis
Once the region of acceptance is defined, the null hypothesis can be tested against sample data. Simply compute the test statistic. In this case, the test statistic is the sample mean. If the sample mean falls within the region of acceptance, the null hypothesis is accepted; if not, it is rejected.
Other Considerations
When one tests a hypothesis in the real world, other issues may come into play. Here are some suggestions that may be helpful.
♦ You will need to make an assumption about the sampling distribution of the mean score. If the sample is relatively large (i.e., greater than or equal to 30), you can assume, based on the central limit theorem, that the sampling distribution will be roughly
normal. On the other hand, if the sample size is small (less than 30) and if the population random variable is approximately normally distributed (i.e., has a bell-shaped curve), you can transform the mean score into a t-score. The t-score will have a t-distribution.
♦ Assume that the mean of the sampling distribution is equal to the test value M specified in the null hypothesis.
♦ In some situations, you may need to compute the standard deviation of the sampling distribution sx. If the standard deviation of the population σ is known, then sx = σ x sqrt[(1/n) - (1/N)], where n is the sample size and N is the population size. On the other hand, if the standard deviation of the population σ is unknown, then
sx = s x sqrt of [(1/n) - (1/N)], where s is the sample standard deviation.
Example 1
An inventor has developed a new, energy-efficient lawn mower engine. He claims that the engine will run continuously for 5 hours (300 minutes) on a single gallon of regular gasoline. Suppose a random sample of 50 engines is tested. The engines run for an average of 295 minutes, with a standard deviation of 20 minutes. Test the null hypothesis that the mean run time is 300 minutes against the alternative hypothesis that the mean run time is not 300 minutes. Use a 0.05 level of significance.
Solution
There are four steps in conducting a hypothesis test, as described in the previous sections. We work through those steps below:
1. Formulate hypotheses
The first step is to state the null hypothesis and an alternative hypothesis.
Null hypothesis: μ = 300 minutes
Alternative hypothesis: μ < > 300 minutes
Note that these hypotheses constitute a two-tailed test. The null hypothesis will be rejected if the sample mean is too big or if it is too small.
2. Identify the test statistic
In this example, the test statistic is the mean run time of the 50 engines in the sample = 295 minutes.
3. Define the decision rule
The decision rule consists of two parts: (1) a test statistic and (2) a range of values, called the region of acceptance. We already know that the test statistic is a sample mean equal to 295. All that remains is to describe the region of acceptance; that is, to define the lower limit and the upper limit of the region. Here is how that is done.
a. Specify the sampling distribution. Since the sample size is large (greater than or equal to 30), we assume that the sampling distribution of the mean is normal, based on the central limit theorem.
b. Define the mean of the sampling distribution. We assume that the mean of the sampling distribution is equal to the mean value that appears in the null
hypothesis - 300 minutes.
c. Compute the standard deviation of the sampling distribution. Here the standard deviation of the sampling distribution sx is:
sx = σ x sqrt[(1/n) - (1/N)]
sx = 20 x sqrt[1/50] = 2.83
where s is the sample standard deviation, n is the sample size, and N is the population size. In this example, we assume that the population size N is very large, so that the value 1/N is about zero.
4. Find the lower limit of the region of acceptance
Given a two-tailed hypothesis, the lower limit (LL) will be equal to the value for which the cumulative probability of the sampling distribution is equal to the significance level divided by 2. That is, P(x < LL) = α/2 = 0.05/2 = 0.025. To find this lower limit, we use the Normal Distribution table. From table, cumulative probability = 0.025, mean = 300, and standard deviation = 2.83. The calculation tells us that the lower limit is 294.45, given those inputs.
a. Find the upper limit of the region of acceptance. Given a two-tailed hypothesis, the upper limit (UL) will be equal to the value for which the cumulative
probability of the sampling distribution is equal to one minus the significance level divided by 2. That is, P(x < UL) = 1 - α/2 = 1 - 0.025 = 0.975. To find this upper limit, we use the Normal Distribution Table. From table, cumulative probability = 0.975, mean = 300, and standard deviation = 2.83. The calculation tells us that the upper limit is 305.55, given those inputs.
b. Thus, we have determined that the region of acceptance is defined by the values between 294.45 and 305.55.
5. Accept or reject the null hypothesis
The sample mean in this example was 295 minutes. This value falls within the region of acceptance. Therefore, we cannot reject the null hypothesis that a new engine runs for 300 minutes on a gallon of gasoline.
Example 2
Bon Air Elementary School has 300 students. The principal of the school thinks that the average IQ of students at Bon Air is at least 110. To prove her point, she administers an IQ test to 20 randomly selected students. Among the sampled students, the average IQ is 108 with a standard deviation of 10. Based on these results, should the principal accept or reject her original hypothesis? Assume a significance level of 0.01.
Solution
There are four steps in conducting a hypothesis test, as described in the previous sections. We work through those steps below:
1. Formulate hypotheses.
The first step is to state the null hypothesis and an alternative hypothesis.
Null hypothesis: μ > 110
Alternative hypothesis: μ < 110
Note that these hypotheses constitute a one-tailed test. The null hypothesis will be rejected if the sample mean is too small.
2. Identify the test statistic.
In this example, the test statistic is the mean IQ score of the 20 students in the sample.
Thus, the test statistic is the mean IQ score of 108.
3. Define the decision rule. The decision rule consists of two parts: (1) a test statistic and (2) a range of values, called the region of acceptance. We already know that the test statistic is a sample mean equal to 108. All that remains is to describe the region of acceptance; that is, to define the lower limit and the upper limit of the region. Here is how that is done.
a. Specify the sampling distribution. Since the sample size is small (less than 30), we assume that the sampling distribution of the mean follows a t-distribution.
b. Define the mean of the sampling distribution. We assume that the mean of the sampling distribution is equal to the mean value that appears in the null
hypothesis - 110.
c. Find the lower limit of the region of acceptance. Given a one-tailed hypothesis, the lower limit (LL) will be equal to the value for which the cumulative
probability of the sampling distribution is equal to the significance level. That is,
probability of the sampling distribution is equal to the significance level. That is,