Estimation: Hypothesis Testing
Dr. Patrick Toche
References :
† Douglas A. Lind, William G Marchal, Samuel A. Wathen, Statistical Tech- niques in Business and Economics, The McGraw-Hill/Irwin Series in Opera- tions and Decision Sciences, 17th edition (2017), 978-1259666360.
Other references may be given from time to time.
Learning Objectives
1. Explain the process of testing a hypothesis.
2. Interpret Type-I and Type-II errors.
3. Distinguish between a one-tailed and a two-tailed test.
4. Test a hypothesis about a population mean.
5. Compute and interpret a z -score and a t -statistic.
6. Compute and interpret a p -value.
7. Compute the probability of a Type-II error.
Hypothesis Testing
Hypothesis Testing
A hypothesis is a statement about a population parameter sub- ject to verification. Hypothesis testing is a procedure based on sample evidence and probability theory to determine whether the hypothesis is a reasonable statement.
I Hypothesis testing does not provide definitive mathematical proof, but it does provide levels of confidence about the validity of a state- ment. Specific rules are followed to gather the evidence.
I The first step in hypothesis testing is to set the null hypothesis H
0and the alternative H
1. The
0subscript stands for “no change” — the
1stands for the negation.
I The null hypothesis H
0is said to be “true” if the alternate hypoth- esis H
1is “improbable”, that is less likely than a set threshold — the significance level α .
Six-Step Procedure
1. Set the null hypothesis and the alternate hypothesis.
2. Set the desired significance level.
3. Identify the test distribution.
4. Identify the acceptance/rejection regions.
5. Compute the test statistic from the sample data.
6. Interpret the test results.
1. Set Null & Alternate Hypotheses
Null Hypothesis H
0A statement about the value of a population parameter, that may or may not be true. The validity of the null hypothesis is to be as- sessed on the basis of sample data. Example: µ = µ
0, where µ is the true population value and µ
0is the hypothetical value.
Alternate Hypothesis H
1A statement about the value of a population parameter that complements the null hypothesis. Will be considered true if evi- dence suggests the null hypothesis is false. Example: µ 6= µ
0. I The conclusion of a test is to either “reject” or “fail to reject” the null
hypothesis. A test cannot prove with certainty that the null is true
— only that the test failed to reject the null. The value of failing to reject a hypothesis depends on the power of the test.
1. One-Tailed Versus Two-Tailed Hypotheses
I Two-Tailed:
H
0: µ = µ
0H
1: µ 6= µ
0Evidence of µ µ
0or µ µ
0counts against H
0. I Left-Tailed:
H
0: µ ≥ µ
0H
1: µ < µ
0Evidence of µ µ
0counts against H
0, but evidence of µ µ
0counts in favor of it. Sometimes H
0is written as an equality even when H
1is an inequality, which may be confusing.
I Right-Tailed: The direction of the inequalities is reversed.
2. Set Level of Significance
Significance Level
The probability of rejecting the null hypothesis when it is true.
I The level of significance is designated α — It measures the risk of incorrectly rejecting the null hypothesis — a Type-I error.
I The level of significance is related to the level of confidence:
significance = 1 − confidence
I Significance levels of 0.01 , 0.05 , and 0.10 are popular.
– α = 0.001 : high-risk medical procedures.
– α = 0.01 : quality assurance.
– α = 0.05 : consumer research projects.
– α = 0.10 or higher: political polling.
2. Set Level of Significance
Type-I Error
Incorrectly rejecting the null hypothesis H
0, when the null is true.
Denoted α . Example: Medical False Positive.
Type-II Error
Incorrectly not rejecting the null hypothesis H
0, when the null is actually false. Denoted β . Example: Medical False Negative.
I Example of Type-I Error: A medical test suggests a patient has a disease when — in fact — the patient does not have the disease.
I Example of Type-II Error: A medical test suggests a patient does
not have a disease when — in fact — the patient does.
3. Identify the Test Distribution
Test Statistic
A value, determined from sample information, used to determine whether to reject the null hypothesis. The sampling distribution of the test statistic, under the null hypothesis, is used to calculate critical values and p -values that form the basis of a probability- based decision to reject (or not) the null hypothesis.
I Two classes of test statistics used to test hypotheses about the population mean are:
– Standard Normal z -statistic: used when the sampling distribution is normal with standard deviation σ known and constant.
– Student t -statistic: used when the sampling distribution is normal with standard deviation sn−1 estimated from a single sample.
4. Identify Acceptance/Rejection Region
Critical value
A value that separates acceptance and rejection regions. For symmetric, two-tailed tests, there are two critical values set at equal distance to the center of the distribution and the rejection regions consists of the left and right tails.
I If the null hypothesis is true, the probability of sampling values in the rejection region is smaller than the chosen level of significance.
I The size of the regions depends on the selected significance level.
The smaller the significance level, the smaller the rejection region
and the more “demanding” the test is. For instance, a significance
level of α = 0.01 (confidence level of 99% ) is associated with a
greater critical value — and a smaller and more distant rejection
region — than a level of α = 0.10 (confidence level of 90% ).
4. Identify Acceptance/Rejection Region
-1.960 1.960
Reject H₀ Type-I error
in left tail α/2 = 0.025
Reject H₀ Type-I error
in right tail α/2 = 0.025 Cannot
Reject H₀
0
-3 -2 -1 0 1 2 3
density
Standard normal z-distribution two-tailed, α = 0.05
Acceptance/Rejection Region
Example: A two-tailed test for the standard normal z distribution: The null hypothesis H
0is rejected if the sample statistic falls into either one of the tails. Each tail contains half of the possible Type-I errors, α/2 . The critical values for a 95% confidence interval for the z distribution are z
α/2≈ −1.960 and z
1−α/2≈ 1.960 .
5. Compute the Test Statistic
I Example: H
0: µ = µ
0, H
1: µ 6= µ
0, σ is known, α = 0.05 . This is a two-tailed test. Under H
0, the sample mean x follows the normal distribution with mean µ
0and standard error σ/ √
n . The test statistic is the z -score:
z = x − µ
0σ/ √
n
and follows the z -distribution N (0, 1) .
I A test statistic of z = 4.0 lies in the rejection region. If the null is true, the probability that a z -score at least as large is drawn is very small. Thus, if the z -score were z = 4.0 , we would reject H
0. I A test statistic of z = 0.5 lies outside the rejection region, In this
case, we cannot reject H
0— maybe the null is true, maybe our
test lacks power.
6. Interpret the Results
I Humility is the key to a correct interpretation.
I If the sample data leads you to reject the null hypothesis, there are several possible explanations:
1.
The null is indeed actually false.
2.
The null is true, but unfortunately the sample used was an outlier.
I If the sample data fails to reject the null hypothesis, there are sev- eral possible explanations, including:
1.
The null is indeed actually true.
2.
The null is false, but the test does not have enough power.
Example: The sample size is too small.
I Another possibility is that the selected significance level is too small (more likely to reject) or too large (more likely not to reject).
I Yet another possibility is that the sampling procedure was flawed, e.g. dependent samples, finite population (sampling without re- placement), sampling bias.
Practice: Two-Tailed Hypothesis Tests
I Example: Output at a plant typically follows a normal probability distribution with a mean of 200 and a standard deviation of 16 . New techniques are adopted that could have improved/disrupted production. A sample of size 50 is drawn with mean of 203.5 . Test whether mean output has changed, at the 0.01 significance level.
1. State the null & alternate hypotheses:
H
0: µ = 200 H
1: µ 6= 200
This is a two-tailed test because the alternate hypothesis does not
explicitly state a direction — any evidence that mean production
is significantly greater than 200 or significantly less than 200 will
stack up against the null hypothesis.
Practice: Two-Tailed Hypothesis Tests
2. Set the level of significance:
α = 0.01
The probability of committing a Type-I error — the probability of incorrectly rejecting a true null hypothesis. The significance level α = 0.01 sets a 99% confidence interval.
3. Identify the test distribution: The standard deviation is known:
σ = 16 . By the central limit theorem, the test statistic z has a standard normal distribution N (0, 1) .
4. Identify the rejection region:
Half of the probability α is located in each tail. With α = 0.01 , the area where H
0is not rejected is 0.99 . For a two-tailed test, the z -distribution yields the critical value z
α/2≈ 2.576 .
Confidence Intervals,c
80% 90% 95% 98% 99% 99.9%
Level of Significance for One-Tailed Test, !
0.10 0.05 0.025 0.01 0.005 0.0005
Level of Significance for Two-Tailed Test, !
0.20 0.10 0.05 0.02 0.01 0.001
71 1.294 1.667 1.994 2.380 2.647 3.433
72 1.293 1.666 1.993 2.379 2.646 3.431
73 1.293 1.666 1.993 2.379 2.645 3.429
74 1.293 1.666 1.993 2.378 2.644 3.427
75 1.293 1.665 1.992 2.377 2.643 3.425
76 1.293 1.665 1.992 2.376 2.642 3.423
77 1.293 1.665 1.991 2.376 2.641 3.421
78 1.292 1.665 1.991 2.375 2.640 3.420
79 1.292 1.664 1.990 2.374 2.640 3.418
80 1.292 1.664 1.990 2.374 2.639 3.416
81 1.292 1.664 1.990 2.373 2.638 3.415
82 1.292 1.664 1.989 2.373 2.637 3.413
83 1.292 1.663 1.989 2.372 2.636 3.412
84 1.292 1.663 1.989 2.372 2.636 3.410
85 1.292 1.663 1.988 2.371 2.635 3.409
86 1.291 1.663 1.988 2.370 2.634 3.407
87 1.291 1.663 1.988 2.370 2.634 3.406
88 1.291 1.662 1.987 2.369 2.633 3.405
89 1.291 1.662 1.987 2.369 2.632 3.403
90 1.291 1.662 1.987 2.368 2.632 3.402
91 1.291 1.662 1.986 2.368 2.631 3.401
92 1.291 1.662 1.986 2.368 2.630 3.399
93 1.291 1.661 1.986 2.367 2.630 3.398
94 1.291 1.661 1.986 2.367 2.629 3.397
95 1.291 1.661 1.985 2.366 2.629 3.396
96 1.290 1.661 1.985 2.366 2.628 3.395
97 1.290 1.661 1.985 2.365 2.627 3.394
98 1.290 1.661 1.984 2.365 2.627 3.393
99 1.290 1.660 1.984 2.365 2.626 3.392
100 1.290 1.660 1.984 2.364 2.626 3.390
120 1.289 1.658 1.980 2.358 2.617 3.373
140 1.288 1.656 1.977 2.353 2.611 3.361
160 1.287 1.654 1.975 2.350 2.607 3.352
180 1.286 1.653 1.973 2.347 2.603 3.345
200 1.286 1.653 1.972 2.345 2.601 3.340
! 1.282 1.645 1.960 2.326 2.576 3.291
Student’s t Distribution (concluded )
(degreesdf freedom)of (continued )
Areas under the Normal Curve
z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.0 0.0000 0.0040 0.0080 0.0120 0.0160 0.0199 0.0239 0.0279 0.0319 0.0359
0.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.0636 0.0675 0.0714 0.0753
0.2 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987 0.1026 0.1064 0.1103 0.1141
0.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.1406 0.1443 0.1480 0.1517
0.4 0.1554 0.1591 0.1628 0.1664 0.1700 0.1736 0.1772 0.1808 0.1844 0.1879
0.5 0.1915 0.1950 0.1985 0.2019 0.2054 0.2088 0.2123 0.2157 0.2190 0.2224
0.6 0.2257 0.2291 0.2324 0.2357 0.2389 0.2422 0.2454 0.2486 0.2517 0.2549
0.7 0.2580 0.2611 0.2642 0.2673 0.2704 0.2734 0.2764 0.2794 0.2823 0.2852
0.8 0.2881 0.2910 0.2939 0.2967 0.2995 0.3023 0.3051 0.3078 0.3106 0.3133
0.9 0.3159 0.3186 0.3212 0.3238 0.3264 0.3289 0.3315 0.3340 0.3365 0.3389
1.0 0.3413 0.3438 0.3461 0.3485 0.3508 0.3531 0.3554 0.3577 0.3599 0.3621
1.1 0.3643 0.3665 0.3686 0.3708 0.3729 0.3749 0.3770 0.3790 0.3810 0.3830
1.2 0.3849 0.3869 0.3888 0.3907 0.3925 0.3944 0.3962 0.3980 0.3997 0.4015
1.3 0.4032 0.4049 0.4066 0.4082 0.4099 0.4115 0.4131 0.4147 0.4162 0.4177
1.4 0.4192 0.4207 0.4222 0.4236 0.4251 0.4265 0.4279 0.4292 0.4306 0.4319
1.5 0.4332 0.4345 0.4357 0.4370 0.4382 0.4394 0.4406 0.4418 0.4429 0.4441
1.6 0.4452 0.4463 0.4474 0.4484 0.4495 0.4505 0.4515 0.4525 0.4535 0.4545
1.7 0.4554 0.4564 0.4573 0.4582 0.4591 0.4599 0.4608 0.4616 0.4625 0.4633
1.8 0.4641 0.4649 0.4656 0.4664 0.4671 0.4678 0.4686 0.4693 0.4699 0.4706
1.9 0.4713 0.4719 0.4726 0.4732 0.4738 0.4744 0.4750 0.4756 0.4761 0.4767
2.0 0.4772 0.4778 0.4783 0.4788 0.4793 0.4798 0.4803 0.4808 0.4812 0.4817
2.1 0.4821 0.4826 0.4830 0.4834 0.4838 0.4842 0.4846 0.4850 0.4854 0.4857
2.2 0.4861 0.4864 0.4868 0.4871 0.4875 0.4878 0.4881 0.4884 0.4887 0.4890
2.3 0.4893 0.4896 0.4898 0.4901 0.4904 0.4906 0.4909 0.4911 0.4913 0.4916
2.4 0.4918 0.4920 0.4922 0.4925 0.4927 0.4929 0.4931 0.4932 0.4934 0.4936
2.5 0.4938 0.4940 0.4941 0.4943 0.4945 0.4946 0.4948 0.4949 0.4951 0.4952
2.6 0.4953 0.4955 0.4956 0.4957 0.4959 0.4960 0.4961 0.4962 0.4963 0.4964
2.7 0.4965 0.4966 0.4967 0.4968 0.4969 0.4970 0.4971 0.4972 0.4973 0.4974
2.8 0.4974 0.4975 0.4976 0.4977 0.4977 0.4978 0.4979 0.4979 0.4980 0.4981
2.9 0.4981 0.4982 0.4982 0.4983 0.4984 0.4984 0.4985 0.4985 0.4986 0.4986
3.0 0.4987 0.4987 0.4987 0.4988 0.4988 0.4989 0.4989 0.4989 0.4990 0.4990
Example:
If z = 1.96, then P(0 to z) = 0.4750.
z 0 1.96
0.4750 Lin01803_endsheet.qxd 10/2/10 8:07 AM Page 3
Practice: Two-Tailed Hypothesis Tests
-2.576 2.576
0
-4 -3 -2 -1 0 1 2 3 4
density
Density
Probability Reject (1%) Accept (99%) Standard normal z-distribution
Significance level: α = 0.01
Acceptance/Rejection Region
Practice: Two-Tailed Hypothesis Tests
5. Compute the test statistic from the sample data:
The z -score is:
z = x − µ
0σ/ √
n = 203.5 − 200 16/ √
50 ≈ 1.547 6. Interpret the results
The z -score lies inside the “acceptance” region:
−2.576 < 1.547 < 2.576
We cannot reject the null H
0that µ = 200 against the alternate
µ 6= 200 . There isn’t enough evidence to conclude that the pop-
ulation mean has changed — The observed difference between
the population mean of 200 and the sample mean of 203.5 could
simply be due to chance.
Practice: One-Tailed Hypothesis Tests
I Example: Test for an increase in the population mean.
I State the null & alternate hypotheses:
H
0: µ ≤ 200 H
1: µ > 200
I Identify the distribution & acceptance region: The variance is known, σ = 16 , so the test statistic follows the standard normal z -distribution. The critical value for α = 0.01 is z
1−α≈ 2.326 . I Interpret the results: Since z ≈ 1.547 < 2.326 , we cannot reject
the null H
0that µ ≤ 200 against the alternate H
1that µ > 200 . I One-Tailed Versus Two-Tailed Tests: In a one-tailed test, the en-
tire rejection region lies in one tail. In a two-tailed test, the rejection region is split equally between the two tails, with significance α/2 in each tail. The critical values for one-tailed and two-tailed tests are therefore different, and could lead to different conclusions. In this example, however, both tests fail to reject the null.
Practice: One-Tailed Hypothesis Tests
2.326 0
-4 -3 -2 -1 0 1 2 3 4
density
Density
Probability Reject (1%) Accept (99%) Standard normal z-distribution
Significance level: α = 0.01
Acceptance/Rejection Region
Population Variance Unknown
I Identify the test distribution: When the population standard de- viation is unknown, the sample estimate s
n−1is used to estimate the standard error. Under the null hypothesis, the test statistic has a Student t -distribution with n − 1 degrees of freedom.
I Compute the test statistic: The t -statistic is:
t = x
n− µ
0s
n−1/ √
n
I The Student t -distribution is continuous, hump-shaped, and sym- metrical. It has thicker tails than the standard normal distribution
— the density is lower in the middle of the distribution and declines less rapidly towards the tails. The number of degrees of freedom, n − 1 , is a parameter that defines a family of t -distributions.
I While the sample mean x is indexed by n , the sample standard deviation s is indexed by n − 1 to emphasize the importance of the Bessel-correction and the role of degrees of freedom.
Practice: Population Variance Unknown
I Example: The Claims Department reported the mean cost to pro- cess a claim as $60 . After a restructuring, a random sample of 26 claims is taken to test for a reduction in mean cost:
45 49 62 40 43 61 48 53 67 63 78 64 48 54 51 56 63 69 5 8 51 58 59 56 57 38 76 I State the null & alternate hypotheses:
H
0: µ ≥ 60 H
1: µ < 60 I Set the level of significance: α = 0.01
I Identify the test distribution: The population variance is un-
known, so the sample estimate will be used to compute the test
statistic. The test statistic has a t -distribution with ν = n − 1 = 25
degrees of freedom. The critical value is t
n−1,1−α≈ 2.485 .
Practice: Population Variance Unknown
I Compute the test statistic: The sample mean and standard deviation are:
x
26= 56.423 s
25= 10.041 The t -statistic is therefore:
t = x
26− µ
0s
25/ √
n = 56.423 − 60 10.041/ √
26 ≈ −1.816
I Interpret the results: The t -statistic lies inside the (one-sided)
“acceptance” region:
−2.485 < −1.816
We cannot therefore reject the null H
0that µ ≥ 60 against the alternate H
1that µ < 60 .
I You may check that we cannot reject µ = 60 against µ 6= 60 .
Practice: Population Variance Unknown
-2.485 Reject H₀ Type-I error
in left tail α = 0.01
Cannot Reject
H₀
0
-4 -3 -2 -1 0 1 2 3 4
density
Student t-distribution (df = 25) one-tailed, α = 0.01
Acceptance/Rejection Region
p -values in Hypothesis Testing
p -value
The probability of drawing a sample value as extreme as — or more extreme than — the sample value, under the assumption that the null hypothesis is true.
I In traditional hypothesis testing, we compare the test statistic to a critical value and either reject or fail to reject the null hypothesis.
I If the p -value is smaller than the significance level, the null is re- jected. In addition, the p -value gives insight into the probability of a Type-I error. A very small p -value suggests the null has a very small probability of being true — that is, either a rare event has happened or the null hypothesis is false. A large p -value suggests that the null is very probably false.
I The two-tailed p -value is twice as large as the one-tailed p -value.
p -values in Hypothesis Testing
The p -value gives a way to express the probability that H
0is false.
p < 0.10 some evidence H
0false p < 0.05 strong evidence H
0false p < 0.01 very strong evidence H
0false p < 0.001 extremely strong evidence H
0false
The two-tailed p -value is:
P(x
n≤ −x
n) + P(x
n≥ x
n)
If the null hypothesis is actually false, Type-I errors cannot occur and
the interpretation of the p -value is unclear.
p -values in Hypothesis Testing
Suppose a sample z -score under the null hypothesis H
0: µ = µ
0is:
z
n= x
n− µ
0σ/ √
n ≈ −1.282
for known σ . The p -value associated with this sample estimates the probability that other sample means be located in the tails:
-1.282 1.282
p-value in left tail p/2 = 0.10
p-value in right tail
p/2 = 0.10 Expect 80%
of samples to be less
extreme
standardized sample mean
-3 -2 -1 0 1 2 3
density
Two-Sided p-value for Standard Normal
Hypothesis Testing: p-value
Power in Hypothesis Testing
I Type-I error α :
The null hypothesis H
0is true and — incorrectly — rejected.
I Type-II error β :
The null hypothesis H
0is false and — incorrectly — not rejected.
I Confidence 1 − α :
The probability of — correctly — not rejecting a true H
0— The probability of not making a Type-I error.
I Power 1 − β :
The probability of — correctly — rejecting a false H
0— The prob- ability of not making a Type-II error.
In practice, the power of a test cannot be computed since the true
distribution is unknown. Notwithstanding, the following analysis
shows that a one-tailed test (if appropriately directed) is more pow-
erful than a two-tailed test: Under a one-tailed test, the area asso-
ciated with β is reduced and thus power 1 − β is increased.
Power in Hypothesis Testing
α/2 β
μ₀ z μ₁
-z 0.0
0.1 0.2 0.3 0.4
Density
Fact: μ = μ₁ (unknown)
Hypothesis: H₀: μ = μ₀ vs H₁: μ ≠ μ₀
Power of a Test
Significance: α (Probability of Type-I error) Power: 1 - β (1- Probability of Type-II error)
In blue, the hypothetical distribution inferred from a sample. In red, the true — unknown
— distribution. Any sample mean between the critical values
−z
andz would lead to not rejecting H0. The left-tail of the true distribution measures the probability of a Type-II error,β. The power of the test is the complement1 − β
.Power in Hypothesis Testing
α β
μ₀ z μ₁
0.0 0.1 0.2 0.3 0.4
Density
Fact: μ = μ₁ (unknown)
Hypothesis: H₀: μ ≤ μ₀ vs H₁: μ > μ₀
Power of a Test
Significance: α (Probability of Type-I error) Power: 1 - β (1- Probability of Type-II error)
In blue, the hypothetical distribution. In red, the true distribution. Any sample mean smaller than the critical value z would lead to not rejecting H0. The left-tail of the true distribution measures the probability of a Type-II error,β. The power of the test is the complement
1 − β
. One-tailed tests can be more powerful than two-tailed tests.Summary
1. The objective of hypothesis testing is to verify the validity of a statement about a population parameter.
2. Hypothesis testing can be broken down into steps:
1
State the null hypothesis (H
0) and the alternate hypothesis (H
1).
2
Set the desired significance level.
3
Identify the test distribution.
4
Identify the acceptance/rejection regions.
5
Compute the test statistic from the sample.
6
Interpret the results.
3. In a two-tailed test, the rejection region is evenly split between the left and right tails. In a one-tailed test, the entire rejection region is either on the left or on the right.
4. A p -value measures the probability that, under the null hypothesis H
0, a sample could be drawn that would be “more extreme” than the observed sample value.
Summary
5. To test a hypothesis about a population mean using a single sample, two cases arise: If the population standard deviation σ is known, the z -score follows the standard normal distribution:
z = x − µ σ/ √
n
6. If the population standard deviation is not known, the Bessel-corrected sample estimate s is used to compute a t -statistic, which follows the Student t -distribution:
t = x − µ s/ √
n
Summary
7. The major characteristics of the Student t -distribution are:
a.
It is a continuous distribution.
b.
It is hump-shaped and symmetrical.
c.
It has denser tails than the standard normal distribution.
d.