1
Introduction to Hypothesis Testing
2
Hypothesis Testing
A hypothesis test is a statistical procedure that uses sample data to evaluate a hypothesis about a population
Hypothesis is stated in terms of the population Predict sample statistics based on population parameters (e.g. ≈ µ)
Select random sample from population Compare observed sample data with predicted values
3
Step 1: State the Hypotheses
The null hypothesis, H0, states that in the population there is no change, no difference, or no relationship
H0: µtreatment= constant (e.g. µ) e.g. H0: µtreatment= 100
This is read as: “The null hypothesis is that the population mean of people receiving the treatment equals 100”
H0is that the treatment had no effect
4
H
0The null hypothesis must contain an equal sign of some sort (=, ≥, ≤)
Statistical tests are designed to reject H
0, never to accept it
5
H
1: The Alternative Hypothesis
The alternative hypothesis usually takes the following form:
H1: µtreatment≠ constant (e.g. µ) e.g. H1: µtreatment≠ 100
This is read as: “The alternative hypothesis states that the population mean of people receiving the treatment does not equal 100”
H1is that the treatment had an effect
6
H
0and H
1Together, the null and alternative
hypotheses must be mutually exclusive and
exhaustiveMutual exclusion implies that H
0and H
1cannot both be true at the same time
Exhaustive implies that each of the possible
outcomes of the experiment must make
either H
0or H
1true
Step 2: Set the Decision Criteria
What sample means are consistent with H0and what sample means are consistent with H1?
Separate distribution of sample means into two sets of regions – one whose means are consistent with H0and one whose means are consistent with H1 n = 25, µ = 100, σ = 15 for graph
90 95 100 105 110
7 Sample means close to H0: high- probability values
if H0is true
Extreme, low- probability values if H0is true Extreme,
low- probability
values if H0is true
α Level
The α level (alpha level; level of significance) is a probability value that is used to define the very unlikely sample outcomes if H0is true
Psychologists usually adopt α = 0.05, although α = 0.01 and α = 0.001 are sometimes used
The critical region is composed of the extreme sample values that are very unlikely (as specified by the α level) to be obtained if H0is true
8
Critical Regions
Since we can reject H0two ways (extremely small or extremely large sample means), the α level is divided across the two tails of the distribution Find the z-score whose area above equals α / 2
z = 1.96 for α = 0.05 Find raw scores that correspond to that z score
X = 100 + 1.96 · 3 = 105.9
X = 100 – 1.96 · 3 = 94.1 90 95 100 105 110
9 Sample means close to H0: high- probability values
if H0is true
Extreme, low- probability values if H0is true, z = 1.96 Extreme,
low- probability
values if H0is true, z = -1.96
Step 3: Collect Data & Compute Sample Statistics
Randomly sample from population In this example, n = 25
Give the sample the treatment Measure the dependent variable
Calculate the z score of sample mean in the sampling distribution
In this example the sample statistics are, = 107, s
= 14; population parameters from slide 7 (IQs)
10
Step 4: Make a Decision
If the sample mean’s z- score is in the extreme tails of the sampling distribution (e.g. in the critical region), reject H0; otherwise, fail to reject H0
Critical region is z > 1.96 or z < -1.96 for α = 0.05 The example z is 2.33. It is in the critical region.
Therefore, reject H0 It is likely the case that the treatment had an effect
90 95 100 105 110
11 Sample means close to H0: high- probability values
if H0is true
Extreme, low- probability values if H0is true, z = 1.96 Extreme,
low- probability
values if H0is true, z = -1.96
= 107;
z = 2.33
Reject H
0or Fail to Reject H
0The only decisions you ever make in hypothesis testing are
Reject H0. or Fail to reject H0
No other decisions are possible
Never reject H1Never accept H1 Never accept H0
12
Type I (α) Error
A type I (or α) error occurs when a researcher rejects H
0when H
0is really true
Researcher concludes that the treatment had an effect when it did not
This should happen with a probability equal to α
13
Type II (β) Errors
A type II (or β) error occurs when a researcher fails to reject H
0when H
0is really false
Researcher concludes that there is insufficient evidence to suggest that the treatment had an effect when in fact it does have an effect This should happen with a probability equal to β
14
β
Unlike α, β is not directly set by the researcher
β depends on the sample size (n)
β depends on how much the treatment affects the dependent variable
β depends on the variability of the data β depends on α
15
16
Type-I and Type-II Errors
Ideally, we would like to minimize both Type- I and Type-II errors
This is not possible for a given sample size When we lower the αlevel to minimize the probability of making a Type-I error, the β level will rise
When we lower the βlevel to minimize the probability of making a Type-II error, the α level will rise
17
Type-I and Type-II Errors
Factors that Influence a Hypothesis Test
The size of the mean difference
The larger the mean difference is, the more likely you are to reject H0
The variability of the scores
The more variable the scores are, the less likely you are to reject H0
The number of scores in the sample
The larger the sample size, the more likely you are to reject H0
18
Assumptions of the z-Score Hypothesis Test
Random sampling
If the sample is not selected randomly from the population, it probably will not represent the population
Independent observations
σ does not change as a result of the
treatment
Distribution of sample means is normal
19
20
Directional vs Non-Directional Hypotheses
The hypotheses we have been talking about are called non-directional hypotheses because they do not specify how the population mean should differ from the constant
That is, they do not say that the population mean should be larger than the constant
They only state that the population mean should differ from the constant
Non-directional hypotheses are sometimes called two-tailed tests
21
Directional vs Non-Diretional Hypotheses
Directional hypotheses include an ordinal relation between the population mean and the constant
That is, they state that the population mean should be larger than the constant
For directional hypotheses, the H0and H1are written as:
H0: µtreatment≤constant H1: µtreatment> constant
Directional hypotheses are sometimes called one-tailed tests
1 Tailed
When performing a one tailed test, all of the critical region is in one tail of the distribution of sample means
Do not divide α by two when finding the z score for the critical region
This increases statistical power – the probability of correctly rejecting a false H0
22
1 Tailed vs. 2 Tailed
-3 -2 -1 0 1 2 3
1 Tailed
23
-3 -2 -1 0 1 2 3
2 Tailed α= .05, z = 1.65
Critical region in one tail
α=.05, z = -1.96 Critical region in two tails
α=.05, z = 1.96 Critical region in two tails
Concerns about Hypothesis Testing
Hypothesis testing focuses on the data, and not the hypothesis
When we reject H0, we should really say “This specific sample mean is very unlikely (p < .05) if the null hypothesis is true
Statistical significance ≠ practical significance The effect size can be small, but still be statistically significant if the sample size is sufficiently large
24
Effect Size
A measure of effect size is intended to provide a measurement of the absolute magnitude of a treatment effect, independent of the size of the sample(s) being used
Cohen’s d is a measure of effect size
25
Effect Size
What is the effect size for the example on slide 5?
Magnitude of d Evaluation of Effect Size
d = 0.2 Small effect
d = 0.5 Medium effect
d = 0.8 Large effect
26
This is a small effect
Statistical Power
Statistical power is the probability that a
statistical test will correctly reject a false H
0Probability that a statistical test will identify a treatment effect if one really exists
Power = 1 – β = 1 – probability of a Type II error
27
Statistical Power
Calculate before performing the study Need to know / estimate
How much the treatment changes the dependent variable
Sample size α
σ, µ
28
Statistical Power Example
How much the treatment changes the dependent variable
Researchers hypothesize that having proper nutrition during the first two years will increase IQ by 3 points (notice – 1 tailed)
µ = 100, σ = 15
Sample size
n = 25
α = .05 29
Distribution of Sample Means
If the treatment has no effect, by the central limit theorem, the distribution of sample means will have:
a mean = population mean = 100 a standard deviation = σ/√n = 15 / √25 = 3 If the treatment has the hypothesized effect, the distribution of sample means will have
a mean = population mean + effect of treatment = 100 + 3 = 103
a standard deviation = σ/√n = 15 / √25 = 3 add a constant to all scores does not change the standard deviation
30
z Score of Critical Region
This is a one-tailed test with α = .05 Consult a table to find the z with an area above equal to .05
z = 1.65
31
Statistical Power Example
32
91 94 97 100 103 106 109 112 115
z
1.65 2 1 0
Statistical Power Example
Power equals area to right of the z score for the critical region under the treatment distribution of sample means
Areas to the right of the z score for the critical region correspond to rejecting H0
Areas under the treatment distribution of sample means correspond to a false H0 Both combined correspond to rejecting a false H0= power
33
Statistical Power Example
Find the z score in the treatment distribution of sample means that is at the same location as the z score for the critical region in the no treatment distribution of sample means
ztreatment= zcritical region– zmean of treatment
zmean of treatment= (103 – 100) / 3 = 1 ztreatment= 1.65 – 1 = 0.65 Power = area above z = 0.65 Power = .26
Only about a 1 in 4 chance of observing this effect
34
Factors that Influence Power
Sample size
As sample size increases, power increases α level
As α decreases (fewer Type I errors), β increases (more Type II errors), and 1 – β (power) decreases Number of tails (directional vs non-directional)
One tailed tests have more statistical power than two tailed tests. Can you explain why?
35