HUMBEHV 3ST3
Normal Distribution & Hypothesis Testing
Prof. Patrick Bennett
1
Part 3 Hypothesis Testing
67
Duguid & Goncola (2012)
• Examined effect of perceived power on estimate of own height
• Subjects asked to participate in manager-employee scenario
• Randomly assigned to 2 groups:
- group A told they would play role of manager - group B told they would play role of employee
• All subjects asked to report their height
- Experimenter recorded diff between actual/reported height
• Question: What are independent & dependent variables?
• Independent Variable == role (manager vs employee)
• Dependent Variable == height difference (reported - actual)
Duguid & Goncola (2012)
•Results:
-Group A: over-estimated height (M = 0.66 in) -Group B: under-estimated height (M = -0.21 in) -Difference Between Groups (MA-MB): 0.87 in
•Two possible explanations, or hypotheses, about group difference:
-H0: true difference = 0 [observed difference is due to sampling error (i.e., chance)]
-H1: true difference ≠ 0 [observed difference is due to effect of perceived power/status]
•Null Hypothesis Significance Testing:
-determine if data are unusual if H0 is true
‣ assume H0 is true (true group difference = zero)
‣ calculate probability of obtaining group difference at least as large as the observed difference
‣ if probability is very low, then our observation is unusual (when H0 is true)
‣ then… we might reject H0 in favour of H1 (that true group difference is NOT zero)
When H0 is true (i.e., when true group difference is zero), what is the probability of seeing a group difference at least as large as ±0.87?
To answer this question, we need to know how a true group difference of zero is affected by sampling error.
In other words: What is the sampling distribution of the group difference when the true difference is zero?
70
Sampling Distribution of Mean
Mean of sample means depends on population mean of scores
MEAN of Sampling Distribution equals mean of scores (mu) mu=.667, sigma=.88, n=50
Sample Mean
Density
0.0 0.5 1.0 1.5 2.0 2.5
0.00.51.01.52.02.53.03.5
mu=1.667, sigma=.88, n=50
Sample Mean
Density
0.0 0.5 1.0 1.5 2.0 2.5
0.00.51.01.52.02.53.03.5
Two distributions of sample means (n=50)
parameters of distributions of scores
71
Sampling Distribution of Mean
Variance of sampling distribution depends on population variance
VAR = (sigma2)/n mu=.667, sigma=.88, n=50
Sample Mean
Density
-0.5 0.0 0.5 1.0 1.5
0.00.51.01.52.02.53.03.5
mu=0.667, sigma=1.8, n=50
Sample Mean
Density
-0.5 0.0 0.5 1.0 1.5
0.00.51.01.52.02.53.03.5
parameters of
distributions of scores
Standard Error of the Mean (SEM)
• Sampling Distribution: probability/frequency distribution of a sample statistic
• Sampling Error refers to variation of a statistic (e.g., Mean) across samples
• Standard Error of the Mean (SEM) is the standard deviation of the sampling distribution (of the mean)
SEM = σ
¯Y= σ
scores2/n
Duguid & Goncola experiment examined difference between two group means.
How does the difference between means vary across samples/experiments?
(What is the sampling distribution of the difference between means?)
74
Sampling Distributions of Sums & Differences
Group 1 (mu = 0.25, var=1)
Sample Mean
Frequency
-1.0 -0.5 0.0 0.5 1.0
0200400600800100012001400
Group 2 (mu = -0.25, var=1)
Sample Mean
Frequency
-1.0 -0.5 0.0 0.5 1.0
0200400600800100012001400
muDiff = mu1 - mu2 varDiff = var1 + var2
Difference Between Group Means (mu=0.5, var=2)
Group Difference
Frequency
-1.0 -0.5 0.0 0.5 1.0
0500100015002000
Sum of Group Means (mu=0, var=2)
Group Difference
Frequency
-1.0 -0.5 0.0 0.5 1.0
0500100015002000
muSum = mu1 + mu2 varSum = var1 + var2
Sampling distributions of sums & differences between group means can be predicted from sampling distributions of individual group means
75
Sampling Distributions & Hypothesis Testing
• If we know the mean & variance of the population of scores, then we can calculate the mean & variance of the distribution of sample means.
- Often we do not know the population mean & variance, but we can estimate them from our data.
• According to the Central Limit Theorem, the sample means will be distributed normally provided sample size (n) is large enough.
• Hence, our sample provides sufficient information to derive an estimate of how sample means are distributed when the null hypothesis is true.
Duguid & Goncola (2012)
•Sampling Distribution of Group Difference assuming that true group difference is zero: (normal, mean = 0, VAR = 0.031*)
•0.4 & -0.4 correspond to z scores 2.3 & -2.3 - (e.g., 0.4/SEM = 0.4/.176 = 2.3)
•A group difference greater than 0.4 or less than -0.4 is unusual (i.e., p = 0.02)
•Observed group difference of 0.87 is extremely unusual (p < .001)
Sampling Distribution Assuming D=0
Group Difference
Frequency
-0.6 -0.4 -0.2 0.0 0.2 0.4 0.6
0100200300400
* estimated VAR of scores = 0.882
* estimated VAR of group means = (0.882)/50
* estimated VAR of mean difference = (0.882)/50 + (0.882)/50
* SEM (group difference) = sqrt(0.031) = 0.176
Duguid & Goncola (2012)
• Evaluate Null Hypothesis with z scores
• If Null Hypothesis is True, group difference distribution:
- Mean (A - B) = 0
- Variance (A - B) = Var(M
A) + Var(M
B)
‣ = (0.88
2)/n + (0.88
2)/n = 2 x {(.88
2)/50} = 0.03098 - Shape of Distribution is Normal (Central Limit Theorem)
‣ N(μ,σ
2) = N(0, 0.03098)
• Convert observed difference (D=0.87) to z score:
- z = (.87 - 0)/SQRT{0.03098} = (.87 - 0)/0.176 = 4.94
78
Duguid & Goncola (2012)
p(z > 4.94) < 0.0001 p(z < -4.94) < 0.0001
probability of finding a difference at least as extreme as ±0.87, assuming true difference is zero, is p < 0.0002
79
Duguid & Goncola (2012)
• Observed sample difference D = 0.87
• Hypothesis Testing:
- H0: True population D = 0 - H1: True population D ≠ 0
• Using Normal Distribution (z scores):
- {p(D > +0.87 | H0 is True) + p(D < -0.87 | H0 is True)} < 0.0002
• Using a rejection level (significance level) of p < 0.01 - we reject H0 in favour of H1
One-tailed vs Two-tailed Tests
• In example, we considered probability of observing group differences that were greater than and less than the mean
• However, our original hypothesis was directional - increased power leads to over estimation of height - group A mean should be greater than group B mean - M
A> M
B, so difference (M
A-M
B) > 0
• Can we incorporate directional predictions into hypothesis testing?
Duguid & Goncola (2012)
• Observed sample difference d = 0.87
• Hypothesis Testing:
- H0: True population d ≤ 0 - H1: True population d > 0
- N.B. Only large POSITIVE differences are unlikely if H0 is true
• Using Normal Distribution (z scores):
- p(D > 0.87 | H0 is True) < 0.0001 [p is half of previous value]
• Using a rejection level (significance level) of p < 0.01
- we reject H0 in favour of H1… that true population difference D > 0
• N.B. If we had observed a difference d = -0.87 we would NOT reject H0 directional (one-tailed) test
82
General Strategy
-4 -2 0 2 4
0.00.10.20.30.4
z
density
-4 -2 0 2 4
0.00.10.20.30.4
z
density
reject H0 if z exceeds critical values of z (reject H0 if p is smaller than some critical value)
2-tailed test 1-tailed test
critical z values critical z value
z = ¯Y − μ σ ¯Y
83
Null Hypothesis Testing
• 2-Tailed test
- H0: true group difference is zero - H1: true group difference is not zero
• 1-Tailed test
- H0: true group difference is less than or equal to zero - H1: true group difference is greater than zero
• N.B. H0 & H1 are mutually exclusive and exhaustive: either H0 or H1 is true
• Compute probability of group difference at least as large as ours when H0 is true - p( “our result” given “null hypothesis is true”)… p(A | B) == p(A) given B
- p( “our result” given “null hypothesis is true”) ≠ p(“null hypothesis is true” given “our result”)
• If p is less than some small value, we may reject H0 in favour of H1
Null Hypothesis Testing: Possible Outcomes
Reject H0 Fail to Reject H0
H0 is TRUE Error
(Type I) Correct
H0 is FALSE Correct Error
(Type II) Decision Regarding H0
St at e of th e W or ld
Type I and Type II Errors
• Decisions to reject or not reject H0 inevitably lead to errors
• Type I error: we reject H0 when it really is true
- false alarm, or false positive: we conclude there is an effect or a correlation or a difference between groups when there is no such thing
• Type II error: we fail to reject H0 when it really is false
- a miss, or false negative: we fail to detect a real effect or correlation or group difference
86 87
Type I Error
-4 -2 0 2 4
0.00.10.20.30.4
z
density
Assuming H0 is true and true group difference is zero, we expect to obtain z scores beyond ± 2 approximately 5%
of the time
Depends on rejection/significance level
When significance level (alpha) is p=0.05, Type I error occurs 5% of the time Type I error rate = significance level (alpha)
Type I Error
-4 -2 0 2 4
0.00.10.20.30.4
z
density
Assuming H0 is true and true group difference is zero, we expect to obtain z scores beyond ± 2.6 approximately 1%
of the time
Depends on rejection/significance level
When significance level (alpha) is p=0.01, Type I error occurs 1% of the time Type I error rate = significance level (alpha)
Type I Error
-4 -2 0 2 4
0.00.10.20.30.4
z
density
Assuming H0 is true and true group difference is zero, we expect to obtain z scores beyond +2.6 approximately 0.5% of the time
When using same rejection level, error rate is different for 1-tailed and 2-tailed tests
In a 1-tailed test, extreme scores in only one direction are used to reject H0, and therefore the Type I error rate is 1/2 the rate of a 2-tailed test.
90
Type I Error
-4 -2 0 2 4
0.00.10.20.30.4
z
density
Assuming H0 is true and true group difference is zero, we expect to obtain z scores beyond +2.3 approximately 1%
of the time.
H0 rejection criterion is reduced in 1-tailed test to maintain Type I error rate of 1%
To maintain Type I error rate at p=0.01 in a 1-tailed test, the criterion for rejecting H0 is reduced from 2.6 to 2.3 N.B. This makes it easier to reject H0 while keeping Type I error constant.
91
Null Hypothesis Testing: Possible Outcomes
Reject H0 Fail to Reject H0 H0 is TRUE Type I Error
p = 𝛼
Correct p = 1- 𝛼
H0 is FALSE Correct Error
(Type II) Decision Regarding H0
St at e of th e W or ld
Type I and Type II Errors
• Decisions to reject or not reject H0 inevitably lead to errors
• Type I error: we reject H0 when it really is true
- false alarm or false positive: we conclude there is an effect or a correlation or a difference between groups when there is no such thing
• Type II error: we fail to reject H0 when it really is false
- a miss, or false negative: fail to detect a real effect or correlation or group difference - Beta (ß): probability of failing to reject a false H0
‣ N.B. This Beta is NOT the same as Beta in regression
Sampling distribution of z for group difference
(when true difference is zero)
𝛼/2 𝛼/2
2-tailed test: 𝛼 = 0.05 critical z values = ±1.96 distribution of z when
H0 is TRUE
-1.0 -0.5 0.0 0.5 1.0
0.00.51.01.52.0
group difference
density
94
Duguid & Goncola (2012)
•Evaluate Null Hypothesis with z scores
•If Null Hypothesis is True, group difference distribution:
- Mean (A - B) = 0
- Variance (A - B) = Var(MA) + Var(MB)
‣ = (0.882)/n + (0.882)/n = 2 x {(.882)/50} = 0.03098 - Shape of Distribution is Normal (Central Limit Theorem)
‣ N(μ,σ2) = N(0, 0.03098)
N.B. We can calculate sampling distribution for alternative hypothesis (H1) if we assume that the true group difference is some non-zero value
95
Sampling distributions of group difference
(when true difference is zero and true difference is 0.35)
𝛼/2 𝛼/2
2-tailed test: 𝛼 = 0.05 critical z values = ±1.96 sampling distribution
when H0 is TRUE sampling distribution when
H0 is FALSE (u = 0.35)
When H0 is false and u=0.35, the probability of obtaining a z score
between z=±1.96 is β≈0.5
-1.0 -0.5 0.0 0.5 1.0
0.00.51.01.52.0
group difference
density
-1.0 -0.5 0.0 0.5 1.0
0.00.51.01.52.0
group difference
density
Beta depends on mean of H1 distribution
sampling distribution when H0 is FALSE (uD = 0.35)
β≈0.5
β≈0.84 β≈0.16
Type II errors are less likely when true group difference is large
-1.0 -0.5 0.0 0.5 1.0
0.00.51.01.52.0
group difference
density
sampling distribution when H0 is FALSE (uD = 0.18)
-1.0 -0.5 0.0 0.5 1.0
0.00.51.01.52.0
group difference
density
sampling distribution when H0 is FALSE (uD = 0.53)
-1.0 -0.5 0.0 0.5 1.0
0.00.51.01.52.0
group difference
density
Beta depends on alpha
z critical = ±1.96 𝛼 = 0.05
β≈0.5
z critical = ±2.32 𝛼 = 0.01
β≈0.72
z critical = ±3.29 𝛼 = 0.001
β≈0.90 Type II errors are less likely when alpha is large
sampling distribution when
H0 is FALSE (uD = 0.35) sampling distribution when H0 is FALSE (uD = 0.35) sampling distribution when
H0 is FALSE (uD = 0.35)
-1.0 -0.5 0.0 0.5 1.0
0.00.51.01.52.0
group difference
density
-1.0 -0.5 0.0 0.5 1.0
0.00.51.01.52.0
group difference
density
-1.0 -0.5 0.0 0.5 1.0
0.00.51.01.52.0
group difference
density