Part 3 Hypothesis Testing

(1)

HUMBEHV 3ST3

Normal Distribution & Hypothesis Testing

Prof. Patrick Bennett

1

Part 3 Hypothesis Testing

67 Duguid & Goncola (2012)

• Examined effect of perceived power on estimate of own height

• Subjects asked to participate in manager-employee scenario

• Randomly assigned to 2 groups:

- group A told they would play role of manager - group B told they would play role of employee

• All subjects asked to report their height

- Experimenter recorded diff between actual/reported height

• Question: What are independent & dependent variables?

• Independent Variable == role (manager vs employee)

• Dependent Variable == height difference (reported - actual)

Duguid & Goncola (2012)

•Results:

-Group A: over-estimated height (M = 0.66 in) -Group B: under-estimated height (M = -0.21 in) -Difference Between Groups (MA-MB): 0.87 in

•Two possible explanations, or hypotheses, about group difference:

-H0: true difference = 0 [observed difference is due to sampling error (i.e., chance)]

-H1: true difference ≠ 0 [observed difference is due to effect of perceived power/status]

•Null Hypothesis Significance Testing:

-determine if data are unusual if H0 is true

‣ assume H0 is true (true group difference = zero)

‣ calculate probability of obtaining group difference at least as large as the observed difference

‣ if probability is very low, then our observation is unusual (when H0 is true)

‣ then… we might reject H0 in favour of H1 (that true group difference is NOT zero)

(2)

When H0 is true (i.e., when true group difference is zero), what is the probability of seeing a group difference at least as large as ±0.87?

To answer this question, we need to know how a true group difference of zero is affected by sampling error.

In other words: What is the sampling distribution of the group difference when the true difference is zero?

70 Sampling Distribution of Mean

Mean of sample means depends on population mean of scores

MEAN of Sampling Distribution equals mean of scores (mu) mu=.667, sigma=.88, n=50

Sample Mean

Density

0.0 0.5 1.0 1.5 2.0 2.5

0.00.51.01.52.02.53.03.5

mu=1.667, sigma=.88, n=50

Sample Mean

Density

0.0 0.5 1.0 1.5 2.0 2.5

0.00.51.01.52.02.53.03.5

Two distributions of sample means (n=50)

parameters of distributions of scores

71 Sampling Distribution of Mean

Variance of sampling distribution depends on population variance

VAR = (sigma²)/n mu=.667, sigma=.88, n=50

Sample Mean

Density

-0.5 0.0 0.5 1.0 1.5

0.00.51.01.52.02.53.03.5

mu=0.667, sigma=1.8, n=50

Sample Mean

Density

-0.5 0.0 0.5 1.0 1.5

0.00.51.01.52.02.53.03.5

parameters of

distributions of scores

Standard Error of the Mean (SEM)

• Sampling Distribution: probability/frequency distribution of a sample statistic

• Sampling Error refers to variation of a statistic (e.g., Mean) across samples

• Standard Error of the Mean (SEM) is the standard deviation of the sampling distribution (of the mean)

SEM = σ

_¯Y

= σ

_scores²

/n

(3)

Duguid & Goncola experiment examined difference between two group means.

How does the difference between means vary across samples/experiments?

(What is the sampling distribution of the difference between means?)

74 Sampling Distributions of Sums & Differences

Group 1 (mu = 0.25, var=1)

Sample Mean

Frequency

-1.0 -0.5 0.0 0.5 1.0

0200400600800100012001400

Group 2 (mu = -0.25, var=1)

Sample Mean

Frequency

-1.0 -0.5 0.0 0.5 1.0

0200400600800100012001400

muDiff = mu1 - mu2 varDiff = var1 + var2

Difference Between Group Means (mu=0.5, var=2)

Group Difference

Frequency

-1.0 -0.5 0.0 0.5 1.0

0500100015002000

Sum of Group Means (mu=0, var=2)

Group Difference

Frequency

-1.0 -0.5 0.0 0.5 1.0

0500100015002000

muSum = mu1 + mu2 varSum = var1 + var2

Sampling distributions of sums & differences between group means can be predicted from sampling distributions of individual group means

75 Sampling Distributions & Hypothesis Testing

• If we know the mean & variance of the population of scores, then we can calculate the mean & variance of the distribution of sample means.

- Often we do not know the population mean & variance, but we can estimate them from our data.

• According to the Central Limit Theorem, the sample means will be distributed normally provided sample size (n) is large enough.

• Hence, our sample provides sufficient information to derive an estimate of how sample means are distributed when the null hypothesis is true.

Duguid & Goncola (2012)

•Sampling Distribution of Group Difference assuming that true group difference is zero: (normal, mean = 0, VAR = 0.031*)

•0.4 & -0.4 correspond to z scores 2.3 & -2.3 - (e.g., 0.4/SEM = 0.4/.176 = 2.3)

•A group difference greater than 0.4 or less than -0.4 is unusual (i.e., p = 0.02)

•Observed group difference of 0.87 is extremely unusual (p < .001)

Sampling Distribution Assuming D=0

Group Difference

Frequency

-0.6 -0.4 -0.2 0.0 0.2 0.4 0.6

0100200300400

* estimated VAR of scores = 0.88²

* estimated VAR of group means = (0.88²)/50

* estimated VAR of mean difference = (0.88²)/50 + (0.88²)/50

* SEM (group difference) = sqrt(0.031) = 0.176

(4)

Duguid & Goncola (2012)

• Evaluate Null Hypothesis with z scores

• If Null Hypothesis is True, group difference distribution:

- Mean (A - B) = 0

- Variance (A - B) = Var(M

A

) + Var(M

B

)

‣ = (0.88

²

)/n + (0.88

²⁾

/n = 2 x {(.88

²

)/50} = 0.03098 - Shape of Distribution is Normal (Central Limit Theorem)

‣ N(μ,σ

²

) = N(0, 0.03098)

• Convert observed difference (D=0.87) to z score:

- z = (.87 - 0)/SQRT{0.03098} = (.87 - 0)/0.176 = 4.94

78 Duguid & Goncola (2012)

p(z > 4.94) < 0.0001 p(z < -4.94) < 0.0001

probability of finding a difference at least as extreme as ±0.87, assuming true difference is zero, is p < 0.0002

79 Duguid & Goncola (2012)

• Observed sample difference D = 0.87

• Hypothesis Testing:

- H0: True population D = 0 - H1: True population D ≠ 0

• Using Normal Distribution (z scores):

- {p(D > +0.87 | H0 is True) + p(D < -0.87 | H0 is True)} < 0.0002

• Using a rejection level (significance level) of p < 0.01 - we reject H0 in favour of H1

One-tailed vs Two-tailed Tests

• In example, we considered probability of observing group differences that were greater than and less than the mean

• However, our original hypothesis was directional - increased power leads to over estimation of height - group A mean should be greater than group B mean - M

A

> M

B

, so difference (M

A

-M

B

) > 0

• Can we incorporate directional predictions into hypothesis testing?

(5)

Duguid & Goncola (2012)

• Observed sample difference d = 0.87

• Hypothesis Testing:

- H0: True population d ≤ 0 - H1: True population d > 0

- N.B. Only large POSITIVE differences are unlikely if H0 is true

• Using Normal Distribution (z scores):

- p(D > 0.87 | H0 is True) < 0.0001 [p is half of previous value]

• Using a rejection level (significance level) of p < 0.01

- we reject H0 in favour of H1… that true population difference D > 0

• N.B. If we had observed a difference d = -0.87 we would NOT reject H0 directional (one-tailed) test

82 General Strategy

-4 -2 0 2 4

0.00.10.20.30.4

z

density

-4 -2 0 2 4

0.00.10.20.30.4

z

density

reject H0 if z exceeds critical values of z (reject H0 if p is smaller than some critical value)

2-tailed test 1-tailed test

critical z values critical z value

z = ¯Y − μ σ _¯Y

83 Null Hypothesis Testing

• 2-Tailed test

- H0: true group difference is zero - H1: true group difference is not zero

• 1-Tailed test

- H0: true group difference is less than or equal to zero - H1: true group difference is greater than zero

• N.B. H0 & H1 are mutually exclusive and exhaustive: either H0 or H1 is true

• Compute probability of group difference at least as large as ours when H0 is true - p( “our result” given “null hypothesis is true”)… p(A | B) == p(A) given B

- p( “our result” given “null hypothesis is true”) ≠ p(“null hypothesis is true” given “our result”)

• If p is less than some small value, we may reject H0 in favour of H1

Null Hypothesis Testing: Possible Outcomes

Reject H0 Fail to Reject H0

H0 is TRUE Error

(Type I) Correct

H0 is FALSE Correct Error

(Type II) Decision Regarding H0

St at e of th e W or ld

(6)

Type I and Type II Errors

• Decisions to reject or not reject H0 inevitably lead to errors

• Type I error: we reject H0 when it really is true

- false alarm, or false positive: we conclude there is an effect or a correlation or a difference between groups when there is no such thing

• Type II error: we fail to reject H0 when it really is false

- a miss, or false negative: we fail to detect a real effect or correlation or group difference

86 87

Type I Error

-4 -2 0 2 4

0.00.10.20.30.4

z

density

Assuming H0 is true and true group difference is zero, we expect to obtain z scores beyond ± 2 approximately 5%

of the time

Depends on rejection/significance level

When significance level (alpha) is p=0.05, Type I error occurs 5% of the time Type I error rate = significance level (alpha)

Type I Error

-4 -2 0 2 4

0.00.10.20.30.4

z

density

Assuming H0 is true and true group difference is zero, we expect to obtain z scores beyond ± 2.6 approximately 1%

of the time

Depends on rejection/significance level

When significance level (alpha) is p=0.01, Type I error occurs 1% of the time Type I error rate = significance level (alpha)

(7)

Type I Error

-4 -2 0 2 4

0.00.10.20.30.4

z

density

Assuming H0 is true and true group difference is zero, we expect to obtain z scores beyond +2.6 approximately 0.5% of the time

When using same rejection level, error rate is different for 1-tailed and 2-tailed tests

In a 1-tailed test, extreme scores in only one direction are used to reject H0, and therefore the Type I error rate is 1/2 the rate of a 2-tailed test.

90 Type I Error

-4 -2 0 2 4

0.00.10.20.30.4

z

density

Assuming H0 is true and true group difference is zero, we expect to obtain z scores beyond +2.3 approximately 1%

of the time.

H0 rejection criterion is reduced in 1-tailed test to maintain Type I error rate of 1%

To maintain Type I error rate at p=0.01 in a 1-tailed test, the criterion for rejecting H0 is reduced from 2.6 to 2.3 N.B. This makes it easier to reject H0 while keeping Type I error constant.

91 Null Hypothesis Testing: Possible Outcomes

Reject H0 Fail to Reject H0 H0 is TRUE Type I Error

p = 𝛼

Correct p = 1- 𝛼

H0 is FALSE Correct Error

(Type II) Decision Regarding H0

St at e of th e W or ld

Type I and Type II Errors

• Decisions to reject or not reject H0 inevitably lead to errors

• Type I error: we reject H0 when it really is true

- false alarm or false positive: we conclude there is an effect or a correlation or a difference between groups when there is no such thing

• Type II error: we fail to reject H0 when it really is false

- a miss, or false negative: fail to detect a real effect or correlation or group difference - Beta (ß): probability of failing to reject a false H0

‣ N.B. This Beta is NOT the same as Beta in regression

(8)

Sampling distribution of z for group difference

(when true difference is zero)

𝛼/2 𝛼/2

2-tailed test: 𝛼 = 0.05 critical z values = ±1.96 distribution of z when

H0 is TRUE

-1.0 -0.5 0.0 0.5 1.0

0.00.51.01.52.0

group difference

density

94 Duguid & Goncola (2012)

•Evaluate Null Hypothesis with z scores

•If Null Hypothesis is True, group difference distribution:

- Mean (A - B) = 0

- Variance (A - B) = Var(MA) + Var(MB)

‣ = (0.88²)/n + (0.88²⁾/n = 2 x {(.88²)/50} = 0.03098 - Shape of Distribution is Normal (Central Limit Theorem)

‣ N(μ,σ²) = N(0, 0.03098)

N.B. We can calculate sampling distribution for alternative hypothesis (H1) if we assume that the true group difference is some non-zero value

95 Sampling distributions of group difference

(when true difference is zero and true difference is 0.35)

𝛼/2 𝛼/2

2-tailed test: 𝛼 = 0.05 critical z values = ±1.96 sampling distribution

when H0 is TRUE sampling distribution when

H0 is FALSE (u = 0.35)

When H0 is false and u=0.35, the probability of obtaining a z score

between z=±1.96 is β≈0.5

-1.0 -0.5 0.0 0.5 1.0

0.00.51.01.52.0

group difference

density

-1.0 -0.5 0.0 0.5 1.0

0.00.51.01.52.0

group difference

density

Beta depends on mean of H1 distribution

sampling distribution when H0 is FALSE (uD = 0.35)

β≈0.5

β≈0.84 β≈0.16

Type II errors are less likely when true group difference is large

-1.0 -0.5 0.0 0.5 1.0

0.00.51.01.52.0

group difference

density

-1.0 -0.5 0.0 0.5 1.0

0.00.51.01.52.0

group difference

density

-1.0 -0.5 0.0 0.5 1.0

0.00.51.01.52.0

group difference

density

(9)

Beta depends on alpha

z critical = ±1.96 𝛼 = 0.05

β≈0.5

z critical = ±2.32 𝛼 = 0.01

β≈0.72

z critical = ±3.29 𝛼 = 0.001

β≈0.90 Type II errors are less likely when alpha is large

sampling distribution when

H0 is FALSE (uD = 0.35) sampling distribution when H0 is FALSE (uD = 0.35) sampling distribution when

H0 is FALSE (uD = 0.35)

-1.0 -0.5 0.0 0.5 1.0

0.00.51.01.52.0

group difference

density

-1.0 -0.5 0.0 0.5 1.0

0.00.51.01.52.0

group difference

density

-1.0 -0.5 0.0 0.5 1.0

0.00.51.01.52.0

group difference

density