Introduction to Hypothesis Testing. Hypothesis Testing. Step 1: State the Hypotheses

(1)

1

Introduction to Hypothesis Testing

2

Hypothesis Testing

A hypothesis test is a statistical procedure that uses sample data to evaluate a hypothesis about a population

Hypothesis is stated in terms of the population Predict sample statistics based on population parameters (e.g.  ≈ µ)

Select random sample from population Compare observed sample data with predicted values

3

Step 1: State the Hypotheses

The null hypothesis, H₀, states that in the population there is no change, no difference, or no relationship

H₀: µtreatment= constant (e.g. µ) e.g. H₀: µ_treatment= 100

This is read as: “The null hypothesis is that the population mean of people receiving the treatment equals 100”

H₀is that the treatment had no effect

(2)

4

H

₀

The null hypothesis must contain an equal sign of some sort (=, ≥, ≤)

Statistical tests are designed to reject H

₀

, never to accept it

5

H

₁

: The Alternative Hypothesis

The alternative hypothesis usually takes the following form:

H₁: µtreatment≠ constant (e.g. µ) e.g. H₁: µ_treatment≠ 100

This is read as: “The alternative hypothesis states that the population mean of people receiving the treatment does not equal 100”

H₁is that the treatment had an effect

6

H

₀

and H

₁

Together, the null and alternative

hypotheses must be mutually exclusive and

exhaustive

Mutual exclusion implies that H

₀

and H

₁

cannot both be true at the same time

Exhaustive implies that each of the possible

outcomes of the experiment must make

either H

₀

or H

₁

true

(3)

Step 2: Set the Decision Criteria

What sample means are consistent with H₀and what sample means are consistent with H₁?

Separate distribution of sample means into two sets of regions – one whose means are consistent with H₀and one whose means are consistent with H₁ n = 25, µ = 100, σ = 15 for graph

90 95 100 105 110

7 Sample means close to H0: high- probability values

if H0is true

Extreme, low- probability values if H0is true Extreme,

low- probability

values if H0is true

α Level

The α level (alpha level; level of significance) is a probability value that is used to define the very unlikely sample outcomes if H₀is true

Psychologists usually adopt α = 0.05, although α = 0.01 and α = 0.001 are sometimes used

The critical region is composed of the extreme sample values that are very unlikely (as specified by the α level) to be obtained if H₀is true

8

Critical Regions

Since we can reject H₀two ways (extremely small or extremely large sample means), the α level is divided across the two tails of the distribution Find the z-score whose area above equals α / 2

z = 1.96 for α = 0.05 Find raw scores that correspond to that z score

X = 100 + 1.96 · 3 = 105.9

X = 100 – 1.96 · 3 = 94.1 ₉₀ ₉₅ ₁₀₀ ₁₀₅ ₁₁₀

if H0is true

Extreme, low- probability values if H0is true, z = 1.96 Extreme,

low- probability

values if H0is true, z = -1.96

(4)

Step 3: Collect Data & Compute Sample Statistics

Randomly sample from population In this example, n = 25

Give the sample the treatment Measure the dependent variable

Calculate the z score of sample mean in the sampling distribution

In this example the sample statistics are,  = 107, s

= 14; population parameters from slide 7 (IQs)

10

Step 4: Make a Decision

If the sample mean’s z- score is in the extreme tails of the sampling distribution (e.g. in the critical region), reject H₀; otherwise, fail to reject H₀

Critical region is z > 1.96 or z < -1.96 for α = 0.05 The example z is 2.33. It is in the critical region.

Therefore, reject H₀ It is likely the case that the treatment had an effect

90 95 100 105 110

if H0is true

Extreme, low- probability values if H0is true, z = 1.96 Extreme,

low- probability

values if H0is true, z = -1.96

 = 107;

z = 2.33

Reject H

₀

or Fail to Reject H

₀

The only decisions you ever make in hypothesis testing are

Reject H₀. or Fail to reject H₀

No other decisions are possible

Never reject H₁

Never accept H₁ Never accept H₀

12

(5)

Type I (α) Error

A type I (or α) error occurs when a researcher rejects H

₀

when H

₀

is really true

Researcher concludes that the treatment had an effect when it did not

This should happen with a probability equal to α

13

Type II (β) Errors

A type II (or β) error occurs when a researcher fails to reject H

₀

when H

₀

is really false

Researcher concludes that there is insufficient evidence to suggest that the treatment had an effect when in fact it does have an effect This should happen with a probability equal to β

14

β

Unlike α, β is not directly set by the researcher

β depends on the sample size (n)

β depends on how much the treatment affects the dependent variable

β depends on the variability of the data β depends on α

15

(6)

16

Type-I and Type-II Errors

Ideally, we would like to minimize both Type- I and Type-II errors

This is not possible for a given sample size When we lower the αlevel to minimize the probability of making a Type-I error, the β level will rise

When we lower the βlevel to minimize the probability of making a Type-II error, the α level will rise

17

Type-I and Type-II Errors

Factors that Influence a Hypothesis Test

The size of the mean difference

The larger the mean difference is, the more likely you are to reject H0

The variability of the scores

The more variable the scores are, the less likely you are to reject H0

The number of scores in the sample

The larger the sample size, the more likely you are to reject H0

18

(7)

Assumptions of the z-Score Hypothesis Test

Random sampling

If the sample is not selected randomly from the population, it probably will not represent the population

Independent observations

σ does not change as a result of the

treatment

Distribution of sample means is normal

19

20

Directional vs Non-Directional Hypotheses

The hypotheses we have been talking about are called non-directional hypotheses because they do not specify how the population mean should differ from the constant

That is, they do not say that the population mean should be larger than the constant

They only state that the population mean should differ from the constant

Non-directional hypotheses are sometimes called two-tailed tests

21

Directional vs Non-Diretional Hypotheses

Directional hypotheses include an ordinal relation between the population mean and the constant

That is, they state that the population mean should be larger than the constant

For directional hypotheses, the H₀and H₁are written as:

H₀: µtreatment≤constant H₁: µtreatment> constant

Directional hypotheses are sometimes called one-tailed tests

(8)

1 Tailed

When performing a one tailed test, all of the critical region is in one tail of the distribution of sample means

Do not divide α by two when finding the z score for the critical region

This increases statistical power – the probability of correctly rejecting a false H₀

22

1 Tailed vs. 2 Tailed

-3 -2 -1 0 1 2 3

1 Tailed

23

-3 -2 -1 0 1 2 3

2 Tailed α= .05, z = 1.65

Critical region in one tail

α=.05, z = -1.96 Critical region in two tails

α=.05, z = 1.96 Critical region in two tails

Concerns about Hypothesis Testing

Hypothesis testing focuses on the data, and not the hypothesis

When we reject H₀, we should really say “This specific sample mean is very unlikely (p < .05) if the null hypothesis is true

Statistical significance ≠ practical significance The effect size can be small, but still be statistically significant if the sample size is sufficiently large

24

(9)

Effect Size

A measure of effect size is intended to provide a measurement of the absolute magnitude of a treatment effect, independent of the size of the sample(s) being used

Cohen’s d is a measure of effect size

25

Effect Size

What is the effect size for the example on slide 5?

Magnitude of d Evaluation of Effect Size

d = 0.2 Small effect

d = 0.5 Medium effect

d = 0.8 Large effect

26

This is a small effect

Statistical Power

Statistical power is the probability that a

statistical test will correctly reject a false H

₀

Probability that a statistical test will identify a treatment effect if one really exists

Power = 1 – β = 1 – probability of a Type II error

27

(10)

Statistical Power

Calculate before performing the study Need to know / estimate

How much the treatment changes the dependent variable

Sample size α

σ, µ

28

Statistical Power Example

How much the treatment changes the dependent variable

Researchers hypothesize that having proper nutrition during the first two years will increase IQ by 3 points (notice – 1 tailed)

µ = 100, σ = 15

Sample size

n = 25

α = .05 ²⁹

Distribution of Sample Means

If the treatment has no effect, by the central limit theorem, the distribution of sample means will have:

a mean = population mean = 100 a standard deviation = σ/√n = 15 / √25 = 3 If the treatment has the hypothesized effect, the distribution of sample means will have

a mean = population mean + effect of treatment = 100 + 3 = 103

a standard deviation = σ/√n = 15 / √25 = 3 add a constant to all scores does not change the standard deviation

30

(11)

z Score of Critical Region

This is a one-tailed test with α = .05 Consult a table to find the z with an area above equal to .05

z = 1.65

31

Statistical Power Example

32

91 94 97 100 103 106 109 112 115

z

1.65 2 1 0

Statistical Power Example

Power equals area to right of the z score for the critical region under the treatment distribution of sample means

Areas to the right of the z score for the critical region correspond to rejecting H₀

Areas under the treatment distribution of sample means correspond to a false H₀ Both combined correspond to rejecting a false H₀= power

33

(12)

Statistical Power Example

Find the z score in the treatment distribution of sample means that is at the same location as the z score for the critical region in the no treatment distribution of sample means

z_treatment= zcritical region– zmean of treatment

zmean of treatment= (103 – 100) / 3 = 1 ztreatment= 1.65 – 1 = 0.65 Power = area above z = 0.65 Power = .26

Only about a 1 in 4 chance of observing this effect

34

Factors that Influence Power

Sample size

As sample size increases, power increases α level

As α decreases (fewer Type I errors), β increases (more Type II errors), and 1 – β (power) decreases Number of tails (directional vs non-directional)

One tailed tests have more statistical power than two tailed tests. Can you explain why?

35