Hypothesis testing.

(1)

Hypothesis testing

(2)

Hypothesis testing

•

Null hypothesis is that there is no

systematic relationship between

independent variables (IVs) and

dependent variables (DVs).

•

Research hypothesis is that any

(3)

Behavioural Science II 3

Hypothesis testing

• Whereas research hypothesis tends to be imprecise about numerical differences

between groups (e.g., difference in

reaction times), null hypothesis states

(4)

Null hypothesis versus

alternative hypothesis

•

The null hypothesis assumes that

scores for different levels of the IV

are random samples from the same

population.

•

The alternative hypothesis is that

samples come from different

(5)

Null hypothesis versus

alternative hypothesis

• For any single experiment, we are bound to see a difference, just as we see a

difference between the means of two random samples in a distribution of sample means.

• If the null hypothesis is true, then

differences in mean scores are just two random samples from the same

(6)

Testing the null hypothesis

•

A statistical test assesses the

probability of obtaining a given

sample or samples of scores,

(7)

Testing the null hypothesis

• If the probability is low enough (e.g.,

p<.05), then the null hypothesis is rejected in favour of the alternative (research)

hypothesis, and the IV is deemed to have a systematic effect.

• If the probability is not sufficiently low (e.g., p>.05), then the null hypothesis is not rejected but retained, and the IV is deemed to have no effect (i.e., the

(8)

Statistical significance

• Statistical significance refers to the

probability of the data obtained, given that the null hypothesis is true.

• A statistically significant result does not mean that the null hypothesis is

improbable.

• There is an ongoing gap between

(9)

Hypothesis testing and

sampling distributions

•

The decision to reject or not reject

the null hypothesis usually is made

with reference to the sampling

distribution of a statistic of some

kind (e.g., z-distribution,

(10)

Example of hypothesis

testing using z-distribution

•

Null hypothesis population

parameters:

 = 15

=15

•

Random sample statistics

(11)

Applying formulae

• Given that z-score of 1.96 = p< .05 (two-tailed), would reject null hypothesis.





_X





N



15

9



15

3



5

Z



X





X



_X



110



100

5



10

(12)

Example of hypothesis

testing using t-distribution

•

Null hypothesis population

parameters:

=100

•

Random sample statistics

Mean = 110

N=9

(13)

Applying formulae

Given that t-scores of 2.306 (df=8) =p< .05 (two-tailed), would reject the null hypothesis.



˜





x

2



N



1



960

9



1



960

8



10.95

˜



_X





˜

N



10.95

9



10.95

3



3.65

t



X





X

˜



_X



110



100

3.65



10

(14)

Hypothesis testing using

confidence intervals

• We reject null hypothesis when null population mean lies outside the

confidence interval.

(15)

Errors in hypothesis testing

•

Given the gap between statistical and

substantive significance, a decision

based on probability to retain or

(16)

When null hypothesis is

true (Type I error)

•

When null hypothesis is true, and it

is rejected, this decision is called a

Type 1 error.

•

The probability of making such an

(17)

When null hypothesis is

true (Type I error)

• If null hypothesis is true and alpha level is set at .05, then the null hypothesis will be rejected 5% of time even though it is true.

• One way to safeguard against a Type I

(18)

When null hypothesis is

false (Type II or III errors)

•

When alternative hypothesis is true,

and the statistic (mean) from

(19)

Type II error

• Retaining null hypothesis when alternative hypothesis is true is called a Type II error.

• The probability of making a Type II error usually is symbolized as beta ().

• The probability of beta depends on how much the alternative hypothesis sampling distribution overlaps the retention region of the null hypothesis sampling

(20)

Type III error

• It is also possible to make a Type III error, by rejecting a null hypothesis but inferring the incorrect alternative hypothesis.

• The probability of making a Type III error usually is symbolized as gamma () and is equivalent to whatever percentage of

scores in the alternative distribution falls in the far end of the null hypothesis

(21)

The power of a test

•

The probability of rejecting a false

null hypothesis and correctly

inferring the position or direction of

the alternative hypothesis with

respect to the null hypothesis.

•

Factors affecting power and error

(22)

Power is affected by

significance (alpha) level

•

Setting a less stringent significance

level increases the discriminatory

power of the statistical test and

(23)

Power is affected by magnitude of

difference between sample means

•

So, increasing the difference in the

size of the mean at differing levels of

the IV increases the power of the

(24)

Power is affected by sample size

•

An increase in sample size increases

the power of the test, if the

alternative hypothesis is true.

•

This is because as sample size

(25)

Effect size

•

In order to gauge the effect of the IV,

it makes sense to contrast the

difference between the population

(26)

Effect size formula

•

where

•



is standard deviation of population

of dependent measure scores.

Effect

_

size





0





1

(27)

Judging effect sizes

•

According to Cohen (1988)

.20 = small effect size

(28)

Do we really need the null

hypothesis?

•

A significant test of the null

hypothesis does not mean the data

are not a product of chance.

•

The significant result may simply be

(29)

Do we really need the null

hypothesis?

•

Better to test research hypothesis, if

know size and direction of effect.

(30)

One-tailed versus two-tailed

tests

• Conventionally reject null hypothesis if obtained z-score or t-score falls beyond certain values in either tail of the relevant sampling distribution (i.e., a two-tailed

test).

• In specific contexts, a one-tailed test

(31)

One-tailed versus two-tailed

tests

• Generally, two-tailed tests are preferred to one-tailed tests.