Hypothesis testing
Hypothesis testing
•
Null hypothesis is that there is no
systematic relationship between
independent variables (IVs) and
dependent variables (DVs).
•
Research hypothesis is that any
Behavioural Science II 3
Hypothesis testing
• Whereas research hypothesis tends to be imprecise about numerical differences
between groups (e.g., difference in
reaction times), null hypothesis states
Null hypothesis versus
alternative hypothesis
•
The null hypothesis assumes that
scores for different levels of the IV
are random samples from the same
population.
•
The alternative hypothesis is that
samples come from different
Behavioural Science II 5
Null hypothesis versus
alternative hypothesis
• For any single experiment, we are bound to see a difference, just as we see a
difference between the means of two random samples in a distribution of sample means.
• If the null hypothesis is true, then
differences in mean scores are just two random samples from the same
Testing the null hypothesis
•
A statistical test assesses the
probability of obtaining a given
sample or samples of scores,
Behavioural Science II 7
Testing the null hypothesis
• If the probability is low enough (e.g.,
p<.05), then the null hypothesis is rejected in favour of the alternative (research)
hypothesis, and the IV is deemed to have a systematic effect.
• If the probability is not sufficiently low (e.g., p>.05), then the null hypothesis is not rejected but retained, and the IV is deemed to have no effect (i.e., the
Statistical significance
• Statistical significance refers to the
probability of the data obtained, given that the null hypothesis is true.
• A statistically significant result does not mean that the null hypothesis is
improbable.
• There is an ongoing gap between
Behavioural Science II 9
Hypothesis testing and
sampling distributions
•
The decision to reject or not reject
the null hypothesis usually is made
with reference to the sampling
distribution of a statistic of some
kind (e.g., z-distribution,
Example of hypothesis
testing using z-distribution
•
Null hypothesis population
parameters:
= 15
=15
•
Random sample statistics
Behavioural Science II 11
Applying formulae
• Given that z-score of 1.96 = p< .05 (two-tailed), would reject null hypothesis.
X
N
15
9
15
3
5
Z
X
X
X
110
100
5
10
Example of hypothesis
testing using t-distribution
•
Null hypothesis population
parameters:
=100
•
Random sample statistics
Mean = 110N=9
Behavioural Science II 13
Applying formulae
Given that t-scores of 2.306 (df=8) =p< .05 (two-tailed), would reject the null hypothesis.
˜
x
2
N
1
960
9
1
960
8
10.95
˜
X
˜
N
10.95
9
10.95
3
3.65
t
X
X˜
X
110
100
3.65
10
Hypothesis testing using
confidence intervals
• We reject null hypothesis when null population mean lies outside the
confidence interval.
Behavioural Science II 15
Errors in hypothesis testing
•
Given the gap between statistical and
substantive significance, a decision
based on probability to retain or
When null hypothesis is
true (Type I error)
•
When null hypothesis is true, and it
is rejected, this decision is called a
Type 1 error.
•
The probability of making such an
Behavioural Science II 17
When null hypothesis is
true (Type I error)
• If null hypothesis is true and alpha level is set at .05, then the null hypothesis will be rejected 5% of time even though it is true.
• One way to safeguard against a Type I
When null hypothesis is
false (Type II or III errors)
•
When alternative hypothesis is true,
and the statistic (mean) from
Behavioural Science II 19
Type II error
• Retaining null hypothesis when alternative hypothesis is true is called a Type II error.
• The probability of making a Type II error usually is symbolized as beta ().
• The probability of beta depends on how much the alternative hypothesis sampling distribution overlaps the retention region of the null hypothesis sampling
Type III error
• It is also possible to make a Type III error, by rejecting a null hypothesis but inferring the incorrect alternative hypothesis.
• The probability of making a Type III error usually is symbolized as gamma () and is equivalent to whatever percentage of
scores in the alternative distribution falls in the far end of the null hypothesis
Behavioural Science II 21
The power of a test
•
The probability of rejecting a false
null hypothesis and correctly
inferring the position or direction of
the alternative hypothesis with
respect to the null hypothesis.
•
Factors affecting power and error
Power is affected by
significance (alpha) level
•
Setting a less stringent significance
level increases the discriminatory
power of the statistical test and
Behavioural Science II 23
Power is affected by magnitude of
difference between sample means
•
So, increasing the difference in the
size of the mean at differing levels of
the IV increases the power of the
Power is affected by sample size
•
An increase in sample size increases
the power of the test, if the
alternative hypothesis is true.
•
This is because as sample size
Behavioural Science II 25
Effect size
•
In order to gauge the effect of the IV,
it makes sense to contrast the
difference between the population
Behavioural Science II 26
Effect size formula
•
where
•
is standard deviation of population
of dependent measure scores.
Effect
_
size
0
1Behavioural Science II 27
Judging effect sizes
•
According to Cohen (1988)
.20 = small effect size
Do we really need the null
hypothesis?
•
A significant test of the null
hypothesis does not mean the data
are not a product of chance.
•
The significant result may simply be
Behavioural Science II 29
Do we really need the null
hypothesis?
•
Better to test research hypothesis, if
know size and direction of effect.
One-tailed versus two-tailed
tests
• Conventionally reject null hypothesis if obtained z-score or t-score falls beyond certain values in either tail of the relevant sampling distribution (i.e., a two-tailed
test).
• In specific contexts, a one-tailed test
Behavioural Science II 31
One-tailed versus two-tailed
tests
• Generally, two-tailed tests are preferred to one-tailed tests.