Hypothesis Testing

(1)

Hypothesis Testing

Introduction to Study Skills & Research Methods (HL10040)

Dr James Betts

(2)

Lecture Outline:

•What is Hypothesis Testing?

•Hypothesis Formulation

•Statistical Errors

•Effect of Study Design

•Test Procedures

•Test Selection.

(3)

Statistics

Descriptive Inferential

Correlational

Relationships

Generalising Organising,

summarising &

describing data

Significance

(4)

Sampling Error

Statistics The dependent variable can be

generalised from n to N

Effective sampling is essential to correctly generalise back to our

target population

(5)

What is Hypothesis Testing?

A  B A = B

Null Hypothesis

We also need to establish:

1) How unequal are these observations?

2) Are these observations reflective of the general population?

Alternative Hypothesis

(6)

Example Hypotheses: Isometric Torque

• Is there any difference in the length of time that males and females can sustain an isometric muscular contraction?

Null Hypothesis Alternative Hypothesis

♂ = ♀ ^♂  ♀

(7)

Example Hypotheses: Isometric Torque

• Is there any difference in the length of time that males and females can sustain an isometric muscular contraction?

Null Hypothesis (H ₀ )

There is not a significant difference in the DV between males and females

Alternative Hypothesis (H _A ) or experimental (H

_E

)

There is a significant difference in the DV between males and females.

n.b. these are 2-tailed hypotheses. Most

common and more recommended.

(8)

Example Hypotheses: Isometric Torque

• Is there any difference in the length of time that males and females can sustain an isometric muscular contraction?

Useful analogy- the criminal trial

Imagine you are the prosecutor

H ₀ = Defendant not guilty

H _A = Defendant guilty

Your job is to provide sufficient evidence (i.e. ‘beyond reasonable doubt’) that the defendant is not innocent.

Remember: the p-value does NOT tell us the probability they are innocent but rather the probability of finding

our evidence assuming they are innocent

(9)

Example Hypotheses: Isometric Torque

• Is there any difference in the length of time that males and females can sustain an isometric muscular contraction?

Energy Intake (calories per day)

1500 2500 3500 4500 5500

N u m b er o f P eo p le

0 20 40 60 80 100 120 140 160

16 17 18 19 20

Sustained Isometric Torque (seconds)

N♂ N♀

n♂ n♀

n.b. This is why effective

sampling is so important...

(10)

Example Hypotheses: Isometric Torque

• Is there any difference in the length of time that males and females can sustain an isometric muscular contraction?

1500 2500 3500 4500 5500

N u m b er o f P eo p le

0 20 40 60 80 100 120 140 160

16 17 18 19 20

Sustained Isometric Torque (seconds)

N♂ N♀

n♂ n♀

…poor/insufficient sampling can

lead to errors…

(11)

Statistical Errors

• Type 1 Errors

- Rejecting H ₀ when it is actually true

-Concluding a difference when one does not actually exist

• Type 2 Errors

- Accepting H ₀ when it is actually false (e.g. previous slide) -Concluding no difference when one does exist

Errors can occur due to biased/inadequate sampling, poor experimental design or the use of inappropriate/non-

parametric tests.

(12)

Back to Study Design

• Independent Measures

– Individual scores in each data set are independent of one another

• Repeated Measures

– Individual scores in each data set are

dependent/paired/correlated

(13)

Back to Study Design

• Independent Measures

– Individual scores in each data set are independent of one another

• Repeated Measures

– Individual scores in each data set are dependent/paired/correlated T

O ₁ O ₂

T O ₁

O _a P

Pre-Experimental designs.

2 Distinct Groups

Same individuals

tested twice

(14)

Back to Study Design

• Independent Measures

– Individual scores in each data set are independent of one another

• Repeated Measures

True-Experimental design.

Depends on how equivalent groups were

achieved

O ₁ T O ₂

P O ₄

O ₃

R

Random Group Assignment

Cross-Over Design

(15)

Example Hypotheses: Isometric Torque

• Is there any difference in the length of time that males and females can sustain an isometric muscular contraction?

• So the above example is an measures design

– Which therefore requires an independent t-test.

Independent

AKA Students’ (Gosset’s) t-test

(16)

1500 2500 3500 4500 5500

N u m b er o f P eo p le

0 20 40 60 80 100 120 140 160

16 17 18 19 20

Sustained Isometric Torque (seconds)

n♂ n♀

Independent t-test: Calculation

Mean SD n

♀ 18.5 1.74 25

♂ 17.5 1.72 25

Is this a significant

effect?

(17)

Independent t-test: Calculation

Mean SD n

♀ 18.5 1.74 25

♂ 17.5 1.72 25

Step 1:

Calculate the Standard Error for Each Mean

SEM♀ = SD/√n = 1.74/5 = 0.348

SEM♂ = SD/√n = 1.72/5 = 0.344

(18)

Independent t-test: Calculation

Mean SD n

♀ 18.5 1.74 25

♂ 17.5 1.72 25

Step 2:

Calculate the Standard Error for the difference in means

SEMdiff = √ SEM♀ ² + SEM♂ ² = √ 0.251 = 0.501

(19)

Independent t-test: Calculation

Mean SD n

♀ 18.5 1.74 25

♂ 17.5 1.72 25

Step 3:

Calculate the t ^statistic

t = (Mean♀ - Mean♂) / SEMdiff = 2.00

(20)

Independent t-test: Calculation

Mean SD n

♀ 18.5 1.74 25

♂ 17.5 1.72 25

Step 4:

Calculate the degrees of freedom (df)

df = (n ♀ - 1) + (n ♂ - 1) = 48

(21)

Independent t-test: Calculation

Mean SD n

♀ 18.5 1.74 25

♂ 17.5 1.72 25

Step 5:

Determine the critical value for t using a t-distribution table Degrees of Freedom Critical t-ratio

44 46 48 50

2.015 2.013 2.011 2.009

n.b. Use 0.05

for 2 tailed test

(22)

Independent t-test: Calculation

Mean SD n

♀ 18.5 1.74 25

♂ 17.5 1.72 25

Step 6 finished:

Compare t calculated with t critical

Calculated t = 2.00 Critical t = 2.01

Therefore,

t calculated < t critical

Effect size n.s.

(23)

Independent t-test: Calculation

Mean SD n

♀ 18.5 1.74 25

♂ 17.5 1.72 25

Interpretation:

P > 0.05 Reject H

_A

& Accept H

_O

Conclusion:

There is not a significant difference in the DV between

males and females.

(24)

Independent t-test: Calculation

Mean SD n

♀ 18.5 1.74 25

♂ 17.5 1.72 25

Evaluation:

The wealth of available literature supports that females can sustain isometric contractions longer than males. This may suggest that the findings of the present study represent a type error

Possible solution: Increase n

(25)

Independent t-test: SPSS Output

Independent Samples Test

7.842 .012 -2.333 18 .031 -1.69600 .72710 -3.22358 -.16842

-2.333 15.447 .034 -1.69600 .72710 -3.24188 -.15012

Equal variances assumed Equal variances not assumed SwimTime50m

F Sig.

Levene's Test for Equality of Variances

t df Sig. (2-tailed)

Mean Difference

Std. Error

Difference Lower Upper

95% Confidence Interval of the

Difference t-test for Equality of Means

Group Statistics

10 24.7720 1.25246 .39606

10 26.4680 1.92823 .60976

Group Control Visualisation SwimTime50m

N Mean Std. Deviation

Std. Error

Mean

Swim Data

from SPSS session 8

Calculated t

df 18 = critical t 2.101

Ignore sign

2.333 > 2.101

So P < 0.05

(26)

Repeated Measures Designs

• As shown earlier, a repeated measures design infers that data in each data set can be paired or correlated with one another

• An independent t-test is inappropriate to analyse such data

• Instead, a paired t-test should be used…

(27)

1 Week 2

N u m b er o f P re ss -U ps

0 20 40 60 80 100 120 140 160 180 200

Advantages of using Paired Data

• Data from independent samples is heavily influenced by variance between subjects

i.e.

This data would have a large SD associated with an independent t-test simply

because some subjects performed better than others

HOWEVER…

Large SD

(variance)

(28)

1 Week 2

N u m b er o f P re ss -U ps

0 20 40 60 80 100 120 140 160 180 200

Advantages of using Paired Data

• Data from independent samples is heavily influenced by variance between subjects

…using the same participants on two

occasions allows us to pair up the data…

…now we can remove

between subject variance

from subsequent analysis…

(29)

Paired t-test: Calculation

Subject Week 1 Week 2 Diff (D) Diff ² (D ² )

1 10 12

2 50 52

3 20 25

4 8 10

5 115 120

6 75 80

7 45 50

8 170 175

∑D = ∑D ² =

Steps 1 & 2: Complete this table

(30)

Paired t-test: Calculation

∑D = ∑D ² =

Step 3:

Calculate the t statistic

t = n x ∑D ² – (∑D) ² = √ (n - 1)

∑D

(31)

Paired t-test: Calculation

∑D = ∑D ² =

Step 3:

Calculate the t statistic

t = 8 x 137 – (31) ² = 7.06

√ 7

31

(32)

Paired t-test: Calculation

Steps 4 & 5:

Calculate the df and use a t-distribution table to find t critical Degrees of Freedom Critical t-ratio

(0.05 level) 1 2

3 4 5 6 7 8 9

12.71 4.303 3.182 2.776 2.571 2.447 2.365 2.306 2.262

df = n -1

Critical t-ratio (0.01

level) 63.657

9.925

5.841

4.604

4.032

3.707

3.499

3.355

3.250

(33)

Paired t-test: Calculation

Step 6 finished:

Compare t calculated with t critical

Calculated t = 7.06 Critical t = 3.499

Therefore,

t calculated > t critical

Effect size sig.

Mean SD n

Week 1 61.6 56.6 8

Week 2 65.5 57.5 8

(34)

Paired t-test: Calculation

Mean SD n

Week 1 61.6 56.6 8

Week 2 65.5 57.5 8

Interpretation:

P < 0.05 Reject H

₀

& Accept H

_A

Conclusion:

There is a significant difference in the DV between

week 1 and week 2.

(35)

Paired Samples Test

-3.87500 1.55265 .54894 -5.17305 -2.57695 -7.059 7 .000

VAR00001 - VAR00002 Pair 1

Mean Std. Deviation

Std. Error

Mean Lower Upper

95% Confidence Interval of the

Difference Paired Differences

t df Sig. (2-tailed)

Paired t-test: SPSS Output

Push-up Data from lecture 3

Calculated t

df 7 = critical t 2.365 (0.05) 3.499 (0.01)

Ignore sign 7.059 > 3.499 So P < 0.01

Paired Samples Statistics

61.6250 8 56.64157 20.02582

65.5000 8 57.54005 20.34348

VAR00001 VAR00002 Pair

1

Mean N Std. Deviation

Std. Error Mean

(36)

Parametric versus Non-Parametric

• Both the t-tests just shown are parametric tests

• These examine for differences in the mean

• Therefore the mean must be an accurate descriptor

Normal ? Non-normal

(37)

Example Hypotheses: Isometric Torque

• Is there any difference in the length of time that males and females can sustain an isometric muscular contraction?

1500 2500 3500 4500 5500

N u m b er o f P eo p le

0 20 40 60 80 100 120 140 160

16 17 18 19 20

Sustained Isometric Torque (seconds)

Normal Distribution mean is appropriate t-test

Mean A

Mean

B

(38)

Example Hypotheses: Isometric Torque

• Is there any difference in the length of time that males and females can sustain an isometric muscular contraction?

1500 2500 3500 4500 5500

N u m b er o f P eo p le

0 20 40 60 80 100 120 140 160

16 17 18 19 20

Sustained Isometric Torque (seconds)

NON-Normal Distribution mean is INappropriate

Mean A

Mean B Type 2

error

(39)

…assumptions of parametric analyses

• All means and paired differences are ND (this is the main consideration)

• N acquired through random sampling

• Data must be of at least the interval LOM

• Data must be Continuous.

…but see Norman (2010) Adv. Health Sci. Educ.

(40)

Non-Parametric Tests

• These tests use the median and do not assume

anything about distribution, i.e. ‘distribution free’

• Mathematically, value is ignored (i.e. the magnitude of differences are not compared)

• Instead, data is analysed simply according to rank.

(41)

Non-Parametric Tests

• Independent Measures

– Mann-Whitney Test

• Repeated Measures

– Wilcoxon Test

e.g. Exam grades (ordinal) from 14 students in 2 separate schools

(42)

Mann-Whitney U: Calculation

Step 1:

Rank all the data from both groups in one series, then total each Student

School A School B

Student

Grade Rank Grade Rank

J. S.

L. D.

H. L.

M. J.

T. M.

T. S.

P. H.

T. J.

M. M.

K. S.

P. S.

R. M.

P. W.

A. F.

B- B- A+

D- B+

A- F

D C+

C+

B- E C- A-

Median = B-; ∑R

_A

= Median = C+; ∑R

_B

=

(43)

Mann-Whitney U: Calculation

Step 2:

Calculate two versions of the U statistic using:

Median = B-; ∑R

_A

= Median = C+; ∑R

_B

=

U ₁ = (n _A x n _B ) +

2 (n _A + 1) x n _A

- ∑R _A

AND…

U ₂ = (n _A x n _B ) +

2 (n _B + 1) x n _B

- ∑R _B

(44)

Mann-Whitney U: Calculation

Step 2:

Calculate two versions of the U statistic using:

Median = B-; ∑R

_A

= Median = C+; ∑R

_B

=

U ₁ = (n _A x n _B ) +

2 (n _A + 1) x n _A

- ∑R _A

…OR to save time you can calculate U

₁

and then U

₂

as follows

U ₂ = (n _A x n _B ) - U

₁

(45)

Mann-Whitney U: Calculation

Step 3 finished:

Select the smaller of the two U statistics (U

₁

= 17.5; U

₂

= 31.5)

…now consult a table of critical values for the Mann-Whitney test

n 0.05

0.01 6 5

2 7 8

4 8 13

7 9 17

11 Calculated U must be less than critical U to conclude a significant difference

Conclusion

Median A = Median B

(46)

Test Statistics

^b

17.500 45.500 -.900 .368 .383

^a

Mann-Whitney U

Wilcoxon W Z

Asymp. Sig. (2-tailed) Exact Sig. [2*(1-tailed Sig.)]

VAR00001

Not corrected for ties.

a.

Grouping Variable: VAR00002 b.

Mann-Whitney U: SPSS Output

Calculated U (lower value)

17.5 > 8 So P > 0.05 n.s.

Ranks

7 8.50 59.50

7 6.50 45.50

14 VAR00002

1.00 2.00 Total VAR00001

N Mean Rank Sum of Ranks

(47)

Non-Parametric Tests

• Independent Measures

– Mann-Whitney Test

• Repeated Measures

– Wilcoxon Test

e.g. One group pre-test post-test, assumed non-normal

(48)

Wilcoxon Signed Ranks: Calculation

Step 1:

Rank all the differences in one series (ignoring signs), then total each Athlete Pre-training

OBLA (kph) Rank

J. S.

L. D.

H. L.

M. J.

T. M.

T. S.

P. H.

15.6 17.2 17.7 16.5 15.9 16.7

17.0 0.5 0.3 -1 0.3

0.1 -0.2 0.1

∑Signed Ranks = Post-training

OBLA (kph) Diff. Signed Ranks 16.1

17.5 16.7 16.8 16.0 16.5

17.1 6 4.5

-7 4.5

1.5 -3 1.5

- +

-7

-3

6 4.5 4.5

1.5

1.5 Medians = 16.7 16.7

(49)

Wilcoxon Signed Ranks: Calculation

Step 2:

The smaller of the T values is our test statistic (T+ = 18; T- = 10)

…now consult a table of critical values for the Wilcoxon test

n 0.05

6 0

7 2

8 3

9 5

Calculated T must be less than critical T to conclude a significant difference

Conclusion

Median A = Median B

(50)

Test Statistics

^b

-1.364

^a

.172 Z

Asymp. Sig. (2-tailed)

VAR00002 - VAR00001

Based on negative ranks.

a.

Wilcoxon Signed Ranks Test b.

Wilcoxon Signed Ranks: SPSS Output

10 > 2

So P > 0.05 n.s.

Ranks

2^a 3.00 6.00

5^b 4.40 22.00

0^c 7 Negative Ranks

Positive Ranks Ties

Total VAR00002 - VAR00001

N Mean Rank Sum of Ranks

VAR00002 < VAR00001 a.

VAR00002 > VAR00001 b.

VAR00002 = VAR00001 c.

(51)

So which stats test should you use?

Q1. What is the LOM?

Ordinal

Nominal Interval/Ratio

Q2. Are the data ND?

No

Yes

Q3. Are the data paired

or

independent?

(52)

Why do we use Hypothesis Testing?

• It is easy (i.e. data in  P value out)

• It provides the ‘Illusion of Scientific Objectivity’

• Everybody else does it.

(53)

Problems with Hypothesis Testing?

• P<0.05 is an arbitrary probability (P<0.06?)

• The size of the effect is not expressed

• The variability of this effect is not expressed

• Induction/deduction - reproducability

• Overall, hypothesis testing ignores ‘judgement’ .

(54)

Hypothesis Testing