• No results found

Reliability and Validity

N/A
N/A
Protected

Academic year: 2021

Share "Reliability and Validity"

Copied!
61
0
0

Loading.... (view fulltext now)

Full text

(1)

Reliability and Validity

Introduction to Study Skills & Research Methods (HL10040)

Dr James Betts

(2)

Lecture Outline:

•Definition of Terms

•Types of Validity

•Threats to Validity

•Types of Reliability

•Threats to Reliability

•Introduction to Measurement Error.

(3)

Commonly used terms…

“She has a valid point”

“My car is unreliable”

…in science…

“The conclusion of the study was not valid”

“The findings of the study were not reliable”.

(4)

Some definitions…

• Validity

“The soundness or appropriateness of a test or instrument in measuring what it is

designed to measure”

(Vincent 1999)

(5)

Some definitions…

• Validity

“Degree to which a test or instrument measures what it purports to measure”

(Thomas & Nelson 1996)

(6)

Some definitions…

• Reliability

“…the degree to which a test or measure produces the same scores when applied in the same circumstances…”

(Nelson 1997)

(7)

Some definitions…

• Objectivity

“…the degree to which different observers agree on measurements…”

(Atkinson & Nevill 1998)

(8)

Types of Experimental Validity

• Internal

– Is the experimenter measuring the effect of the independent variable on the dependent variable?

• External

– Can the results be generalised to the wider population?

(9)

Logical Statistical

AKA Criterion

Face Content Predictive

Construct

Concurrent

Validity

Consistency

Reliability Objectivity

(10)

Logical Validity

• Face Validity

– Infers that a test is valid by definition

– It is clear that the test measures what it is supposed to

e.g.

If you want to assess reaction time, measuring how long it takes an individual to react to a given stimulus would have

face validity Externally

Valid?

(11)

Logical Validity

• Face Validity

– Infers that a test is valid by definition

– It is clear that the test measures what it is supposed to

Assessing face validity is therefore a subjective process.

i.e.

Would assessing 15 m sprint time be a valid means of

assessing reaction time?

(12)

Logical Validity

• Content Validity

– Infers that the test measures all aspects contributing to the variable of interest

…also a subjective process.

e.g.

Who is the most physically

VOfit?2 max test?

Wingate test?

1 RM?

(13)

Overall:

A logically valid test simply appears to

measure the right variable in its entirety?

(14)

Statistical Validity

• Concurrent Validity

– Infers that the test produces similar results to a previously validated test

e.g.

VO2 max

Incremental Treadmill Protocol

with expired gas analysis Multi-Stage Fitness (Beep) Test

(15)

Statistical Validity

• Predictive Validity

– Infers that the test provides a valid reflection of future performance using a similar test

e.g.

Can performance during test A be

used to predict future performance

in test B?

A B

http://www.youtube.com/watch?v=vdPQ3QxDZ1s

(16)

Overall:

A statistically valid test produces results

that agree with other similar tests?

(17)

Logical/Statistical Validity

• Construct Validity

– Infers not only that the test is measuring what it is supposed to, but also that it is capable of detecting what should exist, theoretically

– Therefore relates to hypothetical or intangible constructs

e.g.

Team Rivalry

Sportsmanship.

(18)

Logical/Statistical Validity

• Construct Validity

– Infers not only that the test is measuring what it is supposed to, but also that it is capable of detecting what should exist, theoretically

– Therefore relates to hypothetical or intangible constructs

– This makes assessment difficult,

i.e. if what should exist cannot be detected, this could mean:

a) Test Invalid? b) Theory Incorrect? c) Sensitivity/Specificity Issues?

(19)

Interesting Example: Breast Cancer

• Incidence: ~1 % (0.8 %)

(i.e. a positive result should be detected for approximately 1 in every 100 women tested)

• Sensitivity: ~90 % (87 %)

(the mammogram is sensitive enough that approximately 90 in every 100 breast cancer patients will receive a positive result)

• Specificity: ~90 % (93 %)

(the mammogram is specific enough that approximately 90 in every 100 healthy patients will receive a negative result).

Data from Kerlikowske et al. (1996)

(20)

Quick Test

• What is the probability that a patient receiving a positive

result actually has breast

cancer?

(21)
(22)

Threats to Validity

(and possible solutions?)

(23)

Threats to Internal Validity

• Maturation

– Changes in the DV over time irrespective of the IV

(24)

Threats to Internal Validity

• Maturation

e.g. One Group Pre-test Post-test

O

1

T O

2

(25)

Threats to Internal Validity

• Maturation (possible solution) Time series

O

1

O

2

O

3

T O

4

O

5

O

6

(26)

Threats to Internal Validity

• Maturation (possible solution)

Pre-test Post-test Randomised Group Comparison

O

1

T O

2

P O

4

O

3

R n.b. RCT

(27)

Threats to Internal Validity

• Maturation (possible solution)

Repeated measures designs can occasionally be an inappropriate solution, even when randomised and counterbalanced

e.g.

Muscle Damage (repeated bout effect)

Vitamin Supplementation (wash-out period)

In which case independent measures designs could be used.

(28)

Threats to Internal Validity

• History

– Unplanned events between measurements

(29)

Threats to Internal Validity

• History

O

1

T O

2

e.g. exercise?

Therefore, solution = control extraneous variables!

(30)

Threats to Internal/External Validity

• Pre-testing

– Interactive effects due to the pre-test (e.g. learning, sensitisation, etc.)

– Also influences External Validity

(31)

• Pre-testing

…but then respond better to the T than the P…

e.g.

O

1

T O

2

O

3

P

R O

4

…so it is actually T+O1 that is better than P, not T alone.

Threats to Internal/External Validity

Assessing muscle

mass here could make them train harder in both trials…

(32)

• Pre-testing (possible solution)

Solomon Four- Group Design

O

1

T O

2

R O

4

O

3

P

P O

6

T O

5

Threats to Internal/External Validity

(33)

Threats to Internal Validity

• Statistical Regression

– AKA regression to the mean

– An initial extreme score is likely to be

followed by less extreme subsequent scores

e.g.

Training has the greatest effect on untrained individuals.

Therefore, solution = effective sampling.

Sophomore Slump & SI

‘Cover Jinx’

(34)

Threats to Internal Validity

• Instrumentation

– A difference in the way 2 comparable variables were measured

e.g.

Uncalibrated equipment

Therefore, solution = calibrate!

(35)

Threats to Internal Validity

• Selection Bias

– The groups for comparison are not equivalent

(36)

Threats to Internal Validity

• Selection Bias

e.g. Groups not randomly assigned

Static Group Comparison

T O

1

O

a

P

i.e.

Group T were resistance trained to start with

(37)

Threats to Internal Validity

• Selection Bias (possible solution)

T O

1

O

a

P

Either:

-Randomise group assignment,

-Pre-test and post- test difference,

-Repeated Measures Design.

(38)

Threats to Internal/External Validity

• Experimental Mortality

– Missing Data due to subject drop-out – Reduced n = reduced statistical Power

– Not only challenges quality of data gathered (Internal Validity) but

also our ability to generalise

(External Validity).

Therefore, solution = recruit sufficient

participants (young?)

(39)

Threats to External Validity

• Inadequate description

– 5th characteristic of research…

…should be

replicable

If nobody can replicate the methods of a given study, then it is irrefutable and therefore lacks external validity.

Therefore, solution = comprehensive methodology

(40)

Threats to External Validity

• Biased sampling

– Linked to statistical regression

– Sample does not reflect target population – n ≠ N

Results generalised across gender

Therefore, solution = random sample (of target population).

(41)

Threats to External Validity

• Hawthorne Effect

– DV is influenced by the fact that it is being recorded

e.g.

Fastest sprint when professor enters lab

Therefore, solution =

control the lab environment.

(42)

Threats to External Validity

CHO H2O

Therefore, solution = double or single

blinding.

• Demand Characteristics

– Participants detect the purpose of the study and behave accordingly

e.g.

Sports Science students already know that the carbohydrate drink is supposedly superior

(43)

Threats to External Validity

• Operationalisation

– AKA Ecological Validity

– The DV must have some relevance in the

‘real world’

e.g.

TTE has no Olympic equivalent

Therefore, solution = choose your DV carefully.

(44)

Reliability

• Reliability is a pre-requisite of validity

e.g. Direct versus Indirect measures of VO2 max

-Gold Standard -Expensive -Complex

-Predictive -Cheap -Easy

(i.e. valid and reliable)

(45)

Reliability

Subject 1 60 ml.kg-1.min-1 60 ml.kg-1.min-1 60 ml.kg-1.min-1

Subject 2 55 ml.kg-1.min-1 55 ml.kg-1.min-1 55 ml.kg-1.min-1

Subject 3 70 ml.kg-1.min-1 70 ml.kg-1.min-1 70 ml.kg-1.min-1

Valid and Reliable

(46)

Reliability

Subject 1 60 ml.kg-1.min-1 65 ml.kg-1.min-1 65 ml.kg-1.min-1

Subject 2 55 ml.kg-1.min-1 60 ml.kg-1.min-1 60 ml.kg-1.min-1

Subject 3 70 ml.kg-1.min-1 75 ml.kg-1.min-1 75 ml.kg-1.min-1

Not Valid but Reliable

5 ml.kgcorrection?-1.min-1

(47)

Reliability

Subject 1 60 ml.kg-1.min-1 72 ml.kg-1.min-1 57 ml.kg-1.min-1

Subject 2 55 ml.kg-1.min-1 61 ml.kg-1.min-1 52 ml.kg-1.min-1

Subject 3 70 ml.kg-1.min-1 40 ml.kg-1.min-1 84 ml.kg-1.min-1

Not Valid and not Reliable

i.e. a test can never be valid without being reliable?

(48)

Types of Reliability

• Relative

• Absolute

• Rater reliability (Objectivity)

– Intrarater reliability – Interrater reliability.

(49)

Relative Reliability

Subject 1 60 ml.kg-1.min-1 63 ml.kg-1.min-1 57 ml.kg-1.min-1

Subject 2 55 ml.kg-1.min-1 56 ml.kg-1.min-1 48 ml.kg-1.min-1

Subject 3 70 ml.kg-1.min-1 65 ml.kg-1.min-1 66 ml.kg-1.min-1

Relatively Reliable

i.e. Individuals maintain position in the group

(50)

Absolute Reliability

Subject 1 60 ml.kg-1.min-1 63 ml.kg-1.min-1 57 ml.kg-1.min-1

Subject 2 55 ml.kg-1.min-1 56 ml.kg-1.min-1 48 ml.kg-1.min-1

Subject 3 70 ml.kg-1.min-1 65 ml.kg-1.min-1 66 ml.kg-1.min-1

Not Absolutely Reliable

i.e. Test-Retest within individuals

(51)

Rater Reliability

• Intrarater reliability

– The consistency of a given observer or

measurement tool on more than one occasion

(52)

Rater Reliability

• Interrater reliability

– The consistency of a given measurement from more than one observer or measurement tool

e.g.

Score for the American Gymnast British Judge = 9.9 French Judge = 4.4 Japanese Judge = 7.0

(53)

Threats to Reliability

• Fatigue

Subject 1 60 ml.kg-1.min-1 55 ml.kg-1.min-1 50 ml.kg-1.min-1

8 am 9 am 10 am

Therefore, solution = increase time between tests.

(54)

Threats to Reliability

• Habituation

Subject 1 60 ml.kg-1.min-1 65 ml.kg-1.min-1 70 ml.kg-1.min-1

Therefore, solution = familiarise prior to test.

(55)

Threats to Reliability

• Standardisation of Procedures

– Control of extraneous variables

• Precision of Measurements

– i.e. if we are happy to measure VO2 max to the nearest 10 ml.kg-1.min-1, then it could probably be reliably

predicted from your training volume and age.

(56)

Measurement Errors

• Ultimately, reliability is dependent on the

degree of measurement error in a given study

• The overall error in any measurement is

comprised of both systematic and random error

• We will address measurement error further next

week…

(57)

Literature Search Assignment

• The handout lists 8 questions which can be

answered through retrieving the corresponding source articles

• Answer as many as possible and bring them to next week’s lecture

• DO NOT contact author or order articles.

(58)

Selected Reading

• Atkinson, G. and A. M. Nevill. Statistical methods for

assessing measurement error (Reliability) in variables relevant to sports medicine. Sports Medicine. 26:217-238, 1998.

• Holmes, T. H. Ten categories of statistical errors: a guide for research in endocrinology and metabolism. American Journal of Physiology. 286: E495-501.

• Thomas J. R. & Nelson J. K. (2001) Research Methods in Physical Activity, 4th edition. Champaign, Illinois: Human Kinetics

(59)
(60)
(61)

References

Related documents

Primary energy dependence of the average shower maximum depth for proton- and iron-initiated vertical EAS, as calculated using the QGSJET-II-04 [17, 18], EPOS-LHC [14], and

Dies kann, obwohl der Verlauf in unserer Studie nicht über viele Jahre beobachtet wurde, als Erfolg gewertet werden, da Patienten mit M.Menière auch nach 20

Objective: The objective of this study was to compare the risk of abdominal surgery, steroid utilization, and hospitalization for infection in Crohn’s disease (CD) or

(EUROCRYPT ’12) gives a partial negative answer by showing that some CPA secure schemes do not satisfy a simulation-based definition of SOA security called SIM-SOA.. However, until

O estudo teve por finalidade revisar a literatura publicada a respeito do gel Papacárie e apresentar alternativa para método tradicional de remoção de cárie,

By fostering initiatives of the Experiments to establish ambitious preservation plans and throughout the developments of the related projects, the DPHEP project at CERN is moving in

The age limits for the population in this study were set as between eighteen and twenty-five in order to explore the issues confronting younger gays and lesbians. It is difficult