• No results found

Rank-Based Non-Parametric Tests

N/A
N/A
Protected

Academic year: 2021

Share "Rank-Based Non-Parametric Tests"

Copied!
39
0
0

Loading.... (view fulltext now)

Full text

(1)

Rank-Based Non-Parametric Tests

(2)

Non-Parametric Tests

Reminder: Student Instructional Rating Surveys

You have until May 8 th to fill out the student instructional rating surveys at https://sakai.rutgers.edu/portal/site/sirs

The survey should be available on any device with a full-featured web browser. Please take the time to fill it out. Your answers:

• Will be anonymous

• Will help me to improve my teaching strategies and the structure of the course

• Will help the department in planning and designing future courses

• Will be used by the university in promotion, tenure, and

reappointment decisions

(3)

Non-Parametric Tests

Parametric and Nonparametric Tests

• Most of the statistical tests that we have used throughout the semester have relied on certain specific assumptions about the distribution of the involved variables and/or their means, and have been set up to test hypotheses about specific

population parameters

• Such tests (z-tests, t-tests, ANOVAs, Pearson’s correlation)

are called parametric tests

(4)

Non-Parametric Tests

Parametric and Nonparametric Tests

• Though these parametric tests are robust to minor violations of their assumptions, they can lead to gross systematic errors when the data are strongly violate the underlying assumptions and can even be undefined for certain types of data (e.g.,

nominal or non-numerical data).

• Certain tests do not rely on specific distributional assumptions or test hypotheses about particular population parameters.

These tests are generally called nonparametric tests

• The chi-square tests introduced in the last lecture and

Spearman’s rank correlation coefficient test were examples of

nonparametric tests.

(5)

Non-Parametric Tests

Parametric versus Nonparametric Tests

• The advantages of parametric tests are that

– They are more powerful (i.e., you can detect smaller effect sizes with smaller samples) than comparable non-parametric tests when the parametric assumptions are correct (or approximately correct).

– The hypothesis tests are more specific and easier to interpret.

• The advantage of nonparametric tests are that

– They can be used when the distribution of the population is completely unknown

– They tend to be more robust to ill-behaved (e.g., non-normal, heteroscedastic, & multi-modal) data

– They are less sensitive to outliers

(6)

Non-Parametric Tests

Nonparametric tests

• In the last 70 years or so, statisticians have developed many different nonparametric tests.

• Those most widely used in the behavioral sciences tend to be rank randomization tests

• Rank randomization (or rank-permutation) tests are

hypothesis tests based on the theoretical distribution of

randomly assigned ranks. As a first step, they all require the conversion of raw scores to ordinal ranks

– This makes them obvious candidates for ordinal data, though these data

usually still need to be modified

(7)

Non-Parametric Tests

Rank Randomization Tests

Advantages of rank-based tests:

1. Ranks are simpler, and rank-based tests are easier to compute 2. They are largely insensitive to the particular form of the population

distributions and differences between the distributions underlying the scores in different samples

3. They tend to minimize the effects of large sample variances 4. They are insensitive to outlier scores and make it easier to deal

with undetermined scores (e.g., time to task completion)

5. The distribution of randomly assigned ranks can be computed

(8)

Non-Parametric Tests

Sample 1 Ranks

1 2 3

1 2 3

1 2 3

1 2 3

1 2 3

1 2 3

1 2 4

1 2 4

1 2 4

1 2 4

● ● ●

● ● ●

● ● ●

6 5 3

6 5 3

6 5 4

6 5 4

6 5 4

6 5 4

6 5 4

6 5 4

Sample 2 Ranks

4 5 6

4 6 5

5 4 6

5 6 4

6 4 5

6 5 4

3 5 6

3 6 5

5 3 6

5 6 3

● ● ●

● ● ●

● ● ●

4 1 2

4 2 1

1 2 3

1 3 2

2 1 3

2 3 1

3 1 2

3 2 1

720

permutations

(9)

Non-Parametric Tests

720 permutations

Sample 1 Ranks R

1

= Σr

1 2 3 6

1 2 3 6

1 2 3 6

1 2 3 6

1 2 3 6

1 2 3 6

1 2 4 7

1 2 4 7

1 2 4 7

1 2 4 7

● ● ● ●

● ● ● ●

● ● ● ●

6 5 3 14

6 5 3 14

6 5 4 15

6 5 4 15

6 5 4 15

6 5 4 15

6 5 4 15

6 5 4 15

Sample 2 Ranks R

2

= Σr

4 5 6 15

4 6 5 15

5 4 6 15

5 6 4 15

6 4 5 15

6 5 4 15

3 5 6 14

3 6 5 14

5 3 6 14

5 6 3 14

● ● ● ●

● ● ● ●

● ● ● ●

4 1 2 7

4 2 1 7

1 2 3 6

1 3 2 6

2 1 3 6

2 3 1 6

3 1 2 6

3 2 1 6

W

s

= min(R

1

,R

2

) 6 6 6 6 6 6 7 7 7 7

7

7

6

6

6

6

6

6

(10)

Non-Parametric Tests

(11)

Non-Parametric Tests

(12)

Non-Parametric Tests

Rank Randomization Tests

Popular rank-based tests:

1. The Mann-Whitney U (or Wilcoxon rank-sum) test

– Nonparametric analogue to the independent-samples t-test

2. The Wilcoxon signed-rank test

– Nonparametric analogue to the matched-samples t-test

3. The Kruskal-Wallis test

– Nonparametric analogue to the one-way ANOVA (independent meas.)

4. The Friedman test

– Nonparametric analogue to the repeated-measures ANOVA

(13)

Non-Parametric Tests

A Note about Computing Ranks

• All of the rank-based tests will require that you compute ranks based on the total number of scores and from lowest to

highest

– I.e., if you have 3 samples with 5 scores each, the lowest overall score should be assigned the rank 1 and the highest overall score should be assigned the rank 15

• In case of ties, each tied score should be assigned the mean of the tied ranks

– I.e., if the 3 rd and 4 th lowest scores have the same value then you should assign them each a rank of 3.5, with the next highest value receiving a rank of 5.

– If the 7 th , 8 th , and 9 th scores are all tied, then you should assign them

each a rank of 8, with the next highest value receiving a rank of 10.

(14)

Non-Parametric Tests

Rank Randomization Tests

Aside from converting the raw scores to ranks, the logical steps are similar to those for the parametric hypothesis tests:

1. State the null and alternative hypotheses about the population.

2. Use the null hypotheses to predict the characteristics that the sample ranks should have.

3. Use the samples to compute the test statistic

4. Compare the test statistic with the hypothesis prediction

(15)

Non-Parametric Tests

The Mann-Whitney Test

• This is a test for independent-measures (between subjects) research designs with two groups and is thus an alternative to the independent-measures t-test

• The basic intuition behind the test is that:

– A real difference between the two treatments should cause the scores in one sample to be generally larger than the scores in the other sample

– If all the scores are ranked, the larger ranks should be concentrated in

one sample and the smaller ranks should be concentrated in the other

sample.

(16)

Non-Parametric Tests

The Mann-Whitney Test

• The null and alternative hypotheses are a bit more vague than for the t-test, but still test for some sort of difference in central tendency

– H 0 : There is no difference between treatments. Therefore, there is no tendency for ranks in one sample to be systematically higher or lower than in the other sample

– H 1 : There is a difference between treatments. Therefore, the ranks in

one sample should be systematically higher or lower than in the other

sample

(17)

Non-Parametric Tests

The Mann-Whitney Test

• When comparing two samples, the Mann-Whitney U statistic for sample 1 represents the sum of the number of scores in sample 2 outranked by scores in sample 1

• The smaller of the U values is looked up in the table

Treatment A Raw Scores Ranks

27 7

2 1

9 4

48 8

6 2

15 5

Treatment B Raw Scores Ranks

71 11

63 9

18 6

68 10

94 12

8 3

(18)

Non-Parametric Tests

Example

Treatment A

Raw Scores Ranks Points

27 7 2

2 1 0

9 4 1

48 8 2

6 2 0

15 5 1

6

Treatment B

Raw Scores Ranks Points

71 11 6

63 9 6

18 6 4

68 10 6

94 12 6

8 3 2

30

(19)

Non-Parametric Tests

The Mann-Whitney Test: Steps

In practice, the steps for computing the test statistic (U) are:

1. Rank all the observations from smallest to largest

2. Compute the sum of the ranks in each sample, using the following formulas to compute U statistics from the ranks R:

3. U is the smaller of the sums of these counts

1 1

1 1

( 1 2 ,

)

U R n n

 

  U 2 R 2 n 2 ( n 2 2 1)

(20)

Non-Parametric Tests

Example

Treatment A Raw Scores Ranks

27 7

2 1

9 4

48 8

6 2

15 5

ΣR

1

27

Treatment B Raw Scores Ranks

71 11

63 9

18 6

68 10

94 12

8 3

ΣR

2

51

1 1

1 1

( 1 2 27 6(7) 2

)

7 21 6 2

U R n n

 

   

(21)

Non-Parametric Tests

1 2

6 6 6 U

n n

In this case, U crit = 5, so we

retain the null hypothesis

(22)

Non-Parametric Tests

The Mann-Whitney Test: Normal Approximation

When n 1 and n 2 are sufficiently large (e.g., n 1 ,n 2 ≥ 10) the distribution of rank sums becomes roughly normal and you can use a normal

approximation, evaluated against a critical z value, to test for significance.

1 2 U 2

  n n

 

1 2 1

U 12

n n N

 

  U

U

z U U

 

(23)

Non-Parametric Tests

Wilcoxon’s Signed Ranks Test

• This is a test for repeated-measures (within-subjects)

research designs with two treatment conditions and is thus an alternative to the repeated-measures t-test

• The basic intuition behind the test is that:

– A real difference between the two treatments should cause the difference scores to be generally positive or negative

– If all the difference scores are ranked and signed (according to whether they represent increases + or decreases -), the ranks should be

concentrated in either the positive or negative set.

(24)

Non-Parametric Tests

Wilcoxon’s Signed Ranks Test

• Again, the null and alternative hypotheses are a bit more

vague than for the repeated measures t-test, but test for some sort of difference in central tendency

– H 0 : There is no difference between treatments. Therefore, there is no tendency for the ranks of difference scores to be generally positive or negative

– H 1 : There is a difference between treatments. Therefore, the ranks of

the difference scores should be systematically positive or negative

(25)

Non-Parametric Tests

Wilcoxon’s Signed Rank Test: Steps

The steps for computing the test statistic (T) are:

1. Compute the difference scores

2. Rank all the difference scores from smallest to largest

absolute value and assign them positive or negative signs based on whether they represent an increment or decrement 3. Compute separate sums for the positively and negatively

signed sets

4. T is the smaller of the resulting signed-rank-sums

(26)

Non-Parametric Tests

Example

(27)

Non-Parametric Tests

Example

T = 9 N = 15

T crit is 25, which is greater

than our test statistic, so

we would reject the null

hypothesis

(28)

Non-Parametric Tests

Wilcoxon’s Signed Ranks Test: Normal Approximation

Again, when n is sufficiently large (e.g., n ≥ 20) the distribution of T becomes roughly normal and you can use a normal approximation, evaluated against a critical z value, to test for significance.

1

T 4

  n n

( 1)(2 2)

T 24

n n n

 

 

1

4

( 1)(2 1) 24

T T

z T T

T n n

n n n

 

 

  

(29)

Non-Parametric Tests

The Kruskal-Wallis One Way ANOVA

• This is a test for independent-measures research designs with more than two groups. As its name suggests, it is a

nonparametric alternative to the parametric one-way ANOVA

• The basic intuition behind the test is analogous to that for the parametric one-way ANOVA:

– A real difference among treatments should cause the variability of

scores between groups to be greater than the variability of scores within groups

– If all the scores are ranked the variability of rank-sums between groups

(30)

Non-Parametric Tests

The Kruskal-Wallis One Way ANOVA

• The null and alternative hypotheses are very similar to those in the parametric one-way ANOVA.

– H 0 : There is no difference between treatments. There is no tendency for ranks in any sample to be systematically higher or lower than in any other condition.

– H 1 : There are differences between treatments. The ranks in at least one

condition are systematically higher or lower than in another treatment

condition

(31)

Non-Parametric Tests

 

 

2

1 2

k

i

k n

i j

i i T

ij i

n M M F C

x M

 



 

 

2

2 2

k

T i

k n

T i

i i

j j

i

r r r n

r HC

 



Parametric ANOVA: Kruskal-Wallis:

within

df N k

C   

Comparison of Conceptual Formulas for the Parametric one-way ANOVA

and Kruskal-Wallis Test:

(32)

Non-Parametric Tests

The Kruskal-Wallis Test: Steps

The steps for computing the test statistic (H) are:

1. Rank all scores from lowest to highest across all samples 2. Compute R, the sum of ranks in each sample

3. Plug into the following formula to solve for H:

122 31

1 i i R i

H N

N N n

  

 

(33)

Non-Parametric Tests

The Kruskal-Wallis Test: Normal Approximation

• As in the other rank-randomization tests, when N is

sufficiently large, the distribution of rank-sums becomes approximately normal

• H is a linear combination (a weighted sum) of squared-rank sums, which means that it can be approximated by the

distribution of a sum of squared normal variables

• For this reason, the significance of H is usually evaluated

using a chi-squared distribution with k-1 degrees of freedom.

(34)

Non-Parametric Tests

Friedman’s Rank Test

• This is a test for repeated-measures research designs with more than two groups. It is the non-parametric analogue to a one-way repeated-measures ANOVA

• Just as the repeated measures ANOVA tests for consistent changes between individuals across treatment groups,

Friedman’s test looks for consistent rankings between individuals across treatment groups

• It can be used with any repeated-measures data, but is

especially useful for measuring inter-rater agreement for

rankings

(35)

Non-Parametric Tests

The Friedman Test

• The null and alternative hypotheses are identical to those for the Kruskal-Wallis Test.

– H 0 : There is no difference between treatments. There is no tendency for ranks in any sample to be systematically higher or lower than in any other condition.

– H 1 : There are differences between treatments. The ranks in at least one

condition are systematically higher or lower than in another treatment

condition

(36)

Non-Parametric Tests

The Friedman Test: Steps

The steps for computing the test statistic ( ) are:

1. Rank scores across each treatment group for each individual 2. Compute R, the sum of ranks for each group

3. Plug into the following formula to compute the test statistic:

2 12 2

( 1) 3 ( 1)

( 1)

k R

i

k R i n k

   nk k  

 

2

R

(37)

Non-Parametric Tests

Friedman Test Example

Sommelier Wine 1 Wine 2 Wine 3

A 1 2 3

B 2 1 3

C 2 1 3

D 3 2 1

E 1 2 3

F 2 1 3

G 3 2 1

R 14 11 17

 

 

2 2

2 2 2

(2) 12 3 ( 1)

( 1)

12 14 11 3(7)(4)

7(3)(4) 17

12 196 121 289 84 2.658 84

R i

k i

R n k

  nk k  

 

  

(38)

Non-Parametric Tests

χ

2

Distribution Upper Tail Probability

df 0.1 0.05 0.025 0.01

1 2.71 3.84 5.02 6.63

2 4.61 5.99 7.38 9.21

3 6.25 7.81 9.35 11.34

4 7.78 9.49 11.14 13.28

5 9.24 11.07 12.83 15.09

6 10.64 12.59 14.45 16.81

7 12.02 14.07 16.01 18.48

8 13.36 15.51 17.53 20.09

9 14.68 16.92 19.02 21.67

10 15.99 18.31 20.48 23.21

11 17.28 19.68 21.92 24.72

12 18.55 21.03 23.34 26.22

13 19.81 22.36 24.74 27.69

14 21.06 23.68 26.12 29.14

15 22.31 25.00 27.49 30.58

16 23.54 26.30 28.85 32.00

17 24.77 27.59 30.19 33.41

18 25.99 28.87 31.53 34.81

19 27.20 30.14 32.85 36.19

20 28.41 31.41 34.17 37.57

30 40.26 43.77 46.98 50.89

40 51.81 55.76 59.34 63.69

50 63.17 67.50 71.42 76.15

60 74.40 79.08 83.30 88.38

70 85.53 90.53 95.02 100.43

80 96.58 101.88 106.63 112.33

90 107.57 113.15 118.14 124.12

100 118.50 124.34 129.56 135.81

(39)

Non-Parametric Tests

Friedman Test Example

Sommelier Wine 1 Wine 2 Wine 3

A 1 2 3

B 2 1 3

C 2 1 3

D 3 2 1

E 1 2 3

F 2 1 3

G 3 2 1

R 14 11 17

 

 

2 2

2 2 2

(2) 12 3 ( 1)

( 1)

12 14 11 3(7)(4)

7(3)(4) 17

12 196 121 289 84 2.658 84

R i

k i

R n k

  nk k  

 

  

2

5.99 retai , n 0

crit H

 

In this case, we would conclude that there is no significant difference

References

Related documents

Price sensitive search Store search Ratings based ordering Comparator events Basket add events Payment Gateway events. The

As part of the capacity nomination, the customer may be able to chose between a number of characteristics including day- ahead or day-of notification and the event duration

A pilot test with 4 adults showed that SwellFit successfully distinguished noise data and motion artifacts from ankle curvature values.. We are currently collaborating with

Average difference of daytime and nightime temperatures thirty days prior to blooming start shows significant relation with the start date of blooming.. The higher the

70 million tonnes Production value: 8.2 billion NOK 1,03 billion EUR Export: 65 percent of production Number of gravel and crushed rock

Senior Vice President - Joint Head of Global Shipping Germany Mr.. Weber Marcus Commerzbank AG Relationship

Since the policy conflicts in firewalls always exist and are hard to be eliminated, a practical resolution method is to identify which rule involved in a conflict