The Mann-Whitney U test. Peter Shaw

(1)

The Mann-Whitney U

test

(2)

Introduction

We meet our first inferential test.

You should not get put off by the

messy-looking formulae – it’s

usually run on a PC anyway.

The important bit is to understand

(3)

Imagine..

That you have acquired a set of

measurements from 2 different sites.

Maybe one is alleged to be polluted, the other clean, and you measure residues in the soil.

Maybe these are questionnaire returns from students identified as M or F.

You want to know whether these 2 sets

of measurements genuinely differ. The issue here is that you need to rule out the possibility of the results being

(4)

The formal procedure:

Involves the creation of two competing

explanations for the data recorded.

Idea 1:These are pattern-less random data. Any observed patterns are due to chance. This is the null hypothesis H0

Idea 2: There is a defined pattern in the data. This is the alternative hypothesis

H1

Without the statement of the competing

(5)

Occam’s razor

If competing explanations exist, chose

the simpler unless there is good reason to reject it.

Here, you must assume H0 to be true

until you can reject it.

In point of fact you can never

ABSOLUTELY prove that your

observations are non-random. Any

pattern could arise in random noise, by chance. Instead you work out how likely H0 is to be true.

(6)

Example

Noise complaints 1= no complaint, 5 = very unhappy

Homes near airport Control site

5 3 4 2 4 4 3 1 5 2 4 1 5

You conduct a questionnaire survey of homes in the Heathrow flight path, and also a control population of homes in South west London. Responses to the question “How intrusive is plane noise in your daily life” are

(7)

Stage 1: Eyeball the

data!

These data are ordinal, but not normally

distributed (allowable scores are 1, 2, 3, 4 or

5).

Use Non-parametric statistics

It does look as though people are less happy

under the flightpath, but recall that we must

state our hypotheses H0, H1

H0: There is no difference in attitudes to plane

noise between the two areas – any observed

differences are due to chance.

H1: Responses to the question differed

(8)

Now we assess how

likely it is that this

pattern could occur by

chance:

This is done by performing a calculation.

Don’t worry yet about what the calculation

entails.

What matters is that the calculation gives an

answer (a test statistic) whose likelihood

can be looked up in tables. Thus by means

of this tool - the test statistic - we can work

out an estimate of the probability that the

observed pattern could occur by chance in

random data

(9)

One philosophical

hurdle to go:

The test statistic generates a probability - a

number for 0 to 1, which is the probability of H0

being true.

If p = 0, H0 is certainly false. (Actually this is

over-simple, but a good approximation)

If p is large, say p = 0.8, H0 must be accepted

as true.

(10)

Significance

We have to define a threshold, a boundary, and

say that if p is below this threshold H0 is rejected

otherwise H1 is accepted.

This boundary is called the significance level. By

convention it is set at p=0.05 (1:20), but you can

chose any other number - as long as you specify

it in the write-up of your analyses.

WARNING!! This means that if you analyse 100

sets of random data, the expectance (log-term

average) is that 5 will generate a significant test.

(11)

The procedure:

Data 5 3 4 2 4 4 3 1 5 2 4 1 5 Test statistic U = 15.5 Probability of H0 being true p = 0.03

Set up H0, H1. Decide significance level p=0.05

Is p above critical level? Y N

Reject H0 Accept H0

(12)

This particular test:

The Mann-Whitney U test is a non-parametric

test which examines whether 2 columns of data

could have come from the same population (ie

“should” be the same)

It generates a test statistic called U (no idea why

it’s U). By hand we look U up in tables; PCs

give you an exact probability.

It requires 2 sets of data - these need not be

paired, nor need they be normally distributed,

nor need there be equal numbers in each set.

(13)

How to do it

1

: rank all data into ascending order,

then re-code the data set replacing raw

data with ranks.

Data 5 3 4 2 4 4 3 1 5 2 4 1 5 Data 5 #13 3 #5 4 #10 2 #4 4 #9 4 #7 3 #6 1 #2 5 #12 2 #3 4 #8 1 #1 5 #11 Data 5 #13 = 12 3 #5 = 5.5 4 #10 = 8.5 2 #4 = 3.5 4 #9 = 8.5 4 #7 = 8.5 3 #6 = 5.5 1 #2 = 1.5 5 #12 = 12 2 #3 = 3.5 4 #8 = 8.5 1 #1 = 1.5 5 #11 = 12

2

Harmonize ranks where the same value occurs more than once

(14)

Once data are ranked:

Add up ranks for each column; call these r

_x

and r

_y

(Optional but a good check:

r

_x

+ r

_y

= n2/2 + n/2, or you have an error)

Calculate

Ux = NxNy + Nx(Nx+1)/2 - Rx Uy = NxNy + Ny(Ny+1)/2 - Ry

take the SMALLER of these 2 values and look up in tables. If U

is LESS than the critical value, reject H0

NB This test is unique in one feature: Here low values of the

(15)

In this case:

Data 5 #13 = 12 3 #5 = 5.5 4 #10 = 8.5 2 #4 = 3.5 4 #9 = 8.5 4 #7 = 8.5 3 #6 = 5.5 1 #2 = 1.5 5 #12 = 12 2 #3 = 3.5 4 #8 = 8.5 1 #1 = 1.5 5 #11 = 12 ___ ___ rx=67 ry=24 Check: rx + ry + 91 13*13/2 + 13/2 = 91 CHECK. Ux = 6*7 + 7*8/2 - 67 = 3 Uy = 6*7 + 6*7/2 - 24 = 39 Lowest U value is 3. Critical value of U (7,6) = 4 at p = 0.01. Calculated U is < tabulated U so reject H0.

At p = 0.01 these two sets of data differ.

(16)

Tails.. Generally use

2 tailed tests

Upper tail of distribution Lower tail of distribution

2 tailed test

: These populations DIFFER.

1 tailed test

: Population X is Greater than Y (or Less than Y).

(17)

Kruskal-Wallis:

The U test’s big cousin

When we have 2 groups to compare (M/F, site 1/site 2, etc) the U test is correct applicable and safe.

How to handle cases with 3 or more groups?

The simple answer is to run the Kruskal-Wallis test. This is run on a PC, but behaves very much like the M-W U. It will give one significance

value, which simply tells you whether at least one group differs from one other. Males Females Do males differ from females? Site 1 Site 2 Do results differ between these sites? Site 3

(18)

Your coursework:

I will give each of you a sheet with data collected from 3 sites. (Don’t try copying – each one is different and I know who gets which dataset!). I want you to show me your data processing skills as follows:

1: Produce a boxplot of these data, showing how values differ between

the categories.

2: Run 3 separate Mann-Whitny U tests on them, comparing 1-2, 1-3 and

2-3. Only call the result significant if the p value is < 0.01

3: Run a Kruskal-Wallis anova on the three groups combined, and