• No results found

, NOTES NOTE ON WILCOXON'S TWO-SAMPLE TEST WHEN TIES ARE PRESENT. Mathematical Centre, Amsterdam

N/A
N/A
Protected

Academic year: 2021

Share ", NOTES NOTE ON WILCOXON'S TWO-SAMPLE TEST WHEN TIES ARE PRESENT. Mathematical Centre, Amsterdam"

Copied!
12
0
0

Loading.... (view fulltext now)

Full text

(1)

Reprinted from THE ANNALS OF MATHEMATICAL STATISTtCS Vol. 23, No. 1, March, 1952

Printed in U.S.A.

, NOTES

NOTE ON WILCOXON'S TWO-SAMPLE TEST WHEN TIES ARE PRESENT

BY J. HEMELRIJK Mathematical Centre, Amsterdam

Wilcoxon's parameterfree two-sample test (cf. Wilcoxon [1]; H.B. Mann and

D. R. Whitny [2]) depends on a statistic U with the following definition: If X1 , • • • , Xn and Yr , · · · , Ym are the two samples, U is the number of pairs

(i, j) with Xi

>

Yi. The probability distribution of U, under the hypothesis

that the samples have been drawn independently from the same continuous

pop-ulation, has been derived by Mann and Whitney. The influence of ties on this probability distribution has not been investigated as yet.

It is noteworthy that Wilcoxon's U is closely connected with the quantity S,

which Kendall (cf. e.g. Kendall [3]) introduced in the theory of rank correlation.

When r pairs of numbers (uk, vk) are given, Sis computed by scoring:

-1, if (uh - Uk) (vh - Vk)

<

0, 0, if (uh - Uk) (vh - Vk)

=

0, +1, if (uh - Uk) (vh - vk)

> 0,

and adding the scores for all pairs (h, k) with h

<

k. If, in this definition, we take r

=

n

+

m and substitute the values Xr, · · · , Xn, Y1, ·· · · , Ym in this

order for U1 , • • • , Un , Un+i , • • • , Ur , and O or 1 respectively for vk if uk

=

x;

for some i or '!f'k

=

y i for some j respectively, then the following relation holds:

(1) 2U

+

S

=

nm.

The simplest way to see this is by considering the total score of 2U

+

S for every pair (h, k). This score is equal to

+

1 if Vh

=

0 and vk

=

I, and 0

other-wise. The sum of the scores is therefore nm.

Relation (1) holds if no ties are present among the two samples x1 , • • •,

x,. and Y1, · · · , Ym. It is natural to define U in general by extending (1) to the case when there are ties. Since for a pair (x; , y i) with X;

=

y i the score of

S is equal to zero, the score for U must be taken as

½

for such a pair.

Now Kendall has derived the mean and the standard deviation of S under

the hypothesis that for a given order of the quantities v1 ,· • • • , Vr all the r !

possible permutations of u1 , · · · , Ur are equally probable. This condition is

fulfilled in our case if the samples x1, · · · , Xn and Yr, · · · , Ym have been drawn

at random from the same population (which need not be continuous anymore).

Therefore, the mean and standard deviation of U under the null hypothesis

may be derived from Kendall's formulas.

(2)

134 J. HEMELRIJK

According to Kendall ([4], pp. 56 and 60), we have

(2) and E(S)

=

0 var (S)

= /

13 {r(r - 1)(2r + 5) -

E

t(t - 1)(2t + 5) t (3) -

~

s(s - 1)(2s + 5)

l

+ gr(r _ 1\ r _ 2) {~ t(t - l)(t - 2)

l

• {~ s(s - l)(s - 2)1 + 2r(r

~

l)

q::

t(t - l)} {~ s(s - 1)},

where summation

Li

takes place over the various ties among ui , · · · , Ur , and

E.

over the ties among Vi , • • • , Vr ; t and s respectively indicating the number of

elements in every group of equal numbers among ui , · · · , Ur and v1 , • • · , v,

respectively. From (1) we have

(4)

and

(5)

E(U)

=

½

nm - E(S)

=

½

nm

var (U)

=

¼

var (S).

The group Vi , • • • , Vr consists of n numbers O and m numbers 1; thus s in (3)

takes the values n and m and we have

L

s(s - 1) (2s + 5)

=

n(n - 1) (2n + 5) + m(m - 1) (2m + 5),

E

s(s - 1) (s - 2)

=

n(n - 1) (n - 2) + m(m - 1) (m - 2),

E

s(s - 1)

=

n(n - 1)

+

m(m - 1) .

Substituting in (3) and (5), we obtain after some reduction

var (U)

=

,\nm(n + m + 1) -

-h

E

t(t - 1)(2t +5) (6) I + n(n - 1) (n - 2) + m(m - 1) (m - 2)

E

t(t _ l) (t _ 2) 36(n + m)(n + m - l)(n + m - 2) 1

+

n(n - 1)

+

m(m - 1)

E

t(t _ l) 8(m

+

n)(m

+

n - 1) t '

where

E,

takes place over the ties among the values Xi , • • • , Xn, Yi, • • • , y,,.,

taken together. ·

When no ties are present this reduces to results of Mann and Whitney [2]:

(7) E(U)

= ½nm;var

(U)

=

bnm (n + m + 1).

From (6) and (7) it is easy to prove (e.g., by induction) that var (U) is decreased by the presence of ties among the observations. These results constitute a first _

(3)

NOTE ON WILCOXON'S TWO-SAMPLE TEST 135

step towards the possibility of using Wilcoxon's test for samples from any

population.

REFERENCES

[l] F. WiLCoxoN, "Individual comparisons by ranking methods," Biometrics Bull., Vol. 1 (1945), pp. 80-83.

[2] H.B. MANN AND D.R. WHITNEY, "On a test of whether one of two random variables is stochastically larger than the other," Annals of Math. Stat., Vol. 18 (1947), pp. 50-00.

[3] M. G. KENDALL, "A new measure of rank correlation," Biometrika, Vol. 30 (1938), pp. 81-93.

(4)
(5)

Reprinted from THE ANNALS OF MATHEMATICAL STATISTICS

Vol. 23, No. 1, March, 1952 Printed in U.S.A.

NOTES

NOTE ON WILCOXON'S TWO-SAMPLE TEST WHEN TIES ARE PRESENT

BY J. HEMELRIJK Mathematical Centre, Amsterdam

Wilcoxon's parameterfree two-sample test (cf. Wilcoxon [1]; H.B. Mann and D. R. Whitny [2]) depends on a statistic U with the following definition: If x1, · · · , x,. and Y1, · · • , Ym are the two samples, U is the number of pairs

(i, j) with x,

>

Yi. The probability distribution of U, under the hypothesis

that the samples have been drawn independently from the same continuous

pop-ulation, has been derived by Mann and Whitney. The influence of ties on this probability distribution has not been investigated as yet.

It is noteworthy that Wilcoxon's U is closely connected with the quantity S,

which Kendall (cf. e.g. Kendall [3]) introduced in the theory of rank correlation. When r pairs of numbers (u1c, v1c) are given, Sis computed by scoring:

-1, if (ui. - u1c) (vi. - v1c)

< 0,

0, if (uh - U1c) (vi. - V1c)

=

0,

+1,

if (ui. - u1c) (vi. - vk)

>

0,

and adding the scores for all pairs (h, le) with h

<

le. If, in this definition, we take r

=

n

+

m and substitute the values X1 , · · · , Xn , Y1 , · · · , Ym in this order for U1 , · · • , Un , Un+1 , · · · , u, , and 0 or 1 respectively for V1c if Uk

=

x;

for some i or Uk

=

Yi for some j respectively, then the following relation holds:

(1) 2U

+

S

=

nm.

The simplest way to Se(\ this is by considering the total score of 2U

+

S for

every pair (h, le). This score is equal to

+1

if Vh = 0 and v1c = l, and 0 other-wise. The sum of the scores is therefore nm.

Relation (1) holds if no ties are present among the two samples x1 , • • •,

x,. and Y1, · · · , Ym. It is natural to define U in general by extending (1) to

the case when there are ties. Since for a pair (xi, Yi) with x;

=

Yi the score of Sis equal to zero, the score for U must be taken as

½

for such a pair.

Now Kendall has derived the mean and the standard deviation of S under

the hypothesis that for a given order of the quantities v1 , • • • , v, all the r !

possible permutations of U1 , • • • , u, are equally probable. This condition is

fulfilled in our case if the samples X1, • • • , Xn and Yi, · · · , Ym have been drawn

at random from the same population (which need not be continuous anymore).

Therefore, the mean and standard deviation of U under the null hypothesis

may be derived from Kendall's formulas.

(6)

134 J. HEMELRIJK

According to Kendall ([4], pp. 56 and 60), we have (2) and E(S)

=

0 var (S)

=

-ls

{r(r - 1)(2r + 5) -

L

t(t - 1)(2t + 5) t " 1 . " (3) -

L:

s(s - 1)(2s + 5}} + gr(r _ l)(r _ 2)

{'7'

t(t - l)(t - 2)}

• {L

s(s - l)(s - 2)} + 2 ( 1 l) {L t(t - l)}{L s(s - 1)}, • r r - t

where summation

Lt

takes place over the various ties among u1 , • • • , u, , and

Ls

over the ties among v1 , • • • , Vr ; t and s respectively indicating the number of

elements in every group of equal numbers among u1 , • • • , Ur and v1 , • · • , v,

respectively. From {l) we have (4)

and

(5)

E(U) =

½

nm - E(S) =

½

nm

var (U)

=

¼

var (S).

The group V1 , • • • , Vr consists of n numbers O and m numbers 1; thus s in (3)

takes the values n and m and we have ·

I:

s(s - 1) (2s + 5)

=

n(n - 1) (2n + 5) + m(m - 1) (2m + 5),

L

s(s - 1) (s - 2)

=

n(n - 1) (n - 2) + m(m - 1) (m - 2), •

L

s(s - 1)

=

n(n - 1)

+

m(m - 1). 8

Substituting in (3) and (5), we obtain after some reduction var (U)

=

Anm(n + m + 1) -

-h

L

t(t - 1)(2t +5)

(6) t + n(n - l)(n - 2) + m(m - l)(m - 2)

L

t(t _ l)(t _ 2) 36(n + m)(n + m - l)(n + m - 2) t

+

n(n - 1)

+

m(m - 1)

L

t(t _ l) 8(m

+

n)(m

+

n - 1) t '

where

Lt

takes place over the ties among the values x1 , • • • , Xn, y1 , • • • , y"',

taken together.

When no ties are present this reduces to results of Mann and Whitney [2]:

(7)

E(U)

=

½nm;var (U)

=

A

nm (n + m + 1).

From (6) and (7) it is easy to prove (e.g., by induction) that var (U) is decreased by the presence of ties among the observations. These results constitute a first

(7)

NOTE ON WILCOXON'S TWO-SAMPLE TEST 135

step towards the possibility of using Wilcoxon's test for samples from any

population.

REFERENCES

[1] F. WrwoxoN, "Individual comparisons by ranking methods," Biometrics Bull., Vol. 1 (1945), pp. 80-83.

[2] H.B. MANN AND D.R. WHITNEY, "On a test of whether one of two random variables is stochastically larger than the other," Annals of Math. Stat., Vol. 18 (1947),

pp. 50-60.

(3] M. G. KENDALL, "A new measure of rank correlation," Biometrika, Vol. 30 (1938),

pp. 81-93.

(8)
(9)

~~

(i

...

9..-V

~

Reprinted from THE ANNALS OF MATHEMATICAL STATISTICS

Vol. 23, No. 1, March, 1952 Printed in U.S.A.

NOTES

NOTE ON WILCOXON'S TWO-SAMPLE TEST WHEN TIES ARE PRESENT

BY J. HEMELRIJK Mathematical Centre, Amsterdam

Wilcoxon's parameterfree two-sample test (cf. Wilcoxon [1]; H. B. Mann and

D. R. Whitny [2]) depends on a statistic U with the following definition: If Xi , · • • • , Xn and Yi , • • • , Ym are the two samples, U is the number of pairs (i, j) with x;

>

y1 • The probability distribution of U, under the hypothesis that the samples have been drawn independently from the same continuous pop-ulation, has been derived by Mann and Whitney. The influence of ties on this probability distribution has not been investigated as yet.

It is noteworthy that Wilcoxon's U is closely connected with the quantity S,

which Kendall (cf. e.g. Kendall [3]) introduced in the theory of rank correlation.

When r pairs of numbers (uk, vk) are given, Sis computed by scoring:

-1, if (uh - Uk) (vh - vk)

<

0, 0, if (uh - Uk) (vh - vk)

=

0, +1, if (uh - Uk) (vh - vk)

>

0,

and adding the scores for all pairs (h, k) with h

<

k. If, in this definition, we take r

=

n

+

m and substitute the values Xi , • • • , Xn , Yi , • · • , Ym in this order for ui , · · · , Un , Un+i , · · · , Ur , and 0 or 1 respectively for Vk if Uk

=

X;

for some i or Uk

=

Y; for some j respectively, then the following relation holds:

(1) 2U

+

S

=

nm.

The simplest way to see this is by considering the total score of 2U

+

S for every pair (h, k). This score is equal to

+

1 if vh

=

0 and vk

=

I, and 0 other-wise. The sum of the scores is therefore nm.

Relation (I) holds if no ties are present among the two samples Xi, • • •, Xn and Yi, • • • , Ym. It is natural to define U in general by extending (1) to

the case when there are ties. Since for a pair (x;, y;) with x;

=

y1 the score of

Sis equal to zero, the score for U must be taken as½ for such a pair.

Now Kendall has derived the mean and the standard deviation of S under

the hypothesis that for a given order of the quantities Vi , • • • , Vr all the r !

possible permutations of ui , · · · , Ur are equally probable. This condition is fulfilled in our case if the samples Xi, • • • , Xn and Yi, • • • , Ym have been drawn at random from the same population (which need not be continuous anymore). Therefore, the mean and standard deviation of U under the null hypothesis may be derived from Kendall's formulas.

133

!iliL}OlHEEK MATHEMATISCH CENi'RUM

AMSTERDAM

(10)

134 J. HEMELRIJK

According to Kendall ([4], pp. 56 and 60), we have

(2) and E(S)

=

0 var (S)

=

-h

{r(r - 1)(2r

+

5) -

L

t(t - 1)(2t

+

5) t (3) -

~

s(s - 1)(2s

+

5))

+

gr(r _ 1\(r _ 2) {~ t(t - l)(t - 2)) • {~ s(s - l)(s - 2))

+

2r(r

~

l) {~ t(t - 1)) {~ s(s - l)l, where summation

Lt

takes place over the various ties among u1 , • • • , Ur, and

L,

over the ties among v1 , • • • , Vr ; t and s respectively indicating the number of elements in every group of equal numbers among u1 , • • • , Ur and v1 , · • • , v,

respectively. From (1) we have

(4) E(U)

=

½

nm - E(S)

=

½

nm

and

(5) var (U)

=

¼

var (S).

The group v1 , • • • , Vr consists of n numbers O and m numbers 1; thus s in (3)

takes the values n and m and we have

I:

s(s - 1) (2s + 5)

=

n(n - 1) (2n + 5) + m(m - 1) (2m + 5),

L

s(s - 1) (s - 2)

=

n(n - 1) (n - 2)

+

m(m - 1) (m - 2),

L

s(s - 1)

=

n(n - 1)

+

m(m - 1) .

Substituting in (3) and (5), we obtain after some reduction var (U)

=

-,hnm(n + m + 1) -

-h

L

t(t - 1)(2t +5) (6) t

+

n(n - l)(n - 2)

+

m(m - l)(m - 2)

L

t(t _ l)(t _ 2) 36(n + m)(n + m - l)(n + m - 2) t

+

n(n - 1)

+

m(m - 1)

L

t(t _ l) S(m

+

n)(m

+

n - 1) t '

where

Lt

takes place over the ties among the values X1, · · · , Xn, Yi,···, y,,,,

taken together.

When no ties are present this reduces to results of Mann and Whitney [2]:

(7) E(U)

=

½

nm; var (U)

=

-h

nm (n

+

m

+

1).

From (6) and (7) it is easy to prove (e.g., by induction) that var (U) is decreased by the presence of ties among the observations. These results constitute a first

(11)

NOTE ON WILCOXON'S TWO-SAMPLE TEST 135

step towards the possibility of using Wilcoxon's test for samples from any

population.

REFERENCES

[l] F. WILCOXON, "Individual comparisons by ranking methods," Biometrics Bull., Vol. 1

(1945), pp. 8o-83.

[2] H.B. MANN AND D.R. WHITNEY, "On a test of whether one of two random variables is stochastically larger than the other," Annals of Math. Stat., Vol. 18 (1947),

pp. 50-60.

[3] M. G. KENDALL, "A new measure of rank correlation," Biometrika, Vol. 30 (1938),

pp. 81-93.

(12)

References

Related documents

Some important risk factors that could cause the Group's actual results to differ materially from those expressed in its forward-looking statements include, but are not limited

The effective-one-body (EOB) approach to the general relativ- istic two-body problem [5 –8] is a powerful analytical tool that reliably describes both the dynamics and

In this case the consumption growth effect does not depend on the habits intensity, γ , so that the fall in the expected return of the asset is due solely to the enhanced demand

Therein lies the dilemma that small-deck carriers face: how to juggle their already limited complement of planes between offensive and defensive purposes� The raison d’être of

Pada skenario ini jika robot yang bertugas untuk mencari dan berjalan menuju node posisi tujuan menemui halangan berupa halangan statis ataupun halangan dinamis maka

In this paper, a VBA method is proposed for doing Bayesian computations for inverse problems where a hierarchical prior modeling is used for the unknowns.. In particular, two

To answer whether the irradiated cancer patient who is sched- uled for rehabilitation with osseointegrated implants (OI) would need hyperbaric oxygen therapy (HBO) before surgery,

The screening incidence can be used as a measure of the extent of screening in a defined target group, and the ratio defined above can be used as a measure of the appropriateness