Goodness-of-fit test for discrete and censored data, based on the empirical distribution function

(1)

Pettitt, Anthony (1973) Goodness-of-fit test for discrete and censored data, based on the empirical distribution function. PhD thesis, University of Nottingham.

Access from the University of Nottingham repository:

http://eprints.nottingham.ac.uk/11257/1/468818.pdf

Copyright and reuse:

The Nottingham ePrints service makes this work by researchers of the University of Nottingham available open access under the following conditions.

· Copyright and all moral rights to the version of the paper presented here belong to the individual author(s) and/or other copyright owners.

· To the extent reasonable and practicable the material made available in Nottingham ePrints has been checked for eligibility before being made available.

· Copies of full items can be used for personal research or study, educational, or not-for-profit purposes without prior permission or charge provided that the authors, title and full bibliographic details are credited, a hyperlink and/or URL is given for the original metadata page and the content is not changed in any way.

· Quotations or similar reproductions must be sufficiently acknowledged. Please see our full end user licence at:

http://eprints.nottingham.ac.uk/end_user_agreement.pdf

A note on versions:

The version presented here may differ from the published version or from the version of record. If you wish to cite this item you are advised to consult the publisher’s version. Please see the repository url above for details on accessing the published version and note that access may require a subscription.

(2)

Goodness-of-fit tests for discrete and censored data, based on the empirical

distribution function.

by

ANTHONY PETTITT, B. Sc., M. Sc.

3

Thesis _submitted _to _'the University* _of Nottingham for the degree _of Doctor _of _Philosophy, _October _1973-

(3)

An CrPT) A rl'P

In this thesis two _general _problems _concerning _goodness-

01 f-fit statistics based on the empirical distribution are

cI onsidered. The first _concerns the _problem _of _adapting

Kolmogorov-Smirnov _type _statistics _to _test _for _discrete

Populations. The _significance _points _of the _statistics

are _given _{and various} _power _comparisons _made.

The. second problem concerns testing for goodness-of-fit

with _censored data _using the Crame'r-von Mises type _statistics.

The _small _and _large _sample _{distrýbutions} _are _given _and the

tests _are _modified _{so that} _they _can _{be used} _to _test _for

. the normal and the exponential. distributions. The asymptotic

theory _is _developed. _Percentage _points _for _the _statistics

are given and various small sample and large sample power

(4)

SUMMARY

ACKNOWLEDGMENTS

§1. _INTRODUCTION

Part I Kolmogorov _{- Smirnov} Type _statistics for discrete data. _.

§2. _{SMALL SAMPLE DISTRIBUTION} _{OF THE STATISTICS} 2.1 Introduction

2.2 A result _of Stock

2.3 The distribution _of DD+ _and V for discrete nnn

populations using Steck's result.

2.4 Tbe distribution _of V for _certain _values _of _{n and} q. 2.5 The tables _of D, D+ an dV

nnn

2.6 Power

Study

2.7 Results _of _power _study

ASYMPTOTIC DISTRIBUTION

_{OF THE STATISTICS}

3.1 Introduction

3.2 Asymptotic

distribution

_of

the

_statistics

for

fixed

_q.

A RESULT CONCERNING _{THE ASYMPTOTIC} _{POWER OF THE STATISTICS} FOR THE SIGNIFICANCE _LEVEL _TENDING TO 0

11.1 Introduction

4.2 _{A different}

_formulation

_of

_the

_statistics

4-3 Some extensions

_of

theorems

_{of Hoeffding.}

Qi-'4 Some

_examples

_of

_{the previous}

_results

A COMPARISON _{OF THE ASYMPTOTIC} _POWERS 5-1 Introduction

5.2 Previous _results 5-3 Results

(5)

I

Part II Goodness-of-fit for censored data using Crame'r-von

Mises tne _stafistics

THE I$ STATISTIC FOR TESTING GOODNESS-OF-FIT WITH

CENýORED DATA WITH A SIMPLE HYPOTHESIS

6.1 _Introduction

6.2 _Small

_sample

_propert-Los

_of

PW

nrnq,

alla

p n

le-

6-3

_Exact

_lower-tail

_distribution

_of

_the

_statistic

VP*

pn

6.4 _Theory

_{of the}

_asymptotic

_distribution

_{of the}

_statistics

Ive

_{IN2 and}

v

rnpnq,

p n

6-5 Moments

_{and percentage}

_points

_of

_{We and}

W2

p

q, p

6.6 _Asymptotic

_power

_of

pV

aný its

components

against

shifts

in

location

and scale

THE VA STATISTIC WITH CENSORED DATA FOR A COMPOSITE

HYPOTHESIS

7-1

Introduction

7.2 The _asymptotic form _of the _empirical _process

7-3 The _weighted _)ý _{representation} _of

pV

7-4 Testing

for

_normality

_with

p

U2

7-5 Testing

for

the

_exponential

_distribution

_in

life

testing

_with

pV

THE ANDERSON-DARLING STATISTIC FOR CENSORED DATA

8.1 _Introduction

8.2 _Exact _moments _of _A2

, A2 and A2

pnrnp

8-3 _Asymptotic _distribution _of _A2, _A2

rnpn

8.4 Significance

_points

_of

A2

p

8*5 Asymptotic

_{power of}

_{A2 against}

_shifts

_{in location}

_and

p

scale

8.6 Testing

_for

_normality

_using

P

A2

Part

III

(6)

I

9.1 Introduction

9.2 The statistics

DID+

_{and V}

introduced

in

Part

I

n, q

9-3 The statistics

I$

_and

A2 introduced

_in

Part

II

pp

ý10.

_{RELATED AND FURTHER WORK}

10.1 Introduction

10.2 _The _statistic _U2

10-3 _{Goodness-of-fit} _statistics _fo'r _the _2-sample _problem

and the bivariate sample with censored data

10.4 _{An alternative} _method _of _testing _for _normality _with

censored data

10-5 Conclusion

REFEREIýCES

(7)

ACKNOWLEDGMENTS

I would like to thank Professor M. A. Stephens for his

encou. &agement,

guidance

_{and many helpful}

_suggestions

through-

out the preparation _of this _work, _{and without} _whose

enthusiasin this _work _would _never have been _started.

I am also grateful to Professor C. W. J. Granger for his

assistance in the _submission _of this thesis; _also to the

invitation

_given

conference

'Uses

SUNY at Buffalo,

National

_Science

Many thanks

thesis.

by Professor

E. Parzen

_{to attend}

the

of the

empirical

distribution

function'

at

New York

in

September,

_{1971 and the}

I

Foundation

for

their

_financial

_support.

(8)

1.1

§1. Introduction

In _much _statistical inference _and _model building, there

is _{a need} to test the _validity _of _assumptions _made _about -the

underlying _populations, from which observations have been

drawn. In its _simplest form, this _problem _can be stated

as follows: _{a random} _sample of n observations is taken from

a population with cumulative distribution function F(x), and

it is _wished to test the _null hypothesis.

11 : F(x)

EF0 (x)

where

F0 (x)

is

a completely

speqified

distribution

function.

The classical

solution

to this

problem

is

the

test

of

. Karl Pearson, described in detail in Kendall and Stuart

(1961,.

_p.

_419).

_{The criticisms}

_of

_this

_test

_are

_that

_its

small 'sample distribution is not tabulated and so is applicable

only to large samples, and that if F(x) is assumed. not to be

discrete then there is. _{a necessary} loss _of _power in its _use,

, due to the fact that the observations have to be grouped.

These two _criticisms _have _been _overcome _by _the _{introduction'}

of the Dn statistic by Kolmogorov (1933). The statistic

is based _{on the} _empirical distribution function Fn (X), _where

FnW is defined for _{a random} _sample _{x 1'x2'**"xn} _of _size _n

by

FW= _Proportion _of _observations (xll.... _rx <x

nn-

The _statistic _{Dn is} _given _by

Dn=

_SUPIF

n

(x)-

F0 (x)1.

This _statistic _is _{shoi%m to} _{be distribution-free} _for

continuous F (X) and Kolniogorov (1933) _obtained its _asy totic

0 MP

(9)

1.2 l im P KnD n< x) =1+27G1 )'r exp (- 2r2 x2, )-1-2 exp (- 212 2,2 ) n-fC1D r=l

He also gave rbcurrence _relations for the _probability for

finitý _n, _which _were _used _{by Birnbaum} (1952) _to _tabulate

P(D

n< k/n) for n= 1(l) 100 and k l(l) 15- Previously

Massey (1950) _had _given _these _{probabilities} _but _for _selected

values _of _{n and k.}

Since _D _is _{a two-sided} _statistic, _one-sided _versions _have

n

been _considered _{by Smirnov} (1939) _and _Wald _and _Wolfowitz (1939),

D+ '_- sup(P (x)- F W)

n

11

0 x

D_ = sup(F (X)- F (x)) n0n x

Smirriov

(1939)

_gives

_the

_asymptotic

_distribution

_of

_these

statistics

(they

are

distributed

exactly

the

same):

_X2

lim P (Vn D+< _X) _=1- _{exp ( 'r}

So 4n(D+)2 _has _the _distribution _with _two _degree _of _freedom

n

asymptotically. Birnbaum _{and Tingey} (1951) _give _the _asymptotic

points. D is known _{as the} Kolmogorov _statistic _and _{D+ and D-}

nnn

are kno-wn _{as the} Kolmogorov-Smirnov _statistics.

Other _statistics _based _{on the} _empirical _distribution

function _to _{be proposed} _have _been _of _the _form

GD

Vý =S (F (x) _-F (x))2 * (F (x)) dF (x)

-GD

known

_{as Crame'r-v on Mises}

_{(CVIM) type}

_statistics,

which

are

distribution-free

_for

continuous

F0 (x).

The statistic

_with

*(X)

_{was investigated}

_{by Smirnov}

_(1936):

(30

IV2 =nI (F (x)

-F (X))2 dF (x)

- QD

-who found the _asymptotic _{characteristic} function _{of W2}

(10)

1-3

lim EFexp(itlv'? -

F

(2it

n-4

aD

LL

sin( (2it

This _was inverted by Anderson _and Darling (1952) to _give the

limiting distribution _of _{W2 and selected} _significance _points

n

were given. Marshall (1958) found the small sample distribution

of V

for

_{n uPto}

3 and showed the asymptotic

points

of

n

Anderson _and _Darling (1952)-to _{be adequate} for _{n as small}

as 3. Approximations to the small sample distribution have

been _made _{by Pearson} _and Stephens (1962) _{and Tiku} (1965).

Stephens (1970) _shows _that the _modified _statistic

UA

_{= (VA -0.4/n+O.}

6/n: 2 Y(l. 0+1

_.

'O/n)

nn

can be used

w ith

asymptotic

significance

points

with

very

little

error in the actual critical level for small n.

Anderson _and Darling (1952) introduced the CVM type

1

statistic with *(X) _{= (x(l-x))-}

GD ₁

, &2 = (F W-F (x»2 EF (x)(1-F WM- dF (x) _GD n000

They

found

its

_{characteristic}

function

_{and gave significance}

points, Anderson and Darling (1954). These were shoim to be

fairly

_accurate

_{even for}

_n=1

by Marshall

(1958).

The. K-S type _and CVM type _statistics _were _modified for

tests _of _{goodness-of-fit} _{on the} _circle by Kuiper (1960) _and

Watson (1961). _The _tests _have _to _{be invariant} _of the _origin

chosen _{on the} _circle. Kuiper introduced the _statistic

VD++ D-

nnn

and Watson the _statistic

(ID _(ID

C=f _[F _(x)-F _W-S _(F _(y)-F _(y» _{dF (y)32} _{dF (x)}

n0n00

(ID _{- (ID}

Simple _computing formulae _and _simplified _significance _points

for

the

_statistics

V

_{and Lý are}

_given

in

Stephens

(1970)-

(11)

1 _1.4

The _main _criticisms _of the _statistics _{as they} _stand, is

that they _are dependent-upon the distribution _of the _null

I

hypothesis being _{a continuous} _population _and the _null hypothesis

being _simple. _Darling (1955) _tackled the latter _problem _where

he

gave the limiting distributions _of the _statistics when the

null hypothesis is _of the form

H0: F (x)

_=-

where F (x, Q) is _{a c-4-f-} of specified functional form with 9

01

an unknown _parameter.. Kacj Kiefer and Wolfowitz (1955) invest-

igated _the _problem _of testing for _normality, _when _{11 and 02}

are to be estimated, using the statistic I$ The results

n

of both

these

_papers

have

been extended

by Stephens'(1973)

and significance points given for large sample tests when

testing for _{exponentiality} _and _normality _using the _statistics

W2, L12 and A2. The _small _sample distributions _of the K-S

nnn

and CVM type statistics have been found by Monte Carlo _methods

by various _people _and the _results _summarized _in _Stephens (1970)-

The problem of applying the K-S _and CWI type _goodness-of-

fit _statistics to discrete _and _grouped _data _is _tackled by

Illyasenko (1952) _in _the _asymptotic _case _for _D _leaving _the

n

c-d. f- _{as an} integral _-involving _{the distribution} _{of the order}

statistics from _ak _multivariate _normal distribution. Noether

(1963) _in _{a short} _note _states _the _significance _points _of _the

Dn statistic

_will

_{be conservative}

_{when used with}

discrete

data.

_{The asymptotic}

_{distributions}

_{of the}

_{K-S type}

_statistics

remain intractable, but _are found by Monte Carlo _methods.

Also the _small _sample _{distributions} _of _the _statistics _are

found in this thesis _and _extensively _tabulated. _The _power

of the statistics is investigated both for _small _samples

(12)

1-5

and the latter in two situations: the first where the

significance level is kept constant and the second the

significance level is allowed to go to 0 as the sample size

increases.

In. the _second _part _of _the _thesis _{we look} _at the _problem

of adapting the CVM type _statistics to use for goodness-of-fit

tests _with _censored _data. _The _statistics _considered _are _of

the

_type

S2

pn

Sn pr -y (ID

(F

n

(X)-F

0 (x»2

_{ý (F}

0 (x) ) dF

0 (x)

(x)-P (x»2 _{ý (F} (x» _{dF (x)}

We take

*(x)

_-=

_{1 and *(x)}

_{= (x(l-x))-',}

the

former

_giving

the I$ _statistics _and W2, _and the _latter the _{A2 statistics,}

prn

A2 and A2. The _small _sample _moments _of the _{1$ statistics}

pnrn

are found and approximate percentage given by fitting _a

generalized Xý distribution to the distribution _of We by

equating

the

first

three

moments.

The _asymptotic theory _of the _statistics _is _given _and _a

good _{approximation} to the distribution _of UP and A2 found.

The _asymptotic theory is _also _viven for W2 when used _with

doublely _censored _{observations.} _Asymptotic _percentage _points

are _given _when the _censoring is _symmetric. The _asymptotic

powers _of the tests based _on

p IY2 an dP A2 and statistics derived

from them _are _studied _for _scale _shift _of _the _exponential

distribution,

_{and location}

_{and variance}

_shift

_of

_the

_normal

distribution.

_{The statistics}

_{are modified}

_{so that}

_they

_can

be used to test

_{a composite}

hypothesis

_{and the}

_asymptotic

(13)

1.6 -

of testing for normality and the exponential distribution

with censored data _are _given for the I$ _{and A2 statistics.}

Results _of MoiAe _Carlo _power _studies _are _given _to _compare

the _power _of _the _small _sample _tests.

Finally, _the _statistic _{U2 is} _modified _to _test _for

censored data _and the theory developed.

(14)

1 _2.1

Part I- Kolmogorov Smirnov Tn3e _statistics for discrete data §2. Small-sample _distribution _of _the _statistics

2!. l Introduction

In this _section, _the _exact _{distributions} _of _D+, _D _and nn

Vn _are _found, _and _the _{distributions} _are _tabulated _for _the discrete _uniform _population (which _can _arise _from _grouping

-n observations into-q mutually exclusive groups which are

equally likely) _{on q points} for _n,

-< 30 and q.: ý 12. The

tables _are _given _in _Appendix _1. _{The results} _of _{a power}

, study are also given comparing these statistics with the

statistic _and Davids Empty Cell test, see David (1950),

. showing these latter statistics to be less powerful for

certain alternatives.

2.2. A result _of Steck

The method used is to apply a result of Steck (1971). He

shows that if x(l)'x(2)' _..., _x(n) _are _order _statistics from _a

sample _of n independent uniform. _random _variables then

,

Pr(uj.: ýx(j):

ývjj

i=1,

---,

n) = det R, where

R. j

(v

_-u

3. i

j-i+l ii

=

1 +1

and (X)+

_{= max(O, x).}

R is

_{a matrix}

_{of Hessenberg}

form.

This

_result

_{can be stated}

in

terms

_{of the}

_{e. d. f.}

_{F (X)}

Pr (g (x)

_<F

(X) <h (x) )=det

(R)

where _{g and h are} _non _negative functions _on (0,1)i _with _g

continuous to the left _{and h continuous} to the _right, _and

ui, v _a. are defined by u. ₌

3. 11-

1 (i/ta) _v _9- 1 ((i-l)/n)

(15)

2.2

2-3. The distribution _of Dn, Dn and Vn for discrete populations,

using Steck's result

Consider _{a random} _variable _{y to} be distributed discretely

with q(< aD) mass _points bli < ... <bq Define z= G(y), where

G is the. _{c. d; f.} _of _y, _then _{z is} _{a discrete} _random _variable with

mass points 0< G(b _1)< G(b _2)<... G(b

q-1

)< _G(b

q)=1.

Let

ai= G(b )i=1,..., _q, then Pr(z <a)=ai=l, _---, q-

Consider _now _the _e-d-f- _{Gn (y)} _of _{a random} _sample _yl, _.... _Yn

from the _discrete _population _with _{c. d. f.} G(y), Gn (Y) ₌

Number _of _{y i-: ý y} Gn(y) _will have discontinuities _only _at

nt

the _points bl,..., b

q, where it may have steps-of

height _equal

-to multiples of I/n. Define

number of z<z number of y,:: ýG -1 (Z)

H

') =nn

n

(7

where Q (z) _{= inffy:} G(y)= z) and z.

L Ck

(Y i

Then Hn (z) _will _{be a step} function having _steps _multiples

of 1/n at the points a,.,... _{ta q*} Now Gn (y)- G(y) _=Hn (y)- H(y)

so

SUPIG

n

(y)- _G(y)l _suplH

n

W- _H(z)j,

yz

and it is wished to find the distribution of

suplH (z)- H(z)l _{= maxlH} (a. )- If(a Now at the points

nn3.

a,, ... aHn (z) behaves like the e. d. f. of a sample of n

independent _uniform _random _variablesi _{i. e. nH} (z) is

n

binomially _distributed _Bi(ai, _n) _at _each _a,, i= _{19 ... 9q.}

So Pr(maxlH n (a 3- H(a i )I _{< d)} i Pr(maxlF n

(ai)- _ail _{< d) where} _F

n(x) is the e. d. f. of n

i

(16)

2-3

result can now be used to f ind Pr (maxI F' (a. _ni a, I< d) and

+ 3.

similarly for Dn Pr(tilax(F

n (a i )- aM)< d). Define h(x) _min(l, _{a i +d),} _a, _, :ýx<a where a0 _and 0 9%x) _{max%'O, a} d), a. <x<a. 3.0, q-1 3. - 3. +

then _Pr(maxIF (a. )- _ail _{< d)} _{= Pr(g(x)} <FW< h(x)) i, nin

= det R

'where R _ij _=(i Xv _{i-u i} )+ j-i+1>0

j-i+l

=0 j-i+1<0

where v. and u. are found by taking inverses of g and h. 3.

. Put n(a i +d) and Yj

n(a. _-d). ₌ a and [a. ]

3. i 3.3.

Also if _y,. > n then _Yi _n, _and if 5. <0 then 6 0.

3. Define _ui a,, ₂

a 2'

. -a ql'

yq+ll

...,

n

a 2'

612 aq

q-1

+1,...

96 k

q

2.4. Distribution _of Vn for _certain _values _of _{n and. q}

(17)

2.11

q, -then the only pnn ossible values of D+ are D are multiples

of 1/n. We can then find, for example-, Pr(nD + :ýk, 1 nD _n _n-- <k ₂

by the _above _method by defining _n(a

I +k and

ai n(a, _{-k 2} Now the statistic Vn is defined to be

VD+ _+D-.

nnn

So the _{distribution.} _of Vn is just the _convolution _of

D+ and D-.

Thus

nn

Pr(nV

< k)

_{= Pr(nD}

++

_{nD- < k)}

nn

n-

Now

+k+

Pr(nD _{+ nD-} _{= k)} ₌ 7, Pr(nD _{= i,} _nD _{= k-} i) n. n _i=0 .nn

So the _{p*. d. f.} _of V is _obtained for this _special _case, _since

n Pr(nD += kj, nD- =k Pr(nD+< kl, _nD-< k nn2n- n- 2 Pr(nD+< _{k -1,} _nD-< k n- 1 n- 2 Pr (nD +<k _{nD - <k} _-1), n- _{ii -2}

2-5.

The tables

_{of D,}

_{D+ and V}

n

--

nn

For

the

_statistics

D,

_{D+, V}

_the

nnn

Pr (nS

_{:ý k)}

where S is D, D+ or V is _given for values q up to J. 2 and n

nnnn

up to 30 so that _{q divides} _n. Note that for n=q the

distribution _of _D _and _{D+ is} _given _by _the _continuous _distribution

nn

free _results. _If _{DC denotes} _the _statistic _for _continuous

n

populations _and D

n, q the one for discrete or grouped

populations Ivith _{q mass} _points, then

I

Pr(nD c< k) _{= Pr(nDc<} _k) _{= Pr(nD} _{< k)} _{= Pr(nD} _{< k-} ₁₎

n- _n _{n, n} _{n, n-}

This

follows

directly

_from

_Massey's

_argument

(1950)-

(18)

2-5

grouping

has a large

effect

on the

distribution

of

the-

statistics.

To note the _effect _of _grouping _{on D, note} that

Pr(12-D 12 12. _, ýý5) = . 02115, Pr(12. D _{1 ,}_{2 6.}ýý5) = . 01422

Pr (12. D

12,4

ý' 5)ý*

013L15

Pr(12

D

12112ýý4)

= -10874

Pr(12

D

_{12,6.? ý4) = .}

09064

and

Pr(12

D

_12,4

ýý4)

_{ý -05974}

Pr(20

_D

_{>o6) = . 04307}

_Pr(20

_D>

6)

_{= -03273}

20,20

20,10-

Pr(20

D

2015

> 6) = . 02203

Pr(20

P

20,202ý5)

= -13763,

Pr(20

D

20,10-ýý5)

= *"0910

Pr(20

_{D20,5.?: 5), = -07617}

Also from Stephens (1965), _{we have} _Pr(V _< L) _and _the

nn

percentage points of the statistic Vn We see that the

effect of grouping is much stronger.

For _n= _12, _the _{10% and 1% significance} _points _are _given:

Pr(12 D 12215 -24) = . 10 and Pr(12 V ₁₂ > 6.48)

we have

Pr(12

V

_12,12

> 6) = -01131 Pr(12 V

12,12>

7) = -0010

Pr(12

Vl

2,6

ýý6)

= -001171 Pr(12

V12,6ý'7)

= -0003

Pr W2 V12,12oý5)

ý -0738

Pr (12 V12,62ý5)

_{2-* -0403-}

The values _of _{det R were} _calculated using _{a single}

Precision _program _written in FORTRAN, _using _{a method} _derived

from Wilkinson (1965). _The _tables _of _Dn.

,Dn and V _n. can be

taken to _{be accurate} _to _{5 significant} _figures _for _n< _25,

but become inaccurate _for _larger _values _of _{n especially} _in

(19)

2.6

2.6. Power Study

A power study was under-taken to compare the power of the

I

statistics D, D+ and V, when used correctly with discrete

Innn

data, _with the _power _of the _chi-squared _statistic and Davids

Empty Cell test. The _null hypothesis tested, in terms _of

the _multinomial _{distribution,} _was that the _{q sample} _points

were equally likely (or in terms of grouped data, the q cells

were _equally likely).

1 _(2-1)

Ho : Pl = P2 pq

q

The _alternatives _considered _were.

i _(i) 1. E-i=l, _{-. -, q,} _abbreviated as A (6) j=l q1

i6

j=l

j

-ý2L

pj

>

j=l

2ý

q

2,

Ae (6

I-C I+6 q <R so _pi q 16

A3(6

It

is

_{to be noticed}

_that

_all

_the

_statistics

_are

_discrete.

The distributions

_of

the

first

three

_statistics

_are

tabulated

here

_{and Davids}

Empty

Cell

tabulated

in

David

(1950)-

For

(20)

2-7

where

the

exact

moments

of

the

statistic

Xý are given

for

the

equi-probable cell case. For given n and q, a Pearson curve

was fitted

to the

first

four

moments

of the

statistic

and

percentage points obtained using interpolation of the tables

of Johnson,

Nixon,

Amos and Pearson

(1965).

The actual

chi-

squared _significant _point used was the greatest even integer

<a (P + 2n - -qaq

where

Pa is

the

Pearson

curve

_{approximation}

significant

point,

since

X9

_{-cL 7, xý}

2n +

npi

nIq

and n and q were

chosen

so that

7, x2. takes

only

even values.

3-

The _actual _significance _of _the _test _is _not _known, _but _is _at

least _{adlo in} terms _of the _{approximation,} _which is _reasonably

accurate,

Kompthorne

(1967)-

The _other _statistics _are _discrete, _{so no exact} _value

could be found. This _problem _was _overcome by using the

randomization tecluiique. If S denotes the _statistic _in

quest-ion and it takes _{on integer} _values 0,1,2,..., then

suppose su is the _smallest integer _such that

Pr (S >s)<

CL

'r

and s

_L

the

largest

integer

_such

that

Pr(S

>sL)>

_(x

and

Pr(S

>'S

uaU,

Pr(S

>sLa

L'

Then an cL% test

is

defined

for

the

_statistic

_{S, if}

for

_a

given value _{s of} S we reject if _s>sU _and _accept if _s<s _L*

Also, _reject _or _accept _as _{U is <a} _or _{> (x, where} _{U is a}

random variable distributed _uniformly _on (a

uaL This can

be applied to _{nD , nD}+ _and _nV _since _they _take _only _integer

nnn

(21)

2.8

2.7- Results _of _powen _study

The _results _of the _power _study _are _given in Tables _2.1

and 2.2. For the alternatives which have been considered here,

I

which are _systematic _alternatives to the _null, the Kolmogorov

type _statistics _{do well} _compared _to Davids Empty Cell test (E)

and the Chi-squared test (CH). For the _alternatives A, _and A3,

which _could be considered 'trend' _{alternatives,} (since the

pi. satisfy _either the _ordering _p, _{ýS P2-: ý ...} _{ýS pq or}

P1 > P2 >-> pq ) the statistics D and D! (Di refers to

either D+, for _alternatives _of the _second _ordering, or D for

alterna. tives _of the first _ordering) _are the _most _powerful.

Thus _for _trend _alternatives _it _seems _advantageous _to _use _the

.

statistic D, D+ and D- rather than CH or E.

For 'peaked' _alternatives _which _{we define} _to _be

1: 'I < P2 pr ?ý pr+l 2ý pq

' for some 1<r<q

or conversely Itroughed' _alternatives _p, ? ýp2 >

pr+l <<Pr for some 1 ' 'ý* < q' of _which A2 is _an _example,

V should be _powerful _and _this _is _shown _in _the _results _for _Ag. _V

was originally used for _{goodness-of-fit} _{on the} _circle, _see

Kuiper

(1960)t

_{and so again}

_{V can be used for}

_discrete

_or

grouped data _{on the} circle.

For the _alt,

PrP

r+l

powerful than CH

power _of D in §3

ernative A3, _which is _of the kind _p, _{ý P2 ý*}

p the statistics D, D+, V are much _more

and E. A result concerning the asymptotic

suggests that the power of Dn is _optimal for

this type _of _{Idichotanised'} _alternative.

I Thus for small sample sizes, the D, D+, V statistics

are

of importance

when the

_alternatives

_{of interest}

_are

systematic ones of the kind. _considered. D and D+ are _especially

(22)

2.9

both E and CH are invariant to the ordering of the pils, that

is the _powers _of the tests _will be the _{same for} the

alternative

HI: _p, _=PIIP2 _=PIS _...

9P _qq PI

aI=II...,

_p _=DI

as for

Hi - p, =p

U4 9 P2 p Cýe q (x

where (al, _... _laq ) is _{a permutation} of For

certain _{alternatives,} the

q) permutation was chosen

so that

i

sup IE

_{)I -I}

i

j=i

'i

q

was am. inimum, thus (a', _---, aq ) was selected so that the power

pf D, D+ was at a minimum, taken over

q For these

alternatives (see Table 2-39 called Af' (i = 1,2,3)), the

powers

of

the

statistics

D, D+IV

are

reduced,

but

usually

not by more than half, and often less. Of course the powers

of the statistics E and CH remain the same, subject to

sampling error. However for these _alternatives CH is _most

powerful

For the _alternative Ax- D, _N, _{V lose} _most _power. _However

this is _not _surprising _since _of _all the _alternatives _considered

the _statistics D, D _are _{comparatively} the best for the

(23)

Alternative _D _{D+ D-} _V _E _CH A CIO ₆₄ _{89 +} ₆₁ ₅₈ ₇₁ A1(. 8) ₉₄ ₁₄₅ ₇₁ ₆₂ ₉₃ Al(. 5) ₄₃₃ ₅₇₀ ₃₄₉ ₁₉₉ ₃₉₉ A (. 25) ₉₁₀ ₉₆₀ ₈₅₈ ₅₅₅ ₉₂₃ A (1.25) ₆₅ ₁₁₈ ₇₃ ₆₅ ₈₂ A (1.50) ₁₅₀ ₂₅₉ ₉₅ ₈₈ ₁₀₆ A, (2.00) ₃₇₇ ₅₆₂ ₂₈₁ ₁₉₄ ₂₆₀ Al(3.00) ₈₀₆ ₉₂₉ ₇₀₇ ₄₈₄ ₆₅₅ A (. 8) 2 74ý 64 68 58 82 A2(*5) ₁₇₈ ₁₅₅ ₂₉₆ ₁₆₅ 263 A2(. 25) 449 ₃₃₆ 757 451 706 A (. 1) 2 698 521 978 822 968 A2 (1.5) 48 44 131 90 119 A2 (2.0) 54 79 296 09 253 A2 (3.0) 160 180 700 466 600 A (3.5) 2 211 255 815 579 744 A3(. 033) 77 143 76 64 77 A3(. 066) 221 352 172 119 170 A3(") ₄₉₈ ₆₄₆ ₄₃₀ ₂₆₀ ₃₃₉ (, 5)(1.6ý2.5.3.4) A 294 432 ₂₄₉ 175 379 j A1 (2.0)(6., 1,5,2,493)

₁₀₃

167 +

₁₂₉

₂₁₇

₂₇₈

A2(. 5)(3jl,

2,4j6,5)

85

120

135

255 A* (3.0)(1,3,2,6,5,4)

2

147 177-

236

454

629 A*

_{1)(1.4,295,3,6)}

3

106

147

124

255

346 Table 2.1

Power of Statistics

x 1000, ie number of rejections

in 1000

replications.

denotes D+ (D7) statistic,

Level of test 5%

(24)

Alternative

_D

_D+

9D

V

E

CH

n

nn

_n

A1 (. 9)

₈₂

_{125 +}

₇₈

₅₉

₇₀

A (. 8)

₁₄₆

₂₁₄

₁₀₅

₇₄

₉₇

684

759 ₅₀₉

₃₁₄

₅₅₁

A (. 25)

₉₉₄

₉₉₈

₉₈₁

₈₉₃

₉₉₄

A1 (1.2 5)

₉₈

₁₆₀

₇₆

₅₈

₇₅

A1 (1.50)

₂₆₆

₃₇₁

₁₆₁

₉₄

₁₁₇

A1 (2.00)

₆₄₉

_774.

₄₇₅

₂₃₄

₂₆₆

A1 (3.00)

₉₈₃

₉₉₂

_ý922

₆₃₃

₆₂₇

A2 (*8)

₉₄

₉₀

₁₂₂

₆₇

₁₁₁

2 (. 5)

270

237

537

299

456 A2 (. 25)

₇₉₃

₅₉₁

₉₈₄

₈₈₂

₉₇₄

A2 (* 1

₉₉₇

₈₆₆

₁₀₀₀

A2 (1.5)

_5ý

₈₀

₂₂₅

_ý106

₁₁₆

A2 (2.0»

112 ₁₅₁

₅₈₂

₂₄₉

₂₇₆

A2 (3.0)

379 ₄₁₇

₉₆₉

₆₃₇

₆₅₂

A

₂

(3.5)

545

514 ₉₉₁

₇₈₆

₇₉₂

A3 Gol)

₁₀₅

₁₅₇

₈₈

₇₃

₈₃

A3 (. o2)

333 ₄₄₄

₂₄₇

₉₇

₁₄₃

A3 (. 03)

₇₀₉

₇₉₆

₆₀₆

₂₃₉

₂₇₅

Table. 2.2

Power of statistics,

_{x 1000, ie. number, of rej6cti6tid}

in 1000

replications.

denotes D+(D-) statistic.

_{Level of test 5%}

(25)

3-1

§3. AsYmptotic distribution of the statistics

3-1 Introduction

In this _section, the _asymptotic distributions _of the

s tatistics are given, but it is found they cannot be given

explicitly. A Monte Carlo investigation is made to find them.

3.2 Asymptotic _distribution _of the Statistics for fixed

If

_{n --> aD then}

the

_random

_variables

fn(H

n

(a

i

)_1-1(a

I

(i _li..., _q-1) _by _the _{Multi-variate} _central- limit theorem,

are distributed as Z= (Zj,... _{'Zn_d} where the Z is MU1ti7

variate

normal

with

NO

₌₀

and cov(Z)

=V

where

(V)ij

-

a (1-a i<j. Then

lim Pr (V»n D+ Pr(tnax(Z 0) rr) m n, q-i

.

Pr(in

D

pr(max(Z,

_i-Z

i)<

X)

MM

lim

Pr(in

V

n, q

<0=

Pr'(max(Z,

10)+ max(Z, 10)

Tr+ Co

ii

Hence

the

_asymptotic

distributions

_{of the}

_statistics

involve the _order _statistics _of the _multivariate _normal. With

such a covariance matrix the problem appears unsolvable, so

the _asymptotic distributions have been _simulated. For _each

value _of _q, the _"i _-qi=1,..., _q- 1 and 10,000 independent

values _of the _vector _random _variable Z were _generated _using

the

_{N. A. G. pseudo-random}

_number

_generator,

_{and the}

_statistics

on the right

hand

_side

_of

(3-1)

_calculated.

The results

_are

given in Tables 3-1-3-3 for D+, D, V _{respectively.}

nnn

It is _seen _from _Table (3-1) _that _the _{distributions} _of

the _statistics _{do not} _approach _the _{distributions} _of'the

/

(26)

3-2

for

_{q --> co.}

Nor in fact

do the

distributions

_of

the

statistics for fixed q approach the asymptotic distributions

for fixed _{q particularly} fast. For _example, from Table 3-1 lim Pr(fn D*4' < _.821) _-ý-90-

nj5

However

_from

_Appendix

_{1 we see that}

+4

Pr(D

30,5 :

ýý-O-) = . 906

Pr(V30

D+

< -730) = . 906

30,5 -

so the

10% upper

_significance

_points

are..

821 and -730 for the

asymptotic

_statistics

ýn D+

and P+

respectively

with

n, q

30, q

q

5. The approach

of the

asymptotic

distributions

of the-

other

statistics

is

no better

for

_q5

_{and other}

values

of

(27)

Vn D'I* ,q n q 5 6 7 8 9 10 11 00 15% . 718 . 734 7.5,4 . 769 _.777 _.784 . 1,98 . 973 10% . 821 _.837 _.855 . 873 . 882 . 884 . 898 1.073 5% . 975 _.989 1.008 1.022 1.040 1.047 1.051 1.224 21% ₂ 1.120 _1.129 1.148 1.156 1.176 1.178 1.179 1.358 1% 1.281 _1.290 _1.297 1.319 1.327 1.342 1.335 1.518

D

10 15%

.

885 .

906 .

917 .

938 .

941 .

947 .

957

1.138 10%

.

971 .

990

1.004

1.020

1.030

1.042

1.044

1.224 5%

1.112

1.131

1.140

1.157

1.166

1.179

1.171

1.358 2j%

1.244

1.254

1.264

1.289

1.287

1.311

1.296

1.480. 1%

1.381

1.403

1.430

1.418

1.428

1.455

1.444

1.628

Vn Vn q 68 ₁₀

15%

1.023

1.066

1.095

1.123

1.151

1.167

1.182

1.537 10%

1.111

1.157

1.180

1.214 _1.239

_1.251

1.261

1.620

52

1.250

1.292

1.310

1.341

1.360

1.381

1.394 _1.747

217.

1.371

1.399

1.434

1.456

1.471

1.504

1.511

1.862 1%

1.510

1.560

1.577

1.598

1.644

1.643

1.650

2.001 Table 3.1

Asymptotic _significance _points _{of the statistics}

Vii J)"* q

Vn Dn, q

_,

Vn Vn, q . found by simulation

n

(28)

4.. j

A result

concerning

the

as ymptotic

_power

_{of the}

statistics

for the _significance level tendinj ta 0.

4-1 Introductiýn

ýhis

section

is

based

_{on an extension}

_of

the work of

Hoeffding

(1965).

_{A test}

_of

_the

_simple

_multinomial

_hypothesis

H

0,

:p1=p

_ol'-

P2 = Po2' '* *'Pq

= Poq

based _{on n observations,} _when _{q is} _fixed, _is _considered.

. Hoeffding considers letting the significance level an tend

to _{0 as n ---> co, and considers} _the _power _of _tests _for _testing

H0'. when the alternative is _not very _near to the _null. He shows that _the _likelihood _ratio _test _{(LR test)} _(actually _log _of _the

likelihood _ratio)

q

LxI _log(x

i/Poi

where xI is the number of _observations falling in the i th category, is _more _powerful than the _{X2 test} _where

q

(x I -npo3. \2

except at a particular set of _{alternatives.} We find

alternatives

where

the power of the Kolmogorov-Smirnov

_type

statistics _are _equivalent to the LR test, _and _{so much} _more powerful than the If test.

4.2 _{A different} _formulation of the statistics Consider _the statistics

max

'E

(xl-pl)

ISIE

niE

_1-1

, where n is _{a collection} _of _non-empty _proper _subsets _of _the set (11 _.... _q). The _statistics D, _{D+ and V} _can _{be defined}

nnn

(29)

4.2

(i) _With _TI

i= ((l,..., j), I<j q) then

Z(x, _p) gives the statistic D+

n

max (x, _-p, ) max (x _i-pi

METR i+M j i=l

max n (G xi (a j )-a i i

where

G (a

nx

and a

lfi th MI _q), _1<j _{-ý q)} this _gives the statistic D--

n

(iii) _If _{MI. is} _{MI defined} _in M

and M). . is n defined in

3.11

(ii) _then _n...

= 93. U M. . will give Z corresponding to D

XXX 1 11 _n

(iv) _{Iii th M?}

iv (al ... Jig -it, ...? jig j, ...., q) for I<i<V<jI<j _{:ý q)}

will correspond to Vn It will be recalled that the

statistic Vn is used for goodness-of-fit on the circle. It can be shown that*V is the value of max D, when D is

I. _-nnn

calculated on the circle, the max being with respect to the choice _of _origin, Stephens (1965)- If the data is discrete

and whenthe _circle is transferred to (0,1) _with aO9al, _---, _a

q

(a

0=0, aq= 1) as the mass points, choosing one of the _mass _points _{on the} _circle _{as origin.} Then V will be given by

Z(X, P)

max

(x

_i-Pi

W. ffR...

_r(i)EM

1JL1

where

n

(30)

4.3

rotation of the points of M. Hence 0. _IV consists of nj _3-11_... and all the cyclic rotations of _3.3-1 By a cyclic rotation of a

set M we mean that

r(M) ₌ (ils _{* **Iic)} where iEj+a (modq) m=1,..., c. mm

a an integer _and

mjc

and ive define

_r(i

r(j

mm

(v)

_if

_{PD. contains}

_all

the

_subsets

_of

(1,...,

k)

then

WXIP) ₌ 22' EiXi-Pil

I

see floeffding

(1965)-

-

4-3

Some extensions

of theorems

of Hoeffding

First,

_{some definitions}

_from

_I-loeffding

_are

_quot'ed.

Define

q

xi log(x _i/pi) and f) to be the simplex

n= «xil---IX ); _x1 ýý 0'. 0. ' X>0, x1+... +X = 1), P () =( (xj_ 9 ... _{IX q}); X, > Oi---lx q> 01 X1+---+kq= 1) and n(p) ₌₍ (x ] -9 !..,: ýc q ); _xI=0 _if _pi =

If A is _{a non-empty} _sub-set _of 0, then 7k denotes the _closure of the _set A.

This _work _is _based _{on the} _idea _of _large _deviations _and Sanov's (1961) _work _and _is _summarized _in _Hoeffding's _theorem

2.1.

Let _{A be a proper} _sub-set _of _Define I(A, _p) _{= inf(I(X,} _p); _xE _A)

and I(A, p) _{= co if} A ý0.

Let

_nlj---jn

(31)

4.4

q

with n observations, where n n. Define znI /n then Pr(Z _(n) _{7- z (n)} )= Cn. '/(ni,..., _n ')Ipllll..., _pnq where

qq

(n) (n)

z

(n) z is a random variable distributed as a multinomial random variable with p= (pl,..., p ), and

iq

(n /n,

.... n /n). ' z (n) =q1q

Let A

(n) denote the set of lattice points z(n) containing in A. _{We quote} _{some of} Hoeffdingl-, Theorems, his _numbers

preceded

by an W.

I

Theorem

H2.1.

ife have

For

_{any set}

_{A C: 01 and any point}

_pE0

0 n-

(q-li/2

exp(-nI(A

_(n)"))

< Px-(z (n) E A/p)

n+q-1 :ý( q-1 ) exp (-nI (A _{(n) Ip}

where C0 is a constant not dependent _{on n.} Hence Pr(Z _(n) E, A/P) _{= exp(-nI(A}

(n)'p)) + 0(logn)) uniformly for AcP _and _pE0.

Also

Pr (Z

(n) E A/p) < expf-nI(A, p) + O(logn» uniformly for A c: 0 and p

This

_result

_gives

_{an asymptotic}

_expression

_for

_the

logarithm _of _the _probability Pr(Z _(n) E A/p). This _result is _non-trivial _if _Z

(n) is far from its mean p. Hence if A is taken _{as a critical} _region for _some test this _enables _the power to be compared with the likelihood ratio test.

We quote

Lemmas 4.2

_{and 4-7 of Hoeffding.}

Lemma 114.2

Let

_{A be a non-empty}

_subset

_{of 0.}

(a)

_{Let pE}

_no.

_There

_is

_{at least}

_{one point}

_{y such}

_that

yEK,

I(y, p) = I(A, p).

(4-1. )

(32)

4-5

y=p. If p (ý A, then I(Alp) >0 and any y which satisfies (4-1) _is _in _the _boundary _of A.

(b) _Lot _pr0, _then _I(A, _p) _{< aD if} _and _only if the intersection An _P(p) is _non-empty. If this is the _case

then I(A, _p) _{= I(An} _r(p), _p) _and the _statements _of _{par t} (a) are true with A replaced by An r(p).

Lemma H4-Z Let A= (x

i f(x) > 0) where the function

f(x) is _continuous _{in. P and max f(x)} > 0. Let _{p be a point}

in ()

0. such

that

f(p)

< 0.

Suppose

further

that

the

derivatives

_{f! (x)}

_{= bf(x)/bx.}

i

_{1,. -., q exist}

_{and are}

3.3.

continuous

_{at all}

_{x in 00 for}

_which

f(x)

_{= 0.}

Let

_{y be any Point}

_in

X such that

_{I(y, p)}

I(Ap).

Then

if

_yE

_f)

0 it

is

necessary

that

f(y)

0 and

log(y

_i

/pi)

_=af!

(Y)+

_bi

_{11 ... Iq}

3.

where

a>0

_{and b are}

constants.

Let _pE _and'O _{< E, < max Z(x, p}

0 Consider the X

critical _region A for testing the _simple _hypothesis

H *. p=p1

'where

A(G) ₌ (x; _q(x, _p

0)> E)

A(E)

is

_{the union}

_of

_the

_sets

Am (E)

_{= {x;}

_{Dm (X, p}

0)>

i. e.

A(E)

U

_AM(E)

MGTI

We nowstate _{and prove} _{a Theorem} which follows Hoeffding's Theorem

Theorem

4.1. _Suppose

_that

_p0e00

_{and 0<E<}

_{max Z(x, p}

0 X

then

_{y (: A(E)}

_such

that

I(A(E),

_p

0 I(y,

po)

and

y i=

apoi

bp

oi

(33)

4.6

where (y _i-poi _{-Sý(Y, p0}

E+h*

1_h*_E

a=

h*'

b=

1_h*

and I(Y, p

where

h* = F,

p.

iEM*

Ol

Proof By Lemma H4.2 _{we must} have Wy, _p 0 El _and let ) where M* E TI. (4.2) jD (y, P0 (y _i-Poi By lemma H4-71 _Put _{f(. X) =. Z(X, P} 0 then f(y) ₌₀ 'and so if I(y, p) _{= I(A(E),} p) 0iV,

_M*

los(y _i /P _{S6 .+t} _where 6

(4-3)

oi 3- 3.

s and t are constants and s> 0.

The _set _{M* may or may not} _{be unique,} _however there is _a unique _point y defined by (4-2) _and (4-3) given M*.