Randomization and Von Neumann function variance formula and a problem

(1)

Economic and Social Review Vol. 9 No. 1

Randomisation and the von N e u m a n n

Function: A Variance F o r m u l a and

a Problem

R. C. G E A R Y *

The Economic and Social Research Institute, Dublin

I

N a paper o f m a n y years ago (Geary 1952) w h a t was t e r m e d the contiguity

ratio was i n t r o d u c e d , t o determine whether, i n p r o b a b i l i t y , a statistical

map has a p a t t e r n or whether the mapped statistics are d i s t r i b u t e d at r a n d o m . This r a t i o is really a two-dimensional version o f the v o n N e u m a n n ( 1 9 4 1 ) statistic, m o r e familiar as that tabled for null-hypothesis n o r m a l O L S resi duals b y J . D u r b i n and G. S. Watson. Geary was also concerned w i t h the O L S residual p r o b l e m . He approached i t i n t w o ways, b y randomisation and b y classical O L S regression t h e o r y , his instruments being means and variances o f the c o n t i g u i t y r a t i o .

A d i f f i c u l t y w i t h r a n d o m i s a t i o n treatment was expressed as follows:— " T h e p r o b l e m is t o determine i f there is a c o n t i g u i t y effect, i.e. i f c (the con t i g u i t y r a t i o ) has a significantly l o w value after the e l i m i n a t i o n o f q indepen dent variables b y the least square m e t h o d . As far as randomisation is con cerned, i t w o u l d appear t h a t the test developed i n this section can be applied f o r m a l l y , the z being the remainders after the c o n t r i b u t i o n s o f the indepen dent variables have been removed. T o a certain e x t e n t the w r i t e r shares the misgivings o f some other students about the v a l i d i t y o f the r a n d o m i z a t i o n approach i n its a p p l i c a t i o n t o regression remainders. As each successive i n dependent variable is removed, should n o t thedegreeof freedom be diminished? I t does n o t seem so. What happens is t h a t the variance (or range) o f the remainders d i m i n i s h as the effect o f each independent variable is a l l o w e d for, t h e test b e c o m i n g indeterminate w h e n the n u m b e r o f independent variables (originally w i t h mean zero) is one less t h a n the n u m b e r o f observations n, i.e., w h e n all the remainders are zero. A c c o r d i n g l y the f o r m a l a p p l i c a t i o n o f the r a n d o m i z a t i o n procedure, w i t h o u t d i m i n u t i o n o f the n u m b e r o f degree o f freedom, does n o t result i n obvious inconsistency: we can conceive o f cases

(2)

where c w i l l be significantly l o w even after removal o f the effect o f (n — 2) independent variables. Since doubts r e m a i n , however, the w r i t e r considered i t desirable t o examine the p r o b l e m f r o m the classical sampling aspect. I n any case i t w i l l be interesting t o compare the results o f the t w o approaches. I n the p r a c t i c a l aspect the r a n d o m i s a t i o n m e t h o d has the advantage t h a t i t can be applied w i t h o u t the assumption o f universal n o r m a l i t y i n the n obser-vations,regarded as a r a n d o m sample".

As far as the w r i t e r is aware the degrees o f freedom p r o b l e m has never been discussed i n this a p p l i c a t i o n : the controversy i n another c o n t e x t between K . Pearson and R. A . Fisher is part o f statistical h i s t o r y . One o f the objects o f the present c o m m u n i c a t i o n is t o i n v i t e statisticians t o discuss the p r o b l e m .

T h e c o n t i g u i t y r a t i o c o n t e x t is t o o esoteric f o r a suitable discussion. T h e p r o b l e m arises i n the m u c h simpler single dimension o f the r a t i o . B u t the w r i t e r is unaware o f any r a n d o m i s a t i o n treatment o f the v o n N e u m a n n statistic, so he ventures t o give one here w i t h o u t any claim t o o r i g i n a l i t y . One result is remarkable, as w i l l be seen.

One is given a sample o f n measures o f any k i n d ( t h e y may be raw values, OLS residuals etc.), x1, x2 . . ., xn, ordered i n a particular w a y . F r o m a given

f u n c t i o n (e.g., the v o n N e u m a n n r a t i o ) one wants t o make inferences about the character o f the sample (is i t p r o b a b l y n o n - n o r m a l , autoregressed etc.?) One considers the n ! permutations o f the sample values for each o f w h i c h the test f u n c t i o n has a value. These n ! values are regarded as f o r m i n g a fre quency d i s t r i b u t i o n . I f the single value o f the f u n c t i o n f o u n d for the given ordered sample is near the ends o f the frequency d i s t r i b u t i o n (i.e., b e y o n d the .05, . 0 1 etc., l i m i t s ) one rejects the hypothesis, exactly as i n o r d i n a r y

t h e o r y . A feature o f the test is t h a t n o assumption is made about the fre quency d i s t r i b u t i o n f r o m w h i c h the sample o f n is d r a w n . I n t h e o r y one c o u l d calculate the moments o f the function—or at least the first four moments—and so estimate the frequency d i s t r i b u t i o n using e.g., the Pearson curve system. Here we deal o n l y w i t h the first t w o m o m e n t s , the mean and the variance, w h i c h suffice for most practical purposes.

T h e test f u n c t i o n cannot usefully be s y m m e t r i c a l i n ( x1 ( x2 xn) because t h e n all the n ! values w o u l d be the same. T h e essence o f the v o n N e u m a n n r a t i o d is t h a t i t is n o t s y m m e t r i c a l ( f o r n > 2 ) as i t assumes t h a t the sample elements are arrayed i n a particular w a y . I n fact, assuming, w i t h o u t loss o f generality, that:—

Xx{ = 0 (1)

(3)

as w i l l always be the case w i t h O L S residuals, d is given by—

n n

d = X (xt - x , - _ i ) 2/ ^ ? = N/D

«= 2 «= 1

(2)

The n u m e r a t o r N is assymetrical, the d e n o m i n a t o r D s y m m e t r i c a l , i.e., D has the same value i n all p e r m u t a t i o n s . We need concern ourselves o n l y w i t h the n u m e r a t o r N. I t is the fact o f constant D t h a t makes the calculation o f m o m e n t s o f d exactly calculable. This is also the classical case w h e n the sample is a n o r m a l one because t h e n d is a homogeneous f u n c t i o n o f degree zero, w i t h nr2 = 2 x2 i n the d e n o m i n a t o r . T h e fact t h a t w h e n the sample is

n o r m a l r is independent o f d (Geary, 1933) makes the exact calculation o f the moments o f d, and hence the estimation o f the frequency o f d (as b y Durbin-Watson) possible.

I f / is any p o l y n o m i a l f u n c t i o n o f (xt, x2. . ., xn) ordered i n a particular

way the r a n d o m i s a t i o n mean M (f) o f / is the sum o f / for all the permuta tions divided b y n ! . T o f i n d the mean o f d2 given b y (2) or, i n effect, N2 we

have t o deal w i t h terms i n x*, x3. x , ' , x2 x y x , " and x,- x," x , " x , - " , all sub

scripts d i f f e r e n t . O n t a k i n g means we m a y disregard subscripts and insert mean values o f these terms, having regard o n l y t o exponents. These mean values m a y be w r i t t e n ( i n a n o t a t i o n w h i c h is obvious) ( 4 ) , ( 3 1 ) , ( 2 2 ) , ( 2 1 1 ) , ( 1 1 1 1 ) . N o t e t h a t (31) = (13) etc.

Square (1) and take means. There are n o f t y p e x\ and n (n — l ) / 2 o f t y p e Hence:

As i n ( 4 ) , we can express a l l terms i n t w o o r m o r e variables i n single variable expressions. As an example o f the m e t h o d o f d e r i v a t i o n , we have

n ( 2 ) + (11) [ 2 n (n - l ) ] / 2 = 0 ₍₃₎

or

( l l ) = - ( 2 ) / ( « - l ) (4)

E x ; 2 x ; = 0 ₍₅₎

M u l t i p l y i n g o u t and t a k i n g means:

n (4) + n (n - 1) (31) = 0 ₍₆₎

or

(4)

T h e d e r i v a t i o n o f other r a n d o m i s a t i o n means we need is a l i t t l e m o r e c o m p l i c a t e d . We shall be c o n t e n t t o give the results:

( 2 2 ) = [ n ( 2 )2 _ ( 4 ) ] / ( n - l )

( 2 1 1 ) = [ 2 ( 4 ) - n ( 2 )2] / ( n - l ) ( n - 2 ) (8) ( 1 1 1 1 ) = 3 [ n ( 2 )2 - 2 ( 4 ) ] / ( n - l ) ( n - 2 ) ( n . - 3 )

F r o m ( 2 ) ,

£> = n ( 2 ) (9)

E x p a n d i n g the n u m e r a t o r o f (2)

N = (x2 + x2) + 2 2 x2 - 2

I

x , x , _ ! (10)

• = 2 i = 2

Hence t a k i n g means

M (N) = 2 (2) + 2 (n - 2) (2) + 2 (n - 1) ( 2 ) / ( n - 1), (11)

using ( 4 ) . Hence M (N) = 2n ( 2 ) . T h e n :

M (d) = M (N)/D = 2, (12)

using ( 9 ) .

T h e algebra o f the c a l c u l a t i o n o f M (d2) or, i n effect,7W (N2) is onerous b u t

the result is simple. We regard N, given b y the r i g h t side' o f (10) as three terms

(A + B + C) w i t h square (A2 + 2AB + ... + C2) and aggregate the terms, hav

i n g regard t o coefficients and numbers o f terms o f each k i n d , x?, x? etc->

w h i c h , o n t a k i n g means are replaced b y ( 4 ) , (31) etc. T h e n , gathering terms we f i n d :

M (N2) = 2 (2n — 3) (4) - 8 (2n - 3) (31) + 2 ( 2 n2 - 4 n + 3) (22)

- 8 (n - 2) (n - 3) (211) + 4 (n - 2) (n - 3) ( 1 1 1 1 ) . (13)

Using (7) and (8) and collecting terms:

M (N2 ) = 2n [(2n2 - 3) ( 2 )2 - ( 4 ) ] \{n - 1)." (14)

As M (d2) = M (N2 )/D2 w i t h D = n ( 2 ) :

v*x(d)=M{d2)- [M(d)]2 (15)

(5)

where b2 - ( 4 ) / ( 2 )2 the familiar k u r t i c i t y statistic i n n o r m a l t h e o r y i n w h i c h i n fact its p o p u l a t i o n value 02 is 3.

As a check o n the q u i t e elaborate, i f elementary, algebra, consider the case o f n = 2. There is t h e n b u t a single value o f d given b y ( 2 ) , f o r i n this case d is s y m m e t r i c a l i n ( x1 ( x2) . (4) = ( x j + x £ ) / 2 = x j since Xx + x2 = 0 and

(2) = x\. Hence b2 = 1 . S u b s t i t u t i n g then n = 2 and b2 = 1 i n ( 1 5 ) we f i n d

var (d) = 0, as we s h o u l d .

O f course ( 1 5 ) is O ( n_ 1) , w h i c h m e a n s ' t h a t , w i t h increasing n , d tends i n p r o b a b i l i t y towards 2 (see ( 1 2 ) ) . What, as announced above, is remarkable is that the coefficient o f b2 is O {n1). This implies that the variance is

nearly independent o f the frequency d i s t r i b u t i o n f r o m w h i c h the r a n d o m sample o f n (arrayed i n any order) is d r a w n . As an example take n = 20—one w o u l d scarcely be interested i n e.g., residual a u t o c o r r e l a t i o n f o r fewer obser vations—and 6 = 1, and 6, a range p r o b a b l y covering most d i s t r i b u t i o n s . Values o f standard d e v i a t i o n (= square r o o t variance) o f var (d) given b y ( 1 5 ) are—

T h e difference is o f small i m p o r t a n c e , having regard t o the uses o f the statis t i c d.

There is an interest i n c o m p a r i n g the l o w e r c r i t i c a l p r o b a b i l i t y points derived f r o m r a n d o m i s a t i o n w i t h t h e w e l l - k n o w n dL o f Durbin-Watson

( 1 9 5 1 ) . A s s u m i n g n o r m a l i t y , b2 = 3 a n d , since the mean is 2, l o w e r .05 and

. 0 1 N H P critical points w o u l d be given respectively b y (2 — 1.9600s) = C j and (2 — 2.5759s) = c2, s being the r a n d o m i s a t i o n standard d e v i a t i o n , s2

given b y ( 1 5 ) . F o l l o w i n g are comparisons w i t h di f o r various values o f n , w i t h k', t h e n u m b e r o f regression terms, equalling 1 and 2:—

Value o f 62 1 6

s.d. o f d 0.4353 0.4039

n s

'2

0 . 9 1 1.09 1.21 1.29 1.35 1.40 1.43 1.46 1.49 .01 probability Durbin-Watson di k'=-l

0 . 9 5 1.13 1.25 1.32 1.38 1.43 1.47 1.50 1.53

k' = 2

0 . 8 6 1.07 1.20 1.28 1.35 1.40 1.44 1.47 1.50 2 0 3 0 4 0 5 0 6 0 7 0 8 0 9 0 1 0 0

0 . 4 2 3 0 0 . 3 5 2 3 0 . 3 0 8 0 0 . 2 7 7 0 0 . 2 5 3 8 0 . 2 3 5 6 0 . 2 2 0 7 0 . 2 0 8 4 0 . 1 9 8 0

(6)

The regularity i n relationship is m a r k e d : f r o m n = 30 o n i n n o case do the c figures differ f r o m the D W b y m o r e than 0.04. T h e differences, however, t h o u g h small, are systematic, n o t r a n d o m .

Clearly the correspondence between the c and the dL is good. As is w e l l

k n o w n , i n Durbin-Watson there is a zone o f indecision characterised b y l i m i t s dL and dv. As the previous table shows, the randomisation approach

strongly favours di. The difference between dL, and d\j is the greater the

smaller the value o f n so that i n the f o l l o w i n g table a t t e n t i o n is c o n f i n e d t o

n = 2 0 , 3 0 , 4 0 .

.05 probability .01 probability N k'=l ft'='2 k'=l k' = 2

rfi — c 1 d U - C \ dL ~ c l dU —c\ dL ~c 2 dU ~c2 dL ~c 7 dU -c2

2 0 .03 . 2 4 - . 0 7 .27 . 0 4 . 2 4 - . 0 5 . 3 6 3 0 . 0 4 . 1 8 - . 0 3 . 2 6 . 0 4 . 1 7 - . 0 2 .25 4 0 . 0 4 . 1 4 - . 0 1 . 2 0 . 0 4 . 1 3 - . 0 1 . 1 9

There is n o t a shadow o f d o u b t t h a t , w i t h k' = l,2,dL is closer t o the ran

d o m i s a t i o n value t h a n is dy, t o the extent t h a t there seems no p o i n t i n referr i n g t o djj i n testing for absence o f residual autoregression. As the randomisa t i o n f o r m u l a is almost distribution-free (as regards the disturbances) the same m a y n o w be said o f the dL series. O b v i o u s l y the randomisation l i m i t s

(ci and c2) themselves can c o n f i d e n t l y be used f o r adjudging significance i f t o o m u c h precision be n o t attached t o the null-hypothesis p r o b a b i l i t y (i.e. t o use " n e a r " . 0 5 , . 0 1 etc. instead o f exact .05, . 0 1 etc.). When n is n o t t o o small the foregoing assumption that the n ! values are n o r m a l l y d i s t r i b u t e d is close enough for the purpose o f the approximate p r o b a b i l i t y statement.

I n t h e i r original 1 9 5 1 paper D u r b i n and Watson furnish tables o n l y f o r the l o w e r c r i t i c a l values o f the v o n N e u m a n n r a t i o , namely, di and drj. The upper critical values should also be considered: the n u l l hypothesis (i.e., no residual a u t o c o r r e l a t i o n ) should also be rejected w h e n the actual r a t i o ex ceeds 4 — di (4 being the algebraic m a x i m u m o f the r a t i o ) . The latter situa t i o n is far m o r e rare, i n actual practice, than the f o r m e r . I t can occur o n l y w h e n consecutive values o f the calculated disturbances t e n d t o oscillate i n sign f r o m plus t o minus, w h e n , i n fact the. change i n signs test tau (Geary 1 9 7 0 )1 is near n , the n u m b e r o f observations, instead o f near n / 2 w h e n the disturbance values are i n r a n d o m order.

1 . N o n e o f t h e several p a p e r s o n t h e t a u t e s t q u e s t i o n e d t h e v a l i d i t y o f t h e r a n d o m i s a t i o n t e s t a p p l i e d t o O L S d i s t u r b a n c e s . A l l c o m m e n t s b o r e o n t h e d i s c r i m i n a t o r y p o w e r o f t h e t e s t , c o m p a r e d w i t h t h e D W a n d o t h e r t e s t s . N o n e o f t h e c o m m e n t a t o r s o b j e c t e d

(7)

T h e m a i n p o i n t is, however, t h a t , at the upper rejection l i m i t , the discrep ancies between the n o r m a l t h e o r y r a n d o m i s a t i o n critical values and the D W values are e x a c t l y those s h o w n i n the previous tables.

Comparison between r a n d o m i s a t i o n and D W n u l l hypothesis l i m i t s has been c o n f i n e d t o k' = 1 and k' = 2 , i.e., for estimated O L S disturbances after removal o f one and t w o indvars respectively. Comparison is n o t so satisfac t o r y for regressions w i t h m o r e t h a n t w o indvars. T h e n u b o f the p r o b l e m raised i n this paper and stated at the outset is t h a t we have b u t one r a n d o m i s a t i o n critical l i m i t for given p r o b a b i l i t y ; there are k' values o f dL. W i t h

r a n d o m i s a t i o n the manner b y w h i c h the O L S disturbances are estimated (i.e., w i t h o u t regard t o n u m b e r o f indvars) is disregarded: we s i m p l y have a t i m e sequence o f numbers (positive and negative since t h e i r sum m u s t be zero) and the n u l l hypothesis is t h a t t h e y are p r o b a b l y i n n o n - r a n d o m order. The fact t h a t the estimated disturbances are related because t h e i r sum is zero is o f n o i m p o r t a n c e unless n is small. Basically the approach is the same i n using the t a u test, o n the classical approach, o n the c o n t r a r y , w i t h . k' indvars there are (k' + 1) linear relations between the disturbances so t h a t i n significance-testing the n u m b e r o f independent variables involved is n o t n b u t (n — k' — 1), the n u m b e r o f degrees o f freedom.

A n obvious approach is as f o l l o w s . I f , i n ( 1 5 ) , one regards n as n u m b e r o f degrees o f freedom established i n the o r d i n a r y w a y , w o u l d the resulting values o f c1 and c2 correspond t o the Durbin-Watson values o f di for k' -1,

2, . . ., 5. T h e answer is " N o " . A t t e n t i o n m a y be c o n f i n e d t o the most test i n g case o f n u m b e r o f observations equal t o 2 0 .

W i t h 1 t o 5 indvars, degrees o f freedom i n the disturbance range f r o m 18 t o 14 and using ( 1 5 ) a n d , always assuming n o r m a l i t y , the ct values for n = 18 and 14 are respectively 1.13 and 1.04, range 0.09. B u t the Durbin-Watson range for dL f r o m k' = 1 t o k' = 5 is 0 . 4 1 . This comparison relates t o p r o b 

a b i l i t y . 0 5 . A t . 0 1 p r o b a b i l i t y the contrast is also m a r k e d : for c2 the range is 0.15, for k' 0.35. I f any c o r r e c t i o n for n u m b e r o f degrees o f freedom is necessary using r a n d o m i s a t i o n , i t w i l l n o t be o n these lines.

(8)

REFERENCES

D U R B I N , J a n d G . S. W A T S O N , 1 9 5 1 . " T e s t i n g f o r serial c o r r e l a t i o n i n least squares r e g r e s s i o n 1 1 " , Biometrika, V o l . 3 8 .

G E A R Y , R . C., 1 9 3 3 . " A g e n e r a l e x p r e s s i o n f o r t h e m o m e n t s o f c e r t a i n s y m m e t r i c a l f u n c t i o n s o f n o r m a l s a m p l e s " . Biometrika, V o l . X X V .

G E A R Y , R . C . , 1 9 5 2 . " T h e c o n t i g u i t y r a t i o a n d s t a t i s t i c a l m a p p i n g " , The Incorporated

Statistician, V o l . 5 , N o . 3 .

G E A R Y , R . C , 1 9 7 0 . " R e l a t i v e e f f i c i e n c y o f c o u n t o f sign changes f o r assessing r e s i d u a l a u t o r e g r e s s i o n i n least squares r e g r e s s i o n " . Biometrika, V o l . 5 7 .