• No results found

Improving three layer neural net convergence

N/A
N/A
Protected

Academic year: 2020

Share "Improving three layer neural net convergence"

Copied!
6
0
0

Loading.... (view fulltext now)

Full text

(1)

MURDOCH RESEARCH REPOSITORY

http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=140340&contentType=C

onference+Publications

Alder, M., Sok, G.L., Hadingham, P. and Attikiouzel, Y. (1991)

Improving three layer neural net convergence. In: Second

International Conference on Artificial Neural Networks, 18 - 20

November, Bournemouth, UK, pp. 318 - 322.

http://researchrepository.murdoch.edu.au/20266/

Copyright © 1991 IEEE

Personal use of this material is permitted. However, permission to reprint/republish

this material for advertising or promotional purposes or for creating new collective

(2)

IMPROVING

THREE LAYER NEURAL NET CONVERGENCE

Michael A l d e r , Sok

Gek

Lim.Pau1 Hadingham and Yianni Attikiouzel

T h e University of Western Australia, Australia

I N T R O D U C T I O N

T h e m u l t i - l a y e r Perceptron [l], t h e M A D A L I N E [2],

t h e c o m m i t t e e n e t [3] and t h e feed forward n e t w o r k s t r a i n a b l e b y t h e back-propagation algorithm, 141, are

r e l a t e d in ways w h i c h a r e n o t always e n t i r e l y clear. I t is o n e a i m of t h i s p a p e r t o m a k e s o m e of t h e m a little clearer.

T h e main application of neural n e t w o r k s h a s been as trainable p a t t e r n classifiers. and that is how t h e y will b e viewed in this paper. In t h e first part of this paper we i n v e s t i g a t e t h e r e l a t i o n s h i p b e t w e e n t h r e e l a y e r f e e d f o r w a r d b a c k - p r o p a g a t i o n n e t s ( u s i n g t h e terminology of [4]) and t h e committee net of [3], and s h o w t h a t a s i m p l e m o d i f i c a t i o n t o t h e a l g o r i t h m of t h e l a t t e r m a k e s t h e m , in r e s p e c t of t h e i r p o w e r t o classify d a t a s e t s , e q u i v a l e n t . T w o a l g o r i t h m s may, however, b e e q u i v a l e n t in power h u t differ greatly in their practicality. In t h e second part of t h e p a p e r we c o n d u c t s o m e e x p e r i m e n t s in o r d e r t o d e t e r m i n e w h e t h e r t h e m o d i f i e d c o m m i t t e e a l g o r i t h m c a n c o m p e t e w i t h b a c k - p r o p a g a t i o n i n a v a r i e t y of

applications. I t is found that t h e committee algorithm (a) is a b o u t 10 t i m e s as fast in s o m e a p p l i c a t i o n s and ( b ) i s m u c h less p r o n e t o g e t t i n g t r a p p e d in l o c a l minima.

T h e t h e o r e t i c a l interest in t h e p a p e r s t e m s from t h e e a s e of a n a l y s i n g t h e c o m m i t t e e a l g o r i t h m t o g e t h e r w i t h t h e e q u i v a l e n c e . T h q e x p e r i m e n t a l i n t e r e s t is t h a t t h i s m e t h o d of s p e e d i n g u p b a c k - p r o p a g a t i o n m a y b e u s e d w i t h o t h e r i m p r o v e m e n t s t o r e d u c e training times in sdme applications.

I n what f o l l o w s we restrict attention, without loss of g e n e r a l i t y , t o t h e c a s e of a b i n a r y c l a s s i f i c a t i o n problem.

T H E O R E T I C A L C O N S I D E R A T I O N S

T e r m s not defined in this paper are t o b e found in t h e r e f e r e n c e s , s p e c i f i c a l l y see [ 3 ] f o r d e f i n i t i o n s of terms relating t o committee nets. Briefly, a committee n e t is a ( t r a i n a b l e ) s e t of k o r i e n t e d h y p e r p l a n e s in t h e s p a c e e a c h r e t u r n i n g a ‘vote’ + I or - 1 a c c o r d i n g to which side of t h e hyperplane a datum is. T h e v o t e s are summed by t h e ‘ o u t p u t unit’and compared with a t h r e s h o l d of z e r o if t h e number of units is o d d . By

a d d i n g , if n e c e s s a r y , a unit or units r e m o t e from t h e d a t a or b y g i v i n g a t h r e s h o l d of 1/2, t h e c a s e of an e v e n n u m b e r o f u n i t s c a n b e t r e a t e d , a s c a n t h r e s h o l d s o t h e r than zero.

T h e first proposition is t h e trivial observation, made for completeness:

Proposition. Any b i n a r y d a t a s e t can b e c o r r e c t l y classified by a c o m m i r t e e net.

Proof: T h e argument is by induction on t h e n u m b e r of points of t h e d a t a set. We first observe that if there are only two points, one of each category, then t h e set is linearly separable and a single unit net will serve. N o w s u p p o s e t h a t t h e r e i s a l w a y s a c o r r e c t c l a s s i f i c a t i o n o f a d a t a s e t h a v i n g a t o t a l of r d a t a points, and c o n s i d e r a dat set having r + l points. One of t h e p o i n t s at l e a s t m u s t b e e x t r e m a l , i s . may b e s e p a r a t e d f r o m a l l t h e o t h e r s by a h y p e r p l a n e . T h e o t h e r p o i n t s m a y b e c o r r e c t l y c l a s s i f i e d b y a c o m m i t t e e . W i t h o u t loss of g e n e r a l i t y , s u p p o s e t h e n e w p o i n t i s p o s i t i v e a n d t h e c o m m i t t e e w h i c h c l a s s i f i e s t h e r e s t of t h e d a t a s e t w o u l d a s s i g n it a negative value. T h e n we can insert between t h e old r

p o i n t s a n d t h e e x t r e m a l p o i n t s u f f i t i e n t l y m a n y hyperplanes to make t h e new point positive. This will also add n e g a t i v e v a l u e s t o t h e r p o i n t s which may a l t e r t h e c l a s s i f i c a t i o n . W e t h e r e f o r e t a k e a s many h y p e r p l a n e s again which h a v e a l l t h e p o i n t s on t h e p o s i t i v e s i d e . T h i s r e t u r n s t h e r p o i n t s t o t h e s a m e v a l u e s t h e y had b e f o r e , a n d p r e s e r v e s t h e p o s i t i v e sign of t h e extremal point.

0

R e m a r k . T h e r e a r e b e t t e r w a y s . I n p a r t i c u l a r we n e e d a d d o n l y half t h e n u m b e r of new h y p e r p l a n e s s u g g e s t e d i n t h e a b o v e a r g u m e n t . E v e n s o , t h e a r g u m e n t i s h a r d l y c h e e r i n g , s u g g e s t i n g as it d o e s that t h e n u m b e r of hyperplanes, and hence of hidden units, may increase with the size of t h e data set.

(3)

x = ( X , . X ~ . . . . X ~ ) ~ t h e 3-layer FFBP n e t i m p l e m e n t s a

map:

f(x) = sgn ( bjsgn(Faijxi))

J

w h e r e t h e a.. are t h e weights in t h e first hidden layer from t h e ithlkomponent of t h e input to t h e jth hidden unit and t h e b, are t h e weights to t h e output unit, and t h e sgn f u n c t i o n t a k e s t h e v a l u e 1 if i t s a r g u m e n t i s positive, - 1 if i t is negative and is not defined at zero.

Suppose that s o m e d a t a set is correctly classified b y some such 3 - l a y e r FFBP net. We o b s e r v e t h a t s i n c e t h e d a t a set is f i n i t e and f - ' { l ) is open, a s is f - ' ( - I ) , t h e r e is stability under perturbations of t h e weights. Specifically we may take t h e bj to he rational, which

i s just as well given t h e need for using t h e s e n e t s via computer simulations.

M o r e o v e r , i f a n y of t h e c o e f f i c i e n t s s h o u l d b e negative, then reversing t h e sign of t h e corresponding ai, w e i g h t s will a l l o w u s t o r e v e r s e i t s sign t o s t i l l give a s o l u t i o n . And if a weight s h o u l d b e z e r o , we can simply remove t h e unit altogether. Hence we may take i t that t h e weights are all positive rationals, and Fince multiplying throughout by any positive number w i l l p r e s e r v e t h e s i g n of t h e s o l u t i o n , w e m a y m u l t i p l y b y t h e L e a s t C o m m o n M u l t i p l e of t h e d e n o m i n a t o r s t o m a k e t h e w e i g h t s a l l p o s i t i v e integers, which we can then regard as multiplicities. T h i s is now a c o m m i t t e e net with some of t h e u n i t s replicated by their multiplicities.

T h e a b o v e argument l e a v e s one with t h e impression that for any data set that can be classified by a 3-layer FFRP net there is a committee net which can classify i t , h u t t h e n u m b e r o f u n i t s r e q u i r e d f o r t h i s e q u i v a l e n t c o m m i t t e e c a n h e a r b i t r a r i l y l a r g e . Happily. such is not t h e c a s e . We proceed t o show t h i s by a s e r i e s of p r o p o s i t i o n s m o t i v a t e d by s o m e examples.

E x a m p l e . S u p p o s e w e h a v e a c o m m i t t e e w i t h multiplicities attached to t h e units: I shall write:

( 2 , 11, 19, 2 6 ) t o i n d i c a t e t h a t we h a v e f o u r u n i t s

which we may take i t correctly classify some d a t a set in R" , p r o v i d e d w e h a v e t h e given m u l t i p l i c i t i e s . A l t e r n a t i v e l y , we h a v e 2+11+19+26 u n i t s b u t o n l y four of them are distinct hyperplanes.

I f n is big enough, then t h e r e will b e some region of t h e s p a c e h a v i n g a c o u n t of 2+11+19+26, a n d a s w e m o v e a c r o s s t h e l a s t h y p e r p l a n e w e m o v e i n t o a region having a count of 2+11+19 -26, and so on. Now consider t h e second hyperplane. T h e regions on t h e

t h e negative side. Now t h e s e numbers when added t o + I 1 or -11 a r e from t h e set (-47, -43, -9, - 5 , 5 , 9, 4 3 , 4 7 ) . W e observe that t h e sign of t h e result of a d d i n g

11 or s u b t r a c t i n g 11 f r o m t h e s e n u m b e r s will n o t b e a l t e r e d b y c h a n g i n g t h e n u m b e r 1 1 t o s o m e s u f f i c i e n t l y c l o s e o t h e r n u m b e r . I n p a r t i c u l a r , c h o o s i n g any number t o r e p l a c e 11 which is s t r i c t l y between 9 and 43 will not a l t e r any signs; t h i s i s , of course, that interval of t h e values obtained by taking

s u m s a n d d i f f e r e n c e s of t h e m u l t i p l i c i t i e s w h i c h c o n t a i n s 11. W e m a y t h e r e f o r e r e p l a c e t h e g i v e n q u a r t e t ( 2 , 1 1 , 1 9 , 2 6 ) b y t h e e q u i v a l e n t q u a r t e t (2,19,19,26) which m u s t a l s o c o r r e c t l y classify t h e s a m e d a t a s e t . Similarly, t h e 2 6 may b e changed t o a n y v a l u e b e t w e e n 2 a n d 36 so may a l s o b e m a d e equal t o 19. Finally t h e 2 may take any value between

- 19 a n d 19 i n p a r t i c u l a r may t a k e t h e v a l u e 0. W e therefore h a v e that t h e given quartet is equivalent to the quartet (0,19,19,19). T h i s is of course equivalent to t h e q u a r t e t (0.1.1,l) or t h e t r i p l e (1.1.1). I n o t h e r words, t h e original total of 58 u n i t s can b e reduced to 3 and stilI classify t h e same data set.

E x a m p l e . S u p p o s e w e h a v e t h r e e u n i t s w i t h m u l t i p l i c i t i e s p o s i t i v e i n t e g e r s (a,b,c). W e m a y , without loss of generality, suppose t h e m arranged in non-decreasing order and not all t h e same. Now if c >

a + b . t h e first t w o u n i t s c a n n o t c h a n g e t h e sign of a region by enough t o c o m p e n s a t e for t h e v o t e of t h e t h i r d u n i t a n d m a y b e r e m o v e d ; t h e t r i p l e i s e q u i v a l e n t t o a s i n g l e u n i t in t h e t h i r d l o c a t i o n . A l t e r n a t i v e l y w e may argue t h a t b l i e s b e t w e e n a - c and c - a and can b e put to 0, whereupon a lies between - c and c and can also b e put t o 0, finally divide ( 0 . 0 , ~ )

by c. If c < a + b t h e n b l i e s between c - a and c + a and can b e p u t equal t o c, and finally a lies between 0 and 2 c a n d h e n c e may a l s o b e p u t t o c, g i v i n g (c,c,c)

-

(1.1.1). Finally,the case where c = a+b is easily seen to b e r e d u c i b l e t o a t r i p l e (l,l,2) and requires a total of f o u r u n i t s . S u c h a c a s e c a n c e r t a i n l y a r i s e f r o m a 3-layer FFBP net, and we observe that in this case we need one extra unit in an equivalent committee.

D e f i n i t i o n . L e t a = ( a , , a 2 , . . . a k ) b e a s e t o f non-negative integers. T h e unitary span of t h e set I S

t h e s e t of 2k i n t e g e r s o b t a i n e d by t a k i n g all l i n e a r combinations with coefficients in [-I,+l]. a is said to

be unitarily degenerate iff 0 is in t h e unitary span. We w r i t e u s p a n ( a ) f o r t h e set of 2k i n t e g e r s c o m p r i s i n g t h e unitary span.

D e f i n i t i o n . Let a = ( a l

,...

ak) and b = ( b l

....

b k ) b e t w o k - t u p l e s of n o n - n e g a t i v e integers in n o n - d e c r e a s i n g order. Let w = (w,.

...

w k ) b e a unitary vector i.e. each w' is e i t h e r - 1 or + l . T h e n we write a-b iff for every unitary w, sgn(w*a) J = sgn(w*b) w h e r e

*

d e n o t e s t h e usual inner product

(4)

R e m a r k . C l e a r l y

-

i s an e q u i v a l e n c e r e l a t i o n a n d e q u a l l y c l e a r l y i f t h e k - t u p l e s r e p r e s e n t t h e multiplicities of units in t h e hidden layer, replacing o n e s e t of m u l t i p l i c i t i e s w i t h t h e o t h e r w i l l n o t c h a n g e t h e c l a s i f i c a t i o n of t h e n e t . If a k - t u p l e is d e g e n e r a t e t h e n a n y e q u i v a l e n t k - t u p l e i s a l s o degenerate.

D e f i n i t i o n . A k - t u p l e of n o n - n e g a t i v e i n t e g e r s in n o n - d e c r e a s i n g o r d e r w i l l b e r e f e r r e d t o a s a multiplicity-tuple.

D e f i n i t i o n . If a = (ai.a?, ..., ak) is a n o n - d e g e n e r a t e multiplicity-tuple, we write 8, for t h e sequence with

aj r e m o v e d . T h e n i f a ’ i s t h e s e q u e n c e a w i t h a, changed t o b j we say that the transformation from a t o a ’ i s allowable provided both aj and b j are in t h e same interval of consecutive e l e m e n t s of uspan(;iJ), i s . iff there is n o element of uspan(ij) between aj and bj.

P r o p o s i t i o n . If a ’ i s an allowable transform of a then a ’ a n d a are equivalent.

P r o o f : S u p p o s e n o t . T h e n t h e r e i s s o m e u n i t a r y combination of t h e elements on which t h e sign of t h e result is different. Let t h e unitary vector involved be w, so sgn(w*a) is d i f f e r e n t f r o m sgn(w*a?. Now t h e a b s o l u t e v a l u e of t h e d i f f e r e n c e b e t w e e n w*a a n d w * a ’ i s la.-b.l S u p p o s e , w i t h o u t loss of generality. that wj = 1 and let t h e residue r b e given by r = w*a

- a. = w * a ’ - b . S i n c e t h e sign of r

+

aj d i f f e r s from t h e sign of r

+

b . there must b e some number between a. and b . which i s e q u a l t o r , which is an e l e m e n t of u s p a n ( i j ) . B u t t h i s c o n t r a d i c t s t h e d e f i n i t i o n of allowable transformations.

J J ’

J J

0

R e m a r k . Obviously if a ’ i s an allowable transform of a , t h e n a i s a n a l l o w a b l e t r a n s f o r m o f a’. T h e c o m p u t a t i o n s of our f i r s t e x a m p l e d e m o n s t r a t e t h e ease with which one can reduce k - t u p l e s by means of sequences of allowable transforms, and h e n c e reduce t h e n u m b e r o f u n i t s r e q u i r e d o f a c o m m i t t e e n e t w h i l e still p r e s e r v i n g t h e c l a s s i f y i n g capacity. W e n o w e n q u i r e , how m a n y e q u i v a l e n c e c l a s s e s a r e there? T h i s will allow us, with t h e above proposition, t o find b o u n d s on t h e n u m b e r of u n i t s in a r e d u c e d committee equivalent to a given committee. When k =

3 for i n s t a n c e , we h a v e seen that t h e r e a r e precisely t w o non-degenerate equivalence classes, which h a v e representatives (O,O,!) and (l,l,l).

D e f i n i t i o n . A n y unitary v e c t o r w is positive if f o r e v e r y m u l t i p l i c i t y t u p l e a, w*a > 0, i t is negative if f o r e v e r y u n i t a r y v e c t o r a w * a < 0 , a n d i t i s

nmbiguous if t h e r e a r e some a for which w*a > 0 and others for which w*a < 0.

R e m a r k . It is clear that, say (1.1 ,...I) is positive, and

( - 1,- 1

,...,

- 1) i s n e g a t i v e , a n d t h a t (- 1.1.- 1.1

...

- 1.1) i s positive. while (1,- 1.1,- 1

,...,

- 1,l) is ambiguous. (- 1,- 1.1)

is ambiguous, while (- l,l,l) is positive.

P r o p o s i t i o n . F o r k o d d , t h e n u m b e r of equivalence c l a s s e s of multiplicity- t u p l e s is at least o n e greater than half t h e number of ambiguous unitary vectors of length k.

P r o o f : T h e equivalence classes are. by definition, t h e s e t o f d i f f e r e n t c o n s i s t e n t a s s i g n m e n t s t o t h e 2 k d i f f e r e n t unitary v e c t o r s of t h e v a l u e s + I or - 1. T h e v a l u e s assigned to t h e positive and negative unitary v e c t o r s a r e f o r c e d t o b e + 1 a n d - 1 r e s p e c t i v e l y , by d e f i n i t i o n . C o n s i d e r f i r s t t h e c a s e w h e r e t h e a m b i g u o u s v e c t o r s may b e o r d e r e d by u < v iff for e v e r y m u l t i p l i c i t y - t u p l e a, a*u < a*v. T h e n we may t a k e h a l f t h e a m b i g u o u s v e c t o r s in t h i s o r d e r and o b s e r v e t h a t an e q u i v a l e n c e c l a s s is d e t e r m i n e d by c h o o s i n g w h e r e t o p u t a zero in t h e list. Since t h e r e a r e k / 2 e l e m e n t s i n t h e l i s t t h e r e a r e l + k / 2 equivalence classes. If t h e vectors cannot b e ordered, t h e u s u a l s t a t e of a f f a i r s , t h e r e c a n o n l y b e m o r e equivalence classes than this.

R e m a r k . F o r small ( o d d ) k , t h e comparisons are not unfavourable to t h e committee. For example when k =

3, t h e unitary vectors are (l,l,l),(-l,l,l) and (1,- 1.1) for t h e p o s i t i v e v e c t o r s , t h r e e c o r r e s p o n d i n g n e g a t i v e v e c t o r s , and t h e a m b i g u o u s v e c t o r s a r e (- 1,- 1,l) and (1.1.- 1) w h i c h a r e p a i r e d . W e may i n s e r t t h e z e r o b e f o r e o r a f t e r t h e ( - 1,- 1.1) w h i c h g i v e s t w o equivalence classes, (0.0.1) and (1.1,l).

F o r k = 5 , t e n of t h e unitary vectors are positive, ten n e g a t i v e a n d t h e r e m a i n i n g t w e l v e v e c t o r s fall i n t o six pairs. Among t h e six we find a chain of f i v e and t h e l a s t v e c t o r m a y o c c u p y a n y o f t h e f i r s t t h r e e places. T h e r e are t h u s 8 equivalence classes. It is easy to confirm that t h e following 8 multiplicity-tuples are inequivalent: (O,O,O,O,l), (O,O,l,l,l), (l,l,l,l,l), (0,1,1,1,2), (1,l.I,2,2), (0.1.1.2.3). (1,1,1,1,3) a n d (1,1,2,2,3). We o b s e r v e t h a t t h e m a x i m u m s i z e of t h e e q u i v a l e n t committeee is therefore 9, and t h e average size is 5.5.

(5)

f

T a b l e 2 N e t

G e o m e t r y : 3-3-1 3-5- 1 3-7- 1

M o v e s

S.B.P. B*p*

]

195 132 105

0.21 1.2* 0.01 0.013

T i m e B.P.

2842* 442 1632*

E X P E R I M E N T S

R e m a r k . T h e t r a i n i n g p r o c e d u r e f o r c o m m i t t e e s r e q u i r e s s o m e a n a l y s i s o f t h e F F B P t r a i n i n g procedure; an informal discussion of pertintent issues may b e found in Fahlman [4]. Nilsson describes some p r o c e d u r e s w h i c h h a v e s o m e d e f e c t s . W e b r i e f l y outline t h e crucial issues.

I f t h e c o m m i t t e e correctly classifies a point t h e r e is no need to d o anything. If it is in error, we may take it t h a t s o m e of t h e u n i t s h a v e to b e c o r r e c t e d by t h e u s u a l p r o c e d u r e f o r a s i n g l e u n i t , t h e p e r c e p t r o n convergence procedure. T h e question is, which units is o n e t o modify? Nilsson gives a heuristic argument for modifying t h e ’ l e a s t wrong’, that is t o say, if 0 <

a*x < b*x f o r t h e d a t u m x a n d b o t h a a n d b a r e in e r r o r and if only o n e unit n e e d s t o be a d a p t e d , t h e n o n e a d a p t s a. A r a t i o n a l e f o r t h i s is t h a t minimum c h a n g e s a r e t o b e p r e f e r r e d s i n c e if t h e d a t a s e t i s m o s t l y k n o w n , t h i s i s l e a s t l i k e l y t o d e s t r o y ‘ k n o w l e d g e a b o u t t h e d a t a s e t a l r e a d y a c q u i r e d by t h e net’. A d r a w b a c k of t h i s is t h a t w e may h a v e an initial configuration in which all but o n e of t h e units a r e r e m o t e from all t h e d a t a , and so o n l y o n e unit is r e p e a t e d l y a d a p t e d . t h e n e t is ‘starved’. S t r a t e g i e s s u c h a s m a k i n g r a n d o m c h o i c e s of w h i c h u n i t t o correct, d e r i v e d from annealing algorithm ideas, a r e alternatives.

We employ t h e tactic of correcting all units which are in e r r o r , b u t of adapting t h e m by different a m o u n t s , t h e ‘stepsize’being smaller for t h e more remote data.

I f we t a k e a f u n c t i o n such a s 1/(1+x2) for e x a m p l e , w h i c h d e c r e a s e s f r o m 1 t o z e r o , a n d u s e t h i s a s a s t e p s i z e for t h e case when t h e ‘ d i s t a n c e ’ x i s la*yl ( f o r y t h e a u g m e n t e d i n p u t v e c t o r ) , t h e n t h i s i s precisely equivalent to weighting by the derivative of a s i g m o i d f u n c t i o n a r c t a n ( x ) . T h i s , of c o u r s e , i s p r e c i s e l y w h a t i s a c c o m p l i s h e d i n t h e B a c k Propagation algorithm; t h e actual function arctan(x) a p p e a r s e s s e n t i a l l y o n l y v i a i t s d e r i v a t i v e . W e observe that t h e important thing about t h e derivative of s i g m o i d f u n c t i o n s , i s t h a t t h e i r d e r i v a t i v e f o r p o s i t i v e d i s t a n c e s i s a l w a y s n e g a t i v e , t h e c l o s e n e u r o n s a r e m o v e d m o r e t h a n t h e r e m o t e n e u r o n s , a n d h e n c e t h e d y n a m i c i s n o t a t a n y s t a g e a c o n t r a c t i o n m a p p i n g . T h e r e p e a t e d o p e r a t i o n of d i f f e r e n t d a t a on t h e s y s t e m g i v e s us. of c o u r s e , an iterated function system.

T h e a b o v e o b s e r v a t i o n s s u f f i c e t o s p e c i f y a n e w c o m m i t t e e n e t a l g o r i t h m which we c o m p a r e d w i t h standard back-propagation in a series of experiments c o n d u c t e d o n a S U N S P A R C I - s t a t i o n . F i r s t w e treated t h e parity problem in dimensions 2.3.4 and 5 ,

i s . w e t o o k a u n i t h y p e r c u b e i n Rn a n d a s s i g n e d alternate

+

and - categories t o t h e vertices, for t h e

cases of n = 2 t o 5. In dimension 2 t h i s i s s i m p l y t h e XOR function. We took two 3-layer nets with various n u m b e r s of u n i t s in t h e h i d d e n l a y e r ; f i r s t w a s t h e standard back-propagation algorithm and second was t h e case w h e r e t h e output units were h e l d at weight one and only t h e hidden units corrected according to t h e a b o v e r u l e ; w e r e f e r t o t h i s a s ‘ s i m p l i f i e d back-propagation’ (SBP). Initial assignment of t h e w e i g h t s was at r a n d o m in t h e interval - 3 to 3, and a n u m b e r of r e p e t i t i o n s of t h e training were r u n . T h e

s o c a l l e d ‘ g a i n p a r a m e t e r ’ w a s v a r i e d b u t t h e d e p e n d e n c y w a s n o t s t r o n g o v e r a w i d e r a n g e of values. T h e averaged results a r e shown i n table I.

T a b l e 1.

3 - L a y e r Nets: T h e P a r i t y P r o b l e m

D i m e n s i o n 2 3 4 S

N e t

G e o m e t r y : 2-3-1 3-3-1 4-5-1 5 - 7 - 1

M o v e s

B.P. 215 2842* Y01* 3896* S.B.P 46 195 760 1720

T i m e B.P. 0.03 0.Y* 0.8’ 13.0*

( s e c s ) S.B.P. 0.002 0.016 0.2 1.3

*

indicates that some runs failed to converge within 100,000 moves,

Different n u m b e r s of h i d d e n u n i t s p r o d u c e d l a r g e r variations in n u m b e r s of moves and t i m e s f o r Back- P r o p a g a t i o n t h a n f o r S i m p l i f i e d B.P. a s s h o w n in

table 2, w h e r e t h e d a t a set was t h e parity problem in

dimension 3:

(6)

m a d e on m i n e r a l s a m p l e s was o b t a i n e d . T h e v e c t o r s of l e n g t h e i g h t w e r e o b t a i n e d f r o m a n a p p a r a t u s w h i c h r e f l e c t e d l i g h t f r o m m i n e r a l s a m p l e s a n d b i n n e d i t i n t o i n t e r v a l s of t h e s p e c t r u m . C o l o u r gradings by e y e were assigned to t h e two categories of data. Dividing t h e data into test data and training data h a s e s t a b l i s h e d t h a t t h e a c c u r a c y o f a back-propagation neural net is of t h e order of 78%. A three layer back propagation net was therefore trained u n t i l t h i s l e v e l of e r r o r w a s r e a c h e d . S i m i l a r l y , a c o m m i t t e e n e t was t r a i n e d t o t h e s a m e l e v e l . T h e r e s u l t s for d i f f e r e n t n u m b e r s of units i n t h e h i d d e n layer are shown in t a b l e 3. Here we give run times on

SLN SPARC stations required to obtain different per c e n t a g e s of t h e d a t a c o r r e c t l y c l a s s i f i e d . 2 0 repetitions:

T a b l e 3 .

R e a l Data: T h r e e L a y e r n e t s .

BP

T i m e s . s e c o n d s :

%of d a t a s e t c o r r e c t

7 5 % 7'1% 7 9 %

N e t

G e o m e t r y

8-1-1 4 27 94 7

*

8-3-1 253 1085

*

8-5- 1 160 1037

*

S i m p l i f i e d

BP

T i m e s , s e c o n d s :

% o f d a t a s e t c o r r e c t

7 5 % 77% 7 9 %

Net (;eo m e t ry

8- 1- 1 23 6 5 94

8 - 3 - 1 36 4 3 88

8-5- 1

81%

*

*

*

81%

*

1 7 74 8

- denotes n o data collected,

*

denotes failure to converge.

CONCLUSIONS.

I t m i g h t b e a r g u e d t h a t B a c k P r o p a g a t i o n i s s o inefficient an algorithm that to attempt to improve it m a r g i n a l l y i s t o w a s t e t i m e a n d e n e r g y . Improvements using, for example conjugate gradient and o t h e r types of net a r e known [ 5 ] , [ 6 ] . We feel it is worth investigating t h e algorithm however for four reasons: first i t constitutes something of a standard,

if a l o w o n e ; s e c o n d s o m e of t h e ' i m p r o v e m e n t s ' abandon all claim t o being neural models, while s o m e k i n d o f a c l a i m c a n s t i l l b e m a d e f o r t h e c o m m i t t e e n e t . T h i r d , a s m e n t i o n e d i n t h e i n t r o d u c t i o n , i t i s p o s s i b l e t h a t s o m e of t h e o t h e r i m p r o v e m e n t s c a n b e m a d e i n a d d i t i o n t o o u r modification. I n particular, n e t s with m o r e layers can b e h a n d l e d u s i n g s i m i l a r i d e a s , a n d in a s u b s e q u e n t p a p e r w e s h a l l make s i m i l a r m o d i f i c a t i o n s t o f o u r - layer nets. Finally, t h e committee net is much simpler t o a n a l y s e : B a c k - P r o p a g a t i o n c o n c e a l s e s s e n t i a l features of t h e dynamic.

For nets with a relatively small number of units in t h e h i d d e n i a y e r , i m p r o v e m e n t s in t i m e by a f a c t o r o f

a b o u t 2 0 a r e r e a d i l y a t t a i n a b l e . T h i s i s d u e t o t w o f a c t o r s : f i r s t t h e s i m p l i f i c a t i o n of t h e a l g o r i t h m e n t a i l s less computation and second, t h e reduction in t h e s i z e of t h e search s p a c e m a k e s f o r f e w e r p a s s e s t h r o u g h t h e t r a i n i n g set and r e d u c e s t h e risk of t h e net becoming trapped in a local minimum.

REFERENCES

1.

2.

3.

4 .

5 .

6 .

Rosenblatt, F: 1968,. " T h e Perceptron" Spartan Books

Widrow B, Lehr M: 1990. 30 Years of Adaptive Neural Networks:Perceptron, Madaline,and Back-Propagation Proc. IEEE S e p f e m b e r

14 15- 1442

Nilsson, N. : 1965 "Learning Machines "

Rumelhart D, Hinton G and Williams R., August 1986 Learning Representation by Back-Propagation of Errors Nature, Vol 323 533

Fahlman S. and Lebiere C:1990, T h e Cascade- Correlation Learning A rch itect ure

in: D.S.Touretzky (Ed) "Advances in Neural McGraw-Hill

Information Processing Systems 2 " Morgan

Brent,R, 1990, " Fast Algorithms for Multi-Layer

Neural Nets" T h e Australian National University Computer Sciences Laboratory Technical Report.

References

Related documents

matrices of the multivariate time series data of solar events as adjacency matrices of labeled graphs, and applying thresholds on edge weights can model the solar flare

The tense morphology is interpreted as temporal anteriority: the eventuality described in the antecedent is localised in the past with respect to the utterance time.. Compare this

clinical faculty, the authors designed and implemented a Clinical Nurse Educator Academy to prepare experienced clinicians for new roles as part-time or full-time clinical

For its educators professional liability coverages with policy effective dates occurring during 2001 or prior, the Company has quota share reinsurance for 60% of the first $1,000,000

 TN creates customer, configures Switch number, username, password, e-mail address and SMS to e-mail to enable recipients to reply on SMS’ received?.  TN provides

The power check involves estimation of different power consumed in a design like static power, dynamic power, clock power, latch power, leakage power etc. If any power

Hasil penelitian diharapkan dapat membantu para pemakai laporan keuangan sebagai referensi dan pembanding untuk mengambil keputusan dan untuk menambah pengetahuan dan informasi

A Proposed Causal Model of Effective Returns Management Strategic/ Operational Policies &amp; Practices Functional Integration Supply Chain Orientation External Factors