Default ARTMAP

(1)

Boston University

OpenBU

http://open.bu.edu

Cognitive & Neural Systems CAS/CNS Technical Reports

2003-04

Default ARTMAP

https://hdl.handle.net/2144/1905

Boston University

(2)

Default ARTMAP

Gail Carpenter

April, 2003

Teclmical Report CAS/CNS-2003-008

Permission to copy without fee all or part of this material is granted provided that: 1. The copies are not made or distributed for direct commercial advantage; 2. the report title, author, document number, and release date appear, and notice is given that copying is by permission of the BOSTON UNIVERSITY CENTER FOR ADAPTIVE SYSTEMS AND DEPARTMENT OF COGNITIVE AND NEURAL SYSTEMS. To copy otherwise, or to republish, requires a fee and I or special permission.

Copyright

©

2003

Boston University Center for Adaptive Systems and

Department of Cognitive and Neural Systems

677 Beacon Street

Boston, MA 02215

(3)

IJCNN'03, Portland CAS!CNS Technical Report TR-2003-008

Default ARTMAP

Gail A. Carpenter

Department of Cognitive and Neural Systems, Boston University 677 Beacon Street, Boston, MA 02215

gai !(~~}ens. bu.cdu

ABSTRACT- The default ART MAP algorithm aud

its parameter values specified here define a ready-to-usc gcncral-pm·posc neural nchvork system for supervised learning and recognition.

I. INTRODUCTION:

ART 'T'ITHNOLOCJY TR/\NSFIJ{

Adaptive Rcson.:1ncc Theory (ART) neural networks mode! rcnl-timc prediction, search, learning, and recognition. ART networks function both as models of human cognitive information processing

r

I ,2,3} and as neural systems for technology transf-Cr [4]. A neural computation central to both the scientific and the technological analyses is the ART

matching rule 1'5], which models the interaction between

top~down expectation and bottom~ up input, thereby creating a focus of attention which, in turn, determines the nature of coded memories.

Sites of early and ongoing transfer of ART-based technologies include industrial venues such as the Boeing Corporation 1'6] and government venues such as MIT Lincoln Laboratory [71. 1\ recent report on industrial uses of neural netv,:orks [R] states:

!The] Boeing Neural Information Retrieval System lS probably still the largest-scale

m<-mufilcturing application of neural networks. It uses

I: ART] to cluster binary te!nplatcs or aeroplane parts in a complex hierarchical network that covers over

100,000 itcJns, grouped into thousands of' sc!f'-organised clusters. Cl<limcd savings 111

manun1cturing costs arc in millions of dollars per annum. (p. 4)

At Lincoln Lab, a team led by Waxman developed an image mining system which incorporates several models of vision and re-cognition developed in the Boston University This research wns supported by grants rrom the Air Force Of'f'icc of' Scientific Research (AFOSR F49620-0J-l-0397 and AFOSR F4%20-0 I- I -0423) and the Of11ce of' Naval Research (ONR NOOO I 4-0 I- I -0624 ). I also wish to thank and acknowledge the many BU/CNS students who have contributed to this work, including: Suhas Chelian, Marin Ujaja, Norbert Kopco, Natalya Markuzon, Siegfried Martens, Tim McKenna, Boriana Milenova, Ogi Ogas, Olga Parsons, John Reynolds, Brad Rhodes, David Rosen, Bill Ross, Byron Shock, Bill Strcilcin, Ah-Hwcc Tan, and Matt Woods.

W cb: http://cns. bu.cdu/~gail!Dchlult _A RTM AP_2003 _ .. pdf'

Department of Cognitive and Neural Systems (BU/CNS). Over the years a dozen CNS graduates (Aguilar, Baloch, Baxter, Bomberger, Cunningham, Fay, Clove, lvey, Mchanian, Ross, Rubin, Streilcin) have contributed to this effort, \vhich is now located at Alphatcch, Inc.

Customers for BU/CNS neural net\vork technologies have attributed their selection of ART over alternative systems to the model's defining design principles. In listing the advantages of its THOT'W technology, Cor example, American Heuristics Corporation (;\1-IC) cites several characteristic computational capabilities of this family of' neural models, including fast on-line (one-pass) learning,

"vigilant" detection of novel patlerns, retention

or

rare panerns, improvement with experience, "weights rwhich_l arc understandable in real world terms," and scalability (www .heuristics.com).

Design principles derived from scientific analyses and design constraints imposed by targeted applications have jointly g\Jided the development 0

r

many variants or the basic

networks, including fuzzy ARTMAP [91, ART-EMAP 1101. ARTMAP-IC II II, Gaussian 1\RTMAP

I

121, and distributed ARTMAP 1'3, 1 J]. Comparative analysis

or

these systems has led to the identi!'ication of a (1<~/(111/t !IRTHAP

network, which features simplicity or design and robust performance in many application domains

1_41.

Selection of'

one particular ARTMAP algorithm, speciricd here with a complete set of denndt parameter sellings, is intended to

n1cilitate ongoing technology transfer. ;\ user may start with this version of' the system, then, if necessary, adjust parameters to suit individual applications.

11. WINNI,.'R-T!\KI:-ALI. VS. DISTRIBUri:'D CODE RFPRI:SI:NT!\TIONS

The deh1Ult ARTMAP algorithm (Section IV) outlines a procedure for labeling an arbitrary number of output classes in a supervised learning problem. A critical aspect of this algorithm is the distributed nature of its internal code representation, which produces continuous-valued test set predictions distributed across output classes.

The character of their code representations, distributed vs. \\1_{inncr-take-all, is, in fact, a} _{primary H1ctor} diff'crentiating various 1\RTM/\P networks. The original models 1:9,14] employ winner-take-all coding during training and testing, as do many subsequent variations and the majority of ART sys1ems that have been transferred to technology. Derault ARTMAP is the same as fl1zzy

(4)

!.!CNN '03, Portland CASICNS Tee/mica/ Report T/1-2003-008 2

AH.TMAP during training, but uses a distributed code representation during testing. ARTMAP-IC [II J equals default ARTMAP plus instance counting, vvhich biases a category node's test set output by the number of' training set inputs coded by that node. Distributed ARTMAP (ciARTMAP) employs a distributed code (and instance counting) during both training and testing ["3, 1 J[. Versions of these networks [A] form a nested sequence:

fuzzy ARTMAP C: dcfitult ARTMAP C:

ARTMAP-JC C: distributed ARTMAP

That is, distributed ARTMAP reduces to ARTMAP-IC: when coding is set to winner-take-all during training; ARTMAP-lC reduces to default ARTMAP when counting weights arc set equal to I; and dcfauh ARTMAP reduces to fuzzy ARTMAP when coding is set to \Vinncr-takc-all during testing as well as training.

ARTMAP variants with winner-take-all (WTA) coding and discrete target class predictions have shown consistent relative deficits in labeling accmacy and post--processing adjustment capabilities.

IJL TilE [WEAUI.T AR TMAP SYSTI'M

Dcf~tult ARTMAP codes the current input as a \Vmner-take-all activation pattern during training and as a distributed activation pattern during testing. For distributed coding, the transf'onnntion of the filtered bottom-up input to an activation pattern across a field of nodes is defined by the

increased-gmdient CAM nde !"J3]. The dcnnllt network also

implements the !v/T search algorithm

!

I II and sets the

hase/ine vigilance parameter equal to zero, for maximal

code compression. Other design choices for default ARTMAP include /(ts! learning, whereby weights converge

to asymptote on e~1ch learning trial; single-epoch training, which emulates on-line learning; a choice-hy-d[f/(!rence

signal function

r

15] fl-om the input field to the coding field; and four-fold cross-validation.

ARTMJ\P's capacity for f~1st lcnrning implies that the system can incorporate inf-<)l']nation 1hm1 examples that arc important but infrequent and can be trained incrementally. Fast learning also causes each network's memory to vary with the order of input presentation during training. Voting

across several networks trained with different orderings of a given input set takes advantage of this feature, typically improving performance and reducing variability as well as providing n measure of confidence in each prediction 19].

While the nun1ber of voting systems is, in general, a fl·ce parameter, flve voters have proven to be sufficient for many applications. Default AR TMAP thus trains five voting networks for each training set combination.

Even v>'ith the number of voters f'ixed, other design choices appear in systems where output activations may be distributed. In particular, dcL1ult ARTMAP, which produces a continuous-valued distribution CJk across output classes f.:.

for each test set item, presents options for combining

weighted predictions across voters to make a final class choice. One strategy stnns the Ok values of individual networks to produce a net distributed output pattern, which is then used to determine the predicted class. An alternative strategy first lets each voting network choose its own winning output class, then assigns the test set inputs on the basis of these individual votes. In most applications, the first of these two voting strategies produces better results.

IV. DHA\JLT ARTMAP ALGORITIIM

Fig. 1 and Table I summarize del~llllt ARTMJ\P notation. Table 11 lists def~wlt parameter values. A user who

wishes to explore network variations might begin by varying the baseline vigilance,

p.

In some cases, higher values of

p

increase predictive accuracy but may decrease code compression. actual output class

k

output classes net signal ak code y net

T

signal .I complement coded input A ICature vector a

i

a

code reset if

.IAIIwJI<pM

e

matched pattern

A

1\\I'J

(5)

IJCNN '03, Pori/and CASICNS Techniwl Reporl TR-2003-008 3

TABLE! TABLE II

DI'.FAULT ARTMAP NOTATION DFFAULT PAicAMFTER VALUI'S

NOTATION DlcSCRIPTION

NAME PAR,\- RAN<il: DEFAlJl:r NOTI:S

input component index METIJ<. VAUJI:

a=

o+

j coding node index signal

"

(O,oo)

0. 0 I ll1<1Xll111ZCS

rule _code

k output class index parameter compression

M number of input features

learning

_fl

_[0,1]

1.0

(3

=

I

a _{feature vector}

_(ai),

₀

~

_ai

_,;I fi·action _implements

A

complement coded input vector; bst learning

A=

(a,a')

match F

( -1,1)

0 .flO I _E_<

0

_{(MT )}

K actual output class of training input _tracking _codes

y coding ilclcl activation pattern (CAM):

(ri)

inconsistent cases

chosen coding node (winner-take-all) base! inc

p

[0,1

J

() 0

('1=0

.! _vigilance

maximizes

(' number of committed coding nodes code

compression

A, A' committed node subsets

T

signal n·om input field to coding nodej CAM

"

(o,x]

1.0 Increased

J rule Gradient (I G)

power CAM mlc

(Jk

signal fi·om coding field to output node k converges to

WT'Ans

W· _{coding node weight vcctorj:}

c''ii

l

p···? X

J

wk

(wpJ

If L >I /:' I

output class weight vector k: _training _simulates

on--epochs line lcarnino b

p vigilance variable

A component-wise minimum (fuzzy tf data F ?.3 4 F-Cold

cross--intersection): (pA q).

=

min(pi,'fi)

subsets validation

I

I· I

vector size ( Lt -norm):

IPI= 2:1Ptl

If voting

v

:: I 5

systems

p

(' vector complement:

(p'),

=

1-

Pi

(6)

!JCNN'03, Portland CAS/CNS 7i'<.'hnicol Report lR-2003-008 4

A. Classification A1ethodo/ogv

This section outlines a canonical classification procedure for training and evaluating supervised learning systems, including ARTMAP.

A. 1. List output classes for the supervised learning problem.

A.2. If possible, estimate an a priori distribution of output classes.

!\.3. If not provided, create a ground truth set for each class

by assigning output labels to n designated set of input vectors.

A.4. Divide the ground truth set into F disjoint subsets. A.5. In each of the F subsets, designate either all ground

truth inputs in that set; or P randomly chosen labeled inputs for each output class (or all inputs in a given class if fewer than P have been labeled). Fix random orderings of designated inputs in each subset.

A.6. Choose one subset for validation, one for testing, and the rest for training.

A. 7. TI·ni11 /1 systems (wJ/el:~·), C<1Cil with/:." presentations of input vectors from one of the ordered training sets (Section JV.B).

A.X. For each voter, choose parameters by validation (if parameter choice is required).

J\.9. Present to each voter all test set inputs. Produce an output class prediction (Jk for each test input (Section lV.C).

J\.1 0. Sum the distributed output class predictions ncross the V voters.

J\.11. Label inputs by one of three methods (breaking tics by random choice):

J\.ll.a. Baseline: Assign the input to the output class k with the largest summed prediction.

A.ll.b. Prior prohnhilities: Select an output class at random according to the estimated a priori

distribution in the data set. Assign that class label to the still-unlabeled input with the largest summed prediction for this class.

A.ll.c. Valida!ion: Bia:-:; the summed output class distribution, evaluating performance on the validation set. One such method j4] selects decision thresholds for each output class, with nn upper bound of l0%1 set for each f~1lse alarm rate. Alternatively, the distributed prediction of each voter (or of the sum) could be

weighted by a steepest descent algorithm. Usc the biased summed distribution to label the input by the baseline or prior probabilities method.

A.l2. Post-training output class ac(justments:

J\.l2.a. Standard posf-Jnocessing methods:

Mapping tasks, for example, may benefit tl·om local image smoothing. Post-processing for speckle removal may be implemented as a simple voting filter which assigns to each pixel the lnbcl originally assigned to a majority of its eight neighbors plus three copies of itself

;\. 1 2. b. Class dis I rihu t ion acUus t men t: Starting with the output class predictions produced by any method (Step A.ll ), target distribution percentages may be adjusted up or down (e.g., based on

inspection of resulting classes), and class labels recomputed by the prior probabilities method.

J\.12.c. False alarm role cu(iustment: ;\ decision threshold fOr an over-represented class may be increased to reduce the validation set f'alsc alarm rate. A.l3. Class!fier Cl-'aluation: Compute average performance

statistics across all combinations

or

training subsets (each with /1 voters). Classifier evaluation ;;1casurcs include test set output class distributions, hit and h1lsc alarm rates f()]' each class, overall accuracy on the test set, pcrJ'ormancc vmiabilitv between tasks. product appearance (e.g., Cor mappi-ng, overall and by overlays for each class), and degree

or

improvcmc1;t by post-processing.

B. Dej(m/t ;JRTMAP Training (Winner-Take-All Code)

The dcf'ault ARTMAP algorithm spccif'icd here 1s a special case of' the distributed ARTMAI' (dARTMAP) algorithm described in 1:13].

B.l. Complement code J\1/-dimcnsional training set feature vectors a to produce 2M-dimensional input vectors A:

A=

(a, a')

and

IAI

=

M

13.2.

Set initial values:

'"U

=

I,

Wjk

=

0,

C

=

I

B.3. Select the first input vector A, with associated actulll

output class K

B.4. Set initial V·.'cights for the ne\vly committed coding node j

=

C:

We=

A

(7)

/JCNN '03, Portland CAS!CNS Technical Report TR-2003-008 5

B.S. Set vigilance

p

to its baseline value:

P=P

and reset the code:

y=O

B.6. Select the next input vector A, \:vith associated actual output class/( (until the last input of the last training epoch)

B.7. Calculate signals to committed coding nodes

j =

l. ..

C

Ti

= lA

A

wil +(I- ex)(

M-

hi)

B.S. Search order: Sort the committed coding nodes \Vith

1~i

>aM

in order of

7j

values (max to min)

B. 9. Search f'or a coding node J that meets the matching criterion and predicts the correct output class K, as Col lows:

B.9.a. Code: For the next sorted coding node

(.i

=

J)

that meets the matching criterion

(JAAw 1J

\

~ M . >

p) .

set VJ =

l (

WTA) B.9.b. Ou!Jmf class prediction:

c

a"

2:

w.ikY.i

=

w.~"

.H

B.9.c Correct prediction: If the active code .!

predicts the actual output class/(

((J/(

=

WJK

=

l)., go to Step B.ll (learning)

l3.9.d A1atch tmcking: If" the active code J n1ils to predict the correct output class

(aK

=

0),

raise vigilance:

l

A 1\

w

II

()= . +E

M

Return to Step B.9.a (continue search).

l3.1 0. After unsuccessfully searching the sorted list, increase

C by 1 (add a committed node). Return to Step H.4

B.l l. Learning: Update coding \vcights:

'"'"' f'(A

old)

(l

[·i)

old W.t

=

J , 1\ WI

+ -

WJ .

Return to Step B.S (next input).

C. De.f(m/t ARTMAP Testing (Distributed Code)

C. 1. Complement code 1\1-dimcnsional test set feature vectors a to produce 2A1-dimensional input vectors A

C.2. Select the next input vector A, with associated actual output class f(

C.3. Reset the code: y

=

0

C.4. Calculate signals to committed coding nodes

j=I. ..

C

Ti

=lA

A

wil+(l-cx)(M-hl)

C.S.

Let

A={A=I ...

C:7),

>aM}

and

/1.'

= {?c=

I...C 7?c

= M}

{?c

=

I ...

C:

wi

=

A}

C.6. Increased Gmdient (/G) CAM Rule:

C.6.a. Point hox case: lf'

i\

1

:j.:q) (i.e .. Wj

=A

l

f'or some j), set Yj

=

fi\1

Cor each j E.!\ 1

C.6.b.

If'/\'=

1J,

set

Yj =

_[LM_-_I -

7 ::..:i.l]_"_

f(w each j EA

i~ M~7)J

C.7. Calculate distributed output class predictions:

c

<h

= '\'

w .,, \' .

/, L..; ./I' •' .I

.H

C.R. Until the last test input, return to Step C.2

C.9. Predict output classes Ji·01n crk values, according to the chosen labeling method (sec Step A.ll)

(8)

!.!CNN'03, Portland CAVCNS Technical Report TR-2003-008

RJ:J:J:RL~NcJ:s

II] S. Grossberg, "The link between brain, learning, attention, and consciousness," Consciousness and Cognition, vol. R~ pp. 1-44, 1999.

ft p :1! ens-ft p. bu.cdu/pub/ d iana/Gro.concog98. ps. gz

12_1 S. Grossberg, "How docs the cerebral cortex work? Development, learning, attention. and 3D vision by laminar circuits of visual cortex," Behavioral and Cogni/il'(: Neuroscience Reviews, in press, 2003.

http: I /www. ens. bu .cdu/Pro f1 Jcs/G rossbcrg/Ci ro 2 003 BCN R.pclf

[3] Ci.A. Carpenter, "Distributed learning, recognition, and prediction by ART and ARTMAP neural networks,"

Neuml Networks, vol. 10, pp. 1473-1494, 1997.

http://cns.bu.cclu/~gail/115dARTNN1997_.pdC

[4] 0. Parsons and G.A. Carpenter, "/\RTMAP neural networks for information fusion and data mining: map production and target recognition mcthodologic~,"

Ne!ural Ne!H10rlis, vol. 16, 2003.

http://cns.bu .cdu/--gail/ ARTM AP __ .map_ 2003 ___ .pdf [SJ G.A. Carpenter and S. Grossberg, "A massively parallel

architecture for a self-organizing neural rattcrn

recognition machine," Computer Vision, Graphics, and

!mage Processing, vol. 37, pp. 54-115, 19R7.

161 T.P. Caudell.. S.D.CJ. Smith, R. Escobedo. and M. Anderson, "NJRS: Large scale ART 1 neural architectures for engineering design retrieval,'' Neural Nctii'Orks, vol. 7, pp. 1339-1350, 1994.

http://cns.bu.cduf,gail/NIRSCaudcll.l994 .pdf J71 W. Strcilcin, A. Waxman, W. Ross, F. Liu, I., M.

Braun, D. Fay, 1'. !Iarmon, and C:.H. Read, "Fused multi-sensor image mining for feature foundation data," Jn Proceedings of" 3"1 international Conf"crence 011

h?(i:Jrmation Fusion, Paris, vol. J, 2000.

prj

P. Li.'>hoa, "Industrial usc of saftcy-rclatcd artificial neural nctoworks." Con/rae! Research Re;)()r/ 32712001,

Liverpool John Moores University, 2001.

http:/ IWW\V. hsc .gov. u k/rcsearcll/crr_ ,Pd f/200 l /crrO 1127 .pd r

["91 C.i.A. Carpenter, S. Grossberg, N. Markuzon, J.H. Reynolds, and D.B. Rosen, "Funv ARTMAP: A neural network architecture for incre1;1ental supervised learning of analog multidimensional maps," JEI:E

Tran'>aclions 011 NC!l!l'al NetH'orks, vol. 3, pp. 69R-713,

1992.

htt p://cns. bu.edu.f.~·gai 1/070_F un.y _A R TM A P 1992. ___ .pdf

110] Ci.A. Carpenter and W.D. Ross, "ART-IoMAP: A neural network architecture for object recognition by evidence accumulation," 1/:'FF Transc/ctions on Neurol NetH'orks, voL 6, pp. 805-R I 8, I 995.

http://cns.bu.edu/~gail/097ART-EMAP 1995 .pdf Ill] G.A. Carpenter and N. Markuzon, "ARTMAP-IC and

medical diagnosis: Instance counting and inconsistent cases," Neural Net1vork\·, vol. I I, pp. 323-336, 1998. http://cns.bu.eduf. .. gail/l 17_ARTMAP-IC_I 998 _.pdf 112] J.R. Williamson, "Gaussian ARTMAP: A neural

network for fast incremental learning of noisy multidimensional maps," Neuml Net\\-"OI'k;~ vol. 9, pp. 881-897, 1998.

http://cns.bu.edu/·-·gai 1/C.i-ART __ W i 11 iamson_ I 99R __ .pdf 1131 G.A. Carpenter, B.L. Milenova, and B.W. Nocskc.

"Distributed ARTMJ\P: a neural netv..:ork for H1st distributed supervised learning," Neural Net1ror/\s, vol.

II, pp. 79l-813, 1998. .

http://cns.bu.edu/ .. ·gai l/120.dARTMAP 199S_.pdf [14] G.A. C..'arpentcr_ S. Cirossbcrg. and J.ll. Reynolds.

"J\RTMAP: Supervised real-time: learning nnd classification or nonstatio1wry data by a self·org~n!zing neural ncl\vork," Neuml NetH'orks, vol. 4. pp. 565-5gx, I 99 I.

http://cns.bu.cduf.·gail/054ARTMAP.l 991 .pdf 115] (l.A. Carpenter and M.N. Gjaja, "Fuzzy ART choice

functions," Proceedings r~j" the World Congress on

Neural lVetwork.'>· (WC/'iN-94), Hillsdale, NJ: Lawrence