Neural Nets: An Introduction for Mathcad Users
Neural Nets: An Introduction for Mathcad Users
Page 1 of 27Page 1 of 27 by Eric Edelsteinby Eric Edelstein MathSo
MathSoft, ft, Inc.Inc.
In this article, we will consider modeling a feed forward network (a special type of weighted In this article, we will consider modeling a feed forward network (a special type of weighted directed graph
directed graph) after t) after t he way a brhe way a brain operaain operates tes and begin looking at algorithms tand begin looking at algorithms t hat teach hat teach thethe network how to learn.
One
One of the things that maof the things that makes humans efficient is our abkes humans efficient is our ability to changeility to change. . This manifests itThis manifests it self inself in many ways. The first is that our brains need not be completely redesigned just to change our many ways. The first is that our brains need not be completely redesigned just to change our lunch ord
lunch order wer when when we find theye find they're 're out of octopus sukiyaki. out of octopus sukiyaki. This is non-triviaThis is non-trivial. l. ConConsider thesider the adv
advantages our flexibantages our flexibility has overility has over, for example, a microchip. , for example, a microchip. The electronic circThe electronic circuits may beuits may be able to perform many different operations, but the number is finite and the abilities don't able to perform many different operations, but the number is finite and the abilities don't change over
change over time. time. If you have If you have an OR an OR gate, it musgate, it mus t be taken apart and t be taken apart and rebuilt if you want anrebuilt if you want an AND g
AND gate. ate. If you wanIf you want it t it to add numbeto add numbers, you have rs, you have to compile quite a few components.to compile quite a few components.
A human, however, can add to his/her stockpile of abilities without adding brain cells. How A human, however, can add to his/her stockpile of abilities without adding brain cells. How does this happen? No one knows exactly, but certain ideas in the theory of learning are does this happen? No one knows exactly, but certain ideas in the theory of learning are getting clearer, and some can be modeled on a computer. We are now in the age where getting clearer, and some can be modeled on a computer. We are now in the age where
computers can be taught to learn new tricks. That is, a program representing a neural net can computers can be taught to learn new tricks. That is, a program representing a neural net can be made to learn infinitely many different routines (one at a time). That makes it extremely be made to learn infinitely many different routines (one at a time). That makes it extremely flexible, and he
flexible, and hence, powerful. nce, powerful. NeuNeural nets have beral nets have been created that mimic and anticipateen created that mimic and anticipate human behavior, run machinery in automated factories, read books aloud, make complex human behavior, run machinery in automated factories, read books aloud, make complex financial decisions, and a host of other impressive tasks.
One of the most common tasks required of a neural net is the recognition of patterns and One of the most common tasks required of a neural net is the recognition of patterns and reaction to them in some manner. This will be demonstrated in this article. The reason for reaction to them in some manner. This will be demonstrated in this article. The reason for this particular emphasis is that once a neural net can find a pattern, it can start predicting. this particular emphasis is that once a neural net can find a pattern, it can start predicting. The art of prediction is an old one. There are many and varied statistical techniques to The art of prediction is an old one. There are many and varied statistical techniques to approximate predictions. However, neural nets have been shown to be more accurate on approximate predictions. However, neural nets have been shown to be more accurate on some occasions. Also, unlike a standard statistical program which allows for one set of some occasions. Also, unlike a standard statistical program which allows for one set of analyses, the same neural net can learn to do different analyses on different kinds of data. analyses, the same neural net can learn to do different analyses on different kinds of data. It will just need to be retrained. However, the most fundamental difference is one of action. It will just need to be retrained. However, the most fundamental difference is one of action. A neural net will not only predict, but will also act in accordance with this prediction as we A neural net will not only predict, but will also act in accordance with this prediction as we shall see later on.
shall see later on.
Now, consider the cellular make up of a brain: neurons. There are millions of neurons Now, consider the cellular make up of a brain: neurons. There are millions of neurons interconnected along axons. The center of a neuron receives stimuli and decides, interconnected along axons. The center of a neuron receives stimuli and decides,
somehow, whether or not to send a signal to neighboring neurons. If it decides to send out somehow, whether or not to send a signal to neighboring neurons. If it decides to send out a signal, an electrical burst, it does so through the axons. This is how the brain makes its a signal, an electrical burst, it does so through the axons. This is how the brain makes its own
own predpredictiictions and actions. ons and actions. Given Given this descthis desc ription, the brain can ription, the brain can be thoughbe thought of as a grapht of as a graph with the main neuron body represented by a node or vertex and the edges representing with the main neuron body represented by a node or vertex and the edges representing axons.
These graphs (also called networks) contain points, called nodes or vertices; the line These graphs (also called networks) contain points, called nodes or vertices; the line segments connecting these nodes are called ed
segments connecting these nodes are called edges. ges. The endThe endpoints of an edge points of an edge areare its vertices
its vertices. . An orienAn orientation of the edges is a choice of station of the edges is a choice of s tarting and etarting and ending vnding vertex forertex for the edge. Usually, we draw an arrow on an oriented edge pointing from the initial to the edge. Usually, we draw an arrow on an oriented edge pointing from the initial to the final node. If each edge of the graph has an orientation, the graph is called a the final node. If each edge of the graph has an orientation, the graph is called a directed graph (or digraph, for short).
directed graph (or digraph, for short).
A graph with this association and the inherent implications that brings is called a A graph with this association and the inherent implications that brings is called a neur
neural networal network (or a neurk (or a neural net, for sal net, for short). hort). We shall restrict We shall restrict ourselveourselves ts to the study ofo the study of neural nets of a certain form: we assume our neural nets are layered and
neural nets of a certain form: we assume our neural nets are layered and forward-feed.
forward-feed. These are These are weighted, directweighted, direct ed ged graphs with nodes that can be raphs with nodes that can be broken upbroken up into discrete vertical layers (that is, the nodes lie on vertical slices through the graph). into discrete vertical layers (that is, the nodes lie on vertical slices through the graph). The orientations given to the edges are the same throughout the graph, either left to The orientations given to the edges are the same throughout the graph, either left to right or
right or right to left. right to left. In this article we In this article we will use the will use the convenconvention of left to rightion of left to right. t. Such aSuch a digraph looks like:
With the edge orientation of With the edge orientation of left to right, the leftmost layer left to right, the leftmost layer is called the input layer, the is called the input layer, the rightmost,
rightmost, the output the output layerlayer,, and all those between, the and all those between, the hidden layers. Nodes are hidden layers. Nodes are often called units, making often called units, making the leftmost ones, input the leftmost ones, input units, the rightmost, output units, the rightmost, output units, and those in between, units, and those in between, hidden un
hidden unitsits..
As mentioned earlier, we are concerned not only with the choices of edges and nodes, but As mentioned earlier, we are concerned not only with the choices of edges and nodes, but also with
also with the weighting of the weighting of them. them. TTo do determine what etermine what the weighting should the weighting should be, let's return tobe, let's return to the brain.
the brain. If a neuron reIf a neuron receives a very ceives a very small ssmall stimulus, timulus, it does not fit does not fire. Once, howevire. Once, howeverer, it , it doesdoes receive
receive a significant enough stimulus, it fa significant enough stimulus, it fires a complete bires a complete burst. urst. It fIt follows the aollows the all-or-ll-or-nonenone principal.
principal. The The cut-off value cut-off value for stimuli is called a for stimuli is called a threshold. threshold. It is the amouIt is the amount of stimulus for ant of stimulus for a particular neuron below which no reaction signal will be sent.
In modeling graphs after brains, we associate to each node,
In modeling graphs after brains, we associate to each node, , a , a threshold valuethreshold value,, , rather, rather like a transistor has in a logic gate.
like a transistor has in a logic gate.
The axon connections may be very strong or weak. That is, the signal sent from one neuron The axon connections may be very strong or weak. That is, the signal sent from one neuron to another via a particular axon may be completely passed on, or it may be impeded. This to another via a particular axon may be completely passed on, or it may be impeded. This can be thoug
can be thought of as tht of as the strength of the connection. he strength of the connection. The degThe degree oree of connection betweenf connection between those t
those two neuwo neurons will reflect rons will reflect how interdhow interdependependent they are. This ent they are. This strength between them isstrength between them is used to define the w
used to define the weights on the edgeeights on the edges in the neuras in the neural net. l net. If tIf the whe weight is closeight is close to zero on ane to zero on an edge between two nodes, then we can think of these two units as having little effect on each edge between two nodes, then we can think of these two units as having little effect on each other. If, on the other hand, the weights are high in absolute value, then the units' effect on other. If, on the other hand, the weights are high in absolute value, then the units' effect on each other is
each other is strong. The weight on tstrong. The weight on t he edge from verhe edge from vertextex to vertexto vertex is denotedis denoted ww
At this point we've completed the fundamental association between a simplified brain At this point we've completed the fundamental association between a simplified brain and a
It remains only to show how signals are passed along. Let's say that we're in the middle of a It remains only to show how signals are passed along. Let's say that we're in the middle of a neural net at a vertex,
neural net at a vertex, . . It wouIt would loold look something k something like:like:
Where
Where x1x1 throughthrough x4x4 represent the strengths of the impulses that have been sent to thisrepresent the strengths of the impulses that have been sent to this node,
node, . The effect of. The effect of x1x1onon will be determined by the strength of the connectionwill be determined by the strength of the connection WW11 . So. So by defining our weights correctly the effect of
by defining our weights correctly the effect of x1x1onon will be the productwill be the product WW11 x1x1. Taking the. Taking the other incoming impulses into consideration,
i i
1 1 4
4 ii xi xi WW iiνν
The reaction atThe reaction at to this impulse must be dto this impulse must be determinedetermined. . First we muFirst we must sst see if the ee if the incomingincoming signal passes the threshold test. To do this, subtract the threshold from the impulse and signal passes the threshold test. To do this, subtract the threshold from the impulse and determine if the result is positive or negative. Then, a response function of some kind, determine if the result is positive or negative. Then, a response function of some kind, called the activation function, will act on the impulse, provided it is above the threshold called the activation function, will act on the impulse, provided it is above the threshold level.
level. We peWe perform these two steps as one by rform these two steps as one by assuming some stassuming some structure on ructure on the function.the function. We will assume that the activation function will treat positive numbers and negative We will assume that the activation function will treat positive numbers and negative numbe
numbers differentlyrs differently. . That is, That is, the function values for a pothe function values for a positive input will corresponsitive input will correspond to thed to the neur
neuron firing. on firing. The functThe function values for ion values for neganegative input tive input valuevalues s will correspond to non-firing.will correspond to non-firing. With this we find the response at
With this we find the response at to the stimulus is:to the stimulus is:
f f ii xi xi WW iiνν
ττνν
A typical activation function might be
A typical activation function might be f xf ( ( ) x)
( ( x 0x 0
)) x x
55
4.9954.995
554 4 22 0 0 2 2 44 0.5 0.5 0 0 0.5 0.5 1 1 1.5 1.5 f f x( ( ))x x x ττ ν ν
33 This is an example of an all or none response. Note what it wouldThis is an example of an all or none response. Note what it would look like when applied in a neural net with a threshold value 3: look like when applied in a neural net with a threshold value 3:
f f x( ( ) x)
xx
ττνν
00
0.5 0.5 0 0 0.5 0.5 1 1 1.5 1.5 f f x( ( ))x x xTo get an idea of what's going on geometrically, let's consider To get an idea of what's going on geometrically, let's consider two impulses going to a unit with the same threshold of 3. Let's two impulses going to a unit with the same threshold of 3. Let's say one edge has a weight of a half, and the other a quarter.
say one edge has a weight of a half, and the other a quarter. ww1 1
..55 ww2 2
..2255 f x1 x2 f x1 x2( (
) )
xx1 1 w
w11
x2 w2x2 w2
ττνν
00
i i
55
1010 j j
55
1010 MM i i 55 ( ( ) j ) j 5( ( 5))
f f i i jj( (
)) TheThe zz-ax-axis is describes tdescribes t he node'he node'ss reaction output to the two stimuli reaction output to the two stimuli x1x1 and
and x2x2, plotted in the, plotted in the x-yx-y plane.plane.
The neural net to the left of the The neural net to the left of the node
Now that we know how a single node Now that we know how a single node reacts to stimuli, we can determine reacts to stimuli, we can determine the outputs of the output units for a the outputs of the output units for a choice of input units. We consider a choice of input units. We consider a very simple neural net:
very simple neural net:
There are two input nodes,
There are two input nodes, 11 andand 22, three hidden units,, three hidden units, 33,, 44, and, and 55, and one output, and one output node
node 66..
Let's assign some weights to the edges. Let's assign some weights to the edges.
w
w113 3 ww114 4 ww224 4 ww225 5 ww336 6 ww446 6 ww5566 (
( ))
( ( 11 11 11 11 11
2 2 11))We must decide upon an activation function. Let's choose:
P
Piicck tk thhrreesshhoolld d vvaalluueess:: NoNow w ppiicck tk thhe e iinnppuut vt vaalluueess::
y1 y1 y2 y2
1 1 1 1
τ τ33 τ τ44 τ τ55 τ τ66
0 0 1.5 1.5 0 0 .5 .5
For the input layer we assume the thresholds are zero and the activation function is the For the input layer we assume the thresholds are zero and the activation function is the identity, so that the signal put into
identity, so that the signal put into 11 is the same as the signal coming out fromis the same as the signal coming out from 11..
y
y3 3
f yf
y1 1 w
w1133
ττ33
yy4 4
f yf y1
1 w
w1144
y2 w24y2 w24
τ44τ
yy5 5
f yf y2
2 w
w2255
ττ55
yy6 6
f y3 f y
3 w
w3366
y4 w46y4 w46
y5 w56y5 w56
ττ66
The output unit for the corresponding input pattern is
The output unit for the corresponding input pattern is y6 y6
00Do you recognize this binary function? (Hint: It's one of the standard logical operations.) Do you recognize this binary function? (Hint: It's one of the standard logical operations.)
Let's now
Let's now consider how consider how to change the neto change the net. t. Thinking of the grapThinking of the graph as a brah as a brain, it sin, it s eems cleareems clear that as learning goes on, the vertices (neurons) aren't going to go wandering all about. that as learning goes on, the vertices (neurons) aren't going to go wandering all about. That is,
That is, as we learnas we learn, the c, the cellular structure of the brain can't move ellular structure of the brain can't move arounaround verd very much. y much. It It waswas found that as we learn, the chemical structure of the brain does change in small local ways. found that as we learn, the chemical structure of the brain does change in small local ways. When we learn to do something, or not to do something else, various connections
When we learn to do something, or not to do something else, various connections between the
between the neurons are neurons are either strengteither strengthened hened or wor weakened. eakened. This cThis corresponds to aorresponds to a
change on the edge weights of our network. We start with the simplest type of neural net, a change on the edge weights of our network. We start with the simplest type of neural net, a two layer
two layered, feed-forwaed, feed-forward net. We will srd net. We will s how hohow how the weight changes takw the weight changes tak e place. e place. SinceSince there are only two layers, and in every feed forward net there is both an input and output there are only two layers, and in every feed forward net there is both an input and output layer, there can be no hidden units.
layer, there can be no hidden units.
We say that a layered We say that a layered graph is fully connected if graph is fully connected if every node in each layer is every node in each layer is connected to every other connected to every other node in the next layer to the node in the next layer to the right. It generally looks like: right. It generally looks like:
Note that nodes in one Note that nodes in one layer aren't connected layer aren't connected to any other nodes in to any other nodes in the same layer. This is the same layer. This is always the case in always the case in layered neural nets. layered neural nets.
There is a routine that we can carry out so that the neural net can figure out what the weights There is a routine that we can carry out so that the neural net can figure out what the weights on the edges should be to realize a certain set of fixed reactions. We feed the net specific on the edges should be to realize a certain set of fixed reactions. We feed the net specific inputs with known desired outputs. We compare the network's output with the desired inputs with known desired outputs. We compare the network's output with the desired output and change weigh
output and change weights ts accordinglyaccordingly. T. This routine is his routine is then repeated unthen repeated until til all outputs all outputs areare correct for all inputs.
correct for all inputs.
Essentially, this can be thought of as a pattern recognition problem. Let's say that we have Essentially, this can be thought of as a pattern recognition problem. Let's say that we have a 2 layer neural net with two input nodes, and one output. We might want to teach the net to a 2 layer neural net with two input nodes, and one output. We might want to teach the net to produce the result
produce the result 11 ANDAND 22 for the outputfor the output 33, using the following logic table:, using the following logic table:
The net must be trained to recognize the pattern (1,1) as 1 and the other three as 0, in the The net must be trained to recognize the pattern (1,1) as 1 and the other three as 0, in the same way as you apply a name to a face.
For problems of this type it is often convenient to talk about input and output patterns. We've For problems of this type it is often convenient to talk about input and output patterns. We've already mentioned that the input can be thought of as a pattern. The output can be thought of already mentioned that the input can be thought of as a pattern. The output can be thought of one as w
one as well. ell. Consider a big Consider a big neural net wneural net with one input node, some hidden units, and 64 outputith one input node, some hidden units, and 64 output nodes arranged as an 8 by 8 square. We could train the net that given an input of 0 to send nodes arranged as an 8 by 8 square. We could train the net that given an input of 0 to send 1's to the outer most units of the square, and 0's to all others. We could in addition, teach it that 1's to the outer most units of the square, and 0's to all others. We could in addition, teach it that given in input of 1, it should produce outputs of 1's to the fourth and fifth columns in the square given in input of 1, it should produce outputs of 1's to the fourth and fifth columns in the square of output units, and 0's to the others. It would look like:
of output units, and 0's to the others. It would look like:
The 0'
The 0's have been s have been left out fleft out f or clarityor clarity. . The ellipse in the middle reprThe ellipse in the middle represents tesents the hiddehe hidden units.n units. The square r
The square repreepresents sents the output units in an 8 by 8 square. the output units in an 8 by 8 square. As you can see, tAs you can see, t he output nowhe output now represents a pattern in the visual sense. The output looks like the numeral for the input (well, represents a pattern in the visual sense. The output looks like the numeral for the input (well, sort of).
As far as the computer is concerned, the neural net is a function, As far as the computer is concerned, the neural net is a function,
f:R
f:R R
R
6464with the following property: with the following property:
In this way we realize that pattern recognition and learning the action of a fixed function are the In this way we realize that pattern recognition and learning the action of a fixed function are the same in principle.
same in principle.
With this
With this in mind, tin mind, t here is a lhere is a learnearning algorithm which teaches ing algorithm which teaches the two layered feed-the two layered feed-forwaforwardrd neural net to recognize patterns. It works as follows:
Note: By "binary" in this section, we mean the set
Note: By "binary" in this section, we mean the set {-1,1}{-1,1} (we use -1 instead of the usual 0).(we use -1 instead of the usual 0). Assume we start with the edges having random weights assigned to them. Then, given an Assume we start with the edges having random weights assigned to them. Then, given an input pattern
input pattern II (some sequence of -1's and 1's), there is an output pattern(some sequence of -1's and 1's), there is an output patternZZ (a number,(a number, though in general, not the correct one) and a corresponding desired output pattern
though in general, not the correct one) and a corresponding desired output patternOO (also(also a number).The weights going out from the
a number).The weights going out from the vvthth input unit must be changed by adding:input unit must be changed by adding:
Δ Δww
v
v ==εε
OO
IIvv
[ [ 1 1
( ( O O == ZZ))]] so thatso that wwvv ==wwvv
εε
OO
IIvv
[ [ 1 1
( ( O O == ZZ))]]is a small increment. We find the direction for the change from the
is a small increment. We find the direction for the change from the OIOIvv(1-(O=Z)) part. The(1-(O=Z))part. The step size is given by
step size is given by . Note that if the net's output is the ideal desired output (i.e., it has. Note that if the net's output is the ideal desired output (i.e., it has learned to identify that pattern or function correctly), then
learned to identify that pattern or function correctly), thenO=ZO=Z. In this case. In this case w=0w=0 for the net,for the net, so no changes will take place. This follows the "if it ain't broke, don't fix it" principle of higher so no changes will take place. This follows the "if it ain't broke, don't fix it" principle of higher computer science.
computer science. Since a function usually Since a function usually consisconsists of correctly identifying ts of correctly identifying severaseveral patternsl patterns (one pattern for each point in its domain), we would like to see this net learn several different (one pattern for each point in its domain), we would like to see this net learn several different patterns concurrently. This is one of the real advantages of the neural net model. It can learn patterns concurrently. This is one of the real advantages of the neural net model. It can learn severa
several diffl diff erenerent t things without cthings without c hanginhanging its g its basic basic structstruct ure. Yure. You can have a neural net learnou can have a neural net learn the AN
the AND function, D function, and then with a cand then with a changhange of e of weighweights ts learn the OR function. learn the OR function. No new circuitry isNo new circuitry is need
The more patterns we try to make the net learn, the more likely it will incorrectly remember a The more patterns we try to make the net learn, the more likely it will incorrectly remember a previously learned
previously learned pattern. pattern. LuckLuckily, the ily, the weights won't havweights won't have changed e changed much (withmuch (with
small), sosmall), so we keep training and retraining. In certain cases, it has been proven that this method must we keep training and retraining. In certain cases, it has been proven that this method must converge to successful several pattern recognition in a finite number of steps. This problem converge to successful several pattern recognition in a finite number of steps. This problem is very much like the tent peg problem. It's easy to nail in one peg, but while nailing the is very much like the tent peg problem. It's easy to nail in one peg, but while nailing the second peg, you'vsecond peg, you've loosened the firse loosened the first, t, which then has twhich then has to get rehammereo get rehammered..d..
One final improve
One final improvement before continuing. Sincment before continuing. Since we want to be able te we want to be able to change the to change the thresholdshresholds as the network learns, we treat them as weights for new edges. To do this we add a new as the network learns, we treat them as weights for new edges. To do this we add a new node for each different threshold in the net. When we give the net its input patterns, we make node for each different threshold in the net. When we give the net its input patterns, we make sure the value of 1 goes to the nodes providing threshold values. The weight on an edge sure the value of 1 goes to the nodes providing threshold values. The weight on an edge connecting such a vertex to the next layer will work as a threshold.
connecting such a vertex to the next layer will work as a threshold.
Let's try an example: Say we want the computer to come up with a neural net that will produce Let's try an example: Say we want the computer to come up with a neural net that will produce an AND function. We start with a net that has two input nodes, one output node, and no hidden an AND function. We start with a net that has two input nodes, one output node, and no hidden units. This is only a guess. In general it is a difficult problem to know how many units are units. This is only a guess. In general it is a difficult problem to know how many units are needed to solve your problem, and if it's solvable by these methods at all. Let's assume that needed to solve your problem, and if it's solvable by these methods at all. Let's assume that all thresholds will be the same through the learning. In this case it is sufficient to add only one all thresholds will be the same through the learning. In this case it is sufficient to add only one input node (which will always get an input value of 1). The network looks like this:
With a little foresight and a hunch based on our choice of the binary system as
With a little foresight and a hunch based on our choice of the binary system as {-1,1}{-1,1} wewe chose the activation function accordingly:
chose the activation function accordingly:
f f 0( ( ) 0)
00 f f x( ( ))x xx x x
( ( x x == 00))
4 4 22 0 0 2 2 44 1 1 0 0 1 1 f f x( ( ))x x x f f 5( ( ) 5)
11 f f 5( (
5) )
11k
k
0 0 2
2We start
We start with the weights swith the weights s et randomlyet randomly. Let's . Let's try:try:
w w
0
0
11 ww11
00 ww22
22 εε
.3.3For this network, the output,
For this network, the output, ZZ is given by:is given by:
Z Z( ( νν00
νν11
νν22) ) f f νν0 0 ww 0 0
νν1 1 ww 1 1
νν2 2 ww 2 2
I
I
( ( ( ( 1 1
11
11))))TT O O
11The actual output is:
The actual output is: ZZ11
Z Z 1 ( ( 1 1
1
11)) ZZ11
11The change of weights:
The change of weights: εε
OOIIk k
1 1
O O ==ZZ11
0 0 0 0 0 0
Change the weights:
Change the weights: ww
k k
wwk k
εε
OO
IIk k
1 1
O O == ZZ11
New weights: New weights: ww 0 0
11 ww11
00 ww22
22The second pattern (1,1,-1). This has an ideal output of -1. The second pattern (1,1,-1). This has an ideal output of -1.
O O
11 II
( ( ( ( 1 1 1 1 1
1))))TTThe actual output is:
The actual output is: ZZ22
Z Z 1 ( ( 1 1
1
11)) ZZ22
11The change of weights:
The change of weights: εε
OOIIk k
1 1
O O ==ZZ22
0 0 0 0 0 0
Change the weights:
Change the weights: ww
k k
wwk k
εε
OO
IIk k
1 1
O O == ZZ22
New weights: New weights: ww 0 0
11 ww11
00 ww22
22I
I
( ( ( ( 1 1
11 11))))TT O O
11The actual output is:
The actual output is: ZZ33
Z Z 1 ( ( 1 1
1
11)) ZZ33
11The change of weights:
The change of weights: εε
OOIIk k
1 1
O O ==ZZ33
0.3 0.3
0.3 0.3 0.3 0.3
Change the weights:
Change the weights: ww
k k
wwk k
εε
OO
IIk k
1 1
O O == ZZ33
New weights: New weights: ww 0 0
0.70.7 ww11
0.30.3 ww22
1.71.7The fourth pattern (1,1,1). This has an ideal output of +1. The fourth pattern (1,1,1). This has an ideal output of +1.
I
I
( ( ( ( 1 1 1 1 11))))TT O O
11The actual output is:
The actual output is: ZZ44
Z Z 1 ( ( 1 1
1
11)) ZZ44
11The change of weights: The change of weights:
ε ε
OOII k k
1 1
O O ==ZZ44
0 0 0 0 0 0
Change the weights:
Change the weights: ww
k k
wwk k
εε
OO
IIk k
1 1
O O == ZZ44
New weights: New weights: ww 0 0
0.70.7 ww11
0.30.3 ww22
1.71.7At this point we've made a pass through each pattern exactly once. We repeat this At this point we've made a pass through each pattern exactly once. We repeat this procedure several times, until the weights stabilize. To do this, change the initial procedure several times, until the weights stabilize. To do this, change the initial assignments
assignments of the of the weigweights hts to tto the edges (whehe edges (where the big red arrow is.) Then pagere the big red arrow is.) Then page down to see what the new weights should be.
down to see what the new weights should be.
Eventually, you will see that the matrices of weight changes is zero. At this point the Eventually, you will see that the matrices of weight changes is zero. At this point the weig
weights hts stop cstop c hanginhanging, and the g, and the output will be toutput will be the correctly predicted and desired outputhe correctly predicted and desired output for each pattern. This should take six complete passes starting with
for each pattern. This should take six complete passes starting withw0=1w0=1,, w1=0w1=0, and, and w2=2
w2=2..
In Future Issues:
In Future Issues:
BIG
BIG
, Multi
, Multilayered neu
layered neural nets,
ral nets,
Gradient Descent Learning, and
Gradient Descent Learning, and
Back P
References References
1. Drew V
1. Drew Van Camp, "Neuroan Camp, "Neurons ns for Computers," Scfor Computers," Sc ientific Aientific American, Sept. merican, Sept. 1992, pp.170-11992, pp.170-172.72.
2. R. C. Lacher, Artificial Neural Networks, An Introduction to the Theory and Practice. 2. R. C. Lacher, Artificial Neural Networks, An Introduction to the Theory and Practice. Lecture Notes, Version 1, Oct
Lecture Notes, Version 1, Oct ober 19, 1991.ober 19, 1991.
3. Patrick Shea and Vincent Lin, "Detection of Explosives in Checked Airline Baggage 3. Patrick Shea and Vincent Lin, "Detection of Explosives in Checked Airline Baggage Using an Artificial Neural System," Science Applications International Corporation, Santa Using an Artificial Neural System," Science Applications International Corporation, Santa Clara, CA.