7.6 Storage capacity

Chapter Five The delta rule

The storage prescription attempts to capture information about the mean correlation of components in the training set. As such, it must induce a weight set that is a compromise as far as any individual pattern is concerned. Clearly, as the number m of patterns increases, the chances of accurate storage must decrease since more trade-offs have to be made between pattern requirements. In some empirical work in his 1982 paper, Hopfield showed that about half the memories were stored accurately in a net of N nodes if m=0.15N. The other patterns did not get stored as stable states. In proving rigorously general results of this type, it is not possible to say anything about particular sets of patterns so that all results deal with probabilities and apply to a randomly selected training set. Thus McEliece et al. (1987) showed that for m<N/2logN, as N becomes very large, the probability that there is a single error in storing any one of the patterns becomes ever closer to zero. To give an idea of what this implies, for N=100 this result gives m=11.

7.7

The analogue Hopfield model

In a second important paper (Hopfield 1984) Hopfield introduced a variant of the discrete time model discussed so far that uses leaky-integrator nodes. The other structural difference is that there is provision for external input. The network dynamics are now governed by the system of equations (one for each node) that define the node dynamics and require computer simulation for their evaluation.

In the previous TLU model, the possible states are vectors of Boolean-valued components and so, for an N-node network, they have a geometric interpretation as the corners of the N-dimensional hypercube. In the new model, because the outputs can take any values between 0 and 1, the possible states now include the interior of the hypercube. Hopfield defined an energy function for the new network and showed that, if the inputs and thresholds were set to zero, as in the TLU discrete time model, and if the sigmoid was quite “steep”, then the energy minima were confined to regions close to the corners of the hypercube and these corresponded to the energy minima of the old model. The use of a sigmoid output function in this model has the effect of smoothing out some of the smaller, spurious minima in a similar way to that in which the Boolean model can escape spurious minima by using a sigmoid.

There are two further advantages to the new model. The first is that it is possible to build the new neurons out of simple, readily available hardware. In fact, Hopfield writes the equation for the dynamics as if it were built from such components (operational amplifiers and resistors). This kind of circuit was the basis of several implementations—see for example Graf et al. (1987). The second is a more philosophical one in that the use of the sigmoid and time integration make greater contact with real, biological neurons.

7.8

Combinatorial optimization

Although we have concentrated exclusively on their role in associative recall, there is another class of problems that Hopfield nets can be used to solve, which are best introduced by an example. In the so-called travelling salesman problem (TSP) a salesman has to complete a round trip of a set of cities visiting each one only once and in such a way as to minimize the total distance travelled. An example is shown in Figure 7.13 in which a set of ten cities have been labelled from A through to J and a solution indicated by the linking arrows. This kind of problem is computationally very difficult and it can be shown that the time to compute a solution grows exponentially with the number of cities N.

In 1985, Hopfield and Tank showed how this problem can be solved by a recurrent net using analogue nodes of the type described in the previous section. The first step is to map the problem onto the network so that solutions correspond to states of the network. The problem for N cities may be coded into an N by N network as follows. Each row of the net corresponds to a city and the ordinal position of the city in the tour is given by the node at that place outputting a high value (nominally 1) while the rest are all at very low values (nominally 0). This scheme is illustrated in Figure 7.14 for the tour of Figure 7.13.

Since the trip is a closed one it doesn’t matter which city is labelled as the first and this has been chosen to be A. This corresponds in the net to the first node in the row for A being “on” while the others in this row are all “off”. The second city is F, which is indicated by the second node in the row for F being “on” while the rest are “off”. Continuing in this way, the tour eventually finishes with city I in tenth position.

The next step is to construct an energy function that can eventually be rewritten in the form of (7.3) and has minima associated with states that are valid solutions. When this is done, we can then identify the resulting coefficients with the weights w_ij. We will not carry this through in detail here but will give an example of the way in which constraints may be captured directly via the energy. The main condition on solution states is that they should represent valid tours; that is, there should be only one node “on” in each row and column. Let nodes be indexed according to their row and column so that yx_i is the output of the node for city X in tour position i and consider the sum

(7.10)

Each term is the product of a pair of single city outputs with different tour positions. If all rows contain only a single “on”

unit, this sum is zero, otherwise it is positive. Thus, this term will tend to encourage rows to contain at most a single “on”

unit. Similar terms may be constructed to encourage single units being “on” in columns, the existence of exactly ten units

“on” in the net and, of course, to foster a shortest tour. When all these terms are combined, the resulting expression can indeed be written in the form of (7.3) and a set of weights extracted³. The result is that there are negative weights (inhibition) between nodes in the same row, and between nodes in the same column. The path-length criterion leads to inhibition between adjacent columns (cities in a path) whose strength is proportional to the path length between the cities for those nodes. By allowing the net to run under its dynamics, an energy minimum is reached that should correspond to a solution of the problem. What is meant by “solution” here needs a little qualification. The net is not guaranteed to produce the shortest tour but only those that are close to this; the complexity of the problem does not vanish simply because a neural net has been used.

The TSP is an example of a class of problems that are combinatorial in nature since they are defined by ordering a sequence or choosing a series of combinations. Another example is provided from graph theory. A graph is simply a set of vertices connected by arcs; the state transition diagram in Figure 7.6 is a graph, albeit with the extra directional information on the arcs. A clique is a set of vertices such that every pair is connected by an arc. The problem of finding the largest clique in a graph is one of combinatorial optimization and also takes time which is exponentially related to the number of vertices.

Jagota (1995) has demonstrated the possibility of using a Hopfield net to generate near optimal solutions using a network with the same number of nodes as graph vertices.

Figure 7.13 Travelling salesman problem: an example.

7.9

Feedforward and recurrent associative nets

Consider the completely connected, feedforward net shown in Figure 7.15a. It has the same number of inputs and outputs and may therefore be used as an associative memory as discussed in Section 7.2 (for greater realism we might suppose the net has more than three nodes and that the diagram is schematic and indicative of the overall structure). Suppose now that the net processes an input pattern. We imagine that, although the net may not have managed to restore the pattern to its original form (perfect recall), it has managed to produce something that is closer to the stored memory pattern than the original input. Now let this pattern be used as a new input to the net. The output produced now will, in general, be even closer to a stored memory and, iterating in this way, we might expect eventually to restore the pattern exactly to one of the stored templates.

To see how this is related to the recurrent nets discussed in this chapter, consider Figure 7.15b, which shows the network output being fed back to its input (the feedback connections are shown stippled). Making the process explicit in this way highlights the necessity for some technical details: there must be some mechanism to allow either the feedback or external input to be sent to the nodes and, during feedback, we must ensure that new network outputs about to be generated do not interfere with the recurrent inputs. However, this aside, the diagram allows for the iterative recall described above.

Figure 7.15c shows a similar net, but now there are no feedback connections from a node to itself; as a result, we might expect a slightly different performance.

Now, each input node may be thought of as a distribution point for an associated network output node. Part (d) of the figure shows the input nodes having been subsumed into their corresponding output nodes while still showing the essential network connectivity. Assuming a suitable weight symmetry, we now clearly have a recurrent net with the structure of a Hopfield net.

Since, in the supposed dynamics, patterns are fed back in toto, the recurrent net must be using synchronous dynamics (indeed the mechanism for avoiding signal conflict in the “input nodes” above may be supplied by sufficient storage for the previous and next states, a requirement for such nets as discussed in Sect. 7.4.3).

In summary we see that using a feedforward net in an iterated fashion for associative recall may be automated in an equivalent recurrent (feedback) net. Under suitable assumptions (no self-feedback, symmetric weights) this becomes a Hopfield net under asynchronous dynamics. Alternatively, we could have started with the Hopfield net and “unwrapped” it, as it were, to show how it may be implemented as a succession of forward passes in an equivalent feedforward net.

Figure 7.14 Network state for TSP solution.

Figure 7.15 Relation between feedforward and recurrent associative nets.

TLUs with zero threshold in which every node takes input from all other nodes (except itself) and where the interunit weights are symmetric.

Under asynchronous operation, each node evaluates its inputs and “fires” accordingly. This induces a state transition and it is possible (in principle) to describe completely the network dynamics by exhaustively determining all possible state transitions. Alternative dynamics are offered by running the net synchronously, so that the net operation is deterministic. The stable states are, however, the same as under asynchronous dynamics. For nets of any significant size the problem of finding the state transition table is intractable but, since we are only interested in equilibrium states, this is not too important. It is here that the energy formalism comes into its own and enables us to demonstrate the general existence of stable states. The energy is defined by thinking of the net as instantiating a series of pairwise constraints (via the weights) on the current network state.

If these constraints (the weights) are imposed by the pairwise statistics of the training set, then these will tend to form the stable states of the net and, therefore, constitute stored memories. Although the weights may be calculated directly, they may also be thought of as evolving under incremental learning according to a rule based on a description of biological synaptic plasticity due to Hebb.

Hopfield nets always store unwanted or spurious states as well as those required by the training set. In most cases this is not a problem since their energy is much higher than that of the training set. As a consequence their basins of attraction are smaller and they may be avoided by using a small amount of noise in the node operation. In order to store a given number of patterns reliably the net must exceed a certain size determined by its storage capacity.

Hopfield developed an analogue version of the binary net and also showed how problems in combinatorial optimization may be mapped onto nets of this type.

7.11 Notes

1. Since the Hopfield net is recurrent, any output serves as input to other nodes and so this notation is not inconsistent with previous usage of x as node input.

2. In this discussion the term “state” refers to the arrangement of outputs of i and j rather than the net as a whole.

3. The analogue net has, in fact, an extra linear component due to the external input.

Chapter Eight

In document Kevin Gurney - An Introduction to Neural Networks (Page 75-79)