Recurrent networks
2. OTHER RECURRENT NETWORK MODELS
discrete points in time. CTRNNs have become popular in the field of evolutionary robotics where the weights are normally determined using artificial evolution. The task of finding the proper weights is more complicated than in standard feedforward networks, because there is a complex intrinsic dynamics (due to the recurrent con- nections). This intrinsic dynamics can be exploited by the robot, as this intrinsic activation can be used to implement a kind of short term or working memory.
CTRNNs can be formally described as:
τ yi0=−yi+ N X j=1 wijσ(yj−θj) + S X k=1 sikIk, i= 1,2, . . . , N
where y is the state of the neuron, τ is the time constant of the neuron,N is the total number of neurons,wij gives the strength of the connection,σis the standard sigmoid activation function, θ is a bias term, S is the number of sensory inputs,
Ik is the output of the kth sensor, and sik is the strength of the connection from sensor to neuron.
Neurons that have the characteristics described here, are also called ”leaky integrators”, i.e. they partially decay and partially recycle their activation and can – in this way – be used for short-term memory functions.
In a fascinating set of experiments, the computational neuroscientist Randy Beer evolved simulated creatures that had the task to distinguish between a dia- mond and a circle using information from ”ray sensors”, i.e. sensors that measure presence or absence in a particular direction (Beer‘s agent had 7 such ray sensors). More precisely, Beer used a genetic algorithm to determine the weights of the re- current part of the neural network, because it is way hard to find proper learning rules for recurrent networks. The agent was equipped with a CTRNN type neural network: the input layer was attached to the ray sensors, the output layer consisted of nodes telling the agent to move left or right, and the hidden layer consisted of a fully recurrent network. From the ”top” an object, either a diamond or a circle was dropped into the scene. If the object was a diamond, the agent had to move away from it, if it was a circle it had to move towards it. Beer used a genetic algorithm to determine the weights. The best agents, i.e. the ones that could perform the distinction most reliably were the ones that moved left and right a few times, be- fore moving either towards or away from the object. In other words, they engaged in a sensory-motor coordination. The purpose of this sensory-motor coordination is to generate the additional sensory stimulation needed. This is compatible with the principle of sensory- motor coordination. For more detail, see [Beer, 1996] or [Pfeifer and Scheier, 1999].
For additional applications of CTRNNs see, for example, [Ito and Tani, 2004, Beer, 1996].
2.3. Echo state networks (ESN). Nonlinear dynamical systems are very popular both in science and engineering because the physical systems they are intended to describe are inherently nonlinear in nature. The analytic solution to nonlinear systems is normally hard to find and so the standard approach lies in the qualitative and numeric analysis. The learning mechanism in the biological brain is intrinsically a nonlinear system. To this aim, the echo state network models the nonlinear behavior of the neurons thus incorporating an artificial recurrent neural network(RNN). Its peculiarity with respect to other kinds of artificial RNN resides
Figure 10. Two approaches to RNN learning: (A) Schema of pre-
vious approaches to RNN learning. (B) Schema of ESN approach. Solid bold arrows: fixed synaptic connections, dotted arrows: ad- justable connections. Both approaches aim at minimizing the er- ror d(n) y(n), where y(n) is the network output and d(n) is the ”teacher” time series observed from the target system.
in the high number of neurons within the RNN(order of 50 to 1000 neurons) and in the locality of the synapses which are being adjusted by learning(only those that link the RNN with the output layer). Due to this structure, the ESN benefits both from the high performance and dynamics exhibited by a RNN and from the linear training complexity, respectively. An ESN is depicted in Figure 10 referred from [Jaeger and Haas, 2004].
The underlying mathematics of ESNs consists of: (1) the state equation:
x(n+ 1) = tanh(W x(n) +winu(n+ 1) +wf by(n) +v(n),
wherex(n+1) is the network state at discrete timen+1,W is theN2-size
matrix of the RNN’s internal weights, win is the N-size vector of input weights,u(n+ 1) is the input vector at the current time,wf bis the weight vector from the output to the RNN, y(n) is the output vector obtained previously andv(n) is the noise vector;
(2) the output equation:
y(n) = tanhwout(x(n), u(n)),
wherewout is a (N+1)-sized weight vector to the output layer.
The neurons in the artificial recurrent neural network of the ESN are sparsely con- nected, reaching a value of 1% interconnectivity. This decomposes the RNN into loosely coupled subsystems and ensures a rich variation within it. Due to the recur- rency and the feedback received from the output neurons, a bidirectional dynamical interplay unfolds between the internal and the external signals. The excitation in
2. OTHER RECURRENT NETWORK MODELS 73
the internal layer is viewed as an echo to the signals coming from the output layer.
The ESN represents a powerful tool for time series prediction, inverse model- ing (e.g. inverse kinematics in robotics), pattern generation, classification (on time series) and nonlinear control. The biological features that the ESN designs make it suitable as well as a model for prefrontal cortex function in sensory-motor tasks, for models of birdsong and of cerebellum, etc.
In this chapter we have looked at recurrent networks. Because of the loops, re- current networks have an intrinsic dynamics, i.e. their activation changes even if there is no input, a characteristic that is fundamentally different from the previ- ously discussed feed-forward networks. These properties lead to highly interesting behavior and we need to apply the concepts and terminology of complex dynamical systems to describe their behavior. We have only scratched the surface, but this kind of network is highly promising and reflects at least to some extent properties of biological networks. Next we will discuss non-supervised learning.
CHAPTER 6