5.6 Reservoir Computing
5.6.2 Mathematical model
Let ni represent the number of input units, nrthe reservoir units and nothe output units,
u(n)the ni-dimensional external input, x(n) the nr-dimensional reservoir internal activa- tion states and y(n) the no-dimensional target output. The discrete time dynamics of the ESN is give by the state update equation 5.21, which is similar to equation 5.7 RNN internal activation.
x(n + 1) = tanh(Wrrx(n) + Wriu(n) + Wroy(n) + W
r
b) (5.21)
Where the weights Wto
f rom elements are described in table 5.3 and represents the connec- tion weights between the nodes of the complete network (Fig. 5.15). b, i, r, o denotes bias, input, reservoir and outputs respectively. The output is computed as
y(n + 1) = g(Worx(n + 1) + Woiu(n) + Wooy(n) + Wob)
= g(Wout(x(n + 1), u(n), y(n), 1)) = g(Woutz(n + 1))
where g is a post-processing activation function and z(n + 1) = (x(n + 1), u(n), y(n), 1) is the extended reservoir state which includes the previous input, output vectors and a bias term. The weight matrices that represent the connections to the reservoir Wr
. are randomly ini- tialized and represented by solid arrows on figure 5.15. The output weights Wo
. are trained and represented by dashed arrows in figure 5.15. Output feedback is given by the projec- tion Wr
oy(n)and bias W.b As stated before, the non-trainable weights Wr. are generated
Reservoir Output layer
Input layer U a X m Y 1 g tanh z-1 z-1
Fig. 5.15.: Echo State Network mapping scheme.
Signals
u
input signal
y
output signal
x
reservoir state
a
weighted sum for reservoir units
m
weighted sum for output units
Weights
W
riinput to reservoir connection matrix
W
rbbias to reservoir connection matrix
W
rrreservoir connection matrix
W
rooutput to reservoir connection matrix
W
oiinput to reservoir connection matrix
W
orreservoir to output connection matrix
W
oooutput to output connection matrix
W
obbias to output connection matrix
Tab. 5.3.: Elements of figure 5.15using a sparse uniformly distributed random matrix function with a certain added connec-
tivity which corresponds to the percentage of non-zero weights in the respective connection
matrix Wto
f rom. Also a scaling factor to the weights is applied and it corresponds to the scal- ing of the respective connection matrix Wto
f romsuch that all weights are rescaled according to the multiplication of the scale factor and the weight matrix. The reservoir connection matrix Wr
that in some cases the rescaling have a few eigenvalues that are situated slightly outside the unity circle, but the reservoir should still exhibits rich dynamics. Also, the reservoir should guarantee the ESP (Echo State Property) (Jaeger, 2001) which means the reservoir should have a fading memory. The spectral radius ρ is the largest absolute eigenvalue of the reser- voir connection matrix Wr
r and should be less than unity or else the ESP will be violated. For most applications, the best performance is attained with a reservoir that operates at the edge of stability ρ(Wr
r) = 0.99, different values will be shown in Chapter 6.
5.6.3
Training
With the introduction and mathematical modeling presented in previous sections, we pro- ceed to show the formal method for training a ESN network for the task that it will be used in this research effort, in such task we assume that the output units are sigmoid units, the output layer must contain feedback connections and a supervised training methodology is also assumed. The formal process it is showed next:
1. Data processing. Create/obtain input and data outputs of training and testing sam- ples, such data must be consistent with the network structure.
2. Reservoir creation. Randomly generate the dynamical reservoirs Win, W and Wback, such reservoir must comply with the echo state property and a spectral radius ρmax< 1 and also they should be sparse with a rich variety of dynamics. The number of neurons N should reflect both the length T of training data, and the difficulty of the task (difficult tasks require a larger N). N should not exceed an order of magnitude of T
10 to
T
2, just to prevent over-fitting. The spectral radius ρ should be small for fast
teacher dynamics and large for slow teacher dynamics.
3. Sample training. Enter the network input and output data samples and update the network status using equation 5.23 and collect the concatenated input/reservoir/previous- output states (u(n), x(n), y(n− 1)) as a new row on a state collecting matrix M. Also the teacher output tanh−1y(n)should be saved as a new row on a teacher collecting
matrix T.
x(n + 1) = tanh(Winu(n) + Wx(n) + Wbacky(n)) (5.23)
4. Compute output weights. Calculate values of the output by multiply the pseudo- inverse of M with T, as showed in equation 5.24. To obtain the desired output weight Wout, (Wout)tshould be transposed.
5. Usage. The network Win, W, Wback and Wout is ready to be exploited and it can be driven with novel data (testing data) sequences using equations 5.25 and 5.26. The MSE for training data and testing data should be calculated to ensure the ESN is working properly, if not, the process can be repeated or optimised until a desired MSE is founded on the testing data.
x(n + 1) = tanh(Winu(n) + Wx(n) + Wbacky(n)) (5.25)
y(n + 1) = tanh(Wout(x(n + 1), u(n + 1), y(n))) (5.26)
If stability problems are encountered when using the trained network, it very often helps to add some small noise during the sampling step, such noise should be a uniform white noise function ν(n) of sizes [0.0001− 0.01]. In a experimental test, the noise was optimised to produce the lowest MSE possible on the testing data. It should be added to the weights inside the activation function in equation 5.23. This technique was proven in Jaeger, 2002a. If the system is highly non-linear, the system can be improved by adding augmented network states for training and in usage. The modified update augmented equation can be seen in 5.27. This method is showed and used in Jaeger, 2002b.
y(n + 1) = tanh(Wout(x(n + 1), u(n + 1), y(n), x2(n + 1), u2(n + 1), y2(n))) (5.27)