Stability Analysis for Real-Time Recurrent Learning

1. Given a weight matrixW, iterate the forward dynamics of the network with inputs from the

Roessler trajectory for 250 time steps.

2. Clamp the inputs to the next point of the Roessler trajectory and keep them constant. 3. Iterate the network dynamics for 32768 time steps.

4. Iterate the network dynamics for another 32768 time steps and apply a Fast Fourier trans- formation to the time course of the network states, in order to obtain the Fourier coefficients

c i (k)= T 1 X t=0 x i (t)e 2 T kt k =0;:::;T 1:

5. Compute the period length according to the maximal Fourier coefficient

T = ( T k ; wherek =argmax k jc i (k)j if max k jc i (k)j>10 20 1 otherwise:

6. Iterate the network forT

time steps and calculate the largest real partmax=max j

Re ( j

)

of the eigenvalues of the Jacobi matrix for each state

J(x)= I+W 0 diag f 0 (x i ) ; whereW 0

denotes the non-output part of the weight matrixW.

7. Ifmax<0the fixed point is stable. Otherwise, the limit behavior is periodic or the network

has not settled down to a stable state yet.

Algorithm 6.1: Procedure to trace fixed points during the training process.

6.2 Stability Analysis for Real-Time Recurrent Learning

Boxplots of the Largest Real Part of the Eigenvalues

For the evaluation of the stability properties of the weight matrices, a boxplot of the largest real parts of the eigenvalues was generated for each weight matrix. The method to draw the boxplots goes back to Tukey [see Hoaglin, Mosteller, and Tukey, 1983] and works as follows: According to the statistics of the data, three quartiles are defined. The median of the complete data is the “second quartile”, which divides the data into two halfs. The “first quartile” is the median of the lower half of the data equal or lower than the second quartile. Vice versa, the “third quartile” is the median of the upper half of the data. The first and third quartile are the lower and upper bounds of the box, and the second quartile is drawn as a line through the box. The difference of the third and the first quartile gives the interquartile range. The whiskers above and below the box extend to the largest data point that does not lie outside 1.5 times the interquartile range of the upper or lower bound of the box. Data points above and below the whiskers represent outliers.

Figure 6.2 shows boxplots for training runs with different numbers of learning steps per epoch using real-time recurrent learning.

In 6.2(a) the case of 1000 steps of the Roessler dynamics trained per epoch is shown. In the beginning, the values lie in the range between -0.8 and about -0.4. The boxes are not larger than about 0.1 units, and the median is about -0.6. This means that the largest real parts of the

68 6 Aspects of Stability

(a) Boxplot for a trial with 1000 steps per epoch.

5 15 25 35 45 55 65 75 85 95 −0.8 −0.6 −0.4 −0.2 Epoch

largest real part of eigenvalues

(b) Boxplot for a trial with 1430 steps per epoch.

5 15 25 35 45 55 65 75 85 95 −0.8 −0.6 −0.4 −0.2 Epoch

largest real part of eigenvalues

5 15 25 35 45 55 65 75 85 95 −0.8 −0.6 −0.4 −0.2 Epoch

largest real part of eigenvalues

Figure 6.2: Stability analysis for RTRL: Boxplots of the largest real part of the eigenvalues of the

6.2 Stability Analysis for Real-Time Recurrent Learning 69

eigenvalues for the 100 inputs are similar and lie – besides some outliers – close together. The respective fixed points are thus comparably stable. Until the 35th epoch, outliers appear below the box, while the box and the median do not change much. Apparently, some fixed points tend to be more stable in these epochs. In the 35th epoch, all largest real parts of the eigenvalues lie in the range of -1.0 to about -0.6. From the 35th epoch on, the boxes grow, which means that the values disperse. The whiskers above the boxes show that more values tend to become larger, but the median does not increase much. Obviously, the distribution of the largest real parts of the eigenvalues expands and after 100 epochs, it spreads over the range of about -0.9 to -0.2. The results show that the stability properties of the fixed points for the respective inputs become more diverse during learning. Since the largest real parts of the eigenvalues extend to larger values, the corresponding fixed points approach the boundary to instability. This is probably necessary for the network to produce the desired input-output behavior. With varying inputs, the stability of the fixed points changes, and probably the network switches between different basins of attraction. The inputs might be used to push proper stable points over the boundary to instability.

The boxplot in 6.4(b) belongs to a training run with 1430 learning steps per epoch. In this case, the largest real parts of the eigenvalues span a range of about -0.9 to -0.4 in the beginning. The median is about -0.8 and the box is about 0.2 units high. During learning, the box and whiskers do not expand, unlike in the first plot. A drift which leads to a median of about -0.3 after 100 epochs can be observed. At the end of training, the values cover the range of about -0.6 and -0.2. Only few outliers appear in this plot, which means that the distribution of the values drifts as a whole towards larger values. The network does not provide such variable stability properties as in the previous case. However, the tendency is to obtain eigenvalues that have a real part closer to zero, too. This suggests that the corresponding fixed points must not be too stable for the network being able to produce the desired input-output behavior.

Plot 6.4(c) shows a trial with 1019 steps per epoch and looks quite different from the other two. The largest real part of the eigenvalues covers the range of -0.8 to -0.2. The small size of the boxes indicates that most values lie close together. But a number of outliers is present during the whole course of learning and the box and the median vary strongly such that no systematic behavior can be identified in this plot. The presence of eigenvalues with real parts of up to -0.2 is a hint that a certain variability in the stability of the fixed points is required by the network. The fluctuating behavior of the distribution of the values corresponds to the training error for this trial (cf. figure 4.1(c) on page 30). Obviously, the stability of the network and the training error are in some way related. In this case, the weight dynamics of the network might operate in a region of the weight space where small changes in the weights have a more drastic effect on the location of the fixed points. The mapping from initial conditions to fixed points is not continuous, and hence this might result in a jump in the error [Pearlmutter, 1995].

The three boxplots show that different stability behavior can arise from learning the same task with RTRL. This shows that the weight dynamics of RTRL is very flexible. Due to varying initial conditions and different transients, the stability properties change diversely during training. No straightforward relationship of the formation of fixed points to the task can be identified. Maybe the resultant basins of attraction are dependent on the initial weight configuration of the network. The results suggest that the fixed points must not be too stable. The network might make use of the inputs to reach the boundary to instability in order to switch between different basins of attraction.

70 6 Aspects of Stability 0 20 40 60 80 100 −3 −2 −1 0 1 2 3 Input

output component of stable state

(a) Output component of stable state.

0 20 40 60 80 100 −0.70 −0.65 −0.60 −0.55 Input

largest real part of eigenvalues

(b) Largest real part of eigenvalues.

Figure 6.3: Stability analysis for RTRL: Course of the output component of the stable states of

the network and of the largest real part of the eigenvalues for 100 different inputs.

Relation of Inputs to Stability

To investigate the relation of the fixed points and the respective eigenvalues of the Jacobian (6.1) further, the respective values are plotted for each input. Figure 6.3(a) shows the output component of the fixed points for each of the 100 inputs on the selected part of the Roessler trajectory. A comparison with figure 6.1 reveals that the fixed points render an oscillation similar to that ofx(t).

Deviations appear at the peak inz(t)between the 60th and the 80th input. The peak coincides also

with a small kink in the curve of the fixed points. Besides the kink the output component of the stable state of the network changes continuously with the inputs.

The corresponding largest real part of the eigenvalues is plotted in figure 6.3(b). Most values are less than 0.65. The modulation corresponds to that of the stable states, where states with larger absolute value of the output component tend to be more stable. The real part of the eigenvalues increases as the absolute value of the output component becomes lower. At the three points where the output component goes through zero, clear peaks occur in the plot of the eigenvalues. A fixed point with output component close to zero is thus more unstable. It is not clear what actually happens at this point. It might be that the origin is unstable and is used for the transition between positive and negative output components. The inputs to the network might be used to determine the direction of the passage through zero.

The results indicate that both the output component of the stable state and the largest real part of the eigenvalues are influenced by the input to the network. Their course imitates the modulation of the input trajectory and is perturbed only by the peak inz(t). The passage through output compo-

nent zero is interesting because peaks appear in the plot of the largest real part of the eigenvalues. The data is yet not sufficient to completely characterize how the stability properties of the network are determined by the inputs. It should be noted that figure 6.3 represents only one example of possible behavior of a trained network. In other trials, different behavior was observed. However, the passage through zero was always noticeable. Further experiments should therefore sample this passage more accurately to reveal what effects play a role here.

In document Analysis and Comparison of Algorithms for Training Recurrent Neural Networks (Page 73-77)