6.3
Stability Analysis for Atiya-Parlos Recurrent Learning
Boxplots of the Largest Real Part of the Eigenvalues
For Atiya-Parlos recurrent learning, the same analysis as for RTRL was carried out. The obtained boxplots of the largest real parts of the eigenvalues are shown in figure 6.4.
The plot in 6.4(a) belongs to a trial where 1000 steps of the Roessler trajectory were trained per epoch. Two regions can be clearly distinguished: In the first 50 epochs, the boxes are small and the largest real parts of the eigenvalues lie close together, apart from some outliers that exceed the range of the whiskers. The values cover the range of about -0.6 to 0.0, and the median varies between -0.2 and -0.5. After 50 epochs, the boxes become larger up to a size of about 0.2 units, and the median decreases to -0.8. Only few outliers above the upper whiskers are present.
The change in the characteristics of the eigenvalues corresponds to the occurrence of the error overshoot after 50 epochs. The data suggests that the weight matrix is scaled into a region of the weight space where the network is not able to produce a good input-output behavior. The plot shows that the fixed points in this region are more stable, so it can be concluded that the desired input-output behavior can not be implemented by the network if the fixed points are too stable. Once the weight matrix has been scaled to such an unfavorable region of the weight space, the distribution of the largest real parts of the eigenvalues does not change much further. The median is nearly constant, and so is the box size. In this region, the changes in the weights have little effect on the stability of the fixed points. This might be the reason why the error cannot be improved again once the error overshoot has occurred.
In figure 6.4(b), a trial with 1250 steps per epoch is depicted. The dichotomy of the previous plot is not present here. A continuous drift can be observed in this plot, similar to figure 6.2(b). In the beginning, the boxes are about 0.1 units high, and the values are spread over the range from -0.7 to -0.1. The median starts at about -0.65 and gradually increases up to about -0.3 during training. Simultaneously, the distribution of the values contracts, which is indicated by the shrinking box size. Some outliers appear below the whiskers, but all values lie in the range of about -0.5 to -0.2. The shifting of the largest real part of the eigenvalues and the contraction of the distribution comes along with a good approximation of the desired input-output behavior in this case (cf. section 4.1). The observation suggest that the network achieves a configuration for which the fixed points are flexible enough such that the output can be modulated according to the inputs. Again, the boxplot reveals that the largest real parts of the eigenvalues are not too far away from the boundary to instability. The fact that the eigenvalues are gradually shifted during learning indicates that APRL scales the reservoir such that the fixed points get sufficiently close to instability.
Figure 6.4(c) shows the largest real parts of the eigenvalues for a trial with 1019 steps per epoch. It shows a drift of the median starting at the 30th epoch. While the boxes are only about 0.05 units high, outliers are identifiable both above and below the whiskers. The overall range covered by the values is about -0.7 to 0.0. Here, the distribution is not contracted during the drift of the median. This example shows again that the fixed points of the network are not too far away from instability if good input-output behavior of the network is achieved. The gradual shift of the distribution is a hint that APRL might scale the network such that a sufficient configuration is reached. The inputs to the network can then be used to modulate the fixed point in order to generate the desired output.
The boxplots for APRL show that the stability properties of the networks subject to adaption may vary in different ways. Nevertheless, a relation between the fixed points and the network performance might exist, as in plot 6.4(a). It is a characteristic example of the trials which show the error overshoot. In almost all cases, the error overshoot coincides with a development of the fixed points as depicted in the plot. The median of the largest real parts of the eigenvalues becomes
72 6 Aspects of Stability
(a) Boxplot for a trial with 1000 steps per epoch.
5 15 25 35 45 55 65 75 85 95 −0.8 −0.6 −0.4 −0.2 0.0 Epoch
largest real part of eigenvalues
(b) Boxplot for a trial with 1250 steps per epoch.
5 15 25 35 45 55 65 75 85 95 −0.7 −0.6 −0.5 −0.4 −0.3 −0.2 −0.1 Epoch
largest real part of eigenvalues
(c) Boxplot for a trial with 1019 steps per epoch.
5 15 25 35 45 55 65 75 85 95 −0.7 −0.5 −0.3 −0.1 0.0 Epoch
largest real part of eigenvalues
Figure 6.4: Stability analysis for APRL: Boxplots of the largest real part of the eigenvalues of the
6.3 Stability Analysis for Atiya-Parlos Recurrent Learning 73 0 20 40 60 80 100 −2 −1 0 1 2 Input
output component of stable state
(a) Output component of stable state.
0 20 40 60 80 100 −0.6 −0.5 −0.4 −0.3 Input
largest real part of eigenvalues
(b) Largest real part of eigenvalues.
Figure 6.5: Stability analysis for APRL: Course of the output component of the stable states of
the network and of the largest real part of the eigenvalues for 100 different inputs.
very low, and the boxes expand to a larger range. The distribution of the eigenvalues changes only little during the rest of training. It can be concluded, that there is a certain part of the weight space which the weight matrix should not leave in order to implement a good approximation of the desired output. But since APRL scales the inner weights of the weight matrix as long as the error does not vanish, the weight matrix is scaled to unsensible configurations. The analysis of stability suggests that the fixed points of these configurations are not flexible enough for being controlled by the input to the network. Therefore the network performance is poorer.
Relation of Inputs to Stability
The relation between the output component of the stable state of the network and the largest real part of the eigenvalues of the corresponding Jacobi matrix is shown in figure 6.5. It is obtained from a weight matrix yielding an acceptable performance. Likewise RTRL, this case is only one example of how the respective values behave. Nevertheless, some characteristic features can be described. The output component of the stable state performs an oscillation similar to that ofx(t)
here as well. At the point of the peak in z(t), a slight kink in the curve is present and at the
passages through zero the output component of the stable points makes a jump. It is not clear whether this jump is due to the discrete steps on the trajectory being too large or whether it is a discontinuity. Apart from these points, the curve appears to be continuously dependent on the inputs.
The largest real part of the eigenvalues is plotted against the input step in figure 6.5(b). The peaks at the points where the output component of the stable state goes through zero are present here as well. The low eigenvalues between input step 60 and 80 coincide with the peak inz(t),
which indicates that the fixed points with these inputs are more stable. The rest of the curve is modulated according to the network input, where the network is the more stable the higher the inputs are. At the passages of x(t) through zero, the peak in the eigenvalue plot suggests that
the network becomes less stable. It is not clear from the data whether it gets actually unstable at some point. It is interesting that the plots for APRL are quite similar to those of RTRL. It can be supposed that this reflects a general mechanism of how the desired input-output behavior can be implemented by the networks. More data is required to investigate this in more detail.
74 6 Aspects of Stability