This section includes a small example illustrating the different training algorithms used by NeuralFit. If you want examples of different training algorithms of more realistic sizes, see the ones in Chapter 8, Dynamic Neural Networks, or Chapter 12, Application Examples, and change the option Method in the calls to NeuralFit.
Read in the Neural Networks package.
In[1]:= << NeuralNetworks`
Consider the following small example where the network only has two parameters. This makes it possible to illustrate the RMSE being minimized as a surface. To do this, you need the following package.
Read in a standard package for graphics.
In[2]:= << Graphics`Graphics3D`
The “true” function is chosen to be an FF network with one input and one output, no hidden layer and with a sigmoidal nonlinearity at the output. The true parameter values are 2 and -1.
Initialize a network of correct size and insert the true parameter values.
In[3]:= fdfrwrd= InitializeFeedForwardNet@881<<, 881<<,
8<, RandomInitialization → True, OutputNonlinearity → SigmoidD;
fdfrwrd@@1DD = 88882.<, 8−1.<<<<;
Generate data with the true function.
In[5]:= Ndata= 50;
x= Table@8N@iD<, 8i, 0, 5, 10 ê HNdata − 1L<D;
y= fdfrwrd@xD;
A two-parameter function is defined to carry out the RMSE computation. Note that this function makes use of the generated data 8x, y< and is needed to generate the plots.
Define the criterion function.
In[8]:= criterion@a_, b_D := Hfdfrwrd@@1DD = 8888a<, 8b<<<<;
Sqrt@HTranspose@#D.#L &@y − fdfrwrd@xDD ê Length@xDDL@@1, 1DD
The criterion function can be plotted in a neighborhood of the minimum H2, -1L using Plot3D.
Look at the criterion function.
In[9]:= surf= Plot3D@criterion@a, bD, 8a, −1, 5<, 8b, −5, 3<, PlotPoints → 20D
0
Now it is time to test the different training methods. The initial parameters are chosen to be H-0.5, -5L. You can repeat the example with different initializations.
Levenberg-Marquardt
Initialize the network and train with the Levenberg-Marquardt method.
In[10]:= fdfrwrd2= fdfrwrd;
fdfrwrd2@@1DD = 8888−0.5<, 8−5<<<<;
8fdfrwrd3, fitrecord< = NeuralFit@fdfrwrd2, x, yD;
0 1 2 3 4 5 6 7 8 9 10 11
training process. Viewing this list as three-dimensional 8x, y, z< points, it can be used to illustrate the RMSE surface as a function of parameters using Plot3D.
Form a list of the trajectory in the parameter space.
In[13]:= trajectory=
Transpose@Append@Transpose@Map@Flatten, HParameterRecord ê. fitrecord@@2DDLDD, HCriterionValues ê. fitrecord@@2DDL + 0.05DD;
Form plots of the trajectory and show it together with the criterion surface.
In[14]:= trajectoryplot= ScatterPlot3D@trajectory,
PlotStyle→ AbsolutePointSize@8D, DisplayFunction → IdentityD;
trajectoryplot2= ScatterPlot3D@trajectory, PlotJoined → True, PlotStyle→ [email protected], DisplayFunction → IdentityD;
Show@surf, trajectoryplot, trajectoryplot2D
0 2
4
-4 -2
0 2 0
0.2 0.4 0.6 0.8
0 2
4
The 8x, y, z< iterates of the training process are marked with dots that are connected with straight lines to show the trajectory. The training has converged after about five iterations.
Gauss-Newton Algorithm
The training of the initial neural network is now repeated with the GaussNewton algorithm.
Train the same neural network with the Gauss-Newton algorithm.
In[17]:= 8fdfrwrd3, fitrecord< = NeuralFit@fdfrwrd2, x, y, Method → GaussNewtonD;
0 1 2 3 4 5 6 7 8 9 10 11
Iterations 0
0.2 0.4 0.6 0.8
RMSE
Form a list of the trajectory in the parameter space.
In[18]:= trajectory=
Transpose@Append@Transpose@Map@Flatten, HParameterRecord ê. fitrecord@@2DDLDD, HCriterionValues ê. fitrecord@@2DDL + 0.05DD;
Form plots of the trajectory and show it together with the criterion surface.
In[19]:= trajectoryplot= ScatterPlot3D@trajectory,
PlotStyle→ AbsolutePointSize@8D, DisplayFunction → IdentityD;
trajectoryplot2= ScatterPlot3D@trajectory, PlotJoined → True, PlotStyle→ [email protected], DisplayFunction → IdentityD;
Show@surf, trajectoryplot, trajectoryplot2D
0
The Gauss-Newton algorithm converges in seven iterations.
Steepest Descent Method
Train the same neural network with SteepestDescent.
In[22]:= 8fdfrwrd3, fitrecord< = NeuralFit@fdfrwrd2, x, y, Method → SteepestDescentD;
0 5 10 15 20 25 30
The training did not converge within the 30 iterations. This is not necessarily a problem, since the parameter values may still be close enough to the minimum.
Form a list of the trajectory in the parameter space.
In[23]:= trajectory=
Transpose@Append@Transpose@Map@Flatten, HParameterRecord ê. fitrecord@@2DDLDD, HCriterionValues ê. fitrecord@@2DDL + 0.05DD;
Form plots of the trajectory and show it together with the criterion surface.
In[24]:= trajectoryplot= ScatterPlot3D@trajectory,
PlotStyle→ AbsolutePointSize@8D, DisplayFunction → IdentityD;
trajectoryplot2= ScatterPlot3D@trajectory, PlotJoined → True, PlotStyle→ [email protected], DisplayFunction → IdentityD;
Show@surf, trajectoryplot, trajectoryplot2D
0 2
4
-4 -2
0 2 0
0.2 0.4 0.6 0.8
0 2
4
Toward the end of the training the convergence is particularly slow. There, the steepest descent method exhibits much slower convergence than either the Levenberg-Marquardt or Gauss-Newton methods.
Backpropagation Algorithm
When you use the backpropagation algorithm, you have to choose the step size and the momentum. It may not be an easy matter to choose judicious values for these parameters, something that is not an issue when using the other methods since they automatically tune the step size. You can repeat the example with differ-ent values of these parameters to see their influence.
Train the same neural network with backpropagation.
In[27]:= 8fdfrwrd3, fitrecord< = NeuralFit@fdfrwrd2, x, y,
200, Method→ BackPropagation, StepLength → 0.1, Momentum → 0.9D;
0 20 40 60 80 100 120 140 160 180 200
Form a list of the trajectory in the parameter space.
In[28]:= trajectory=
Transpose@Append@Transpose@Map@Flatten, HParameterRecord ê. fitrecord@@2DDLDD, HCriterionValues ê. fitrecord@@2DDL + 0.05DD;
Form plots of the trajectory and show it together with the criterion surface.
In[29]:= trajectoryplot= ScatterPlot3D@trajectory,
PlotStyle→ AbsolutePointSize@8D, DisplayFunction → IdentityD;
trajectoryplot2= ScatterPlot3D@trajectory, PlotJoined → True, PlotStyle→ [email protected], DisplayFunction → IdentityD;
Show@surf, trajectoryplot, trajectoryplot2D
Due to the momentum term used in the training, the parameter estimate goes up on the slope adjacent to the initial parameter values. You can repeat the training with different values of the StepLength and Momen tum options to see how they influence the minimization.