5.1 Feedforward Network Functions and Options
5.2.1 Function Approximation in One Dimension
Consider a function with one input and one output. First, output data is generated by evaluating the given function for a given set of input data. Then, the FF network will be trained with the input-output data set to approximate the given function. You can run the example on different data sets by modifying the com-mands generating the data.
Read in the Neural Networks package and a standard add-on package.
In[1]:= << NeuralNetworks`
<< LinearAlgebra`MatrixManipulation`
This example will involve an input-output one-dimensional data set of a sinusoidal function. The variable Ndata indicates the number of training data generated. Change the following data generating commands if you want to rerun the example on other data. It is always a good idea to look at the data. This is especially easy for one-dimensional problems.
Generate and look at the data.
In[3]:= Ndata= 20;
x= Table@10 N@8i ê Ndata<D, 8i, 0, Ndata − 1<D;
y= Sin@xD;
ListPlot@AppendRows@x, yDD
2 4 6 8
-1 -0.5 0.5 1
The training data consist of the input data x and the output y.
Consider first a randomly initialized FF network, with four hidden neurons in one hidden layer. Although random initialization is not generally recommended, it is used here only for purposes of illustration.
Initialize an FF network with four neurons.
In[7]:= fdfrwrd= InitializeFeedForwardNet@x, y, 84<, RandomInitialization → TrueD Out[7]= FeedForwardNet@88w1, w2<<, 8Neuron → Sigmoid, FixedParameters → None,
AccumulatedIterations → 0, CreationDate → 82002, 4, 3, 13, 25, 46<, OutputNonlinearity → None, NumberOfInputs → 1<D
Find some information about the network.
In[8]:= NetInformation@fdfrwrdD
Out[8]= FeedForward network created 2002−4−3 at 13:25.
The network has 1 input and 1 output. It consists of 1 hidden layer with 4 neurons with activation function of Sigmoid type.
The randomly initialized network is a description of a function, and you can look at it before it is trained.
This can be done using NetPlot.
Look at the initialized FF network.
In[9]:= NetPlot@fdfrwrd, x, yD
2 4 6 8
-1 -0.5 0.5 1
Fit the network to the data.
In[10]:= 8fdfrwrd2, fitrecord< = NeuralFit@fdfrwrd, x, y, 3D;
0 1 2 3
Iterations 0.4
0.5 0.6 0.7 0.8
RMSE
Often, a warning appears stating that training was not completed for the iterations specified. This is equiva-lent to saying that the parametric weights did not converge to a point minimizing the performance index, RMSE. This is not an uncommon occurrence, especially for network models involving a large number of parameters. In such situations, by looking at the performance plot you can decide whether additional training would improve performance.
The trained FF network can now be used in any way you would like. For example, you can apply it to new input data.
Evaluate the FF model at a new input value.
In[11]:= [email protected]<D Out[11]= 81.21137<
Evaluate the FF model at several new input values.
In[12]:= [email protected]<, 80.3<, 82.5<<D Out[12]= 881.21137<, 80.0976301<, 80.478028<<
You can also obtain a Mathematica expression of the FF network by applying the network to a list of symbols.
The list should have one component for each input of the network.
Obtain an expression for the FF network in the symbol xx.
In[13]:= Clear@xxD fdfrwrd2@8xx<D
Out[14]= 9−412.23 + 162.243
1 + 0.693347−0.954841 xx + 250.898
1 + 0.722195−0.737868 xx + 82.5833
1 + 0.80038+0.780475 xx + 356.662
1 + −0.860144+0.831306 xx= You can plot the function of the FF network on any interval of your choice.
Plot the FF network on the interval 8-2, 4<. In[15]:= Plot@fdfrwrd2@8x<D, 8x, −2, 4<D
-2 -1 1 2 3 4
-0.5 0.5 1 1.5 2 2.5 3
If you use NetPlot then you automatically get the plot in the range of your training data. The command relies on the Mathematica command Plot, and any supported option may be used.
Plot the estimated function and pass on some options to Plot.
In[16]:= NetPlot@fdfrwrd2, x, y, PlotPoints → 5, PlotDivision → 20D
2 4 6 8
-1 -0.5 0.5 1
By giving the option DataFormat→NetOutput you obtain a plot of the model output as a function of the given output. If the network fits the data exactly, then this plot shows a straight line with unit slope through the origin. In real applications you always have noise on your measurement, and you can only expect an approximate straight line if the model is good.
Plot the model output versus the data output.
In[17]:= NetPlot@fdfrwrd2, x, y, DataFormat → NetOutputD
-1 -0.5 0.5 1
Data
-0.5 0.5 1 Model
By giving the option DataFormat→HiddenNeurons in the call to NetPlot, you obtain a plot of the values of the hidden neurons versus the data. Such a plot may indicate if the applied network is unnecessarily large. If some hidden neurons have very similar responses, then it is likely that the network may be reduced to a smaller one with fewer neurons.
Look at the values of the hidden neurons and specify colors.
In[18]:= NetPlot@fdfrwrd2, x, y, DataFormat → HiddenNeurons, PlotStyle→[email protected], [email protected], [email protected], [email protected]<,
PlotLegend→ Map@"Nr " <> ToString@#D &, Range@Length@fdfrwrd2@@1, 1, 1, 1DDDDDD
2.5 7.5 10 12.5 15 17.5 20 0.2
0.4 0.6 0.8 1
Layer: 1 versus data
Nr 4 Nr 3 Nr 2 Nr 1
If some hidden neurons give similar outputs, or if there is a linear relation between them, then you may remove some of them, keeping the approximation quality at the same level. The number of any such
neurons can be identified using the legend. This might be of interest in a bias-variance perspective as described in Section 7.5, Regularization and Stopped Search.
Remove the second hidden neuron, and look at the neurons and the approximation of the function.
In[19]:= fdfrwrd3= NeuronDelete@fdfrwrd2, 81, 2<, xD;
In[20]:= NetPlot@fdfrwrd3, x, y, DataFormat → HiddenNeurons, PlotStyle→[email protected], [email protected], [email protected]<,
PlotLegend→ Map@"Nr " <> ToString@#D &, Range@Length@fdfrwrd3@@1, 1, 1, 1DDDDDD
2.5 7.5 1012.51517.520 0.2
0.4 0.6 0.8 1
Layer: 1 versus data
Nr 3 Nr 2 Nr 1
In[21]:= NetPlot@fdfrwrd3, x, yD
2 4 6 8
-1 -0.5 0.5 1
Note that if you re-evaluate the example, then you may have to delete a different neuron due to the random-ness in the initialization.
By removing the output of the network you obtain a new network with outputs equal to the hidden neurons of the original network.
In[22]:= NeuronDelete@fdfrwrd2, 82, 1<D NeuronDelete::NewOutputLayer :
All outputs have been deleted. The second−to−last layer becomes the new output.
Out[22]= FeedForwardNet@88w1<<, 8AccumulatedIterations → 3,
CreationDate → 82002, 4, 3, 13, 25, 46<, Neuron → Sigmoid,
FixedParameters → None, OutputNonlinearity → Sigmoid, NumberOfInputs → 1<D You can use NetPlot to evaluate the training of the network. This is done by applying it to the training record, which was the second argument of NeuralFit. Depending on the option DataFormat, the result is presented differently.
Look at how the parameter values change during the training.
In[23]:= NetPlot@fitrecord, x, y, PlotStyle → [email protected], [email protected], [email protected], [email protected]<D
0.5 1 1.5 2 2.5 3
-400 -200 200
Parameter values versus iterations
Often the parameter values increase during the training. From such a plot you may get some insights about why the parameter values did not converge in the training, although the derived network performs well.
Look at the function approximation after each training iteration.
In[24]:= NetPlot@fitrecord, x, y, Intervals → 1, DataFormat → FunctionPlotD Function estimate after
2 4 6 8
-1 -0.5 0.5 1
Iteration: 2
2 4 6 8
-1 -0.5 0.5 1
Iteration: 1
2 4 6 8
-1 -0.5 0.5 1
Iteration: 0
2 4 6 8
-1 -0.5 0.5 1
Iteration: 3
If you prefer an animation of the training progress, you can load <<Graphics`Animation` and then change the command to Apply[ShowAnimation,NetPlot[fitrecord,x,y,Intervals →1, Data Format→FunctionPlot,DisplayFunction→Identity]].