by Douglas G McIlwraith, Haralambos Marmanis, and Dmitry Babenko
x 1 x 2 theta weighted sum sign of weighted sum AND x
6.4 Multilayer perceptrons
6.4.5 MLNN in scikit-learn
Given that you now understand the multilayer perceptron and the theory behind training using backpropagation, let’s return to Python to code up an example. Because there’s no implementation of MLPs in scikit-learn, we’re going to use PyBrain.2 PyBrain focuses specifically on building and training neural networks. The
following listing provides you with the first snippet of code required build a neural network equivalent to the one presented in figure 6.12. Please refer to the full code that is distributed with this book for the associated imports required to run this code.
#Create network modules net = FeedForwardNetwork() inl = LinearLayer(2) hidl = SigmoidLayer(2) outl = LinearLayer(1) b = BiasUnit()
In listing 6.6 we first create a FeedForwardNetwork object. We also create an input layer (inl), and output layer (outl), and a hidden layer (hidl) of neurons. Note that our input and output layers use the vanilla activation function (threshold at 0), whereas our hidden layer uses the sigmoid activation function for reasons of training, as we dis- cussed earlier. Finally, we create a bias unit. We don’t quite have a neural network yet, because we haven’t connected the layers. That’s what we do in the next listing.
1 Rumelhart et al., “Learning representations by back-propagating errors,” 533–36. 2 Tom Schaul, et al., “PyBrain,” Journal of Machine Learning Research 11 (2010): 743–46.
Listing 6.6 Building a multilayer perceptron using PyBrain
α
#Create connections
in_to_h = FullConnection(inl, hidl) h_to_out = FullConnection(hidl, outl) bias_to_h = FullConnection(b,hidl) bias_to_out = FullConnection(b,outl) #Add modules to net
net.addInputModule(inl) net.addModule(hidl); net.addModule(b)
net.addOutputModule(outl) #Add connections to net and sort net.addConnection(in_to_h) net.addConnection(h_to_out) net.addConnection(bias_to_h) net.addConnection(bias_to_out) net.sortModules()
In listing 6.7 we now create connection objects and add our previously created neu- rons (modules) and their connections to the FeedForwardNetwork object. Calling sortModules() completes the instantiation of the network.
Before continuing, let’s take a moment to delve into the FullConnection object. Here we create four instances of the object to pass to the network object. The signa- ture of these constructors takes two layers, and internally the object creates a connec- tion between every neuron in the layer of the first parameters and every neuron in the layer of the second. The final method sorts the modules within the FeedForwardNet- work object and performs some internal initialization.
Now that we have a neural network equivalent to figure 6.11, we need to learn its weights! To do this we need some data. The next listing provides the code to do this, and much of it is reproduced from the PyBrain documentation.1
d = [(0,0), (0,1), (1,0), (1,1)] #target class c = [0,1,1,0]
data_set = SupervisedDataSet(2, 1) # 2 inputs, 1 output random.seed()
for i in xrange(1000): r = random.randint(0,3) data_set.addSample(d[r],c[r])
Listing 6.7 Building a multilayer perceptron using PyBrain (2)
Listing 6.8 Training our neural network
1 PyBrain Quickstart, November 12, 2009, http://pybrain.org/docs/index.html#quickstart (accessed Decem- ber 22, 2015).
Create a data set and associated targets that reflect the XOR function.
Create empty PyBrain
SupervisedDataSet object.
Randomly sample the four data points 1000 times and add to the training set.
85
Multilayer perceptrons
backprop_trainer = \
BackpropTrainer(net, data_set, learningrate=0.1) for i in xrange(50):
err = backprop_trainer.train()
print "Iter. %d, err.: %.5f" % (i, err)
As you now know from section 6.4.4, backpropagation traverses the weight space in order to reach a minima in the error between the output terms and the expected out- put. Every call to train() causes the weights to be updated so that the neural network better represents the function generating the data. This means we’re probably going to need a reasonable amount of data (for our XOR example, four data points is not going to cut it!) for each call to train(). To address this problem we’ll generate many data points drawn from the XOR distribution and use these to train our network using backpropagation. As you’ll see, subsequent calls to train() successfully decrease the error between the network output and the specified target. The exact number of iter- ations required to find the global minima will depend on many factors, one of which is the learning rate. This controls how fast the weights are updated at each training interval. Smaller rates will take longer to converge, that is, find the global minima, but they’re less likely to result in suboptimal models. Let’s take a quick look at the output generated by listing 6.8 and use this to illustrate this concept:
Iteration 0, error: 0.1824 Iteration 1, error: 0.1399 Iteration 2, error: 0.1384 Iteration 3, error: 0.1406 Iteration 4, error: 0.1264 Iteration 5, error: 0.1333 Iteration 6, error: 0.1398 Iteration 7, error: 0.1374 Iteration 8, error: 0.1317 Iteration 9, error: 0.1332 …
As you’ll see, successive calls reduce the error of the network. We know that at least one solution does exist, but backpropagation is not guaranteed to find this. Under certain circumstances, the error will decrease and won’t be able to improve any fur- ther, or it may bounce between suboptimal (or local) solutions. The algorithm may get stuck and not be able to find the global minima, that is, the lowest possible error.
Because this outcome depends on the starting values of the weights, we’re not able to say if your example will converge quickly, so try running this a few times. Also try experimenting with the learning rate from listing 6.8. How big can you make the rate before the algorithm gets caught in local solutions most of the time? In practice, the choice of training rate is always a trade-off between finding suboptimal solutions and speed, so you want to choose the largest rate that gives you the correct answer. Experi- ment with this until you’re left with a network that has converged with an error of zero. Create a new backpropagation trainer object with the network, dataset, and learning rate. Perform backpropagation 50 times using the same 1000 data points. Print the error after every iteration.