• No results found

Make Predictions with the Model

We can get a better idea of how skillful the model is by generating a standalone prediction and plotting it against the expected output sequence. We can call the generate examples() function and generate one example then make a prediction using the fit model. The prediction and expected sequence are then plotted for comparison.

# prediction on new data

X, y = generate_examples(length, 1, output) yhat = model.predict(X, verbose=0)

pyplot.plot(y[0], label= y ) pyplot.plot(yhat[0], label= yhat ) pyplot.legend()

pyplot.show()

Listing 7.19: Example of making predictions with the fit Stacked LSTM model and plotting the results.

7.7. Complete Example 89 Generating the plot shows, at least for this run and on this specific example, the prediction appears to be a reasonable fit for the expected sequence.

Figure 7.6: Line plot of expected values vs predicted values.

7.7 Complete Example

The complete code example is listed below for your reference.

from math import sin from math import pi from math import exp from random import random from random import randint from random import uniform from numpy import array from matplotlib import pyplot from keras.models import Sequential from keras.layers import LSTM from keras.layers import Dense

# generate damped sine wave in [0,1]

def generate_sequence(length, period, decay):

return [0.5 + 0.5 * sin(2 * pi * i / period) * exp(-decay * i) for i in range(length)]

7.8. Further Reading 90

# generate input and output pairs of damped sine waves def generate_examples(length, n_patterns, output):

X, y = list(), list() for _ in range(n_patterns):

p = randint(10, 20) d = uniform(0.01, 0.1)

sequence = generate_sequence(length + output, p, d) X.append(sequence[:-output])

y.append(sequence[-output:])

X = array(X).reshape(n_patterns, length, 1) y = array(y).reshape(n_patterns, output)

model.compile(loss= mae , optimizer= adam ) print(model.summary())

# fit model

X, y = generate_examples(length, 10000, output) history = model.fit(X, y, batch_size=10, epochs=1)

# evaluate model

X, y = generate_examples(length, 1000, output) loss = model.evaluate(X, y, verbose=0)

print( MAE: %f % loss)

# prediction on new data

X, y = generate_examples(length, 1, output) yhat = model.predict(X, verbose=0)

pyplot.plot(y[0], label= y ) pyplot.plot(yhat[0], label= yhat ) pyplot.legend()

pyplot.show()

Listing 7.20: Complete working example of the Stacked LSTM model on the damped sign wave problem.

7.8 Further Reading

This section provides some resources for further reading.

7.9. Extensions 91

7.8.1 Research Papers

How to Construct Deep Recurrent Neural Networks, 2013.

https://arxiv.org/abs/1312.6026

Training and Analyzing Deep Recurrent Neural Networks, 2013.

Speech Recognition With Deep Recurrent Neural Networks, 2013.

https://arxiv.org/abs/1303.5778

Generating Sequences With Recurrent Neural Networks, 2014.

https://arxiv.org/abs/1308.0850

7.8.2 Articles

Sine Wave on Wikipedia.

https://en.wikipedia.org/wiki/Sine_wave Damped Sine Wave on Wikipedia.

https://en.wikipedia.org/wiki/Damped_sine_wave

7.9 Extensions

Do you want to dive deeper into the Stacked LSTM? This section lists some challenging extensions to this lesson.

List 5 examples that you believe might be a good fit for Stacked LSTMs.

Tune the number of memory cells, batch size, and number of training samples to further lower the model error (e.g. try 50K samples and a batch size of 1).

Develop a Vanilla LSTM for the problem and compare the performance of the model.

Design and execute an experiment to tease out the required increase in training and/or memory cells with the increased length of the damped sine wave sequences.

Design a new contrived sequence prediction problem tailored for Stacked LSTM and design a Stacked LSTM model to address it skillfully.

Post your extensions online and share the link with me; I’d love to see what you come up with!

7.10 Summary

In this lesson, you discovered how to develop a Stacked LSTM. Specifically, you learned:

The motivation for creating a multilayer LSTM and how to develop Stacked LSTM models in Keras.

7.10. Summary 92 The damped sine wave prediction problem and how to prepare examples for fitting LSTM models.

How to develop, fit, and evaluate a Stacked LSTM model for the damped sine wave prediction problem.

In the next lesson, you will discover how to develop and evaluate the CNN LSTM model.

Chapter 8

How to Develop CNN LSTMs

8.0.1 Lesson Goal

The goal of this lesson is to learn how to develop LSTM models that use a Convolutional Neural Network on the front end. After completing this lesson, you will know:

About the origin of the CNN LSTM architecture and the types of problems to which it is suited.

How to implement the CNN LSTM architecture in Keras.

How to develop a CNN LSTM for a Moving Square Video Prediction Problem.

8.0.2 Lesson Overview

This lesson is divided into 7 parts; they are:

1. The CNN LSTM.

2. Moving Square Video Prediction Problem.

3. Define and Compile the Model.

4. Fit the Model.

5. Evaluate the Model.

6. Make Predictions With the Model.

7. Complete Example.

Let’s get started.

93

8.1. The CNN LSTM 94

8.1 The CNN LSTM

8.1.1 Architecture

The CNN LSTM architecture involves using Convolutional Neural Network (CNN) layers for feature extraction on input data combined with LSTMs to support sequence prediction.

CNN LSTMs were developed for visual time series prediction problems and the application of generating textual descriptions from sequences of images (e.g. videos). Specifically, the problems of:

Activity Recognition: generating a textual description of an activity demonstrated in a sequence of images.

Image Description: generating a textual description of a single image.

Video Description: generating a textual description of a sequence of images.

[CNN LSTMs are] a class of models that is both spatially and temporally deep, and has the flexibility to be applied to a variety of vision tasks involving sequential inputs and outputs

— Long-term Recurrent Convolutional Networks for Visual Recognition and Description, 2015 This architecture was originally referred to as a Long-term Recurrent Convolutional Network or LRCN model, although we will use the more generic name CNN LSTM to refer to LSTMs that use a CNN as a front end in this lesson. This architecture is used for the task of generating textual descriptions of images. Key is the use of a CNN that is pre-trained on a challenging image classification task that is re-purposed as a feature extractor for the caption generating problem.

... it is natural to use a CNN as an image “encoder”, by first pre-training it for an image classification task and using the last hidden layer as an input to the RNN decoder that generates sentences

— Show and Tell: A Neural Image Caption Generator, 2015 This architecture has also been used on speech recognition and natural language processing problems where CNNs are used as feature extractors for the LSTMs on audio and textual input data. This architecture is appropriate for problems that:

Have spatial structure in their input such as the 2D structure or pixels in an image or the 1D structure of words in a sentence, paragraph, or document.

Have a temporal structure in their input such as the order of images in a video or words in text, or require the generation of output with temporal structure such as words in a textual description.

8.1. The CNN LSTM 95

CNN Model

Dense Input

Output LSTM Model

Figure 8.1: CNN LSTM Architecture.

8.1.2 Implementation

We can define a CNN LSTM model to be trained jointly in Keras. A CNN LSTM can be defined by adding CNN layers on the front end followed by LSTM layers with a Dense layer on the output.

It is helpful to think of this architecture as defining two sub-models: the CNN Model for feature extraction and the LSTM Model for interpreting the features across time steps. Let’s take a look at both of these sub models in the context of a sequence of 2D inputs which we will assume are images.

CNN Model

As a refresher, we can define a 2D convolutional network as comprised of Conv2D and MaxPooling2D layers ordered into a stack of the required depth. The Conv2D will interpret snapshots of the image (e.g. small squares) and the polling layers will consolidate or abstract the interpretation.

For example, the snippet below expects to read in 10x10 pixel images with 1 channel (e.g.

black and white). The Conv2D will read the image in 2x2 snapshots and output one new 10x10 interpretation of the image. The MaxPooling2D will pool the interpretation into 2x2 blocks reducing the output to a 5x5 consolidation. The Flatten layer will take the single 5x5 map and transform it into a 25-element vector ready for some other layer to deal with, such as a Dense for outputting a prediction.

cnn = Sequential()

cnn.add(Conv2D(1, (2,2), activation= relu , padding= same , input_shape=(10,10,1))) cnn.add(MaxPooling2D(pool_size=(2, 2)))

cnn.add(Flatten())

8.1. The CNN LSTM 96

Listing 8.1: Example of the CNN part of the CNN LSTM model.

This makes sense for image classification and other computer vision tasks.

LSTM Model

The CNN model above is only capable of handling a single image, transforming it from input pixels into an internal matrix or vector representation. We need to repeat this operation across multiple images and allow the LSTM to build up internal state and update weights using BPTT across a sequence of the internal vector representations of input images.

The CNN could be fixed in the case of using an existing pre-trained model like VGG for feature extraction from images. The CNN may not be trained, and we may wish to train it by backpropagating error from the LSTM across multiple input images to the CNN model. In both of these cases, conceptually there is a single CNN model and a sequence of LSTM models, one for each time step. We want to apply the CNN model to each input image and pass on the output of each input image to the LSTM as a single time step.

We can achieve this by wrapping the entire CNN input model (one layer or more) in a TimeDistributed layer. This layer achieves the desired outcome of applying the same layer or layers multiple times. In this case, applying it multiple times to multiple input time steps and in turn providing a sequence of image interpretations or image features to the LSTM model to work on.

model.add(TimeDistributed(...)) model.add(LSTM(...))

model.add(Dense(...))

Listing 8.2: Example of the LSTM part of the CNN LSTM model.

We now have the two elements of the model; let’s put them together.

CNN LSTM Model

We can define a CNN LSTM model in Keras by first defining the CNN layer or layers, wrapping them in a TimeDistributed layer and then defining the LSTM and output layers. We have two ways to define the model that are equivalent and only di↵er as a matter of taste. You can define the CNN model first, then add it to the LSTM model by wrapping the entire sequence of CNN layers in a TimeDistributed layer, as follows:

# define CNN model cnn = Sequential() cnn.add(Conv2D(...)) cnn.add(MaxPooling2D(...)) cnn.add(Flatten())

# define CNN LSTM model model = Sequential()

model.add(TimeDistributed(cnn, ...)) model.add(LSTM(..))

model.add(Dense(...))

Listing 8.3: Example of an CNN LSTM model in two parts.

8.2. Moving Square Video Prediction Problem 97