Up until now, we've used the default strides of one for our networks.
This indicates that the model convolves one input over each axis (step size of one). However, when a dataset contains less granular
information on the pixel level, we can experiment with larger values as strides. By increasing the strides, the convolutional layer skips more input variables over each axis, and therefore the number of trainable parameters is reduced. This can speed up convergence without too much performance loss.
Another parameter that can be tuned is the padding. The padding defines how the borders of the input data (for example images) are handled. If no padding is added, only the border pixels (in the case of an image) will be included. So if you expect the borders to include valuable information, you can try to add padding to your data. This adds a border of dummy data that can be used while convolving over the data. A benefit of using padding is that the dimensions of the data are kept the same over each convolutional layer, which means that you can stack more convolutional layers on top of each other. In the following diagram, we can see an example of stride 1 with zero padding, and an example of stride 2 with padding:
Figure 3.5: Left: an input image of 5 x 5 with stride 1 and zero padding; Right: an input image of 5 x 5 with stride 2 and same
padding
There is no general rule for which value to choose for the padding and strides. It largely depends on the size and complexity of the data in combination with the potential pre-processing techniques used. Next, we will experiment with different settings for the strides and padding and compare the results of our models. The dataset we will use
contains images of cats and dogs, and our task is to classify the animal.
How to do it...
1. Import all necessary libraries as follows:
import glob
import numpy as np import cv2
from matplotlib import pyplot as plt
from sklearn.model_selection import train_test_split from keras.utils import np_utils
from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation, Flatten, Lambda from keras.optimizers import Adam
from keras.callbacks import EarlyStopping from keras.layers import Conv2D, MaxPooling2D SEED = 2017
2. Let's start with loading the filenames and outputting the training set sizes:
# Specify data directory and extract all file names for both classes DATA_DIR = 'Data/PetImages/'
cats = glob.glob(DATA_DIR + "Cat/*.jpg") dogs = glob.glob(DATA_DIR + "Dog/*.jpg")
print('#Cats: {}, #Dogs: {}'.format(len(cats), len(dogs)))
# #Cats: 12500, #Dogs: 12500
3. To get a better understanding of the dataset, let's plot three examples of each class:
n_examples = 3
plt.figure(figsize=(15, 15)) i = 1
for _ in range(n_examples):
image_cat = cats[np.random.randint(len(cats))]
img_cat = cv2.imread(image_cat)
img_cat = cv2.cvtColor(img_cat, cv2.COLOR_BGR2RGB)
plt.subplot(3, 2, i) _ = plt.imshow(img_cat) i += 1
image_dog = dogs[np.random.randint(len(dogs))]
img_dog = cv2.imread(image_dog)
img_dog = cv2.cvtColor(img_dog, cv2.COLOR_BGR2RGB) plt.subplot(3, 2, i)
i += 1
_ = plt.imshow(img_dog) plt.show()
Figure 3.6: Example images of the labels cat and dog
4. Let's split the dataset in a training and validation set as follows:
dogs_train, dogs_val, cats_train, cats_val = train_test_split(dogs, cats, test_size=0.2, random_state=SEED)
5. The training set is relatively large; we will be using a batch generator so that we don't have to load all images in memory:
def batchgen(cats, dogs, batch_size, img_size=50):
# Create empty numpy arrays
batch_images = np.zeros((batch_size, img_size, img_size, 3)) batch_label = np.zeros(batch_size)
# Custom batch generator while 1:
n = 0
while n < batch_size:
# Randomly pick a dog or cat image
if np.random.randint(2) == 1:
i = np.random.randint(len(dogs)) img = cv2.imread(dogs[i])
if img is None:
break
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) # The images have different dimensions, # we resize all to 100x100
img = cv2.resize(img, (img_size, img_size), interpolation = cv2.INTER_AREA)
y = 1 else:
img = cv2.imread(cats[i]) if img is None:
break
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) img = cv2.resize(img, (img_size, img_size), interpolation = cv2.INTER_AREA)
y = 0
batch_images[n] = img batch_label[n] = y n+=1
yield batch_images, batch_label
6. Next, we define a function that creates a model given parameters for the stride and padding:
def create_model(stride=1, padding='same', img_size=100):
# Define architecture model = Sequential()
model.add(Lambda(lambda x: (x / 255.) - 0.5, input_shape=(img_size, img_size, 3)))
model.add(Conv2D(32, (3, 3), activation='relu', padding=padding, strides=stride))
model.add(Conv2D(32, (3, 3), activation='relu', padding=padding, strides=stride))
model.add(MaxPooling2D(pool_size=(2,2))) model.add(Dropout(0.5))
model.add(Conv2D(64, (3, 3), activation='relu', padding=padding, strides=stride))
model.add(Conv2D(64, (3, 3), activation='relu', padding=padding, strides=stride))
model.add(Dropout(0.5)) model.add(Flatten())
model.add(Dense(64, activation='relu')) model.add(Dropout(0.5))
model.add(Dense(1, activation='sigmoid'))
opt = Adam(0.001)
model.compile(loss='binary_crossentropy', optimizer=opt, metrics=['binary_accuracy'])
return model
7. Now we can define a model for each setting, and we extract the number of trainable parameters:
img_size = 100 models = []
for stride in [1, 2]:
for padding in ['same', 'valid']:
model = create_model(stride, padding, img_size) pars = model.count_params()
models.append(dict({'setting': '{}_{}'.format(stride, padding),
'model': model, 'parameters': pars }))
8. To output the scheme of a model, you can use the following:
models[0]['model'].summary()
9. To use early stopping, we define a callback as follows:
callbacks = [EarlyStopping(monitor='val_binary_accuracy', patience=5)]
10. In the next step, we will train our models and store the results:
batch_size = 512 n_epochs = 500
validation_steps = round((len(dogs_val)+len(cats_val))/batch_size) steps_per_epoch = round((len(dogs_train)+len(cats_train))/batch_size)
train_generator = batchgen(dogs_train, cats_train, batch_size, img_size) val_generator = batchgen(dogs_val, cats_val, batch_size, img_size)
history = []
for i in range(len(models)):
print(models[i]) history.append(
models[i]['model'].
fit_generator(train_generator,
steps_per_epoch=steps_per_epoch, epochs=n_epochs, validation_data=val_generator,
validation_steps=validation_steps, callbacks=callbacks
) )
11. Let's visualize the results:
for i in range(len(models)):
plt.plot(range(len(history[i].history['val_binary_accuracy'])), history[i].history['val_binary_accuracy'], label=models[i]['setting']) print('Max accuracy model {}: {}'.format(models[i]['setting'], max(history[i].history['val_binary_accuracy'])))
plt.title('Accuracy on the validation set') plt.xlabel('epochs')
plt.ylabel('accuracy') plt.legend()
plt.show()
Figure 3.7: Performance comparison with different settings for padding and strides
By using a stride of 2, the number of trainable parameters are reduced significantly (from 10,305,697 to 102,561 when using padding). For this dataset, it shows that there is no performance loss by using a stride > 1. This is expected, because we use (resized) 100 x 100 x 3 images as input. Skipping a pixel in each direction shouldn't influence the performance too much.