Implementing a Multi-Class SVM - Tensorflow Machine Learning Cookbook

We can also use SVMs to categorize multiple classes instead of just two. In this recipe, we will use a multi-class SVM to categorize the three types of flowers in the iris dataset.

Getting ready

By design, SVM algorithms are binary classifiers. However, there are a few strategies employed to get them to work on multiple classes. The two main strategies are called one versus all, and one versus one.

One versus one is a strategy where a binary classifier is created for each possible pair of classes. Then a prediction is made for a point for the class that has the most votes. This can be computationally hard as we must create classifiers for k classes.

Another way to implement multi-class classifiers is to do a one versus all strategy where we create a classifier for each of the classes. The predicted class of a point will be the class that creates the largest SVM margin. This is the strategy we will implement in this section.

Here, we will load the iris dataset and perform multiclass nonlinear SVM with a Gaussian kernel. The iris dataset is ideal because there are three classes (I. setosa, I. virginica, and I. versicolor). We will create three Gaussian kernel SVMs for each class and make the prediction of points where the highest margin exists.

How to do it…

1. First we load the libraries we need and start a graph, as follows:

import matplotlib.pyplot as plt import numpy as np

import tensorflow as tf from sklearn import datasets sess = tf.Session()

2. Next, we will load the iris dataset and split apart the targets for each class. We will only be using the sepal length and petal width to illustrate because we want to be able to plot the outputs. We also separate the x and y values for each class for plotting purposes at the end. Use the following code:

iris = datasets.load_iris()

x_vals = np.array([[x[0], x[3]] for x in iris.data])

y_vals1 = np.array([1 if y==0 else -1 for y in iris.target]) y_vals2 = np.array([1 if y==1 else -1 for y in iris.target]) y_vals3 = np.array([1 if y==2 else -1 for y in iris.target]) y_vals = np.array([y_vals1, y_vals2, y_vals3])

class1_x = [x[0] for i,x in enumerate(x_vals) if iris.

target[i]==0]

class1_y = [x[1] for i,x in enumerate(x_vals) if iris.

target[i]==0]

class2_x = [x[0] for i,x in enumerate(x_vals) if iris.

target[i]==1]

class2_y = [x[1] for i,x in enumerate(x_vals) if iris.

target[i]==1]

class3_x = [x[0] for i,x in enumerate(x_vals) if iris.

target[i]==2]

class3_y = [x[1] for i,x in enumerate(x_vals) if iris.

target[i]==2]

3. The biggest change we have in this example, as compared to the Implementing a Non-Linear SVM recipe, is that a lot of the dimensions will change (we have three classifiers now instead of one). We will also make use of matrix broadcasting and reshaping techniques to calculate all three SVMs at once. Since we are doing this all at once, our y_target placeholder now has the dimensions [3, None] and our model variable, b, will be initialized to be size [3, batch_size]. Use the following code:

batch_size = 50

x_data = tf.placeholder(shape=[None, 2], dtype=tf.float32) y_target = tf.placeholder(shape=[3, None], dtype=tf.float32) prediction_grid = tf.placeholder(shape=[None, 2], dtype=tf.

float32)

b = tf.Variable(tf.random_normal(shape=[3,batch_size]))

4. Next we calculate the Gaussian kernel. Since this is only dependent on the x data, this code doesn't change from the prior recipe. Use the following code:

gamma = tf.constant(-10.0)

dist = tf.reduce_sum(tf.square(x_data), 1) dist = tf.reshape(dist, [-1,1])

sq_dists = tf.add(tf.sub(dist, tf.mul(2., tf.matmul(x_data, tf.transpose(x_data)))), tf.transpose(dist))

my_kernel = tf.exp(tf.mul(gamma, tf.abs(sq_dists)))

5. One big change is that we will do batch matrix multiplication. We will end up with three-dimensional matrices and we will want to broadcast matrix multiplication across the third index. Our data and target matrices are not set up for this. In order for an operation such as to work across an extra dimension, we create a function to expand such matrices, reshape the matrix into a transpose, and then call TensorFlow's batch_matmul across the extra dimension. Use the following code:

def reshape_matmul(mat):

v1 = tf.expand_dims(mat, 1)

v2 = tf.reshape(v1, [3, batch_size, 1]) return(tf.batch_matmul(v2, v1))

6. With this function created, we can now compute the dualloss function, as follows:

model_output = tf.matmul(b, my_kernel) first_term = tf.reduce_sum(b)

b_vec_cross = tf.matmul(tf.transpose(b), b) y_target_cross = reshape_matmul(y_target)

second_term = tf.reduce_sum(tf.mul(my_kernel, tf.mul(b_vec_cross, y_target_cross)),[1,2])

7. Now we can create the prediction kernel. Notice that we have to be careful with the reduce_sum function and not reduce across all three SVM predictions, so we have to tell TensorFlow not to sum everything up with a second index argument. Use the following code:

rA = tf.reshape(tf.reduce_sum(tf.square(x_data), 1),[-1,1]) rB = tf.reshape(tf.reduce_sum(tf.square(prediction_grid), 1),[-1,1])

pred_sq_dist = tf.add(tf.sub(rA, tf.mul(2., tf.matmul(x_data, tf.transpose(prediction_grid)))), tf.transpose(rB))

pred_kernel = tf.exp(tf.mul(gamma, tf.abs(pred_sq_dist))) 8. When we are done with the prediction kernel, we can create predictions. A big

change here is that the predictions are not the sign() of the output. Since we are implementing a one versus all strategy, the prediction is the classifier that has the largest output. To accomplish this, we use TensorFlow's built in argmax() function, as follows: have to declare our optimizer function and initialize our variables, as follows:

my_opt = tf.train.GradientDescentOptimizer(0.01) train_step = my_opt.minimize(loss)

init = tf.initialize_all_variables() sess.run(init)

10. This algorithm converges relatively quickly, so we won't have run the training loop for more than 100 iterations. We do so with the following code:

loss_vec = []

batch_accuracy = []

for i in range(100):

rand_index = np.random.choice(len(x_vals), size=batch_size) rand_x = x_vals[rand_index]

rand_y = y_vals[:,rand_index]

sess.run(train_step, feed_dict={x_data: rand_x, y_target:

rand_y})

temp_loss = sess.run(loss, feed_dict={x_data: rand_x, y_

target: rand_y})

loss_vec.append(temp_loss)

acc_temp = sess.run(accuracy, feed_dict={x_data: rand_x, y_

target: rand_y, prediction_grid:rand_x}) batch_accuracy.append(acc_temp)

11. We can now create the prediction grid of points and run the prediction function on all of them, as follows:

x_min, x_max = x_vals[:, 0].min() - 1, x_vals[:, 0].max() + 1 y_min, y_max = x_vals[:, 1].min() - 1, x_vals[:, 1].max() + 1 xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.02),

np.arange(y_min, y_max, 0.02)) grid_points = np.c_[xx.ravel(), yy.ravel()]

grid_predictions = sess.run(prediction, feed_dict={x_data: rand_x, y_target:

rand_y,

prediction_

grid: grid_points})

grid_predictions = grid_predictions.reshape(xx.shape)

12. The following is code to plot the results, batch accuracy, and loss function. For succinctness we will only display the end result:

plt.contourf(xx, yy, grid_predictions, cmap=plt.cm.Paired, alpha=0.8)

plt.plot(class1_x, class1_y, 'ro', label='I. setosa') plt.plot(class2_x, class2_y, 'kx', label='I. versicolor') plt.plot(class3_x, class3_y, 'gv', label='I. virginica') plt.title('Gaussian SVM Results on Iris Data')

plt.xlabel('Pedal Length')

plt.title('Batch Accuracy') plt.xlabel('Generation') plt.ylabel('Accuracy')

plt.legend(loc='lower right') plt.show()

plt.plot(loss_vec, 'k-')

plt.title('Loss per Generation') plt.xlabel('Generation')

plt.ylabel('Loss') plt.show()

Figure 10: Multi-class (three classes) nonlinear Gaussian SVM results on the iris dataset with gamma = 10.

How it works…

The important point to notice in this recipe is how we changed our algorithm to optimize over three SVM models at once. Our model parameter, b, has an extra dimension to take into account all three models. Here we can see that the extension of an algorithm to multiple similar algorithms was made relatively easy owing to TensorFlow's built-in capabilities to deal with extra dimensions.

In document Tensorflow Machine Learning Cookbook (Page 132-138)