Before you start modeling, go back to your original question: can you predict whether a wine is red or white by looking at its chemical properties, such as volatile acidity or sulphates?
Since you only have two classes, namely white and red, you’re going to do a binary classification. As you can imagine, “binary” means 0 or 1, yes or no. Since neural networks can only work with numerical data, you have already encoded red as 1 and white as 0.
A type of network that performs well on such a problem is a multi-layer perceptron. As you have read in the beginning of this tutorial, this type of neural network is often fully connected. That means that you’re looking to build a fairly simple stack of fully-connected layers to solve this problem.
As for the activation function that you will use, it’s best to use one of the most common ones here for the purpose of getting familiar with Keras and neural networks, which is the relu activation function.
Now how do you start building your multi-layer perceptron? A quick way to get started is to use the Keras Sequential model: it’s a linear stack of layers. You can easily create the model by passing a list of layer instances to the constructor, which you set up by running model =
Sequential().
Next, it’s best to think back about the structure of the multi-layer
perceptron as you might have read about it in the beginning of this tutorial:
you have an input layer, some hidden layers and an output layer. When you’re making your model, it’s therefore important to take into account that your first layer needs to make the input shape clear. The model needs to know what input shape to expect and that’s why you’ll always find
the input_shape, input_dim, input_length,
or batch_size arguments in the documentation of the layers and in practical examples of those layers.
In this case, you will have to use a Dense layer, which is a fully connected layer. Dense layers implement the following operation: output =
activation(dot(input, kernel) + bias). Note that without the activation function, your Dense layer would consist only of two linear operations: a dot product and an addition.
In the first layer, the activation argument takes the value relu. Next, you also see that the input_shape has been defined. This is the input of the operation that you have just seen: the model takes as input arrays of shape (12,), or (*, 12). Lastly, you see that the first layer has 12 as a first value for the units argument of Dense(), which is the
dimensionality of the output space and which are actually 12 hidden units.
This means that the model will output arrays of shape (*, 12): this is is the dimensionality of the output space. Don’t worry if you don’t get this entirely just now, you’ll read more about it later on!
The units actually represents the kernel of the above formula or the weights matrix, composed of all weights given to all input nodes, created by the layer. Note that you don’t include any bias in the example below, as you haven’t included the use_bias argument and set it to TRUE, which is also a possibility.
The intermediate layer also uses the relu activation function. The output of this layer will be arrays of shape (*,8).
You are ending the network with a Dense layer of size 1. The final layer will also use a sigmoid activation function so that your output is actually a probability; This means that this will result in a score between 0 and 1, indicating how likely the sample is to have the target “1”, or how likely the wine is to be red.
script.py
# Import `Sequential` from `keras.models`
from keras.models import Sequential
# Import `Dense` from `keras.layers`
from keras.layers import Dense
# Initialize the constructor model = Sequential()
All in all, you see that there are two key architecture decisions that you need to make to make your model: how many layers you’re going to use and how many “hidden units” you will chose for each layer.
In this case, you picked 12 hidden units for the first layer of your model:
as you read above, this is is the dimensionality of the output space. In other words, you’re setting the amount of freedom that you’re allowing the network to have when it’s learning representations. If you would allow more hidden units, your network will be able to learn more complex representations but it will also be a more expensive operations that can be prone to overfitting.
Remember that overfitting occurs when the model is too complex: it will describe random error or noise and not the underlying relationship that it needs to describe. In other words, the training data is modeled too well!
Note that when you don’t have that much training data available, you
should prefer to use a small network with very few hidden layers (typically only one, like in the example above).
If you want to get some information on the model that you have just created, you can use the attributed output_shape or
the summary() function, among others. Some of the most basic ones are listed below.
Try running them to see what results you exactly get back and what they tell you about the model that you have just created:
script.py
# Model output shape model.___________