Artificial Neural Networks - Classification Methods

2.6 Classification Methods

2.6.3 Artificial Neural Networks

Artificial Neural Networks (ANNs), also known as Neural Networks (NNs) are inspired by a mathematical model of biological neural networks [18, 19]. ANN is inspired by the design and functioning of brain. The concept of ANN was first introduced in mid-40s and it is widely used today in PR, blind source separation, filtering, image processing, medical diagnosis, process control, etc. ANN is able to solve complex non-linear input-output relationships that are difficult for other techniques. It is an adaptive system that learns and changes its structure as the input is fed to it.

The basic layout of ANN consists of many neurons and these are linked to- gether according to a specific network structure. There are different architectures of ANN depending on the number of layers and the flow of information. Based on the learning paradigm it can be divided into supervised and unsupervised ANN and on the basis of network structure it could be single layer, multi layer. A supervised ANN is used when the desired output is already known while an unsupervised ANN is used when there are no target outputs. A supervised ANN model will be used in this study since the training data is available.

Generally, the operation of ANN consists of a training and a testing phase. Sometimes, depending on the training methodology used, a validation phase is also required. During the training phase, the training data is presented to the network and a desired response is set at the output. A training error is calculated using the difference of the desired response and the actual response. This error is fed back to the network and different parameters of the network are adjusted adaptively (learning rule) until the desired output is acceptable. If the validation phase is present, a separate set of validation data is used to check the generality

2.6. CLASSIFICATION METHODS ₂₇ of the classifier at each training iteration. A validation error is calculated using the validation data.

If the validation error continues to increase for a pre-defined period, even though the training error is decreasing, the network is said to be over fitting the test data. If this situation arises, the training is process is interrupted and the parameters of the network are reverted to the values that gave the smallest validation error (also called generalisation via early stopping). After the completion of training and validation phase, the performance of the trained network is tested using the test data. A well trained classifier should perform well for all three datasets (training, validation and testing).

Depending on how an ANN partitions the data into different classes it can be divided into three main types: Multi Layer Perceptron (MLP), Radial Basis Function Network (RBF) and Probabilistic Neural Network (PNN). In this study only the first type is discussed and interested readers can read details about other types in [18].

The Multi Layer Perceptron

One of the most common realisations of the neural network is the Multi Layer Per- ceptron (MLP). A typical multi-layer network consists of computing units called neurons, which make a network consisting of an input layer, one or more hidden layers of computation nodes and an output layer. Inputs propagate through the network, layer by layer, and a non-linear mapping of the inputs is produced by MLP at the output layer. Figure 2.6 shows a basic structure of MLP network. The general formula for any neuron can be written as,

yj =ϕ(v(x)) =ϕ N X i=1 (wjixi+bj) ! (2.15) where v(x) = N X i=1 (wjixi+bj) (2.16)

and N is the number of inputs of neuron, wji is the weight of neuron connecting

the output of neuron i to neuron j , bj is the bias attributed to neuron j, φ(.) is

the activation function describing input output relationship andyj is the output

of the neuron j. In Figure 2.6 there is one hidden layer of neurons and one output layer. The activation function for both of these layer are different and can be written as,

2.6. CLASSIFICATION METHODS ₂₈

Figure 2.6: A simple MLP network.

ϕ(v) = e

v₋_e−v

ev₊_e−v (2.17)

and

ϕ(v) =v (2.18)

where a Tan-sigmoid activation function is used for the hidden layer (equation (2.17)) and a linear activation function is used for the output layer (equation (2.18)). The size of the output layer is equal to the number of outputs while the size of the hidden layer is specified by the user. Choosing a right size for the hidden layer is very important in designing any MLP network, since it has a big impact on the classification performance and the generality of the network. Hidden layer with a large size can result in over training (starts remembering the training dataset), while a size too small will result in poor classification performance.

Training Algorithm for Multi Layer Perceptron

A back-propagation (BP) algorithm has been used for training MLP, which uses the difference between the actual and the desired output (the error function) as a cost function, to adjust the weights of neurons [20]. The error signal at output nodej for iteration n can be written as,

2.6. CLASSIFICATION METHODS ₂₉

ej(n) = dj(n)−yj(n) (2.19)

where dj(n) is the desired output value and yj(n) is the actual output value. If

the instantaneous energy for neuron j is 1₂e2

j(n), the total instantaneous energy

can be calculated as,

E(n) = 1 2 J X j=1 e2j(n) (2.20)

A gradient descent algorithm is used in this study during back-propagation to minimise the cost function. The weights of the neurons can be adjusted using the following steps, ∂E(n) ∂wji(n) = ∂E(n) ∂ej(n) ∂ej(n) ∂ϕj(n) ∂ϕj(n) ∂vj(n) ∂vj(n) ∂wji(n) (2.21) ∂E(n) ∂ej(n) =ej(n) (2.22) ∂ej(n) ∂ϕj(n) =₋1 (2.23) ∂ϕj(n) ∂vj(n) = ´ϕ(vj(n)) (2.24) ∂vj(n) ∂wji(n) =yj(n) (2.25)

Putting equations (2.22) to (2.25) in equation (2.21), yields

∂E(n) ∂wji(n)

= ₋ej(n) ´ϕ(vj(n))yj(n)

= ₋δj(n)yj(n) (2.26)

The weight update (correction) of neuron can be written as, ∆wji =−η ∂E(n) ∂wji(n) (2.27) ∆wji =ηδj(n)yj(n) (2.28) where δj(n) =ej(n) ´ϕ(vj(n)) (2.29)

2.7. SUMMARY ₃₀

In document Pattern recognition using genetic programming for classification of diabetes and modulation data (Page 46-50)