Background to Comparison Classifiers

2. Theoretical Framework

2.6 Background to Comparison Classifiers

The following classifiers are to be compared to the MIDECC system, as a method of judging the systems’ training time and overall accuracy. The classifiers are trained and evaluated using the Waikato Environment for Knowledge Analysis (WEKA), Java-based open source software developed by the University of Waikato [12].

2.6.1 Naïve Bayes Classifier

The Naïve Bayes technique is a simple process, assigning ‘class labels’ to problem values. The technique then assumes that for each class label, the value of a particular feature is independent to any other feature. Although very simplistic, Naïve Bayes classifiers have functioned well when compared to other, more complex classifiers [13]. Bayesian classifiers, such as the Naïve Bayes classifiers, follow Bayes Theorem [14], below:

Where

= Probability of instance d being in class

What is to be computed

= probability of generating instance d given class

Classified in class , the probability of having feature d

= probability of instance d occurring

Constant value for all classes

The Bayes Theorem, when used with a Naïve Bayes classifier, assumes that all attributes have independent distributions, estimating the following:

By assuming independent distributions, Naive Bayes is both storage and computationally efficient. The classifier is often seen as a ‘baseline’ for category sorting algorithms such as spam filters, being able to operate on more complex data sets in a supervised learning environment, where training data has the desired output or supervisory signal.

2.6.2 J48 Tree Classifier

A Java implementation of the C4.5 algorithm developed by Ross Quinlan, this classifier generates a decision tree, often referred to as a statistical classifier [15]. The decision tree operates on the general algorithm below, splitting the data at the attribute with the highest normalisation gain.

The algorithm attempts to match the following base cases first: • All the samples in the list belong to the same class.

The end of the branch - the final output value has been found.

• None of the features provide any information gain.

The end of the branch for half the equation, continue for the remaining data.

• Instance of previously-unseen class encountered.

The end of the branch for half the equation, continue for the remaining data.

If these cases are not met, the following algorithm is run: Algorithm 2-1: C4.5 Classification

For each attribute a

Find the normalised information gain ratio from splitting a

Let a_best be the attribute with the highest normalised information gain Create a decision node that splits at a_best

Recur on the sub-data at a_best, adding nodes as the children of node

The C4.5 algorithm and J48 implementation became popular after ranking best in the ‘Top 10 Algorithms in Data Mining’ paper, published in 2008 [16].

When used in this research, WEKA is instructed to generate a J48 decision tree with a pruning confidence threshold of 0.25 and a minimum of 2 attribute instances per leaf.

2.6.3 Random Tree Classifier

Similar to the J48 classifier, the Random Tree classifier generates a decision tree. Unlike the J48 classifier, Random Tree does not ‘prune’ or optimise the resulting tree. This produces a significantly larger tree size, however training takes much less time. Using a much larger tree may become restrictive when storing and processing the classifier on a small embedded computer system. Due to this reason, research has been conducted on determining the length of the longest path in a binary search tree [17]. It has been discovered that a binary tree with n nodes will have the following longest path:

Where is a unique number in the range satisfying the equation:

Nodes are inserted one at a time randomly, as opposed to the J48 classifier which assesses the normalised information gain to narrow down the tree size during training. When used in this research, WEKA is instructed to generate a Random Tree classifier with a minimum of 1 attribute per leaf and seed the random number generator with ‘1’.

2.6.4 Neural Network Classifier

The Neutral Network classifier is a biologically-inspired method, in particular, by the brain. The system is presented as a network of ‘neurons’ in a chosen number of ‘layers’. Each neuron may have a particular weighting or bias from a particular input, outputting a certain value only when certain inputs are met. A simple single-layer neural network is depicted in Figure 2-9. Five inputs to the network are connected to one output through a single ‘hidden’ layer, consisting of three nodes.

Figure 2-9: Single Layer Neural Network Figure Created by Julien Cretel, StackExchange

The system trains by passing in an expected value and assessing the output. The most common method is by using a back-propagation algorithm to update the weights for each node in the network. This weight-updating rate, known as the learning rate, directly effects the training speed compared to quality.

The following process is used to train a simple neural network with back-propagation: Algorithm 2-2: Neural Network Back Propagation

Phase 1: Propagation

Propagate the training signal through the network, activating the output nodes.

Back propagate the training signal, generating delta ‘error’ values for the hidden and output nodes.

Phase 2: Weight Update For each weight;

Calculate the weight gradient by multiplying the output delta and input activation.

Subtract the learning rate ratio of the gradient from the weight.

A network with a single hidden layer, has the limitation of only ‘learning’ a function which does not require abstract features. Training a neural network to identify a model of car based on colour, size and number of wheels, for example, will require multiple hidden layers.

It is expected that the neural network will be one of the top performers in this research, even as a simple single layer network. This is due to the binary classification nature - something a single layer network excels in. When used in this research, WEKA is instructed to train the Neural Network at a learning rate of 0.3 with a back propagation momentum rate of 0.2 for 20 epochs.

In document Colour consistency in computer vision : a multiple image dynamic exposure colour classification system : a thesis presented to the Institute of Natural and Mathematical Sciences in fulfilment of the requirements for the degree of Master of Science in Comp (Page 30-34)