Dealing with uncertainty in classification

7.1 Generalisation and uncertainty in neural networks

7.1.3 Dealing with uncertainty in classification

Regardless of the techniques used to improve the generalisation properties of a neural network, it is still likely that when dealing with real-world data the network system will be unable to correctly identify all of the cases presented to it. In many cases there will be regions of overlap between classes which render 100% classification accuracy an impossibility. Ideally the network would be able to detect the cases in which it is unable to make a correct classification and report these, or failing this provide an indication of its 'confidence' in the result returned.

The standard approach to classification of selecting the output with the highest activation discards a lot of data which may provide information as to the likelihood of this classification actually being correct. Either the actual value of the maxima, or the amount by which this exceeds the other lower output activations (the winning margin) may serve as an indication of the 'confidence' of the system in the result and therefore provide an indication as to the probability of that result being correct.

This was tested using a committee of ten networks, arbitrarily trained for 1,000,000 iterations on the weed seed data. When the data was examined it was found that in general the value of the maxima was higher for examples correctly classified by the network than for examples which were misclassified. However the relationship was not distinct enough to allow correct and incorrect classifications to be completely separated by a simple linear threshold. Also it was found that there was a great degree of variation between the different categories of output, with some averaging far higher maximum activations than others for both correct and incorrect cases. Similar results were discovered when the winning margin was examined, for the same committee and training set.

These results indicated that there was some possibility of indicating whether or not the network was 'certain' of its classification, although the lack of a distinct boundary between the correct and incorrect answers meant that such a technique would not be one hundred percent accurate. Two possible methods of performing this task were implemented. The first was based around applying a simple threshold to the values produced by the

committee to classify the results into 'certain' or 'uncertain'. The second involved training a simple back-propagation network to perform the same classification.

As indicated earlier it was found that there were significant differences between the various seed-type classifications in terms of the average maxima which they produced. Therefore rather than using a global threshold for all classifications, a different threshold was used depending on which type the committee's initial classification belonged to. This threshold was based on the difference between mean value of the maxima for incorrectly and correctly classified examples as observed over the training set. Table 7.4 contains results for several different threshold values.

The second approach used a single hidden-layer backpropagation network with a 10:6:1 topology (the results given in Table 7.4 are averages over several trials). The inputs to the network were the outputs of the committee when presented with the seed data, with the correct output value being 0.9 if the committee correctly classified that example and 0.1 otherwise. The network was trained on the same training data as used for the thresholding approach, and achieved very similar results (for the purposes of testing, an output value of greater than 0.5 was considered a 'certain' classification, with 0.5 or less indicating 'uncertain').

Table 7.4 Performance of the output thresholding and network techniques at eliminating misclassifications of the weed seed data set.

Method used Threshold used %

classified

% misclassified

No threshold - 100 31

Maxima threshold Mean 81 21

Mean + 50% of difference 72 11 Winning margin threshold Mean 83 21 Mean + 25% of difference 73 16 Mean + 50% of difference 64 12 Mean + 75% of difference 54 9 Network - 86 21

Both the thresholding and network-based techniques are successful in reducing the number of misclassifications, and increasing the accuracy of the classification on those examples which are classified. However neither technique is able to completely distinguish between correct and incorrect answers, meaning that some correct classifications are also discarded. Given these results the question arises as to how to measure the success of these techniques.

The desired performance of the system will depend on the application in which the network is to be applied, and specifically the relative cost of classifying an example incorrectly compared to being unable to classify it at all. If the cost of an incorrect classification is high then it may be preferable to reject all incorrect results, even at the cost of failing to classify a large percentage of cases. Medical diagnosis systems would be an example of this type of situation. Alternatively if the cost of classifying incorrectly is low relative to the cost of having to classify many cases using an alternative method then classifying the majority of cases whilst allowing a higher number of incorrect classifications will be the preferred approach. The thresholding technique has an advantage here, as the threshold value can easily be modified to fine-tune the system to the desired level of performance, with higher thresholds producing less incorrect results but at the cost of classifying less examples. There was very little observed difference between the maxima and winning-margin thresholding techniques.

Freeman and Adams (1993a, 1993b) extended this work by applying output thresholding to two further data sets (heart disease and mushroom classification), as well as providing a theoretical basis for the technique, grounded in the relationship between neural networks and Bayesian classifiers. Their results confirmed the observation from this research that output thresholding allows a reduction in the number of misclassifications produced by the system, but also discards some correct classifications.

Although output thresholding was not directly used in the SLARTI system, a similar technique was found to be useful in the final classification of signs. This is detailed in Chapter 11.1.7.

In document Recognition of sign language using neural networks (Page 113-116)