There are many ways of evaluating the performance of a classier system. A commonly used statistic is accuracy. There are actually several versions of the accuracy statistic. The basic statistic just measures the percentage of correct classications out of all the
Chapter 3 Classication Background 38 classications. We can also apply this to obtain an accuracy per class and per classica- tion.
• Confusion Matrix
Typically statistics are calculated using a confusion matrix. This records the the true and predicted classication of each object. It is a NxN matrix where N is the number of classes. Sometimes a Nx(N+1) matrix is used when a classier can reject a query pattern. A classier can reject a query pattern when it is unable to produce a prediction with a high enough condence value. The accuracy can be calculated as the sum of the diagonal over the total number of classications made. For two class problems there are numerous statistics dened (see below). A multi-class confusion matrix can be converted into a two-class confusion matrix for a particular class by marking the required class as positive and all other classes negative.
The two class confusion matrix records four values. The True Positive (TP) value is the number of positive examples correctly classied. Likewise the True Negative (TN) value is the number of negative examples correctly classied. The False Negative (FN) value is the number of positive examples classied as negative and the False Positive (FP) value is the number of negative examples classied as positive.
The users accuracy (also known as precision; see Equation 3.1) is the number of correct classications over all the objects classied as that class.
users accuracy, precision= T P
T P +F P (3.1)
The producers accuracy (also known as recall and sensitivity; see Equation 3.2) is the number of correct classications over all the objects of that class.
producers accuracy, recall, sensitivity = T P
T P +F N (3.2)
Specicity (see Equation 3.3) measures the proportion of negative examples correctly classied. The higher the number of false positives, the lower the specicity.
specif icity= T N
F P +T N (3.3)
• Receiver Operating Characteristics Graphs
A Receiver Operating Characteristics (ROC) graph (Fawcett, 2006) is a visual tool to help evaluate classier performance. A key feature is that it is invariant to class distribution,
Chapter 3 Classication Background 39 however it is a two class tool rather than a multiple class tool. Multiple ROC graphs can be generated (one for each class), but this breaks the invariance to class distribution. The ROC graph plots true positive rate against false positive rate. In the ideal situation, a curve on the graph will start at 0,0, progress to 0,1 and nish at 1,1. The diagonal of the graph represents a random classier. The area under the curve (AUC) can be calculated to allow a single value comparison between classiers.
The above methods calculate the overall accuracy of a classier, they do not gauge the accuracy of an individual classication. This is a harder task than calculating the overall accuracy of a classier as it is dependent on the input pattern. Dierent classication techniques can give dierent outputs. Some techniques can output a single class label, where as others can output a ranked list. Some techniques can also output a numerical value that can be used to gauge the condence of the classication (e.g. distance from decision boundary). If numerical guidance is available, then it is possible to map the value directly into a condence value. However, for classiers outputting only a label, alternative methods of estimating condences are required.
• The a priori and a posterori methods
The work by Giacinto and Roli (1999) looks into several such metrics and highlights the a priori and the a posterori methods as good condence estimators. These techniques make use of a validation set. If the k nearest objects in a validation set were correctly classied, then it is likely that the query object will also be correctly classied. The a priori method estimates the condence without requiring the query to be classied. It simply bases the condence on how many of the neighbouring objects were correctly classied. The a posterori method requires the query object to be classied rst and then bases the condence on how many of the neighbouring objects were correctly predicted that class.
Equation 3.4 shows the a priori condence estimate for a given classier. For each of theK objects,Xk, in the neighbourhood the probability of it being correctly classied,
P(ωi |Xk ∈ωi) is calculated (where i = 1, ..., M, M being the number of classes and
ωi is the label for class i) and weight the result by Wk which is 1/dk where dk is the
Euclidean distance betweenXkand the query pattern. The sum of the correct predictions
is then divided by the sum weighting of allK objects.
a priori conf idence= PK
k=1P(ωi|Xk∈ωi)·Wk
PK
k=1Wk
(3.4) Equation 3.5 shows the a posterori condence estimate for a given classier predicting a labelωi. For each of theK objects,Xk, in the neighbourhood the probability of it being
Chapter 3 Classication Background 40
Wk. The sum of the correct predictions is then divided by the sum weighting of all K
objects that were predicted labelωi.
a posterori conf idence= P
Xk∈ωiP(ωi|Xk)·Wk PK
k=1P(ωi |Xk)·Wk