2.3 Predictive Systems Evaluation
2.3.1 Accuracy
The accuracy of prediction can be calculated, depending on the prediction problem using one of the following measures.
• Numerical measures:
The following points summarise the main measures for calculating the error E (Hyndman and Koehler (2006)):
1. Mean Square Error (MSE):
M SE = 1 N N X i=1 (ˆyi− yi)2 (2.11)
2. Root Mean Square Error (RMSE):
RM SE = √2 M SE (2.12)
3. Sum of Square Regression (SSR):
SSR =
N
X
i=1
(ˆyi− yi)2 (2.13)
4. Mean Absolute Error (MAE):
M AE = 1 N N X i=1 | (ˆyi− yi) | (2.14)
5. Mean Absolute Percentage Error (MAPE): M AP E = 100% N N X i=1 | (ˆyi− yi) yi | (2.15)
6. Root Mean Square Percentage Error (RMSPA):
RM SP E = 2 v u u t[ 100% N N X i=1 (ˆyi− yi) yi ]2 (2.16)
Measures 1-4 are scale-dependent measures, since they depend on the scale of the data. They are mainly used to compare different algorithms applied to the same data set or to data sets of similar scales. Nevertheless, these measures should not be used when the predictive systems are compared across data-sets of different scales. On the other hand, measures 5 and 6 are scale-independent and often used to compare algorithms across different data-sets. However, both measures 5 and 6 are undefined or infinite when the actual output is zero, since the error will be divided by zero in this case.
• Confusion matrix:
Confusion matrix (or contingency table) is a table that illustrates the classification performance of the predictive system. Table 2.1 shows the confusion matrix of a binary classification problem (Alpaydin (2014)).
Table 2.1: Confusion matrix for binary classification problems actual positive actual negative
predicted positive True Positive (TP) False Negative (FN) predicted negative False positive (FP) True Negative (TN)
Assuming that there are two classes: positive class and negative class, the elements shown in the above matrix are (Alpaydin (2014) and Fawcett (2006)):
TN (True Negatives) is the number of examples correctly classified as negatives. FP (False positives) is the number of negative examples incorrectly classified as
positives.
FN (False negatives) is the number of positive examples incorrectly classified as negatives.
For this problem the classification error can be calculated using the following equa- tion:
E = F P + F N
F P + F N + T P + T N (2.17) Several metrics can be derived from the confusion matrix, such as:
Precision Also known as the positive predictive value (PPV). It measures the num- ber of the correctly classified positives divided by the total number of positive examples.
P recision = T P
T P + F P (2.18)
Recall Also known as the sensitivity, it measures the proportion of the positive examples that are correctly identified.
Recall = T P
T P + F N (2.19)
Specificity It measures how well the classifier detects the negative examples.
Specif icity = T N
T N + F P (2.20)
F-measure (F-score) is an accuracy test that can be calculated using a weighted average of the precision and recall as shown in the following equation:
F − measure = 2 × precision × recall
precision + recall (2.21) The best value a classifier can achieve in this measure is 1 and the worst value is 0.
When the number of classes (k) exceeds 2; the confusion matrix becomes a (k × k) matrix (Alpaydin (2014)). The main diagonal of the matrix contains the correctly classified examples and the off diagonal elements contain the examples that are
misclassified. Ideally all off-diagonal elements should be 0.
• ROC and AUC:
Receiver Operating Characteristic (ROC) is a 2-dimensional graph used to vi- sualise the classifier performance in binary classification problems (Kubat et al. (1998)). Figure 2.7 shows an example of the ROC curve, where the y-axis of this graph represent the T Prate (recall) and the x-axis of the graph represent the F Prate
(1 − specif icity). This curve maintains a trade-off between the benefits (true pos- itives) and the cost (false positives). An Ideal classifier will have a T Prate = 1
and F Prate = 0. The closer the classifier is to the upper-left corner the better is its
accuracy.
Figure 2.7: The ROC curve for classifying the Virginica class in the Fisher iris data set using logistic regression
The worst case in binary classification is when the classifier performance lies on the main diagonal. In such a case it will have an accuracy value of 50%. However, the performance of classifiers that go below the main diagonal can be improved by flipping their decision (Alpaydin (2014)). In order to obtain a single value that represents the accuracy of the classifier, the Area Under the Curve (AUC) is calculated. An ideal classifier will have an AUC=1.
• Weighted accuracy (cost sensitive) measures:
As explained before, the confusion matrix distinguishes different types of error. In some applications different costs are associated with misclassification errors. For instance, the cost of misclassifying a simple injury as deadly is much lower than the cost of misclassifying a deadly injury as a simple one (King et al. (1995)). For such applications a cost matrix is constructed. This matrix provides the cost of each type of error. For example, the cost matrix of a binary classification problem
is given as:
actual positive actual negative predicted positive Cost(0, 0) Cost(0, 1) predicted negative Cost(1, 0) Cost(1, 1)
The optimal prediction in this case can be calculated using the following equation (Elkan (2001)):
L(c, i) =X
j
P (j | c)Cost(i, j) (2.22)
Where L is a function which calculates the loss, i is the index for the predicted class, j is the index of the true class, P (j|c) is the probability of the class j being the true class c, and Cost(i, j) is the cost of the prediction.