• No results found

Evaluating predictive models 40

4   Techniques and methods 26

4.2   Evaluating predictive models 40

4.2.1   Evaluating  regression  models  on  skewed  data  

A very common measure of the predictive fit of a regression model is the RMSE.

However, the RMSE penalizes severely larger errors. This behavior might not be always desirable, especially in problems where the response variable follows a skewed distribution. Data sampled from skewed distributions will contain a few or more points with values relatively large when compared to the average point. An error on these point will have a large effect on the value of the RMSE.

There are other metrics that are less susceptible to this problem, such as the mean absolute error (MAE) and the mean relative error (MRE). The problem with these metrics is that they might still be difficult to interpret for skewed datasets, since they are still affected by outliers. Therefore, other measures might be more appropriate.

The Pearson correlation coefficient is not affected by outliers, so it can be a more appropriate measure in those cases. The correlation in this context is calculated between the predicted values and the true values. A scatterplot between these two sets of values should form a 45°straight line passing through the origin. The Pearson correlation coefficient measures the linear relationship between two variables so it can provide an estimate of the performance of the fit in this context. However, Pearson’s correlation coefficient does not take into account systematic biases into the data. Its value will be 1 for any linear relationship, even if this is not a 45° line through the origin.

Therefore, when facing problems from a right skewed distribution a metric is needed in order to assess whether the predicted and the true values fall on a 45° line through the origin. The concordance correlation coefficient (Lin, 1989) is such a measure. The concordance correlation coefficient is defined by (38): 𝜌©= 2𝜌𝜎o𝜎¬ 𝜎oE+ 𝜎 ¬E+ 𝜇o− 𝜇¬ E (38)

The concordance correlation coefficient can be used to assess the agreement between the predicted values of a statistical model and the actual values. The advantage over Pearson’s correlation coefficient is that Pearson’s correlation coefficient ignores any bias that there might be between the true and the predicted values. Pearson’s correlation coefficient will assign a

high value to any relationship 𝑦 = 𝑎𝑥 + 𝑏, while the concordance correlation coefficient will penalize any relationship that deviates from 𝑦 = 𝑥. This ensures a stricter evaluation of the agreement between predicted values and the response.

The concordance correlation coefficient and error metrics such as the RMSE or the MAE can deviate from each other. RMSE and MAE can be severely affected by a large error to a single case, ignoring small improvements over many other cases. The concordance correlation coefficient does not suffer so much from this problem. This is better illustrated in Figure 4.7 below:

Figure 4.7. Example of the difference between MAE and the concordance correlation coefficient. The two graphs represent plots of the true versus the predicted values. The black line is the 45o

line through the origin. The point on the upper right part has the largest response out of all points. The right plot presents a case where the prediction on this point is worse, but improves for the three points with small response. The RMSE and the MAE can give the exact same error for both cases, because the error on the point with a large value counteracts the improvement on the cases with a smaller response. The concordance correlation coefficient, however, will improve in the second case.

4.2.2   Evaluating  classification  models  for  data  with  unbalanced  classes  

A common way to evaluate the success of a classifier is the accuracy, which is defined as the number of correct instances classified. However, accuracy might not give accurate results when dealing with skewed data where the classes are unbalanced.

The problem with unbalanced classes is that in many cases a very good accuracy score can be reached by simply guessing the majority class. So, if, for example, 60% of the data belong to class A, with the rest of the data split equally among classes B, C, D and E, then simply guessing A will yield an accuracy score of 60%. This makes it difficult to understand whether the classifier has learned a concept from the data, or is simply guessing the majority class.

A metric that can help with this problem is Cohen’s kappa (Cohen, 1960). Cohen’s kappa is defined by (39):

𝜅 =Pr 𝑎 − Pr  (𝑒) 1 − Pr  (𝑒)

Cohen’s kappa was originally developed as a tool of inter-rater agreement (Carletta, 1996). Pr(a) is the percentage of agreements between the two raters, and Pr(e) is the percentage agreement that would have been achieved by chance alone. When measuring the accuracy of a classification algorithm the first rater is represented by the predictions of the algorithm and the second rater by the ground truth. Cohen’s kappa ranges between 0 and 1, with a value of 1 indicating perfect agreement. A particular benefit of Cohen’s kappa is that it can also work with multiclass classification problems.