Performance measurements - Classification of high-dimensional microarray data

2.4 Classification of high-dimensional microarray data

2.4.5 Performance measurements

A number of different measures are commonly used to evaluate the performance of predictive algorithms. These measures differ according to whether the output is derived from a predefined set of responses or the classifier result is a quantitative score. A binary classification task is to distinguish between positive and negative samples and its performance can be assessed by the following metrics (Shapiro, 1999).

When the diagnostic test results in classification into positive and negative samples, the counts of correct and incorrect predictions can be summarized in a 2 × 2 contingency table or confusion matrix (see Tab. 2.3) of predictions against actual class labels. Most commonly, the terms sensitivity and specificity are used to characterize a classification rule and can be calculated from the entries of this contingency table.

Predicted positive Predicted negative

Actual positive TP FP PPV

Acutal negative FN TN NPV

PP PN

Table 2.3.: Contingency table or confusion matrix

TP: true positives (predicted positive, actual positive); TN: true negatives (predicted negative, actual negative); FP: false positives (predicted positive, actual negative); FN: false negatives (predicted negative, actual positive); PPV: positive predictive value; NPV: negative predictive value; PP: predicted positives (sensitivity); PN: predicted negatives (specificity).

Sensitivity (sens) or true positives (TP) measures the probability of predicting pos-

itives given true positive status, for example the probability that a positive sample is predicted to be truly positive. It is calculated by

Sensitivity = T P

T P + F N (2.4.18)

Specificity (spec) or true negatives (TN) relates the test’s ability to identify neg-

ative results as the proportion of controls that will test negative for it. It can be written as

Specif icity = T N

T N + F P (2.4.19)

2.4 Classification of high-dimensional microarray data

can be calculated by

Accuracy = T P + T N

T P + F P + F N + T N (2.4.20)

As accuracy is sensitive to the prior class probabilities and does not fully describe the actual difficulty of the decision problem for highly unbalanced distributions, the

Matthews correlation coefficient (MCC) can be used as a balanced performance

measurement. It can be interpreted as a correlation coefficient between observed and predicted binary classifications and as for the Pearson correlation, a value of 1 corresponds to a perfect correlation, meaning a perfect performance. The MCC is defined as

M CC = q T P × T N − F P × F N

(T P + F P )(T P + F N )(T N + F P )(T N + F N )

(2.4.21)

Some classification models result in a continuous output (e.g., an estimate of an in- stance’s class membership probability or another quantitative score) that represents the degree to which an object is a member of the specific class. When assessing the distribution of test results X for positive and negative groups, the degree of overlap determines a tests discriminatory ability and by introducing a threshold or decision limit c, the samples can be separated into predicted negatives (X > c) and predicted positives (X > c). (see Fig. 2.4.5).

Different discrimination thresholds of c can then be applied to predict membership into the two classes and yields a different 2 × 2 contingency table where true positive rate (TPR) and false positive rate (FPR) can be estimated. When c increases, sensi- tivity increases as well at the cost of reduced specificity and vice versa, a decreasing

c leads to higher specificity with decreasing sensitivity.

The receiver operating characteristic (ROC) curve is a useful graphical plot to visualize classifier performance for such a varying threshold c (Swets, 1988). The ROC curve is constructed by using different values of the threshold c to plot the Sensitivity (sens(c)) on the y-axis against 1-specificity (1−spec(c)) on the x-axis (see Fig. 2.4.6). The point (0, 1) represents perfect classification with 100% sensitivity and 100% specificity whereas a random model guessing class labels would lead to a ROC curve at the diagonal line.

The information in the ROC curve can be reduced to one single scalar summary met- ric of predictive performance, the area under the ROC curve (AUC) (Bradley, 1997). An AUC value close to 1 indicates excellent classification whereas a value of 0.5 indicates useless prediction performance. The AUC is a robust measure of perfor- mance and compared to the MCC, it is independent of the choice of the threshold c.

2.4 Classification of high-dimensional microarray data 0.00 0.05 0.10 0.15 0.20 threshold c Pr obabilitydensity positives negatives test result X FNR FPR

Figure 2.4.5.: Hypothetical distributions of diagnostic test results X for negative

and positive samples. The vertical line at the threshold X = c indicates the decision limit for a positive test. The shaded area to the right of c is the false positive rate (FPR); the shaded area to the left of c is the false negative rate (FNR). The figure was modified from Shapiro (1999).

The AUC is equal to the value of the Wilcoxon-Mann-Whitney test statistic and also the probability that the classifier will rank a randomly drawn positive sample higher than a randomly drawn negative sample (AU C = P rob(positive > negative)). Fur- thermore, the AUC represents the average sensitivity over all values of FPR. ROC curves and the AUC can be estimated under parametric or non-parametric assumptions as described in (Faraggi and Reiser , 2002; Shapiro, 1999). The non- parametric approaches include the use of the Mann–Whitney statistic and the fit a smooth ROC curve using kernel smoothing followed by estimation of the AUC by integration. The parametric approaches cover the assumption that the marker values for negative and positive samples are normally distributed where the AUC can be estimated by parametric methods as well as the application of a Box–Cox type power transformation together with the use of normal theory.

Additional to the AUC, the Youden Index (Youden, 1950) is frequently used in practice. This index is defined as

J = maxc{sens(c) + spec(c) − 1} (2.4.22) and ranges between 0 and 1. The Youden Index (YI) has an attractive feature

In document Establishment of predictive blood-based signatures in medical large scale genomic data sets : Development of novel diagnostic tests (Page 39-42)