• No results found

Receiver Operating Characteristic (ROC) Analysis

2.10 Statistical Techniques

2.10.3 Receiver Operating Characteristic (ROC) Analysis

Receiver operating characteristic (ROC) analysis is a common method for assessing the discrimina- tory ability of a classification model,i.e., a model that produces a probability metric for membership in a given class. For this research, the two classifications types are: malignant versus healthy tis- sue or malignant versus benign lesions (Chapter 5), and complete responder versus non-complete responder to chemotherapy (Chapter 3). The example below will describe the ROC method in terms of the malignant versus healthy tissue classification probabilityPM; however, this discussion

is readily transferable to the responder versus non-responder model with probabilityPR.

ROC analysis is predicated on setting some cutoff valuePMc for the probability of malignancy

metricPM, such that any data point with probability of malignancy above the cutoff is predicted

to be positive, i.e., malignant tissue; analagously, any point below the cutoff is predicted to be

negative, i.e., normal tissue. Thus all of the points in the test dataset can be divided into four

categories:

1. True Positives: actually malignant and predicted to be malignant due to aPM >PMc

2. True Negatives: actually healthy and predicted to be healthy due to aPM <PMc

4. False Negatives: actually malignant but predicted to be healthy due to aPM <PMc.

Figure 2.16A contains a graphical representation of these classifications. In that example, there are 15 true positives, 16 true negatives, 5 false positives, and 4 false negatives.

Given these four categories, four different measures of accuracy can also be obtained:

1. True Positive Rate: fraction of actually malignant tissues predicted to be malignant. This is also known as the sensitivity.

T P R=Sensitivity≡ #T rue P ositives

#T rue P ositives + #F alse N egatives (2.197) 2. False Positive Rate: fraction of actually healthy tissues predicted to be malignant. The

quantity (1−F alse P ositive Rate) is also known as the specificity

F P R= 1−Specif icity≡ #F alse P ositives

#T rue N egatives + #F alse P ositives (2.198)

3. Positive Predictive Value: fraction predicted to be malignant that are actually malignant

P P V ≡ #T rue P ositives

#T rue P ositives + #F alse P ositives (2.199)

4. Negative Predictive Value: fraction predicted to be healthy that are actually healthy

N P V ≡ #T rue N egatives

#T rue N egatives + #F alse N egatives (2.200)

These quantities are also described graphically in Figure 2.17, in which a cutoff ofPMc = 0.5 was

chosen. In this example,T P R(Sensitivity) =15+415 = 0.789,F P R= 16+55 = 0.238 (Specif icity= 0.762), P P V = 15+515 = 0.75, andN P V =16+416 = 0.8.

Varying the cutoff value PMc alters the number of data points in each of the four groups (see

Figure 2.16B-C). For example, increasingPMc leads to a lower true positive rate and a lower false

positive rate. Conversely, decreasingPMcresults in both a higher true positive rate and higher false

positive rate. Thus, for practical use, it is necessary to optimize PMc to balance these competing

effects. One common method is to find the value ofPMc that maximizes the sum of the sensitivity

and specificity [115]. For a quality prediction model, this results in a high true positive rate and a low false positive rate. The overall predictive accuracy for a given cutoff can then be defined as

Accuracy≡ #T rue P os+ #T rue N eg

#T rue P os+ #T rue N eg + #F alse P os + #F alse N eg. (2.201) For the data shown in Figure 2.17, the overall classification accuracy is 70 %. Figures 2.16 and 2.18 demonstrate how each of these accuracy parameters changes for the model dataset in Figure

Figure 2.16: Receiver Operating Characteristic (ROC) Analysis. An example dataset is graphed such that the y-axis represents the true binomial classification, either positive or negative, of every data point, and the x-axis is the predicted probability of a data point being positive. A cutoff value for the probability PMc can be chosen, above which all data points are predicted to be

positive, and below which all data points are predicted to be negative. This results in four possible classifications for each data point: 1) True Positives: actually positive and predicted to be positive, 2) True Negatives: actually negative and predicted to be negative, 3) False Positives: actually negative but predicted to be positive, and 4) False Negatives: actually positive but predicted to be negative. Note that the number of data points in each of these groups is determined by the cutoff valuePMc that is chosen. A) displays a cutoff ofPMc = 0.5, B) shows a cutoff of PMc= 0.25, and

C) has a cutoff ofPMc = 0.75. HigherPMc cutoffs lead to more true negatives and false negatives

Figure 2.17: ROC Accuracy Parameters. The four most commonly used accuracy parameters for ROC analysis are displayed graphically. Each accuracy parameter can generally be thought of as the fraction of two ROC classification groups,e.g., false positive and false negative, or false positive and true positive, that fall into a single one of those classification groups. Thus, for each parameter in this figure, the denominator of this fraction is the sum of data points across the two highlighted, colorful regions, and the numerator is the number of data points in the region encased by the green rectangle. Four parameters are shown: A) True Positive Rate (T P R): number of true positives divided by all actual positives, B) False Positive Rate (F P R): number of false positives divided by all actual negatives, C) Positive Predictive Value (P P V): number of true positives divided by all predicted positives, and D) Negative Predictive Value (N P V): number of true negatives divided by all predicted negatives.

2.16 given differentPMcvalues. In general, higherPMccutoffs lead to improvement in specificity and

positive predictive value while lowerPMc values provide better sensitivity and negative predictive

value.

A more holistic measure of the discriminatory ability of a prediction model is the so-called area under the ROC curve (AU C). The ROC curve is created by plotting the true positive rate vs the false positive rate for a range of cutoff values from 0 to 1. WhenPMc= 0, all data points will be

Figure 2.18: Example ROC Accuracy Parameters versus Probability Cutoff. The example data here comes from the schematic ROC analysis example in Figure 2.16. A) Plot of the overall accuracy, sensitivity (T P R), and specificity (1−F P R). Note that higherPMccutoff values lead to

lower sensitivity and higher specificity while lowerPMc cutoffs provide higher sensitivity and lower

specificity. The overall accuracy is maximized by an intermediatePMc value. B) Plot of the overall

accuracy, positive predictive value (P P V), and negative predictive value (N P V). Note that higher

PMc cutoffs produce better positive predictive values and worse negative predictive values. Lower

classified as positive. Thus, the true positive rate will be 1, but the false positive rate will also be 1. Similarly, whenPMc = 1, all data points will be classified as negative. Therefore the true and

false positive rates will both be zero. A perfect discriminator would have a true positive rate of 1 and a false positive rate of 0 for all cutoff values. In this scenario, the area under the plotted ROC curve would be 1. For good, but imperfect, predictors, the true positive rate rapidly increases as

PMc decreases while the false positive rate increases slowly. This results in an area under the curve

value where 0.5< AU C <1, with better predictors havingAU C values closer to 1. A model that randomly classifies data points as either positive or negative produces anAU C= 0.5; thus,AU C must be greater than 0.5 for a model to be said to have predictive value. Figure 2.19 contains an example ROC curve using the model data from Figure 2.16. In that case, the AU C is 0.82, indicating good predictive value.