Knowledge Discovery and Data Mining
Lecture 15 - ROC, AUC & Lift
Tom Kelsey
School of Computer Science University of St Andrews http://tom.home.cs.st-andrews.ac.uk
Testing
A useful tool for investigating model performance is the confusion matrix:
y=0 y=1
ˆy=0 a b
ˆy=1 c d
Contains quantities for the correct prediction of class 0, correct prediction of class 1, and the two ways you may have made incorrect predictions.
Performance Measures
Accuracy a+d
a+b+c+d
Precision d
b+d Recall (TP) d
c+d Sensitivity True negative a
a+b Specificity False positive b
a+b False negative c
c+d
Receiver-Operator Characteristics
ROC curves
For continuous data with variable cutoff points for the classification
Obese Y/N based on BMI, age, etc.
Cancerous based on percent of abnormal tissue in a slide Given a tree, some test data and a confusion matrix, it’s easy to generate a point on a ROC chart
x-axis is FP rate, y-axis is TP rate
This point depends on a probability threshold for the classification
Varying this threshold will change the confusion matrix, giving more points on the chart
Use this to tune the model w.r.t FP and TP rates
Example
Goldstein and Mushlin (J. Gen. Intern. Med. 1987 2 20-24)
Example
Example
Example
Effect of Thresholding
How the balance between TP, TN, FP and FN changes:
Area Under Curve
The area measures discrimination – the ability of the test to classify correctly
Useful for comparing ROC curves – standard academic banding:
0.90 – 1.00 = excellent
0.80 – 0.90 = good 0.86 for the example 0.70 – 0.80 = fair
0.60 – 0.70 = poor 0.50 – 0.60 = fail
Computed by trapezoidal estimates (or the curve can be smoothed, then integrated)
Examples
Kelsey et al.
Examples
Examples
pROC package for R
Examples
Examples
pROC package for R
Examples
Examples
pROC package for R
Examples
The Case For
S. Ma & J. Huang Regularized ROC method for disease classification and biomarker selection with microarray data Bioinf. (2005) 21 (24) An important application of microarrays is to discover genomic biomarkers, among tens of thousands of genes assayed, for disease classification. Thus there is a need for developing statistical methods that can efficiently use such high-throughput genomic data, select biomarkers with discriminant power and construct classification rules. The ROC technique has been widely used in disease classification with low-dimensional biomarkers because (1) it does not assume a parametric form of the class probability as required for example in the logistic regression method; (2) it accommodates case-control designs and (3) it allows treating false positives and false negatives differently.
However, due to computational difficulties, the ROC-based classification has not been used with microarray data.
The Case Against
J.M. Lobo et al. AUC: a misleading measure of the performance of predictive distribution models Global Ecol. and Biogeog. 17(2); 2008 The ... AUC, is currently considered to be the standard method to assess the accuracy of predictive distribution models. It avoids the supposed subjectivity in the threshold selection process, when continuous probability derived scores are converted to a binary presence-absence variable, by summarizing overall model performance over all possible thresholds... We do not
recommend using AUC for five reasons: (1) it ignores the
predicted probability values and the goodness-of-fit of the model;
(2) it summarises the test performance over regions of the ROC space in which one would rarely operate; (3) it weights omission and commission errors equally; (4) it does not give information about the spatial distribution of model errors; and, most
importantly, (5) the total extent to which models are carried out
Lift
Measures the degree to which the predictions of a classification model are better than random predictions.
In simple terms lift is the ratio of the correct positive classifications made by the model to the actual positive classifications in the test data
For example, if 40% of patients have been diagnosed (the positive classification) in the past, and the model accurately predicts 75% of them, the lift would be
0.75
0.4 =1.875
Lift
Lift charts for a model can be obtained in a similar manner to ROC charts. For threshold value t
x= TP(t) +FP(t)
P+N y=TP(t)
The AUC of a lift chart is no smaller than the AUC of the ROC curve for the same model
As before, we can compare lift charts for competing models, and investigate optimal threshold values
Lift Example
Suppose there is have a mailing list of former students, and we want to get money by mailing an elaborate brochure. We have demographic information that we can relate to the response rate.
Also, from similar mail-out campaigns, we estimated the baseline response rate at 8%.
Sending to everyone would result in a net loss. We build a model based on the data collected. We can select the 10% most likely to respond. If among these the response rate is 16% percent then the lift value due to using the predictive model is 16% / 8% = 2.
Analogous lift values can be computed for each percentile of the population. From this we work out the best trade-off between expense and anticipated response.
General chart structure
You can think of this as a customer database ordered by predicted probability - as we move from left-to-right we are penetrating deeper in to the database from high ˆp observations to low ˆp observations:
Lift
Closely associated with the Pareto Principle – 80% of profit comes from 20% of customers. A good model and a lift chart help identify those customers.
Why use these plots?
The utility of these charts is hopefully clear:
if we had a limited budget we can see what kind of level of response this would buy by targeting the (modelled) most likely responders
we can see how much value our model has brought to the problem (compared to a random sample of customers) - in direct monetary terms if costs are included
perhaps we can do a smaller campaign, as the returns diminish beyond some percentage of customers targeted we can see where a level of customer targeting becomes unprofitableif the costs are known.
Summary
Medics and management use ROC, AUC & Lift whenever possible
Easy to compute Easy to understand
Simple 2D graphical expression of how Model A compares to Model B
Plus useful threshold cutoff information Plus important cost-benefit information
You are expected to be able to produce ROC curves.
You are not expected to be able to produce lift charts, but be able to explain their design and use.