Knowledge Discovery and Data Mining

(1)

Knowledge Discovery and Data Mining

Lecture 15 - ROC, AUC & Lift

Tom Kelsey

School of Computer Science University of St Andrews http://tom.home.cs.st-andrews.ac.uk

[email protected]

(2)

Testing

A useful tool for investigating model performance is the confusion matrix:

y=0 y=1

ˆy=0 a b

ˆy=1 c d

Contains quantities for the correct prediction of class 0, correct prediction of class 1, and the two ways you may have made incorrect predictions.

(3)

Performance Measures

Accuracy a+d

a+b+c+d

Precision d

b+d Recall (TP) d

c+d Sensitivity True negative a

a+b Specificity False positive b

a+b False negative c

c+d

(4)

Receiver-Operator Characteristics

ROC curves

For continuous data with variable cutoff points for the classification

Obese Y/N based on BMI, age, etc.

Cancerous based on percent of abnormal tissue in a slide Given a tree, some test data and a confusion matrix, it’s easy to generate a point on a ROC chart

x-axis is FP rate, y-axis is TP rate

This point depends on a probability threshold for the classification

Varying this threshold will change the confusion matrix, giving more points on the chart

Use this to tune the model w.r.t FP and TP rates

(5)

Example

Goldstein and Mushlin (J. Gen. Intern. Med. 1987 2 20-24)

(6)

Example

(7)

Example

(8)

Example

(9)

Effect of Thresholding

How the balance between TP, TN, FP and FN changes:

(10)

Area Under Curve

The area measures discrimination – the ability of the test to classify correctly

Useful for comparing ROC curves – standard academic banding:

0.90 – 1.00 = excellent

0.80 – 0.90 = good 0.86 for the example 0.70 – 0.80 = fair

0.60 – 0.70 = poor 0.50 – 0.60 = fail

Computed by trapezoidal estimates (or the curve can be smoothed, then integrated)

(11)

Examples

Kelsey et al.

(12)

Examples

(13)

Examples

pROC package for R

(14)

Examples

(15)

Examples

pROC package for R

(16)

Examples

(17)

Examples

pROC package for R

(18)

Examples

(19)

The Case For

S. Ma & J. Huang Regularized ROC method for disease classification and biomarker selection with microarray data Bioinf. (2005) 21 (24) An important application of microarrays is to discover genomic biomarkers, among tens of thousands of genes assayed, for disease classification. Thus there is a need for developing statistical methods that can efficiently use such high-throughput genomic data, select biomarkers with discriminant power and construct classification rules. The ROC technique has been widely used in disease classification with low-dimensional biomarkers because (1) it does not assume a parametric form of the class probability as required for example in the logistic regression method; (2) it accommodates case-control designs and (3) it allows treating false positives and false negatives differently.

However, due to computational difficulties, the ROC-based classification has not been used with microarray data.

(20)

The Case Against

J.M. Lobo et al. AUC: a misleading measure of the performance of predictive distribution models Global Ecol. and Biogeog. 17(2); 2008 The ... AUC, is currently considered to be the standard method to assess the accuracy of predictive distribution models. It avoids the supposed subjectivity in the threshold selection process, when continuous probability derived scores are converted to a binary presence-absence variable, by summarizing overall model performance over all possible thresholds... We do not

recommend using AUC for five reasons: (1) it ignores the

predicted probability values and the goodness-of-fit of the model;

(2) it summarises the test performance over regions of the ROC space in which one would rarely operate; (3) it weights omission and commission errors equally; (4) it does not give information about the spatial distribution of model errors; and, most

importantly, (5) the total extent to which models are carried out

(21)

Lift

Measures the degree to which the predictions of a classification model are better than random predictions.

In simple terms lift is the ratio of the correct positive classifications made by the model to the actual positive classifications in the test data

For example, if 40% of patients have been diagnosed (the positive classification) in the past, and the model accurately predicts 75% of them, the lift would be

0.75

0.4 =1.875

(22)

Lift

Lift charts for a model can be obtained in a similar manner to ROC charts. For threshold value t

x= ^TP(t) +FP(t)

P+N y=TP(t)

The AUC of a lift chart is no smaller than the AUC of the ROC curve for the same model

As before, we can compare lift charts for competing models, and investigate optimal threshold values

(23)

Lift Example

Suppose there is have a mailing list of former students, and we want to get money by mailing an elaborate brochure. We have demographic information that we can relate to the response rate.

Also, from similar mail-out campaigns, we estimated the baseline response rate at 8%.

Sending to everyone would result in a net loss. We build a model based on the data collected. We can select the 10% most likely to respond. If among these the response rate is 16% percent then the lift value due to using the predictive model is 16% / 8% = 2.

Analogous lift values can be computed for each percentile of the population. From this we work out the best trade-off between expense and anticipated response.

(24)

General chart structure

You can think of this as a customer database ordered by predicted probability - as we move from left-to-right we are penetrating deeper in to the database from high ˆp observations to low ˆp observations:

(25)

Lift

Closely associated with the Pareto Principle – 80% of profit comes from 20% of customers. A good model and a lift chart help identify those customers.

(26)

Why use these plots?

The utility of these charts is hopefully clear:

if we had a limited budget we can see what kind of level of response this would buy by targeting the (modelled) most likely responders

we can see how much value our model has brought to the problem (compared to a random sample of customers) - in direct monetary terms if costs are included

perhaps we can do a smaller campaign, as the returns diminish beyond some percentage of customers targeted we can see where a level of customer targeting becomes unprofitableif the costs are known.

(27)

Summary

Medics and management use ROC, AUC & Lift whenever possible

Easy to compute Easy to understand

Simple 2D graphical expression of how Model A compares to Model B

Plus useful threshold cutoff information Plus important cost-benefit information

You are expected to be able to produce ROC curves.

You are not expected to be able to produce lift charts, but be able to explain their design and use.