Performance Measures in Data Mining

(1)

Performance

Performance Measures in Data Mining

Common Performance Measures used in Data Mining and Machine Learning Approaches

L. Richter J.M. Cejuela

Department of Computer Science Technische Universität München

Master Lab Course Data Mining, SS 2015, Jul 1st

(2)

Performance

Outline

Item Set and Association Rule Weights Simple Measures

Complex Measures Classification

Basic Performance Measures

Complex Measures – Performance Curves Regression

Assessment Strategies

(3)

Performance

Item Set and Association Rule Weights Simple Measures

Measures for Item Sets

Various algorithms can yield frequent item sets. From frequent item sets c and c ∪ {i} you can derive if c then {i}. Typically there is only one item in the RHS (right hand side of the rule).

I

Support (of an item set):

sup(X → Y ) = sup(Y → X ) = P(X ∧ Y )

how many times the item set is found in the database

I

Confidence (of a rule): conf (X → Y ) = P(Y |X ) =

P(X and Y )/P(X ) = sup(X → Y )/sup(X )

(4)

Performance

Measures for Item Sets

Various algorithms can yield frequent item sets. From frequent item sets c and c ∪ {i} you can derive if c then {i}. Typically there is only one item in the RHS (right hand side of the rule).

I

Support (of an item set):

sup(X → Y ) = sup(Y → X ) = P(X ∧ Y )

how many times the item set is found in the database

I

Confidence (of a rule): conf (X → Y ) = P(Y |X ) =

P(X and Y )/P(X ) = sup(X → Y )/sup(X )

(5)

Performance

Measures for Item Sets

Various algorithms can yield frequent item sets. From frequent item sets c and c ∪ {i} you can derive if c then {i}. Typically there is only one item in the RHS (right hand side of the rule).

I

Support (of an item set):

sup(X → Y ) = sup(Y → X ) = P(X ∧ Y )

how many times the item set is found in the database

I

Confidence (of a rule): conf (X → Y ) = P(Y |X ) =

P(X and Y )/P(X ) = sup(X → Y )/sup(X )

(6)

Performance

Item Set and Association Rule Weights Complex Measures

Measures for Item Sets cont.’d

Measures how frequent an item set / how interesting a rule is in comparison to the expected occurrence (interesting):

I

Leverage (of an item set):

lev (X → Y ) = P(X and Y ) − (P(X )P(Y ))

I

Lift (of a rule): lift(X → Y ) = lift(Y → X ) = P(X ∧ Y )/(P(X )P(Y )) =

conf (X → Y )/sup(Y ) = conf (Y → X )/sup(X )

I

Conviction (of a rule): Similar to Lift, but directed

Compares the probability that X appears without Y, if they were independent with the observed frequency of X and Y.

conviction(X → Y ) = P(X )P(¬Y )/P(X ∧ ¬Y ) =

(1 − sup(Y )/(1 − conf (X → Y ))

(7)

Performance

Measures for Item Sets cont.’d

Measures how frequent an item set / how interesting a rule is in comparison to the expected occurrence (interesting):

I

Leverage (of an item set):

lev (X → Y ) = P(X and Y ) − (P(X )P(Y ))

I

Lift (of a rule): lift(X → Y ) = lift(Y → X ) = P(X ∧ Y )/(P(X )P(Y )) =

conf (X → Y )/sup(Y ) = conf (Y → X )/sup(X )

I

Conviction (of a rule): Similar to Lift, but directed

Compares the probability that X appears without Y, if they were independent with the observed frequency of X and Y.

conviction(X → Y ) = P(X )P(¬Y )/P(X ∧ ¬Y ) =

(1 − sup(Y )/(1 − conf (X → Y ))

(8)

Performance

Measures for Item Sets cont.’d

Measures how frequent an item set / how interesting a rule is in comparison to the expected occurrence (interesting):

I

Leverage (of an item set):

lev (X → Y ) = P(X and Y ) − (P(X )P(Y ))

I

Lift (of a rule): lift(X → Y ) = lift(Y → X ) = P(X ∧ Y )/(P(X )P(Y )) =

conf (X → Y )/sup(Y ) = conf (Y → X )/sup(X )

I

Conviction (of a rule): Similar to Lift, but directed

Compares the probability that X appears without Y, if they were independent with the observed frequency of X and Y.

conviction(X → Y ) = P(X )P(¬Y )/P(X ∧ ¬Y ) =

(1 − sup(Y )/(1 − conf (X → Y ))

(9)

Performance

Supplementary Material

J-Measure

I

empirically observed accuracy of rule

I

Cross-entropy (measuring how good a distribution approximates another distribution) between the binary variables φ and θ with vs. without conditioning on event θ

J(θ ⇒ φ) = p(θ)

p(φ|θ)log p(φ|θ)

p(φ) + (1 − p(φ|θ)) log 1 − p(φ|θ) 1 − p(φ)

(10)

Performance Classification

Basic Performance Measures

Basic Building Blocks for Performance Measures

I

True Positive (TP): positive instances predicted as positive

I

True Negative (TN): negative instances predicted as negative

I

False Positive (FP): negative instances predicted as positive

I

False Negative (FN): positive instances predicted as negative

I

Confusion Matrix:

Predicted a Predicted b

Real a TP FN

Real b FP TN

(11)

Performance Measures

Accuracy , acc = TP + TN TP + FN + FP + TN

= Number of correct predictions Total number of predictions Error rate, err = FN + FP

TP + FN + FP + TN = 1 − acc

= Number of wrong predictions

Total number of predictions

(12)

Performance Measures cont’d

True Positive Rate, TPR, Sensitivity = TP TP + FN True Negative Rate, TNR, Specificity = TN

TN + FP

False Positive Rate, FPR = FP TN + FP False Negative Rate, FNR = FN

TP + FN

(13)

Performance Measures cont’d

True Positive Rate, TPR, Sensitivity = TP TP + FN True Negative Rate, TNR, Specificity = TN

TN + FP False Positive Rate, FPR = FP

TN + FP False Negative Rate, FNR = FN

TP + FN

(14)

Performance Measures cont’d

True Positive Rate, TPR, Sensitivity = TP TP + FN True Negative Rate, TNR, Specificity = TN

TN + FP False Positive Rate, FPR = FP

TN + FP False Negative Rate, FNR = FN

TP + FN

(15)

Performance Measures cont’d

True Positive Rate, TPR, Sensitivity = TP TP + FN True Negative Rate, TNR, Specificity = TN

TN + FP False Positive Rate, FPR = FP

TN + FP False Negative Rate, FNR = FN

TP + FN

(16)

Performance Measures cont’d

Precision, p = TP TP + FP Recall, r = TP

TP + FN

F

₁

measure = 2rp

r + p = 2 × TP

2 × TP + FP + FN

(17)

Performance Measures cont’d

Precision, p = TP TP + FP Recall, r = TP

TP + FN F

₁

measure = 2rp

r + p = 2 × TP

2 × TP + FP + FN

(18)

Performance Measures cont’d

Precision, p = TP TP + FP Recall, r = TP

TP + FN F

₁

measure = 2rp

r + p = 2 × TP

2 × TP + FP + FN

(19)

Complex Measures – Performance Curves

Performance Curves

I

"costs" of different error types are different

I

prediction behaviour changes over the test set

I

performance display in 2D

I

different domains prefer different chart types

(20)

Lift Charts

Taken from

http://www.dmg.org/rfc/pmml-3.2/ModelExplanation.html

I

from marketing area to evaluate mailing success

I

y -axis: number or percentage of responders

I

x -axis: sample

I

red diagonal: random lift

I

green line: optimum lift

(21)

ROC Curves

Taken from http://en.wikipedia.org/wiki/File:Roccurves.png

I

Receiver Operator Characteristics

I

y -axis: TPR

I

x -axis: FPR

I

diagonal: random

guessing (TPR=FPR)

(22)

Sensitivity vs. Specificity

Taken from http://i.stack.imgur.com/fnUd2.png

I

preferred in medicine

I

y -axis: TPR

I

x -axis: TNR (specificity)

I

also frequently as ROC

curve with 1 - specificity

(23)

Recall Precision Curves

Taken from http://scikit-learn.github.io/scikit-learn.org/

I

preferred in information retrieval

I

positives are the documents retrieved in response to a query

I

true positives are

documents really relevant to the query

I

y -axis: precision

I

x -axis: recall

(24)

Performance Regression

Error Measures for Regression

Mean − squared error , MSE = (p

₁

− a

₁

)

²

+ · · · + (p

_n

− a

n

)

²

n

Root mean−squared error , RMSE =

r (p

₁

− a

₁

)

²

+ · · · + (p

n

− a

n

)

²

n

Mean − absolute error , MAE = |p

₁

− a

₁

| + · · · + |p

n

− a

n

|

n

(25)

Error Measures for Regression

Mean − squared error , MSE = (p

₁

− a

₁

)

²

+ · · · + (p

_n

− a

n

)

²

n

Root mean−squared error , RMSE =

r (p

₁

− a

₁

)

²

+ · · · + (p

n

− a

n

)

²

n

Mean − absolute error , MAE = |p

₁

− a

₁

| + · · · + |p

n

− a

n

|

n

(26)

Error Measures for Regression

Mean − squared error , MSE = (p

₁

− a

₁

)

²

+ · · · + (p

_n

− a

n

)

²

n

Root mean−squared error , RMSE =

r (p

₁

− a

₁

)

²

+ · · · + (p

n

− a

n

)

²

n

Mean − absolute error , MAE = |p

₁

− a

₁

| + · · · + |p

n

− a

n

|

n

(27)

Relative Error Measures

Relative − squared error = (p

₁

− a

₁

)

²

+ · · · + (p

_n

− a

_n

)

²

(a

₁

− ¯ a)

²

+ · · · + (a

n

− ¯ a)

²

Root relative − squared error = s

(p

₁

− a

₁

)

²

+ · · · + (p

n

− a

n

)

²

(a

₁

− ¯ a)

²

+ · · · + (a

_n

− ¯ a)

²

Relative − absolute error = |p

1

− a

1

| + · · · + |p

n

− a

n

|

|a

₁

− ¯ a| + · · · + |a

n

− ¯ a|

(28)

Performance

Assessment Strategies

General Problem

I

each algorithm abstracts from observations (instances)

I

the aspects kept and the aspect discarded differ between the learning scheme (inductive bias)

I

this means also: information about individual instances are contained in the model, too

I

individual instance information leads to overfitting

(29)

Performance

Assessment Strategies

Solution Strategies

I

use fresh date, i.e. instances not used for the training

I

for very large numbers of instances: simple split in test and training set

I

most common: 10-fold cross validation

I

Performance Measures in Data Mining

Performance Measures in Data Mining

Common Performance Measures used in Data Mining and Machine Learning Approaches

L. Richter J.M. Cejuela

Master Lab Course Data Mining, SS 2015, Jul 1st

Outline

Item Set and Association Rule Weights Simple Measures

Complex Measures Classification

Basic Performance Measures

Complex Measures – Performance Curves Regression

Assessment Strategies

Measures for Item Sets

Various algorithms can yield frequent item sets. From frequent item sets c and c ∪ {i} you can derive if c then {i}. Typically there is only one item in the RHS (right hand side of the rule).

Support (of an item set):

sup(X → Y ) = sup(Y → X ) = P(X ∧ Y )

how many times the item set is found in the database

Confidence (of a rule): conf (X → Y ) = P(Y |X ) =

P(X and Y )/P(X ) = sup(X → Y )/sup(X )

Measures for Item Sets

Various algorithms can yield frequent item sets. From frequent item sets c and c ∪ {i} you can derive if c then {i}. Typically there is only one item in the RHS (right hand side of the rule).

Support (of an item set):

sup(X → Y ) = sup(Y → X ) = P(X ∧ Y )

how many times the item set is found in the database

Confidence (of a rule): conf (X → Y ) = P(Y |X ) =

P(X and Y )/P(X ) = sup(X → Y )/sup(X )

Measures for Item Sets

Various algorithms can yield frequent item sets. From frequent item sets c and c ∪ {i} you can derive if c then {i}. Typically there is only one item in the RHS (right hand side of the rule).

Support (of an item set):

sup(X → Y ) = sup(Y → X ) = P(X ∧ Y )

how many times the item set is found in the database

Confidence (of a rule): conf (X → Y ) = P(Y |X ) =

P(X and Y )/P(X ) = sup(X → Y )/sup(X )

Measures for Item Sets cont.’d

Measures how frequent an item set / how interesting a rule is in comparison to the expected occurrence (interesting):

Leverage (of an item set):

lev (X → Y ) = P(X and Y ) − (P(X )P(Y ))

Lift (of a rule): lift(X → Y ) = lift(Y → X ) = P(X ∧ Y )/(P(X )P(Y )) =

conf (X → Y )/sup(Y ) = conf (Y → X )/sup(X )

Conviction (of a rule): Similar to Lift, but directed

Compares the probability that X appears without Y, if they were independent with the observed frequency of X and Y.

conviction(X → Y ) = P(X )P(¬Y )/P(X ∧ ¬Y ) =

(1 − sup(Y )/(1 − conf (X → Y ))

Measures for Item Sets cont.’d

Measures how frequent an item set / how interesting a rule is in comparison to the expected occurrence (interesting):

Leverage (of an item set):

lev (X → Y ) = P(X and Y ) − (P(X )P(Y ))

Lift (of a rule): lift(X → Y ) = lift(Y → X ) = P(X ∧ Y )/(P(X )P(Y )) =

conf (X → Y )/sup(Y ) = conf (Y → X )/sup(X )

Conviction (of a rule): Similar to Lift, but directed

Compares the probability that X appears without Y, if they were independent with the observed frequency of X and Y.

conviction(X → Y ) = P(X )P(¬Y )/P(X ∧ ¬Y ) =

(1 − sup(Y )/(1 − conf (X → Y ))

Measures for Item Sets cont.’d

Measures how frequent an item set / how interesting a rule is in comparison to the expected occurrence (interesting):

Leverage (of an item set):

lev (X → Y ) = P(X and Y ) − (P(X )P(Y ))

Lift (of a rule): lift(X → Y ) = lift(Y → X ) = P(X ∧ Y )/(P(X )P(Y )) =

conf (X → Y )/sup(Y ) = conf (Y → X )/sup(X )

Conviction (of a rule): Similar to Lift, but directed

Compares the probability that X appears without Y, if they were independent with the observed frequency of X and Y.

conviction(X → Y ) = P(X )P(¬Y )/P(X ∧ ¬Y ) =

(1 − sup(Y )/(1 − conf (X → Y ))

Supplementary Material

J-Measure

empirically observed accuracy of rule

Cross-entropy (measuring how good a distribution approximates another distribution) between the binary variables φ and θ with vs. without conditioning on event θ

J(θ ⇒ φ) = p(θ)



p(φ|θ)log p(φ|θ)

p(φ) + (1 − p(φ|θ)) log 1 − p(φ|θ) 1 − p(φ)



Basic Building Blocks for Performance Measures

True Positive (TP): positive instances predicted as positive

True Negative (TN): negative instances predicted as negative

False Positive (FP): negative instances predicted as positive

False Negative (FN): positive instances predicted as negative

Confusion Matrix:

Predicted a Predicted b

Real a TP FN

Real b FP TN