Performance
Performance Measures in Data Mining
Common Performance Measures used in Data Mining and Machine Learning Approaches
L. Richter J.M. Cejuela
Department of Computer Science Technische Universität München
Master Lab Course Data Mining, SS 2015, Jul 1st
Performance
Outline
Item Set and Association Rule Weights Simple Measures
Complex Measures Classification
Basic Performance Measures
Complex Measures – Performance Curves Regression
Assessment Strategies
Performance
Item Set and Association Rule Weights Simple Measures
Measures for Item Sets
Various algorithms can yield frequent item sets. From frequent item sets c and c ∪ {i} you can derive if c then {i}. Typically there is only one item in the RHS (right hand side of the rule).
I
Support (of an item set):
sup(X → Y ) = sup(Y → X ) = P(X ∧ Y )
how many times the item set is found in the database
I
Confidence (of a rule): conf (X → Y ) = P(Y |X ) =
P(X and Y )/P(X ) = sup(X → Y )/sup(X )
Performance
Item Set and Association Rule Weights Simple Measures
Measures for Item Sets
Various algorithms can yield frequent item sets. From frequent item sets c and c ∪ {i} you can derive if c then {i}. Typically there is only one item in the RHS (right hand side of the rule).
I
Support (of an item set):
sup(X → Y ) = sup(Y → X ) = P(X ∧ Y )
how many times the item set is found in the database
I
Confidence (of a rule): conf (X → Y ) = P(Y |X ) =
P(X and Y )/P(X ) = sup(X → Y )/sup(X )
Performance
Item Set and Association Rule Weights Simple Measures
Measures for Item Sets
Various algorithms can yield frequent item sets. From frequent item sets c and c ∪ {i} you can derive if c then {i}. Typically there is only one item in the RHS (right hand side of the rule).
I
Support (of an item set):
sup(X → Y ) = sup(Y → X ) = P(X ∧ Y )
how many times the item set is found in the database
I
Confidence (of a rule): conf (X → Y ) = P(Y |X ) =
P(X and Y )/P(X ) = sup(X → Y )/sup(X )
Performance
Item Set and Association Rule Weights Complex Measures
Measures for Item Sets cont.’d
Measures how frequent an item set / how interesting a rule is in comparison to the expected occurrence (interesting):
I
Leverage (of an item set):
lev (X → Y ) = P(X and Y ) − (P(X )P(Y ))
I
Lift (of a rule): lift(X → Y ) = lift(Y → X ) = P(X ∧ Y )/(P(X )P(Y )) =
conf (X → Y )/sup(Y ) = conf (Y → X )/sup(X )
I
Conviction (of a rule): Similar to Lift, but directed
Compares the probability that X appears without Y, if they were independent with the observed frequency of X and Y.
conviction(X → Y ) = P(X )P(¬Y )/P(X ∧ ¬Y ) =
(1 − sup(Y )/(1 − conf (X → Y ))
Performance
Item Set and Association Rule Weights Complex Measures
Measures for Item Sets cont.’d
Measures how frequent an item set / how interesting a rule is in comparison to the expected occurrence (interesting):
I
Leverage (of an item set):
lev (X → Y ) = P(X and Y ) − (P(X )P(Y ))
I
Lift (of a rule): lift(X → Y ) = lift(Y → X ) = P(X ∧ Y )/(P(X )P(Y )) =
conf (X → Y )/sup(Y ) = conf (Y → X )/sup(X )
I
Conviction (of a rule): Similar to Lift, but directed
Compares the probability that X appears without Y, if they were independent with the observed frequency of X and Y.
conviction(X → Y ) = P(X )P(¬Y )/P(X ∧ ¬Y ) =
(1 − sup(Y )/(1 − conf (X → Y ))
Performance
Item Set and Association Rule Weights Complex Measures
Measures for Item Sets cont.’d
Measures how frequent an item set / how interesting a rule is in comparison to the expected occurrence (interesting):
I
Leverage (of an item set):
lev (X → Y ) = P(X and Y ) − (P(X )P(Y ))
I
Lift (of a rule): lift(X → Y ) = lift(Y → X ) = P(X ∧ Y )/(P(X )P(Y )) =
conf (X → Y )/sup(Y ) = conf (Y → X )/sup(X )
I
Conviction (of a rule): Similar to Lift, but directed
Compares the probability that X appears without Y, if they were independent with the observed frequency of X and Y.
conviction(X → Y ) = P(X )P(¬Y )/P(X ∧ ¬Y ) =
(1 − sup(Y )/(1 − conf (X → Y ))
Performance
Item Set and Association Rule Weights Complex Measures
Supplementary Material
J-Measure
I
empirically observed accuracy of rule
I
Cross-entropy (measuring how good a distribution approximates another distribution) between the binary variables φ and θ with vs. without conditioning on event θ
J(θ ⇒ φ) = p(θ)
p(φ|θ)log p(φ|θ)
p(φ) + (1 − p(φ|θ)) log 1 − p(φ|θ) 1 − p(φ)
Performance Classification
Basic Performance Measures
Basic Building Blocks for Performance Measures
I
True Positive (TP): positive instances predicted as positive
I
True Negative (TN): negative instances predicted as negative
I
False Positive (FP): negative instances predicted as positive
I
False Negative (FN): positive instances predicted as negative
I
Confusion Matrix:
Predicted a Predicted b
Real a TP FN
Real b FP TN
Performance Classification
Basic Performance Measures
Performance Measures
Accuracy , acc = TP + TN TP + FN + FP + TN
= Number of correct predictions Total number of predictions Error rate, err = FN + FP
TP + FN + FP + TN = 1 − acc
= Number of wrong predictions
Total number of predictions
Performance Classification
Basic Performance Measures
Performance Measures cont’d
True Positive Rate, TPR, Sensitivity = TP TP + FN True Negative Rate, TNR, Specificity = TN
TN + FP
False Positive Rate, FPR = FP TN + FP False Negative Rate, FNR = FN
TP + FN
Performance Classification
Basic Performance Measures
Performance Measures cont’d
True Positive Rate, TPR, Sensitivity = TP TP + FN True Negative Rate, TNR, Specificity = TN
TN + FP False Positive Rate, FPR = FP
TN + FP False Negative Rate, FNR = FN
TP + FN
Performance Classification
Basic Performance Measures
Performance Measures cont’d
True Positive Rate, TPR, Sensitivity = TP TP + FN True Negative Rate, TNR, Specificity = TN
TN + FP False Positive Rate, FPR = FP
TN + FP False Negative Rate, FNR = FN
TP + FN
Performance Classification
Basic Performance Measures
Performance Measures cont’d
True Positive Rate, TPR, Sensitivity = TP TP + FN True Negative Rate, TNR, Specificity = TN
TN + FP False Positive Rate, FPR = FP
TN + FP False Negative Rate, FNR = FN
TP + FN
Performance Classification
Basic Performance Measures
Performance Measures cont’d
Precision, p = TP TP + FP Recall, r = TP
TP + FN
F
1measure = 2rp
r + p = 2 × TP
2 × TP + FP + FN
Performance Classification
Basic Performance Measures
Performance Measures cont’d
Precision, p = TP TP + FP Recall, r = TP
TP + FN F
1measure = 2rp
r + p = 2 × TP
2 × TP + FP + FN
Performance Classification
Basic Performance Measures
Performance Measures cont’d
Precision, p = TP TP + FP Recall, r = TP
TP + FN F
1measure = 2rp
r + p = 2 × TP
2 × TP + FP + FN
Performance Classification
Complex Measures – Performance Curves
Performance Curves
I
"costs" of different error types are different
I
prediction behaviour changes over the test set
I
performance display in 2D
I
different domains prefer different chart types
Performance Classification
Complex Measures – Performance Curves
Lift Charts
Taken from
http://www.dmg.org/rfc/pmml-3.2/ModelExplanation.html
I
from marketing area to evaluate mailing success
I
y -axis: number or percentage of responders
I
x -axis: sample
I
red diagonal: random lift
I
green line: optimum lift
Performance Classification
Complex Measures – Performance Curves
ROC Curves
Taken from http://en.wikipedia.org/wiki/File:Roccurves.png
I
Receiver Operator Characteristics
I
y -axis: TPR
I
x -axis: FPR
I
diagonal: random
guessing (TPR=FPR)
Performance Classification
Complex Measures – Performance Curves
Sensitivity vs. Specificity
Taken from http://i.stack.imgur.com/fnUd2.png
I
preferred in medicine
I
y -axis: TPR
I
x -axis: TNR (specificity)
I
also frequently as ROC
curve with 1 - specificity
Performance Classification
Complex Measures – Performance Curves
Recall Precision Curves
Taken from http://scikit-learn.github.io/scikit-learn.org/
I
preferred in information retrieval
I
positives are the documents retrieved in response to a query
I
true positives are
documents really relevant to the query
I
y -axis: precision
I
x -axis: recall
Performance Regression
Error Measures for Regression
Mean − squared error , MSE = (p
1− a
1)
2+ · · · + (p
n− a
n)
2n
Root mean−squared error , RMSE =
r (p
1− a
1)
2+ · · · + (p
n− a
n)
2n
Mean − absolute error , MAE = |p
1− a
1| + · · · + |p
n− a
n|
n
Performance Regression
Error Measures for Regression
Mean − squared error , MSE = (p
1− a
1)
2+ · · · + (p
n− a
n)
2n
Root mean−squared error , RMSE =
r (p
1− a
1)
2+ · · · + (p
n− a
n)
2n
Mean − absolute error , MAE = |p
1− a
1| + · · · + |p
n− a
n|
n
Performance Regression
Error Measures for Regression
Mean − squared error , MSE = (p
1− a
1)
2+ · · · + (p
n− a
n)
2n
Root mean−squared error , RMSE =
r (p
1− a
1)
2+ · · · + (p
n− a
n)
2n
Mean − absolute error , MAE = |p
1− a
1| + · · · + |p
n− a
n|
n
Performance Regression
Relative Error Measures
Relative − squared error = (p
1− a
1)
2+ · · · + (p
n− a
n)
2(a
1− ¯ a)
2+ · · · + (a
n− ¯ a)
2Root relative − squared error = s
(p
1− a
1)
2+ · · · + (p
n− a
n)
2(a
1− ¯ a)
2+ · · · + (a
n− ¯ a)
2Relative − absolute error = |p
1− a
1| + · · · + |p
n− a
n|
|a
1− ¯ a| + · · · + |a
n− ¯ a|
Performance
Assessment Strategies
General Problem
I
each algorithm abstracts from observations (instances)
I
the aspects kept and the aspect discarded differ between the learning scheme (inductive bias)
I
this means also: information about individual instances are contained in the model, too
I
individual instance information leads to overfitting
Performance
Assessment Strategies
Solution Strategies
I
use fresh date, i.e. instances not used for the training
I
for very large numbers of instances: simple split in test and training set
I
most common: 10-fold cross validation
I