• No results found

Binary Decomposition Methods for Multipartite Ranking

N/A
N/A
Protected

Academic year: 2022

Share "Binary Decomposition Methods for Multipartite Ranking"

Copied!
23
0
0

Loading.... (view fulltext now)

Full text

(1)

Binary Decomposition Methods for Multipartite Ranking

Eyke Hüllermeier, Philipps-Universität Marburg

Johannes Fürnkranz, TU Darmstadt

Stijn Vanderlooy, Maastricht University

(2)

Outline

 Multipartite Ranking

 Evaluation measures

 C-Index

 m-AUC / Jonckheere-Terpstra statistic

 Methods for learning multipartite rankings

 Transformation into a Single Binary Problem

 Binary Decompositions

 ordered

 pairwise

 Complexity of these approaches

 Experimental Results

 Conclusions

(3)

Binary Classification

0

0 1 1 0

1

1 0

Example: Reviewers divide Papers into Accept / Reject

(4)

Binary Classification with Scores

0

0 1 1 0

1

1 0

Scores (shown in different shades of colors)

indicate the degree of redness or greenness

(5)

Binary Classification with Scores

0 0 1 0 1 1 1

0

(6)

Binary Classification with Scores

0 0 1 0 1 1 1

0

= Bipartite Ranking

+ Partition

(7)

Ordered Classification

0

0 1 1 1 1

2 2

2 0

Example: Reviewers divide papers into Accept / Borderline Borderline / Reject

(8)

Multipartite Ranking

Example: Reviewers sort papers by quality

(9)

Multipartite Ranking

 Task is essentially the same as in bipartite ranking:

Rank a set of objects in

agreement with their class labels

 Main Difference:

Training information is

 not one of two (ordered) classes (binary classification)

 but one of multiple ordered classes (ordinal classification)

→ we need different evaluation metrics

Multipartite Ranking is also known as

 layered ranking

(Waegemans et al. 2008)

 k-partite ranking

(Rajaram, Agarwal 2005)

(10)

Evaluation of Bipartite Rankings

Area under the ROC curve

 the probability that a randomly chosen positive example is ranked before a randomly chosen negative example.

Computation:

for each pair (p,n), where class(p) > class(n)

correct++ if score(p) > score(n)

AUC P , N = correct

# P⋅# N

Available information:

binary classification Available information:

binary classification

Prediction:

ranking scores Prediction:

ranking scores

(11)

Evaluation of Multipartite Rankings

C-Index

 the probability that a randomly chosen positive example is ranked before a randomly chosen negative example.

Computation:

for each pair (p,n), where class(p) > class(n)

correct++ if score(p) > score(n)

Obviously, AUC is a special case of C-Index with C = 2.

C-Index= correct

I , J  I # I⋅# J

example of class J

example of class I < J

Available information:

binary classification Available information:

binary classification

Prediction:

ranking scores Prediction:

ranking scores

ordinal

(12)

Evaluation of Multipartite Rankings

C-Index

 the C-index can be rewritten as a weighted sum of pairwise AUCs:

Jonckheere-Terpstra statistic

 is an unweighted sum of pairwise AUCs:

 equivalent to well-known multi-class extension of AUC

C-Index= 1

I , J  I # I⋅# JI , J I # I⋅# J⋅AUC I , J 

m-AUC= 2

C⋅C −1I , J  I AUC I , J 

Note:

C-Index and m-AUC can be optimized by optimization of

pairwise AUCs Note:

C-Index and m-AUC can be optimized by optimization of

pairwise AUCs

(13)

Conventional Approach to Multipartite Ranking

Turn the problem into a single, large binary classification problem

(Herbrich, Graepel, Obermayer, 2001)

Goal:

learn a (linear) scoring function, s.t. for all pairs (p,n) score(p) > score(n) iff class(p) > class(n) Approach:

for each constraint class(p) > class(n)

construct a new example x = p – n such that

score x=score p−n=score p−scoren0

(14)

Binary Decomposition Methods

 Ordered (F&H)

 learn one theory for each possible split point in the order of the

classes

{1} vs. {2,3,4}

{1,2} vs. {3,4}

{1,2,3} vs. {4}

C-1 theories

 each using all examples

 proposed for ordinal classification

 Pairwise (LPC)

 learn one theory for each pair of classes (like for unordered

classification)

{1} vs. {2}

{1} vs. {3}

{1} vs. {4}

C(C-1)/2 theories

 each using only some examples

 is no worse for ordered

{2} vs. {3}

{2} vs. {4}

{3} vs. {4}

Solve a classification / ranking problem by decomposing it into

a set of binary classification problems

(15)

Complexity

Assume we have 2-class learner with complexity

 single binary problem

trains 1 model with N

2

order constraints → (one constraint for each pair of examples)

 ordered (F&H)

trains C-1 models with N training examples →

 pairwise (LPC)

 equal class distribution (worst case):

trains C(C-1)/2 models with training examples →

 the same as ordered decomposition for linear classifiers

O  N

O C⋅N

2⋅N

C O C 2− N

O  N 2 

(16)

Prediction with F&H

 Prediction of binary models

we have C-1 models M I

for an example x, each model predicts

 Computing a prediction

 derive estimates for the probabilities P(class(x) = I)

 straight-forward, but not our concern

 Computing a score:

 intuitive justification:

high classes have a high probability in all p

I

low classes have a low probability in all p

I

 theoretical justification:

p Ix=P class xI 

 medium classes have high probabilities in the low p

I

and low probabilities in high p

I

score FHx=I p I x

p

1

p

2

p

3

(17)

Prediction with LPC

 Prediction of binary models

we have C(C-1)/2 models M I,J (I < J)

for an example x, each model predicts

 Computing a prediction

 weighted voting: predict

 can be shown to minimize Spearman rank correlation

 Computing a score:

 intuitive justification:

 sum up all predictions “in favor of a higher class”

 examples with low classes will get low probabilities in the models for high classes

 examples with high classes will get high probabilties in the models for high classes

 motivation:

p

I , J

x=Pclass x=I∣I ∨J 

score LPC-Ux=I , J  I p I , Jx

1 2 3 4

1 2 3 4

1

arg max

I

score

I

x=

J

p

J , I

x

p J , Ix=1− p I , J

(18)

Prediction with LPC

Actually, we tested three variants

unweighted sum of p

I,J

 motivated by m-AUC

each p

I,J

is weighted by the relative frequencies and of its classes

 motivated by C-index

score LPC-Ux=I , J  I p I , J x

score LPC-Wx= 1

N 2I , J  I # I⋅# J⋅p I , J x 

# I N

# J N

score LPC-Ax= 1

NI , J I # I # J ⋅p I , Jx

(19)

Experimental setup

 Datasets

 21 discretized regression datasets

 5 classes each, using equal-frequency

 4 real ordered classification sets

 Evaluation Procedure

 average of 5 iterations of 10-fold X-val on each dataset

 Linear Base Classifiers

 all binary models are trained with logistic regression

 Rank-SVM is used with linear kernels (default configuration)

(20)

Results for Multipartite Ranking

F&H is significantly better than all LPC and Rank-SVM

(Nemenyi test, critical rank difference = 0.88)

F&H is significantly better than all LPC and Rank-SVM

(Nemenyi test, critical rank difference = 0.88)

(21)

Results for Classification

LPC is somewhat better, certainly no worse (W/T/L = 15/1/9)

LPC is somewhat better, certainly no worse (W/T/L = 15/1/9)

(22)

Discussion of Results

 Pairwise (LPC) performs worse than Ordered (F&H)

 this contradicts our expectations (and our results) from classification

 Possible Reason: Non-Competence problem

 Classifiers have to predict, even if the example is not from the class

 in those cases, the probabilities have been estimated in regions in which we have no training examples

 most probabilities in the score are such “incompetent” probabilites

 Example:

assume 5 classes, example x is from class 1

score

LPC −U

= p

1,2

p

1,3

p

1,4

p

2,3

p

2,4

p

3,4

p

1,5

p

2,5

p

3,5

p

4,5

(23)

Conclusions

 Binary Decompositions vs. Single Optimization

 Binary decompositions are much more efficient

essentially, for constraints between two groups of p and n examples,

single optimization generates p∙n constraints / examples

binary decomposition trains a classifier with p + n examples

 Predictive performance is no worse

 F&H outperformed Rank-SVM

 Ordered Decompositions outperform Pairwise Decompositions

 this is surprising because

 ordered decomposition was no better on ordered classification tasks

 evaluation metrics are pairwise

 but can be explained

References

Related documents

MV is realized stock market variance; IV is the CAPM-based average idiosyncratic variance constructed using the 100 largest stocks; V_IVF is realized variance of the hedging

BioMed CentralVirology Journal ss Open AcceShort report Differential expression of papillomavirus L1 proteins encoded by authentic and codon modified L1 genes in methylcellulose treated

The procedures for a peer to join the network and start downloading are specified in Fig.3 (a). First, a client logins to a transaction server to purchase the file. After

The proposed system is focused on multimedia file sharing where the protocol will be designed in such a way that whenever any unauthorized peer node will attempt to download

930 Industrial Strategy: government and industry in partnership , HM Government, July 2013, p21 931 Industrial Strategy: government and industry in partnership , HM Government,

• For ease of installation make the exhaust, combustion air intake and fuel connections at base of heater before mounting the heater into the vehicle.. • Position heater in vehicle

• multi-disciplinary – it requires that respect be reciprocal, at a personal level (between service users, their family members, friends, communities and providers), between