Binary Decomposition Methods for Multipartite Ranking
Eyke Hüllermeier, Philipps-Universität Marburg
Johannes Fürnkranz, TU Darmstadt
Stijn Vanderlooy, Maastricht University
Outline
Multipartite Ranking
Evaluation measures
C-Index
m-AUC / Jonckheere-Terpstra statistic
Methods for learning multipartite rankings
Transformation into a Single Binary Problem
Binary Decompositions
ordered
pairwise
Complexity of these approaches
Experimental Results
Conclusions
Binary Classification
0
0 1 1 0
1
1 0
Example: Reviewers divide Papers into Accept / Reject
Binary Classification with Scores
0
0 1 1 0
1
1 0
Scores (shown in different shades of colors)
indicate the degree of redness or greenness
Binary Classification with Scores
0 0 1 0 1 1 1
0
Binary Classification with Scores
0 0 1 0 1 1 1
0
= Bipartite Ranking
+ Partition
Ordered Classification
0
0 1 1 1 1
2 2
2 0
Example: Reviewers divide papers into Accept / Borderline Borderline / Reject
Multipartite Ranking
Example: Reviewers sort papers by quality
Multipartite Ranking
Task is essentially the same as in bipartite ranking:
Rank a set of objects in
agreement with their class labels
Main Difference:
Training information is
not one of two (ordered) classes (binary classification)
but one of multiple ordered classes (ordinal classification)
→ we need different evaluation metrics
Multipartite Ranking is also known as
layered ranking
(Waegemans et al. 2008)
k-partite ranking
(Rajaram, Agarwal 2005)
Evaluation of Bipartite Rankings
Area under the ROC curve
the probability that a randomly chosen positive example is ranked before a randomly chosen negative example.
Computation:
for each pair (p,n), where class(p) > class(n)
correct++ if score(p) > score(n)
AUC P , N = correct
# P⋅# N
Available information:
binary classification Available information:
binary classification
Prediction:
ranking scores Prediction:
ranking scores
Evaluation of Multipartite Rankings
C-Index
the probability that a randomly chosen positive example is ranked before a randomly chosen negative example.
Computation:
for each pair (p,n), where class(p) > class(n)
correct++ if score(p) > score(n)
Obviously, AUC is a special case of C-Index with C = 2.
C-Index= correct
∑ I , J I # I⋅# J
example of class J
example of class I < J
Available information:
binary classification Available information:
binary classification
Prediction:
ranking scores Prediction:
ranking scores
ordinal
Evaluation of Multipartite Rankings
C-Index
the C-index can be rewritten as a weighted sum of pairwise AUCs:
Jonckheere-Terpstra statistic
is an unweighted sum of pairwise AUCs:
equivalent to well-known multi-class extension of AUC
C-Index= 1
∑ I , J I # I⋅# J ∑ I , J I # I⋅# J⋅AUC I , J
m-AUC= 2
C⋅C −1 ∑ I , J I AUC I , J
Note:
C-Index and m-AUC can be optimized by optimization of
pairwise AUCs Note:
C-Index and m-AUC can be optimized by optimization of
pairwise AUCs
Conventional Approach to Multipartite Ranking
Turn the problem into a single, large binary classification problem
(Herbrich, Graepel, Obermayer, 2001)
Goal:
learn a (linear) scoring function, s.t. for all pairs (p,n) score(p) > score(n) iff class(p) > class(n) Approach:
for each constraint class(p) > class(n)
construct a new example x = p – n such that
score x=score p−n=score p−scoren0
Binary Decomposition Methods
Ordered (F&H)
learn one theory for each possible split point in the order of the
classes
{1} vs. {2,3,4}
{1,2} vs. {3,4}
{1,2,3} vs. {4}
C-1 theories
each using all examples
proposed for ordinal classification
Pairwise (LPC)
learn one theory for each pair of classes (like for unordered
classification)
{1} vs. {2}
{1} vs. {3}
{1} vs. {4}
C(C-1)/2 theories
each using only some examples
is no worse for ordered
{2} vs. {3}
{2} vs. {4}
{3} vs. {4}
Solve a classification / ranking problem by decomposing it into
a set of binary classification problems
Complexity
Assume we have 2-class learner with complexity
single binary problem
trains 1 model with N
2order constraints → (one constraint for each pair of examples)
ordered (F&H)
trains C-1 models with N training examples →
pairwise (LPC)
equal class distribution (worst case):
trains C(C-1)/2 models with training examples →
the same as ordered decomposition for linear classifiers
O N
O C⋅N
2⋅N
C O C 2− N
O N 2
Prediction with F&H
Prediction of binary models
we have C-1 models M I
for an example x, each model predicts
Computing a prediction
derive estimates for the probabilities P(class(x) = I)
straight-forward, but not our concern
Computing a score:
intuitive justification:
high classes have a high probability in all p
I low classes have a low probability in all p
I theoretical justification:
p I x=P class xI
medium classes have high probabilities in the low p
Iand low probabilities in high p
Iscore FH x= ∑ I p I x
p
1p
2p
3Prediction with LPC
Prediction of binary models
we have C(C-1)/2 models M I,J (I < J)
for an example x, each model predicts
Computing a prediction
weighted voting: predict
can be shown to minimize Spearman rank correlation
Computing a score:
intuitive justification:
sum up all predictions “in favor of a higher class”
examples with low classes will get low probabilities in the models for high classes
examples with high classes will get high probabilties in the models for high classes
motivation:
p
I , J x=Pclass x=I∣I ∨J
score LPC-U x= ∑ I , J I p I , J x
1 2 3 4
1 2 3 4
1