• No results found

Conditional tree-structured Bayesian network (CTBN)

2.3 Multi-label classification learning

2.3.5 Conditional tree-structured Bayesian network (CTBN)

Conditional tree-structured Bayesian network (CTBN) [Batal et al., 2013, Hong et al., 2014, Hong et al., 2015] is a directed graphical model. Briefly, CTBN learns a directed tree [Batal et al., 2013] to model the conditional likelihood over all the labels. More formally, we would like to obtain a decomposition of the likelihood P(y|x) = Q

jP(yj|x, π(yj)), where

π(yj) is the parents of label yj in the directed tree, such that for new future instance x(0),

P(y(0)|x(0))> P(y|x(0))for anyy6=y(0). CTBN can also be combined with ensembling methods

[Hong et al., 2014, Hong et al., 2015] which provides better performance on predictions.

By modeling the conditional dependencies via undirected or directed networks, multi-label classification models extended from probabilistic graphical models can efficiently capture the hidden dependencies among labels and train the models in polynomial time. Because of that, multi-label classification models extended from probabilistic graphical models are gaining more and more popularity in recent years.

2.3.6 Summary

In this section, we gave a brief introduction of four multi-label classification models: binary relevance (BR) [Boutell et al., 2004, Clare and King, 2001],

labeling powerset (LP) [Tsoumakas et al., 2010], conditional random field (CRF) [Lafferty et al., 2001, Bradley and Guestrin, 2010, Naeini et al., 2015], classifier chain (CC) [Read et al., 2009], and conditional tree-structured Bayesian network (CTBN) [Batal et al., 2013, Hong et al., 2014, Hong et al., 2015]. More details about theory and analysis of these multi-label classification models can be found in the referred papers of each model.

2.4 Learning to rank

Learning to rank [Liu, 2009, Mohri et al., 2012] is a sub-field of machine learning focused on the construction of ranking models for information retrieval or machine learning systems. The training data of ranking models typically consist of instances with some partial or total ordering information specified on the data instances or labels. Such ordering information is typically induced by giving a numerical or ordinal score for each data instance or label. The ranking model aims to rank the future data instances or labels in a similar way to the rankings in the training data. Regarding the type of ordering information provided by the teacher (labeler), the ranking models can be categorized into three sub-categories: instance ranking [Joachims, 2002, Radlinski and Joachims, 2005], label ranking [Vembu and G¨artner, 2011, Zhou et al., 2014], and multi-label ranking [Zhou et al., 2014, Jung and Tewari, 2018, Bucak et al., 2009]. In this section, we will give a brief introduction to these three sub-categories.

2.4.1 Instance ranking

In the standard setting of instance ranking models [Joachims, 2002, Radlinski and Joachims, 2005], training data consist of examples and some partial or total ordering information specified on the data examples, which are given by a teacher (labeler). The goal is to learn a model that can accurately order the unseen future examples. Formally, given training data D = {Xt, St}, where Xt = {x1,x2, . . . ,xN} is the set of instances and

St ⊂ SZ∈P(Xt)GA(Z) is the set of partial ordering information onXt, where P(·)denotes the powerset andGA(·)denotes the automorphism group. The objective is to learn a mapping function

f :X →Rsuch that for new future examplesxaandxb, the comparison betweenf(xa)andf(xb)

follows the partial ordering information regarding these two future examples.

2.4.2 Label ranking

In the standard setting of label ranking models [Vembu and G¨artner, 2011, Zhou et al., 2014], training data consist of examples and some partial or total ordering information specified on the labels of each example, which are given by a teacher (labeler). The goal is to learn a model that can accurately order all the labels of the unseen future examples. Formally, given training data D = {d1, d2, . . . , dN}, wheredi = hxi, Siiis a pair. xi is feature vector of the instance andSi ⊂

S

Z∈P(Y)GA(Z)is the set of partial ordering information on the label spaceY = {1,2, . . . , K}, whereP(·)denotes the powerset andGA(·)denotes the automorphism group. The objective is to

learn a mapping functionf :X×Y → Rsuch that for a new future examplex0, the comparison betweenf(x0, yj)andf(x0, yl)follows the partial ordering information regarding these labeljand

labellof this example.

2.4.3 Multi-label ranking

Again, multi-label ranking [Zhou et al., 2014, Jung and Tewari, 2018, Bucak et al., 2009] is a learning problem where the goal is to not only identify relevant labels from a set of predefined labels, but also to rank them according to their relevance to a data instance [Zhou et al., 2014]. Consequently, multi-label ranking can be considered as a generalization of multi-label classification and label ranking. In the standard setting of multi-label ranking models, training data consist of examples and the total ordering information specified on all the relevant labels of each example, which are given by a teacher (labeler). The goal is to learn a model that can accurately find the relevant labels and order all the relevant labels of the unseen future examples. Formally, given training data D = {d1, d2, . . . , dN}, where di = hxi, Sii is a pair. xi

is feature vector of the instance andSi ∈ SZ∈P(Y)GA(Z)is the total ordering information on the relevant labels Z over the label spaceY = {1,2, . . . , K}, where P(·)denotes the powerset and GA(·) denotes the automorphism group. Typically, the objective is to learn a mapping function f : X ×Y → such that, for a new future example x : (1) the comparison between f(x , y )

and f(x0, yl) follows the total ordering information regarding these label j and label l of this

example if labelj and labell are relevant labels; (2)f(x0, yj) > 0should hold regarding labelj

of this example if labelj is a relevant label; (3)f(x0, yl)<0should hold regarding labellof this

example if labellis an irrelevant label. Overall, compared with label ranking we reviewed in 2.4.2, multi-label ranking only enforces the orderings among the relevant labels.

In this thesis, we propose new multi-label classification models with permutation subsets. We start by first defining and formalizing the problem of learning from permutation subsets in multi-label settings. Then, we point out that such multi-label classification models with permutation subsets is identical to multi-label ranking models. After that, we present an two-state algorithm for learning the multi-label ranking model. The details of multi-label classification with permutation subsets as multi-label ranking will be discussed in Chapter 7.

2.4.4 Summary

In this section, we gave a brief introduction of the three sub-categories of ranking models: instance ranking [Joachims, 2002, Radlinski and Joachims, 2005], label ranking [Vembu and G¨artner, 2011, Zhou et al., 2014], and multi-label ranking [Zhou et al., 2014, Jung and Tewari, 2018, Bucak et al., 2009]. More details about theory and analysis of these multi-label ranking models can be found in the referred papers of each model.