Multi-Label Problem Transformation Methods

ods

In this section we review the main methods that transform a multi-label classification problem to one or more single-label classification problems [24].

Firstly, some problem transformation methods transform a multi-label classification problem to just one single-label classification problem, such as the dubbed PT1 method or Label Elimination method, which randomly selects one of the multiple labels of each multi-label instance and discards the other labels of that instance. PT1 is illustrated in Table 3.2, which shows a possible result of applying PT1 to the data in Table 3.1. It is also possible to select the labels to be discarded

Table 3.2: Transformed data using PT1 InstanceID Class 1 Class 2 Class 3

1 X

2 X

3 X

4 X

5 X

Table 3.3: Transformed data using PT2 InstanceID Class 1 Class 2 Class 3

1 X

2 X

4 X

from each multi-label instance using a non-random criterion such as selecting the label with maximum or minimum frequency in the dataset [10, 92].

Other method, dubbed PT2 (also called Instance Elimination method), re- moves all instances which have multiple labels from the dataset and uses the re- maining instances for data mining. Table 3.3 shows the result of applying PT2 to the data in Table 3.1. A clear weakness of both PT1 and PT2 is that they lead to an information loss, because these techniques tend to eliminate lots of data from the original data set. [94, 95] applied these methods in their research.

The PT3 method or Label Power set method also proposed in [112], which creates a single label for each element in the power set of the set of labels (i.e., for each possible combination of labels) that is observed in the dataset. This method does not lead to information loss like PT1 and PT2, but PT3 can lead to a large number of class labels. This is a serious problem particularly when the number of instances is small, since in this case there would be too few instances for some class labels, making it very difficult to reliably predict those labels. This technique is used in [95, 111, 114]. A variation of PT3 is the pruned transformation method, which was proposed by [94]. This method prunes away label sets that occur a

Table 3.4: Transformed data using PT3

InstanceID Class 1 Class 2 Class 3 Class 2 & Class 3

Class 1& Class2 & Class 3 1 X 2 X 3 X 4 X 5 X

Table 3.5: Transformed data using PT4 InstanceID Class1 ¬Class1

1 X

2 X

3 X

4 X

5 X

InstanceID Class2 ¬Class2

1 X

2 X

3 X

4 X

5 X

InstanceID Class3 ¬Class3

1 X

2 X

3 X

4 X

5 X

number of times smaller than a small user predefined threshold. The result of applying the PT3 method to the data in Table 3.1 is shown in Table 3.4.

PT4, also call Binary Relevance, is a method which transforms the original data set into |L| new data sets (where L is set of labels). Each data set contains all data instances of the original dataset. In the i-th dataset, i = 1,. . . , |L|, each instance is assigned a single label, which isi if the instance contained thei-th label in the original dataset, and¬i otherwise. This technique is used in [23] and [111]. Table 3.5 shows the result of applying PT4 to the data in Table 3.1. Note that

Table 3.6: Transformed data using PT5 InstanceID Class 1 Class1 2 Class2 3 Class1 3 Class2 3 Class3 4 Class2 5 Class2 5 Class3

PT4 creates three single-label datasets, so three classifiers need to be trained.

The last problem transformation method is PT5. This method decomposes each instance into n rows (where n is the number of true labels for the current instance), where those rows have the same attribute values but different classes. However, this method leads to a large amount of data replication in the dataset. The result of applying PT5 to the data in Table 3.1 is shown Table 3.6. Note that PT5 creates a dataset where some instances are duplicated with respect to the features, differing only in their class labels. This would be a problem for most conventional classification algorithms, so this method is rarely used in practice.

The second group of multi-label classification methods consists of algorithm adaptation methods. These methods modify a conventional single-label classification algorithm to solve a multi-label classification problem. Some of these methods are briefly discussed in Subsection 3.3. In any case, note that these methods are not the focus of this research (which focuses on data preprocessing methods).

A similar taxonomy, using somewhat different terminology was introduced in [21], who classified multi-label classification methods into two main types: (1) algorithm independent and (2) algorithm dependent. Algorithm independent methods correspond to the problem transformation method proposed by [112]. Algo-

Table 3.7: A comparison of problem transformation methods proposed or discussed in different works.

Methods Advantages Disadvantages Number of

classifiers

Number of instances

1) PT1: Label elimination Simple and easy

to implement Information loss one

Same as for original data set 2) PT2: Instance elimination Simple and easy

to implement Information loss one Reduced

3) PT3: Label creation or Label power set (LP)

Considers some relationship between labels

A large increase in the number of class labels, increasing the risk of model overfitting

one Same as for

original data set

4) PT4: Label based transformation or Binary Relevance (BR)

Simple and easy to implement

Considers each label separately, ignoring label correlations; slow (leads to many runs of a classification algorithm)

Increased:

Equal to the number of labels

Increased in total (over all new data sets)

5) PT5 Simple and easy

to implement

create duplicated instances regarding to feature values and instances with inconsistent class

one Increased

rithm independent methods can be used with any type of classification algorithm, whereas algorithm dependent methods use a specific type of algorithm for dealing with multi-label classification problems.

Table 3.7 shows a comparative study of problem transformation methods proposed or discussed by different authors. For each method, the first column men- tions its name, the second and third columns mention the advantage(s) and dis- advantage(s) of that method, the fourth column indicates the effect of using the method on the number of single-label classifiers that need to be trained after the data has been transformed, and the fifth column indicates the effect of using the method on the number of instances in the data being mined.

In document New Multi-Label Correlation-Based Feature Selection Methods for Multi-Label Classification and Application in Bioinformatics (Page 78-82)