Future research lines - Aggregation and pre-aggregation functions in fuzzy rule-based classific

In this subsection we describe some open research lines based on the methodologies that have proposed in this thesis.

Generalizations of the Choquet integral using different fuzzy measures

The usage of the Choquet integral to deal with classification problems was firstly introduced in [BBF+13]. Furthermore, in this study, the authors have introduced a fuzzy measure that adapts itself for each class of the problem. In our first study [LSPD+16] we considered generalizations that were combined with the same measures used in [BBF+13]. The achieved results confirmed that the proposed fuzzy measure was superior to the remainder, for this reason in this thesis we basically consider the same fuzzy measure in all papers.

We have used a different fuzzy measure only in the paper related to the tunning of the α

parameter. In that study we have considered as a measure a method that is built based on an overlap index, which achieved satisfactory results. However, when the different generalizations of the Choquet integral proposed in this thesis were combined with this method, poorer results were obtained when compared against the generalizations of the Choquet integral using the power measure.

Both fuzzy measures are calculated in execution time, then, it is possible to explore even more the potency of the fired fuzzy rules if we had a fuzzy measure that is able to represent in a better way the relationship among all the rules of each class. Therefore, having a different

fuzzy measure per class as in this thesis. The open problem is how to learn these fuzzy measures in order to use them when classifying new examples. A possible solution could be the usage of the Sugeno fuzzy measure [Sug74], which is a fuzzy measure that only needs to define the values for the sets having an unique element and the remainder are automatically obtained. Therefore, we could use an evolutionary algorithm to learn these values in order to obtain the best possible fuzzy measures for the faced problem.

Analyzing the behavior of the generalizations of the Choquet integral

As can be noticed in the published papers, we have applied different generalizations of the Choquet integral in the FRM of the FARC-HD fuzzy classifier. This generalizations were used to tackle different classification problems by aggregating the information provided by all fired rules in the FRM.

From the obtained results, it is noticeable that some generalizations achieved a better performance than others in determined datasets. Thus, an important open research line is related to a pre-definition of the characteristics of the dataset that allows one to determine in ad- vance the most appropriate generalization to face it. We have made preliminary studies in this field in [LSD+17a, LSD+18b] by analyzing the behavior of a CC-integral in the FRM and by using data complexity measures, respectively. However, despite the interesting results in both cases, we must dig deeper in this subject.

Applying the generalizations with different classifiers

The generalizations of the Choquet integral developed in this thesis were always applied in the FRM of the FARC-HD [AFAH11] fuzzy classifier. However, observe that the first proposal of the usage of the standard Choquet integral in a FRM [BBF+13] was applied in the fuzzy classifier of Chi et al [CYP96b], and, also enhanced the performance of the algorithm.

Then, we could select the different approaches developed in this thesis and apply them in different fuzzy classifiers. In this way, we would research if these generalizations can also improve their performance, which could support the hypothesis that the Choquet integrals and their generalizations are a good option to improve the quality of all the FRBCSs.

The construction of a model based on ensemble methods

There are different techniques to improve the performance of the classification systems. For example, Evolutionary Fuzzy Systems (See subsection 1.2.3) and Ensembles [OM99]. Ensembles of classifiers usually enhance the performance of single classifiers by inducing several classifiers and combining them so that they outperform all the models conforming it [GFB+12].

Therefore, the basic idea of an ensemble method is to construct several classifiers from the original data and, then, aggregate their predictions when unknown instances are presented. It is observable that different generalizations of the Choquet integral achieved different perfor- mances in the same dataset. Thus, another open research line is related to the construction of an ensemble that considers different FRMs, that is, different generalizations of the Cho- quet integral. Then we could made the final decision based on the outputs of the different generalizations. The open problem is to know if this methodology offers diversity enough as it is a key factor when constructing ensembles of classifiers [Kun05]. If this combination is successful we could improve even more the behavior of FRBCSs to deal with classification problems.

The problem of imbalanced datasets

In a classification problem, whenever the classes are not represented equally, we have the so-called, imbalanced data problem [FGH11]. That is, when the number of instances which represent one class is smaller than the ones from the remainder class.

To ease the explanation of why the usage of generalizations of the Choquet integral could be an interesting research line to cope with this kind of problem, consider the following situation: let the elements to be aggregated for class 1, C1 = [0.75,0.8] and the elements of class 2, C2 = [0.3,0.4,0.5,0.6,0.7]. We highlight that class 1 has less examples (positive class) than

class 2 (negative class). As more rules of the negative class are built in runtime it is easier to have many rules fired for class 2 which can alter the classification of the positive class.

For instance, if we aggregated these values using AC the result would beC1= 0.75+0₂_.₅ .8 = 0.62,

and,C2 = 0.3+0.4+0₂._.5+0₅ .6+0.7 = 1. Although the values forC1are better, when performing the

aggregation, C2 becomes the chosen one, for the simple reason that it has more fired rules.

thesis. The result for C1 would be a value between [0.75,0.8], and for C2 a value between

[0.3,0.7]. It could result in a fairness representation for the problem of imbalanced data. For this reason, we consider it an interesting future research line as it may enhance the results of FRBCSs in imbalanced classification problems.

Extension to interval-valued fuzzy sets

Interval-Valued Fuzzy Sets (IVFSs) [Sam75] have proved to be a suitable tool to represent the system uncertainties and the ignorance in the definition of the linguistic fuzzy terms. An IVFS provides an interval, instead of a single number, as the membership degree of each element to this set. To ease this comprehension, we present in Figure 14, an example of a triangular shaped interval-valued fuzzy set4_{. Where} _A₍_u

i) = [A(ui), A(ui)], is the membership degree of the elementui ∈U to the set inferior (A(ui)) and superior (A(ui)).

Figure 14: An example of an interval-valued fuzzy set.

The Interval-Valued fuzzy reasoning method (IV-FRM) with TUning and Rule Selection (IVTURS) [SFBH13] is an state-of-the-art FRBCS that works with IVFSs. It achieves good results dealing with classification problems. Therefore, a natural step of this thesis, and an open research line, is the adaption of the presented generalizations to the context of IVFSs, to take into account aggregations on intervals. In this way, we will be able to analyze if the improvement produced when working with fuzzy sets is also materialized in the interval-valued context.

5 Introducción y Conclusión (Versión en español)

In document Aggregation and pre-aggregation functions in fuzzy rule-based classification systems (Page 68-72)