4.6 Designing MCMLPS: Methodology
4.6.2 Conditional mutual information based LRs
4.6.2.7 Changing the ratio of the features used in the LRs
In the previous experiments the ratio of features used in the LRs of the MI based MCMLPS was set to 30%. Using a higher or lower feature ratio have been tested on the data sets used in these experiments. It has been found that lowering this ratio from 30% to 10% decreases the accuracy of the LRs prediction as well as the overall accuracy of the system. On the other hand, increasing it to 80% result in a slight improving in the prediction accuracy for some of the data sets used in this experiment and it remained
unchanged for the rest. The only exception is for the ionosphere data set where the ac- curacy was increased to 92.30%. However, for some data sets the accuracy starts to drop after exceeding a certain threshold.
4.7
Summary
This chapter introduces a local learning based algorithm for multi-component, multi- layer architecture. This system divides the data into multiple LRs using the similarity of the features. Inside each LR a pre-defined number of base models are trained on subsets of data and/or subsets of features. The way in which the features are selected and assigned to the individual LRs depends on either the similarities of their pairwise squared correlation or their conditional mutual information. The squared correlation method can be applied in supervised as well as unsupervised learning as it does not consider the output class when splitting the data. On the other hand, the conditional mutual information method can be applied only in supervised learning as it uses the output class while splitting the data.
Investigating the internal performance of the proposed architecture (using either of the similarity metrics) showed that the overall testing accuracies of the architecture exceeded the average internal accuracies of its LRs models. This is due to the LRs being trained on either disjoint sets of data or subsets of features. However, since the prediction of the LRs is weighted by the similarity of the features to the seeds of the LRs, a higher degree of importance is given to the prediction of LRs that are most similar to the new data instance.
In the proposed architecture, the amount of variation in the internal accuracy depends mainly on the size and dimensionality of the data. Given an adequate amount of data used to train and validate the LRs models, the variation becomes small. Otherwise, the variation will be high. The results showed that both the number of LRs and number of models developed within the LRs need to be optimised with respect to the data set size and dimensionality.
The high level of complexity in the proposed MCMLPS is due to the use of multiple base models and to the procedure followed to generate the LRs. Nevertheless, it has a comparable performance to the benchmark algorithms. Despite that, the locality and the high level of diversity among the base predictors of the proposed architecture can be
beneficial in noisy environments. For example, when the noise is applied to only a part of the data, it will not have the same effect on all of the MCMLPS base predictors. The robustness of the proposed architecture to external noise will be investigated in the next Chapter.
Chapter 5
Multi-Component, Multi-Layer
Predictive System in Noisy
Environments
5.1
Introduction
This Chapter studies the relation between accuracy, diversity and robustness of the pro- posed MCMLPS in noisy environments. In ensemble learning, in order to improve the accuracy of the prediction, a number of factors have been studied in literature. These fac- tors include: classifier selection (Zhang and Zhang (2009), Ko et al. (2008) and Parvin et al. (2011)), feature selection (Zhang and Zhang (2009), Zhang and Yang (2008) and Freund and Schapire (1996)), diversity creation in ensembles (Kuncheva et al. (2002), Hatami and Ebrahimpour (2007) and Kuncheva and Whitaker (2003)), fusion methods (Zhang and Zhang (2009), Hatami and Ebrahimpour (2007) and Al-Ani and Deriche (2002)) and combining more than one ensemble (Kotsiantis (2011), Panov and Dzeroski (2007) and Kotsiantis and Pintelas (2004)).
Some of these factors have been addressed in Chapter 4, where the proposed MCMLPS considered feature selection through the use of correlation based and mutual information based local features selection. The diversity among the base predictors was encouraged by training the models on subsets of the data and/or the features. Also, in the proposed system multiple ensembles were combined to obtain the final prediction.
In this Chapter, in addition to the previously considered factors, the effect of model se-
lection and of using different combiners on the performance of the proposed system is studied through the introduction of six fusion methods. Chapter 5 examines the robust- ness of the proposed system in practice. Both correlation based and MI based MCMLPS are tested on data sets with different ratios of noise added to either the training or the test- ing data. The performances of both systems are compared to three well known ensemble methods (namely, Bagging, Boosting and rotation forest).
The organization of Chapter 5 is as follows: in Section 5.2, different types of noise are explained and their effect on the prediction of machine learning methods is examined. Section 5.3 discusses balancing the robustness and the flexibility of machine learning methods. Section 5.4 examines the performances of both correlation based MCMLPS and MI based MCMLPS in noisy environments and compares their results to benchmark algorithms. In Section 5.5, six fusion methods are employed to combine the prediction of the base predictors/ensembles. Furthermore, the effect of using these combiners on the overall performance of the system is examined. Finally, Section 5.6 provides a summary for the Chapter.