2.4 Ensemble member generation
2.4.3 Multi-objective approach
The members of an ensemble can be generated either in parallel without know- ing other members’ performance, or sequentially, i.e., one by one. For example,
Bagging generates members in parallel, as it randomly samples training data into subsets for training base learners, which does not measure the learner’s perfor- mance before generating another subset. By contrast, adaptive Boosting (Ad- aBoost) [83] is designed to enhance the possibility of selecting patterns that the previous base learner fails, thereby improving the performance of the ensemble. Thus, this method falls into the sequential ensemble generation category.
Based on the objectives used, a few categories of ensemble member generation methods in multi-objective perspective are as follows:
Different error measures
Inspired by memetic Pareto artificial neural network (MPANN) [84] and NCL [35] algorithms, Abbass [85] proposed two formulations of the ensemble. In the first variant, the training data is divided into two equal subsets using stratified sam- pling, and then the errors on the two training sets are used as the two objectives. The other method uses two objectives, which are the accuracy of the entire train- ing set and on the same data but with Gaussian noise being injected into the desired output.
Bhowanet al.[86] proposed to use multi-objective genetic programming (MOGP) to evolve the ensemble members, in order to solve unbalanced classification prob- lems. In MOGP, selection methods suggested in NSGA-II and SPEA2 are com- pared and the results show that the selection scheme in SPEA2 outperforms the non-dominated sorting. The reason might be that non-dominated sorting does not have a strong bias towards any of the objectives while selection in SPEA2 slightly biases to the solutions in the middle region of the Pareto front, which are more likely to have a good trade-off between accuracy and diversity.
The idea of using different error measures as both objectives has also been adopted in class imbalance online learning. Motivated by the finding that undersampling
online bagging (UOB) has better classification performance while oversampling online bagging (OOB) is more robust in imbalance learning, Wang et al. [87] used recalls by OOB and UOB methods as both objectives in order to get a balance of individuals’ accuracy and ensemble robustness. The proposed method creates two set of training patterns, by either oversampling the minority class or undersampling the majority class patterns. The non-dominated solutions are then used to construct an ensemble.
Accuracy vs. diversity
Chandra and Yao proposed the diverse and accurate ensemble (DIVACE) [13,88], which also uses MPANN as the EA, while the objectives are training accuracy and a negative correlation diversity measure proposed in NCL as the second objec- tive. The authors have explored other training algorithms and diversity measures and proposed a new measure called PFC in [15] to replace the negative correla- tion measure in DIVACE. While negative correlation measures the probability of failures, PFC directly responds to the failures made by each member which makes it less continuous. As a result, the objective of training accuracy would have greater influence, which slightly decreased the test accuracy. However, as each member is more accurate, the WTA decision making performed better than before. Compared to other methods, both variants of the DIVACE algorithm achieved mostly better generalisation performance.
Accuracy vs. complexity
Jinet al.[89] proposed complexity as the second objective in addition to accuracy.
The complexity of the NNs is measured by either the sum of the squared weights or the sum of the absolute value of the weights. Other suggested complexity measures include the number of connections in the NNss. The bi-objective op-
timisation that maximises the accuracy and minimises the complexity results in a number of NNs having various structures. Pareto optimal NNss near the knee point [90] are then used to construct NN ensembles.
Oliveiraet al.[91] tried to reduce the number of selected features as the complex- ity objective, besides accuracy as the other objective in their algorithm. The ensemble members are generated and optimised by sampling features on the paradigm of ‘overproduce and choose”, and then selectively used to produce the ensemble by optimising diversity.
Tan et al. [92] used the Pareto archived evolution strategy (PAES) to optimise
NNs for dealing with game problems. The game score (accuracy) and the number of hidden neurones (complexity) were used as the two objectives in the multi- objective generation of NNss for constructing ensembles. The WTA was also employed to make the final decision. Their results show significant improvement in performance over single NNs.
Pangilinan and Janssens [93] used the size of the decision tree as the complexity factor, and either NSGA-II or SPEA2 is used to optimise the objectives. By doing so, authors successfully found accurate classifiers that were not too complex and better generalisation was achieved.
Most recently, Tan et al. [94] used three objectives in multi-objective ensemble generation, of which two are accuracy measures, i.e., specificity and sensitivity for imbalanced classification problems, and the third is complexity measure, i.e., the number of input features. Three members having the minimal generational distance (GD) value are selected from the Pareto optimal set for constructing the ensemble.
Other objectives and methods
While most existing multi-objective ensemble approaches involve only two ob- jectives, some researchers have proposed to use more objectives. For instance, Trawi´nskiet al. [95] attempted to consider accuracy, complexity and diversity at the same time. The accuracy is a triplet of training error, error margin, and clas- sification margin; the complexity is the number of classifiers in ensemble; and the diversity is difficulty measure. Their results show that the proposed approach has successfully achieved a high accuracy rate as well as a good accuracy-complexity trade-off on highly complex data.
While above methods measure performance on the training set (labelled data) in supervised learning, they can also be applied on purely unlabelled data in unsu- pervised learning, or even bring testing set (unlabelled data) into consideration, in SSL. It is possible to find a suitable group of unlabelled data using clustering algorithms, and if partially labelled data is available, each group of unlabelled data can be assigned to a class that matches the labelled data’s characteris- tics [96]. However, there could be an arbitrary amount of groups on unlabelled data that meets the optimisation criteria. Therefore, specific measures such as cluster deviation and cluster connectedness [97], etc, have to be used as ensemble objectives.