How to Choose a Single Solution - Multi-Objective Feature Selection Algorithms

4.2 Multi-Objective Feature Selection Algorithms

4.4.6 How to Choose a Single Solution

In multi-objective problems, a set of Pareto front (non-dominated) solutions are obtained, which are trade-offs between different objectives. How- ever, selecting a single solution from these solutions is an important is- sue. In feature selection problems, the two main objectives are minimis- ing the number of features and maximising the classification performance, and the decision is a trade-off between these two objectives. If the Pareto front was “smooth” in that adding each additional feature would reduce the classification error rate by a small, but significant margin, users could weight the trade-off criteria to determine their preferred solutions. How- ever, the results produced show that this is usually not the case. Adding features beyond a certain number does not increase the classification performance. For example, the Musk1 dataset in Figure 4.2, the subset with the lowest classification error rate in CMDPSOFS-B has 40 features. Adding more features does not further increase the classification performance be- cause it does not increase the relevance, but increases the redundancy and the dimensionality. Meanwhile, removing features may not lead to a de-

4.5. CHAPTER SUMMARY 131 crease in classification error rate as relevant features may be removed. On the Musk1 dataset, the solution that stands in the “elbow” of CMDPSOFS- B would be a good choice. Therefore, visually seeing these possible solutions in the Pareto front assists users in determining their preferred com- promises. This is actually the main reason why solving feature selection problems as multi-objective tasks is important.

4.5 Chapter Summary

This chapter conducted the first study on multi-objective PSO for feature selection. Specifically, we investigated two PSO based multi-objective feature selection algorithms, NSPSOFS and CMDPSOFS. Experimental results show that both NSPSOFS and CMDPSOFS can achieve more and better feature subsets than PSOIniPG, which is the best single objective algorithm developed in the previous chapter. NSPSOFS achieved similar (or slightly worse in some cases) results to other three well-known evolu- tionary multi-objective algorithms based approaches, i.e. NSGAII, SPEA2 and PAES in most cases. CMDPSOFS outperformed all other methods mentioned above in terms of both the classification performance and the number of features. In particularly, for the datasets with a large number of features, CMDPSOFS achieved better classification performance using fewer features and shorter computational time than the other four multi- objective algorithms.

This chapter finds that as multi-objective algorithms, NSPSOFS and CMDPSOFS can search the solution space more effectively to obtain a set of non-dominated solutions instead of a single best solution. Examin- ing the Pareto front achieved by the multi-objective algorithms can assist users in choosing their preferred solutions to meet their own requirements. Meanwhile, this chapter also discovers that the potential limitation of los- ing the diversity of the swarm quickly in NSPSOFS limits its performance for feature selection. More importantly, this chapter highlights the bene-

fits of the strategies of maintaining the diversity of the swarm in CMDP- SOFS. A crowding factor together with a binary tournament selection can effectively filter out some crowded non-dominated solutions in the leader set. Different mutation operators in different groups of particles can effectively keep the diversity of the swarm and balance its global and local search abilities. These strategies accounts for the superior performance of CMDPSOFS over NSPSOFS, NSGAII, SPEA2 and PAES, especially on the datasets with a large number of features.

This chapter and the previous chapter (Chapter 3) have shown that PSO can be successfully used for feature selection in classification. How- ever, these two chapters mainly focus on wrapper approaches and no filter approaches are involved. Therefore, in order to further investigate and im- prove the performance of PSO for feature selection, the next two chapters will focus on using PSO to develop new filter feature selection approaches in classification.

Chapter 5 Filter Based Single Objective

Feature Selection

5.1 Introduction

Most of the existing PSO based feature selection algorithms are wrapper approaches, which are argued to be computationally more expensive and less general than filter approaches. However, there are very few studies on using PSO for filter feature selection. In filters, the evaluation measure, which determines the goodness of the selected features, is a key factor in- fluencing the classification performance. Information theory is one of the most important theories that are capable of measuring the relevance between features and class labels [1]. However, no work has been conducted to investigate the use of information theory in PSO based feature selection.

5.1.1 Chapter Goals

The overall goal of this chapter is to investigate the use of information theory in PSO for feature selection. To achieve this goal, we develop two new filter feature selection algorithms based on PSO and two information measures with the expectation of selecting a small number of features and

maintaining or even improving the classification performance over using all features. Specifically, we will investigate:

• whether PSO using a mutual information based fitness function can reduce the number of features and achieve similar or even better classification performance than using all features, and can outper- form traditional feature selection algorithms;

• whether PSO using an entropy based fitness function can select a smaller number of features and obtain similar or even better classification performance than using all features, and achieve better performance than the above mutual information based algorithm, and

• whether the feature subsets selected by the two new algorithms are general in that they enable high classification performance in different classification algorithms.

5.1.2 Chapter Organisation

The remainder of this chapter is organised as follows. The second section describes the two new filter feature selection algorithms. The third section describes the design of the experiments. The results and discussions are presented in the fourth section. The fifth section provides a summary of this chapter.

In document Particle Swarm Optimisation for Feature Selection in Classification (Page 150-154)