Summary - Particle Swarm Optimisation for Feature Selection in Classification

Ke et al. [21] developed a Pareto-based multi-objective ACO for feature selection based on rough set theory. It adopted elite strategy to speed up the convergence performance, used the non-dominated solutions to add pheromone so as to reinforce the exploitation, and applied crowding comparison operator to maintain the diversity of the solutions. In addi- tion, it intended to avoid premature convergence by imposing limits on pheromone values. Compared with a modified non-dominated sorting GA, the proposed method obtained competitive solutions for rough feature selection. However, only three datasets were used in the experiments, which could not confirm the generalisation of the proposed algorithm.

Overall, ACO has been applied to both single objective and multi- objective, filter and wrapper feature selection. However, the datasets used in the papers (we can find) have a relatively small number of features. The use of ACO for feature selection with a relatively large number of features has not been done. However, feature selection is actually more important and necessary in datasets with a large number of features.

2.8 Summary

This chapter reviewed the main concepts of machine learning, classification, feature selection, evolutionary computation techniques, particularly PSO, multi-objective optimisation, entropy and mutual information. This chapter also reviewed the related work of using conventional methods and evolutionary computation algorithms for feature selection.

The limitations of the existing work that form the motivations of this research were also discussed. The overall motivation is that EC techniques have been successfully used to address feature selection problems. Com- pared with other EC techniques, PSO has the advantages of being computationally less expensive, easier to implement, having fewer parameters and converging more quickly. However, the investigation of using PSO

for feature selection has much less work and much shorter history than other EC algorithms. It is needed to further investigate and improve the performance of PSO for feature selection.

Specifically, the limitations of existing work and the motivations of this research can be summarised as follows.

• The performance of PSO can be improved by developing good initialisation strategies andgbestandpbestupdating mechanisms. Fea- ture selection problems are difficult tasks. However, there has been no work on proposing new initialisation strategies for feature selection. Although there are works on updating gbest, they are not applied topbest. Therefore, it is needed to investigate new initialisation and updating mechanisms in PSO for feature selection with the ex- pectations of reducing the number of features, increasing the classification performance and reducing the computational time.

• PSO has been successfully used to solve many multi-objective problems and shown promising performance. Feature selection is a multi- objective task. However, there is no existing work investigating the use of PSO for multi-objective (wrapper or filter) feature selection.

• Most of the existing PSO based feature selection algorithms are wrappers and there are very few works using PSO for filter feature selection. Information theory, including entropy and mutual information, can be used to evaluate the relationship between variables. It has been used to develop feature selection algorithms, but the use of information theory and PSO forfilter feature selection has never been investigated.

• Wrapper feature selection algorithms are argued to be able to achieve better classification performance than filters, but filter algorithms are computationally less expensive and more general than wrappers. However, no thorough work has been conducted to investigate the

2.8. SUMMARY 69 differences between the two approaches in terms of the classification performance and the computational cost, and no work has been conducted to investigate the generality of wrappers.

Following Chapters

This thesis aims to address the above-mentioned issues. The following chapters will investigate those issues by developing new algorithms. Chap- ter 3 will develop new initialisation and gbest and pbestupdating mechanisms in PSO to propose a new wrapper based single objective feature selection algorithm. Chapter 4 will develop a PSO based multi-objective, wrapper feature selection approach. Chapter 5 will introduce entropy and mutual information to PSO for feature selection to develop a new filter feature selection approach. Chapter 6 will develop a PSO based multi- objective, filter feature selection approach. Chapter 7 will investigate the difference between filters and wrappers in terms of the classification performance and the computational time, and also examines the generality of wrappers.

Chapter 3 Wrapper Based Single Objective

Feature Selection

3.1 Introduction

Feature selection aims to find the minimal feature subset that can achieve similar or even better classification performance than using all features. However, most of the existing feature selection approaches, including PSO based methods, are wrappers and aim to maximise the classification performance only. As a result, the selected features may still have redundancy and the same classification performance can be achieved by a smaller feature subset. Therefore, it is necessary to develop a PSO based feature selection method to optimise both the classification performance and the number of features.

3.1.1 Chapter Goals

The goal of this chapter is to develop a PSO based wrapper feature selection algorithm to maximise the classification performance and minimise the number of features. To achieve this goal, a new fitness function is proposed to combine the two objectives into a single function. Further, PSO

is investigated to optimise these two objectives by developing new initialisation and new pbest and gbest updating mechanisms. Specifically, this chapter will investigate:

• Whether the PSO based algorithm with the new fitness function can select a feature subset with a smaller number of features and better classification performance than using all features, and can achieve better performance than PSO with the fitness function considering only the classification performance;

• Whether the new initialisation strategies can improve the performance of PSO for feature selection over the traditional initialisation strategy;

• Whether the new updating mechanisms can improve the performance of PSO for feature selection over the traditionalpbest and gbest updating mechanism;

• Whether combining the new initialisation and updating mechanisms can further increase the performance of PSO for feature selection and can outperform all methods mentioned above.

3.1.2 Chapter Organisation

The remainder of this chapter is organised as follows. The second section describes the new PSO based algorithms. The third section presents the design of the experiments. The results and discussions are presented in the fourth section. The fifth section provides a summary of this chapter.

In document Particle Swarm Optimisation for Feature Selection in Classification (Page 87-92)