• No results found

Ke et al. [21] developed a Pareto-based multi-objective ACO for fea- ture selection based on rough set theory. It adopted elite strategy to speed up the convergence performance, used the non-dominated solutions to add pheromone so as to reinforce the exploitation, and applied crowding comparison operator to maintain the diversity of the solutions. In addi- tion, it intended to avoid premature convergence by imposing limits on pheromone values. Compared with a modified non-dominated sorting GA, the proposed method obtained competitive solutions for rough fea- ture selection. However, only three datasets were used in the experiments, which could not confirm the generalisation of the proposed algorithm.

Overall, ACO has been applied to both single objective and multi- objective, filter and wrapper feature selection. However, the datasets used in the papers (we can find) have a relatively small number of features. The use of ACO for feature selection with a relatively large number of features has not been done. However, feature selection is actually more important and necessary in datasets with a large number of features.

2.8

Summary

This chapter reviewed the main concepts of machine learning, classifica- tion, feature selection, evolutionary computation techniques, particularly PSO, multi-objective optimisation, entropy and mutual information. This chapter also reviewed the related work of using conventional methods and evolutionary computation algorithms for feature selection.

The limitations of the existing work that form the motivations of this research were also discussed. The overall motivation is that EC techniques have been successfully used to address feature selection problems. Com- pared with other EC techniques, PSO has the advantages of being compu- tationally less expensive, easier to implement, having fewer parameters and converging more quickly. However, the investigation of using PSO

for feature selection has much less work and much shorter history than other EC algorithms. It is needed to further investigate and improve the performance of PSO for feature selection.

Specifically, the limitations of existing work and the motivations of this research can be summarised as follows.

• The performance of PSO can be improved by developing good ini- tialisation strategies andgbestandpbestupdating mechanisms. Fea- ture selection problems are difficult tasks. However, there has been no work on proposing new initialisation strategies for feature selec- tion. Although there are works on updating gbest, they are not ap- plied topbest. Therefore, it is needed to investigate new initialisation and updating mechanisms in PSO for feature selection with the ex- pectations of reducing the number of features, increasing the classi- fication performance and reducing the computational time.

• PSO has been successfully used to solve many multi-objective prob- lems and shown promising performance. Feature selection is a multi- objective task. However, there is no existing work investigating the use of PSO for multi-objective (wrapper or filter) feature selection.

• Most of the existing PSO based feature selection algorithms are wrap- pers and there are very few works using PSO for filter feature selec- tion. Information theory, including entropy and mutual information, can be used to evaluate the relationship between variables. It has been used to develop feature selection algorithms, but the use of in- formation theory and PSO forfilter feature selection has never been investigated.

• Wrapper feature selection algorithms are argued to be able to achieve better classification performance than filters, but filter algorithms are computationally less expensive and more general than wrappers. However, no thorough work has been conducted to investigate the

2.8. SUMMARY 69 differences between the two approaches in terms of the classification performance and the computational cost, and no work has been con- ducted to investigate the generality of wrappers.

Following Chapters

This thesis aims to address the above-mentioned issues. The following chapters will investigate those issues by developing new algorithms. Chap- ter 3 will develop new initialisation and gbest and pbestupdating mech- anisms in PSO to propose a new wrapper based single objective feature selection algorithm. Chapter 4 will develop a PSO based multi-objective, wrapper feature selection approach. Chapter 5 will introduce entropy and mutual information to PSO for feature selection to develop a new filter feature selection approach. Chapter 6 will develop a PSO based multi- objective, filter feature selection approach. Chapter 7 will investigate the difference between filters and wrappers in terms of the classification per- formance and the computational time, and also examines the generality of wrappers.

Chapter 3

Wrapper Based Single Objective

Feature Selection

3.1

Introduction

Feature selection aims to find the minimal feature subset that can achieve similar or even better classification performance than using all features. However, most of the existing feature selection approaches, including PSO based methods, are wrappers and aim to maximise the classification per- formance only. As a result, the selected features may still have redundancy and the same classification performance can be achieved by a smaller fea- ture subset. Therefore, it is necessary to develop a PSO based feature selec- tion method to optimise both the classification performance and the num- ber of features.

3.1.1

Chapter Goals

The goal of this chapter is to develop a PSO based wrapper feature selec- tion algorithm to maximise the classification performance and minimise the number of features. To achieve this goal, a new fitness function is pro- posed to combine the two objectives into a single function. Further, PSO

is investigated to optimise these two objectives by developing new initial- isation and new pbest and gbest updating mechanisms. Specifically, this chapter will investigate:

• Whether the PSO based algorithm with the new fitness function can select a feature subset with a smaller number of features and better classification performance than using all features, and can achieve better performance than PSO with the fitness function considering only the classification performance;

• Whether the new initialisation strategies can improve the perfor- mance of PSO for feature selection over the traditional initialisation strategy;

• Whether the new updating mechanisms can improve the performance of PSO for feature selection over the traditionalpbest and gbest up- dating mechanism;

• Whether combining the new initialisation and updating mechanisms can further increase the performance of PSO for feature selection and can outperform all methods mentioned above.

3.1.2

Chapter Organisation

The remainder of this chapter is organised as follows. The second section describes the new PSO based algorithms. The third section presents the design of the experiments. The results and discussions are presented in the fourth section. The fifth section provides a summary of this chapter.