3.5 Super Feature Generation Using GP
3.5.1 Classification Background
Classification is a problem in which a test sample is assigned to a particular class, on the basis of training samples for which class membership is known. Classification plays a vital role in many fields such as PR, fault detection, image processing, biomedical data processing, data mining, etc. This subject has been of major interest in recent literature and different approaches have been presented for classification problems.
Classifier Types
Any algorithm which maps data into corresponding categories is called a clas-
sifier. There are two types of classifier algorithms, supervised and unsupervised.
In supervised algorithm a training dataset with known outputs is available and
algorithm’s task is to find patterns in this training data. Once the training is complete, algorithm is tested with test data to quantify its learning ability. In
unsupervised algorithms there is no known output and algorithm is asked to group
similar items based on some inherent similarities. It is also known as clustering, and in some cases (k-means clustering) the number of clusters (groups/classes) are known and algorithm is asked to make specific number of clusters. It is ap- parent that unsupervised algorithms have much less to start with and is more complex than its counterpart. That is the reason most of the past methods have considered supervised class of problems and the same has been used in this work. Two types of approaches are used while designing a classifier, feature space
division approach and function estimation approach. In feature (input features)
space division approach, each feature represents one dimension and this feature space is divided into different class labelled sections. The test sample is classified according to which labelled section it falls in. A linear classifier can create a hyper plane in feature space directed by linear classifier for classification. Although this approach works only if classes are linearly separable but it provides fast classifi- cation and is particularly useful in situations where speed is a major concern. A non linear hyper plane could be drawn for problems not linearly separable using SVMs. SVMs are relatively complex but have gained much importance in recent years. A distance based classification for feature space division is achieved using KNN. This classifier plots the training (reference) samples in feature space and labels their respective classes. The distance of test sample is calculated from all the reference samples. The class of k nearest neighbours is found and the class having maximum number of neighbours is considered to be the class of this test
3.5. SUPER FEATURE GENERATION USING GP 48 sample. KNN belongs to machine learning algorithms and is one of the simplest machine learning algorithms. Another simple approach is decision tree classifier (rule based classifier), which makes decision using IF-ELSE logic on feature val- ues. It works well for simple classification problems but making simple decision rules for complex problems is not always possible.
In function estimation approach the aim is to find a function of input variables which can separate classes. ANNs have been used in this field widely. ANNs con- sist of input layer of neurons connecting input features to hidden layer of neurons which is then connected to output neurons. ANN is trained to give specific out- puts for specific inputs by changing weights of different neurons. Although ANN is capable of classification where linear separation of classes is not possible, its computational requirement is always a major concern.
Feature Selection
Features are the most important part of any classification process. Ideally a feature set is required which have enough information to classify the data easily but it does not necessarily mean that a large feature set is required. A large feature set having irrelevant features would not only add to complexity but would result in decrease in performance. Large feature sets require preprocessing of the data removing redundant and less important features.
Feature selection can be divided into two main approachesfilter approach and
wrapper approach. In filter approach feature selection can be seen as preprocessing
step independent of classification algorithm used later while in wrapper approach it is taken as part of classification algorithm. Filter approach is much more sim- pler and computationally efficient. A subset of original features is chosen based on the predictive significance of features. An ideal subset should contain features, correlated to class membership but uncorrelated to each other. Normally statis- tical tests are carried to check the above mentioned criteria. Wrapper methods on the other hand are more complex, have more computational requirement but give better results compared to its counterpart.
Feature generation is also performed at times instead of feature selection. It transforms a set of features from original do dimensions to new set of dimensions
dnsuch that dn< do, which results in smaller, richer set of features. The popular
methods for feature generation are PCA, ICA, etc. Fortunately, GP has an in- herent feature selection process and implicit feature selection is performed during evolution.
3.5. SUPER FEATURE GENERATION USING GP 49