• No results found

1.5 Benchmark Tasks with Unbalanced Data

2.1.1 Classification

Classification is a supervised machine learning task where the system learns from a set of labeled input examples or instances. Given a set of attributes or features and their corresponding class labels, classification involves learning a model to correctly predict the class membership of each attribute [129][32].

Common to supervised learning problems are the concepts oftrainingandtest

sets. A training set is a collection of input patterns from which classification rules are induced. A test set is a similar collection of input patterns, except that these are not used during the learning process and remain unseen while learning the rules. The purpose of the test set is to evaluate the performance of the learnt rules on unseen instances of the problem. This is important as it verifies that the learnt rules are notover-fitted to the training set. In supervised learning, the procedure for learning is two-fold: discover/learn the rules or a function for the input-output mappings using the training set, and apply these rules or functions to the test set to determine how well the learnt concepts perform (orgeneralise) on unseen problem instances.

This thesis focuses on supervised learning. In this area, there are many different learning paradigms, some include the following (the four paradigms discussed below are used in the experimental results throughout the thesis).

Bayesian Classifiers

Bayesian classifiers use a probabilistic approach to classification based on Bayesian probability principles. Naive Bayes (NB) is a simple but popular Bayesian classifier which uses Bayes’ theorem to compute unknown probability estimates (i.e. the class of an unseen instance) from known ones (i.e. features of known instances) [129]. NB is remarkably effective in practice and can show competitive results compared to other more-complex learning paradigms [129][178]. However, NB makes strong (naive) assumptions about the conditional independence of the features where the presence (or absence) of a feature is as- sumed to be completely unrelated to the presence (or absence) of another feature. Bayesian belief networks [94] address this issue of conditional independence by representing dependencies between features as a directed graph.

2.1. MACHINE LEARNING 13

Statistical Paradigms

Support Vector Machines (SVMs) [167] is a statistical supervised learning algo- rithm. SVMs construct a number ofhyperplanesin the (high-dimensional) feature- space that aim to separate the input instances from the two classes, and then try to maximise the distance between the decision hyperplanes and the input instances from both classes (this distance is called the margin). The original SVM algorithm was a linear classifier where the input instances are assigned a class label depending on which side of a decision hyperplane they lie on [167]. However, the current version useskernel functionsto construct non-linear decision surfaces [44].

Genetic Paradigms

Genetic paradigms comprise of a wide range of nature-inspired computational methodologies that incorporate the modern principles of Darwinian evolution and natural selection into machine learning. Popular genetic paradigms include genetic algorithms [86] and genetic programming [104] which is also the focus of this thesis. These paradigms and other evolutionary computational methodolo- gies are discussed in more detail in the next section.

Ensemble Paradigms

Ensemble methods combine together multiple learnt models to obtain better predictive performance than could be obtained from any of the single constituent models [31][129]. In an ensemble of classifiers, amajority voteis typically used to combine the outputs of the individual members: for a given input, each member votes on the output (e.g. predicted class label), and the class label with most votes is chosen as the ensemble output. Ensemble methods can be used with base learners from different learning paradigms, provided that the base classifiers are accurate and diverse with respect to their outputs [54][31]. Diverse ensemble members should not make the same errors on the same inputs, otherwise the ensemble will risk misclassifying the same inputs together each time.

Other Paradigms

In addition to the above-mentioned learning paradigms, there are also many other learning paradigms. Three important categories include the following (the

three paradigms discussed below are not used in this thesis but are included to give the reader a better idea of this field).

Connectionist Paradigms. These include artificial neural networks (ANNs) [23] which are computational models inspired by biological neural networks. An ANN consists of an interconnected group of artificial neurons (called nodes), where information (usually numeric) travels through nodes in different layers of the network. In classification, ANNs can model complex relationships between inputs (e.g. features) and outputs (class membership) to find patterns in data. However, ANNs are typically “black-box” learners as end-users cannot easily interpret the learned concept to understand how an ANN has learned to solve a problem.

Case-Based Reasoning. These include the nearest neighbour algorithm [45] which classifies an unseen instance as the same class of the closest training instance in feature-space. These learning paradigms arelazy in that they do not attempt to learn or generalise a classification model using the training data.

Induction Based Reasoning. These include decision tree algorithms which seek to split features that best separate the input instances from the training set [129]. Decision trees classify instances by traversing a tree in top-down manner, starting at the root node and ending at a leaf node which represents the class label. Decision trees are easy to interpret as they represent if-then classification rules; popular algorithms to build decision tree includeID3[147] andC4.5[148].