6.3 GP for Classification of Imbalanced Data
6.3.3 Fitness Functions
The technique discussed above finds the optimum threshold between classes and tells us how to assign class labels to different input samples. Using those class labels, the fitness function of the GP individual can be calculated which tells us how accurately it labels the samples. The true positives (T P s) and true negatives (T Ns) are calculated using PDF based threshold detection and are used for the evaluation of different fitness functions. The sensitivity and specificity defined in the previous chapter are also used in evaluation of fitness functions. Five fitness functions are used in this study and they are explained in the coming sections one by one.
Accuracy Fitness
Accuracy fitness (AF) takes the overall classification accuracy as fitness function of GP individual. The formula for AF is same as classification accuracy given in equation (5.5). It is a ratio of total number of samples correctly classified to the total number of samples available in the dataset. Since it treats all misclassified samples equally, irrespective of which class they belongs to, there is a possibility that the majority class may influence the training process and may result in biased solutions.
Figure 6.3(a) shows the contribution of sensitivity and specificity in AF, where sensitivity is the accuracy of the minority class and specificity is the accuracy of the majority class. The Figure has been plotted by using diabetes training dataset as example so the number of majority class samples was 450 and the number of minority class samples was 241. Figure shows that if the specificity is 1 and the sensitivity is 0, the fitness will be 0.65 while in the other case where specificity is 0 and sensitivity is 1, fitness will be only 0.35. It shows that the classifier could be biased towards specificity as it results in high overall fitness.
Equal Weighted Fitness
The equal weighted fitness (EWF) takes average of sensitivity and specificity to favour the individuals performing well on both majority and minority class. The equation for calculating EWF is given below,
6.3. GP FOR CLASSIFICATION OF IMBALANCED DATA 107 Figure 6.3(b) shows the graph of EWF. It shows that if either sensitivity or specificity is 0, the maximum weighted fitness an individual can achieve is 0.5. In order to have a high overall weighted fitness value, an individual needs to perform well on both majority and minority class samples.
Weighted Square Fitness
The weighted square fitness (WSF) is average of square of sensitivity and speci- ficity. It is similar to weighted fitness but varies in the rate at which fitness changes with change in sensitivity and specificity.
Weighted Square Fitness (WSF) = 0.5∗(Sensitivity2+Specif icity2) (6.3) The variation in WSF with respect to sensitivity and specificity is shown in Figure 6.3(c).
Geometric Mean Based Fitness
Geometric mean based fitness (GF) also gives equal weight to sensitivity and specificity.
Geometric Mean Based Fitness (GF) =pSensitivity∗Specif icity (6.4) Figure 6.3(d) shows a plot of GF against sensitivity and specificity. One major difference between GF and other fitnesses is that if either sensitivity or specificity is zero, the fitness will be zero. Fitness increases only if there is an increase in both sensitivity and specificity at the same time.
Variable Weighted Fitness
In variable weighted fitness (VWF) the total fitness is weighted average of sensi- tivity and specificity.
V W F =W ∗Sensitivity+ (1−W)∗Specif icity (6.5) whereW is the weight factor controlling the contribution of any class in the fitness function. The value of W varies between 0 and 1. When W = 0.5, the VWF reduces to EWF where the accuracy of both the classes is considered equally important. When W > 0.5, the minority class contributes more to the fitness and when W < 0.5, the main contribution towards fitness comes from majority class.
6.3. GP FOR CLASSIFICATION OF IMBALANCED DATA 108
(a) AF (b) WF
(c) WSF (d) GF
6.3. GP FOR CLASSIFICATION OF IMBALANCED DATA 109 Choosing the right value of W is a hard task and often varies from problem to problem depending on the class imbalance ratio. In this thesis the value ofW is varied from 0.1 to 0.9 with an increment of 0.1 and its effect on the classifi- cation accuracy is investigated. The investigation is based on receiver operating characteristics (ROC) explained below.
Receiver Operating Characteristics
Receiver operating characteristics (ROCs) are graphs commonly used in med- ical diagnosis. Recently ROC graphs have been widely adopted in machine learn- ing because of their ability to give more accurate measure of accuracy for imbal- anced datasets. An ROC graph is a two dimensional graph with TP rate along y-axis and FP rate along x-axis.
Figure 6.4(a) shows an ROC graph with outputs of four discrete classifiers. There are some important points and regions in this graph that need special attention. The point (0,0) never makes a false positive mistake but never finds a true positive as well. On the other hand the point (1,1) unconditionally labels all the points in the dataset as true positives. The point (0,1) represents perfect classification with 100% TP and 0% FP rate.
The dotted line (y=x) in the middle represents a random guess. For example the classifier C in the Figure is a random classifier which makes a right guess for TP 60% of time but also commits FP mistakes 60% of time. The performance of random classifier always moves on this line and if TP rate increases so does the FP rate and vice versa.
Any classifier in the lower left half is said to be conservative. It labels a sample as TP only with strong evidence making very few FP errors but also has low TP rate. A classifier in the upper right corner is said to be liberal because it labels samples as TP with weak evidence, labelling nearly all the samples as TPs resulting in high FP rate. In the Figure, the classifier B is more conservative than A. Since in many real life problems the TPs are more important than TNs, any classifier working the in upper left half of the graph is more interesting.
A classifier performing below the dotted line is worse than random guessing but can still be useful if negated. With negation its TPs become FNs and FPs become TNs and it can perform in the upper right plane. Such a classifier is said to have useful information but mainly for TNs.
Normally a classifier produces one single point in ROC graph and is called a discrete classifier. For binary classifiers there is a way to get a set of operating
6.4. IMPROVED CPS WITH MULTIPLE OFFSPRING 110