• No results found

Firstly, this method is not influenced by the unbalanced classes in these tasks since the output values for the two classes as modelled using two Gaussian distributions. This allows the outputs from both classes to be treated as equally important when calculating the probability density function (φ), as eachφ value is calculated relative to theµc andσc values for the minority and majority classes alone. In contrast, both the SDCB and DRS methods are “slot”-based; this means that each slot contains the output values from both classes. When the classes are unbalanced, the larger majority class can influence the class label of each slot, as the class with the most number of examples in a given slot determines the slot’s class label. Secondly, this probabilistic method requires no extra parameter configuration whereas both “slot”-based methods require somea prioriparameter configuration, e.g., size of each slot, total number of slots, and range values of slots. These parameters can be problem-specific and require a trial and error process to configure. Finally, this non-static classification strategy is relatively fast to compute and does not add a significant cost to the GP training times compared to the ZT strategy. This probabilistic strategy requires only one additional pass through the fitness cases to compute the µc and σc values for the two class distributions during fitness evaluation.

3.3

Fitness Functions in GP

The static and non-static classification strategies (discussed above) determine

howa class label is assigned to a particular input instance. The fitness function is different; this defines a measure to calculate the accuracy of a solution by comparing the predicted class labels (as returned by a particular classification strategy) with the target (or actual) class labels.

This section outlines three typical current approaches in the fitness function and discusses the advantages and limitations of each. The first is the standard fitness function for classification: the overall classification accuracy. The other two are improved fitness functions for classification with unbalanced data: the average accuracy of the minority and majority classes, and the AUC.

3.3.1

Overall Accuracy in Fitness

The traditional measure for classification, Acc shown below, uses the overall classification accuracy in the fitness function (as discussed in Section 2.5.1 in

Table 3.1: Outcomes of a two-class classification problem.

PredictedPositive Class PredictedNegative Class

ActualPositive Class True Positive (TP) False Negative (FN)

ActualNegative Class False Positive (FP) True Negative (TN)

Chapter 2). Using Table 3.1, this corresponds to the number of examples correctly predicted by a classifier as a proportion of the total number of training examples. Note that the same confusion matrix as Table 3.1 is also shown in the the previous chapter (Table 2.1 on page 16) but is repeated here for convenience.

Acc= T P+T N

T P+T N+F P+F N (3.2)

InAcc, fitness values range between 0 and 1 where 0 is very poor overall accuracy and 1 is perfect classification accuracy. As Acc treats all correct predictions as equally important in fitness, the larger number of examples from the majority class can influence the overall accuracy, rewarding biased solutions with high fitness values [173][179][130].

For example, consider a data set that contains 100 instances where 10 belong to the minority class and the rest (90) belong to the majority class. Using this fitness function, a trivial solutionpwhich classifiesallthe instances as belonging to the majority class, can score a relatively high fitness, 0.9 (shown byAccp below). An alternative solution q with better discrimination ability between examples from the two classes, e.g., which correctly classifies 8 minority class examples and 72 majority class examples, scores a lower fitness value, 0.8 (shown byAccq below). Even thoughq has good accuracy rates on both classes, its fitness is lower than the biased solutionpand thus,qwill have a lower selection probability thanpin the evolution.

Accp = 0+90+10+00+90 = 10090 = 0.9 Accq = 8+72+2+88+72 = 10080 = 0.8

3.3.2

Average Class Accuracy in Fitness

To promote solutions which have better accuracy on both classes, the fitness measure Ave (Eq. 3.3) uses the average classification accuracy of the minority and majority classes in fitness.

Ave = 12 T P T P+F N + T N T N+F P (3.3)

3.3. FITNESS FUNCTIONS IN GP 63 FP rate 0 1 h’ h i+1 i 1 w TP rate

Figure 3.3: (a) Shaded area is the trapezoid fitted under two points on an ROC curve wherewis the width, andhandh′ are heights of the trapezoid.

In Eq. (3.3), the accuracy of each class is treated separately in the fitness function, where both contribute equally to the final fitness value. Using the example above, the biased classifierpwill now have a poorer fitness of 0.5 (shown byAvep) than solutionq which has a fitness of 0.8 (shown by Aveq). Solution q has a higher fitness because it has a better accuracy across both classes.

Aveq= 12(0+80 +90+090 ) = 21(100 +9090) = 12(0 + 1) = 0.5 Aveq = 12(8+28 + 72+1872 ) = 12(108 + 7290) = 12(0.8 + 0.8) = 0.8

3.3.3

Area under the ROC curve

While Ave can find solutions with better minority class accuracies than the standard Acc, a major limitation of both these measures is that they represent the performance of a solution when it is evaluated using asingleclass threshold. In contrast, the area under the ROC curve (or AUC) measures the classification performance atmultipleclass thresholds. The AUC measures the overall quality of a classifier when the threshold parameter biasing the final classification decision is varied [130]. Auc= N1 X i=1 1 2(F Pi+1−F Pi) (T Pi+1+T Pi) (3.4)

In Eq. (3.4), N is number of class thresholds and TPi/FPi represent the performance of the solution at class threshold i. The equation sums the area of the individual trapezoids2 fitted under the ROC points, as shown in Figure 3.3 for two ROC points. This measure returns values between 0 and 1 where the higher the value, the better the performance. The AUC corresponds to the

2The area of a trapezoid is 1

2w(h+h′) wherew is the width, andh andh′ are heights the

++++++++ + 8

8−

Positive class instances

Negative class instances −−−−−−−−−−−−−−−−−−−−−−−− Ti Tj 1 0 FP rate 1 TP rate Tj Ti (a) (b)

Figure 3.4: (a) Numeric outputs of a GP solution when it is evaluated on the input instances, where+and - denote the positive (minority) class and negative (majority) class outputs, respectively, and Ti and Tj are two different class thresholds; (b) an ROC curve with two points.

probability that a minority class example is correctly predicted across different class thresholds [84]. As mentioned, the AUC is a particularly useful and common measure of performance in classification tasks with unbalanced data as it represents how well a learned classifier approximates the trade-off between the minority and the majority classes across multiple classification thresholds. The following procedure is used to generate an ROC curve for a given GP solution.

a) Evaluate the solution on all the input instances from both classes to obtain the numeric output values (this requires one full pass through the input instances). Store the numeric output values separately for the two classes (e.g. in two separate array structures).

b) For each class, sort the numeric output values (stored in the arrays) in ascending order. For example, Figure 3.4(a) shows the (sorted) numeric output values for the two classes when a GP solution is evaluated on the input instances, where + and - denote the positive (minority) class and negative (majority) class outputs, respectively.

c) Build an ROC curve for each classification threshold valueT:

1. InitialiseT (i.e. the first class threshold) as the lowest output value for the positive (minority) class.

2. Iterate through the positive class outputs to count the number of outputs that are greater than, or equal to, T (i.e. TPs). For example, using Ti as the current threshold in Figure 3.4(a), seven out of eight positive class outputs satisfy this constraint (i.e. ≥Ti) giving a TP rate of 78 (0.875).

3.3. FITNESS FUNCTIONS IN GP 65 3. Similarly, iterate through the negativeclass outputs to count the num-

ber of outputs that are greater than, or equal to, T (i.e. FPs). For example, usingTi as the class threshold in Figure 3.4(a), six out of 26 negative class outputs satisfy this constraint (i.e. ≥Ti), producing a FP rate of 266 (0.23). The TP rate (from the previous step) and this FP rate correspond to one point on the ROC curve, as shown in Figure 3.4(b). 4. Update the threshold T for the next iteration. The new threshold is

the lowest output value from either the positive class or negative class output values that is greater thanT.

5. Repeat steps (2) to (4) until the largest output value for the positive (minority) class is reached. Each step produces one ROC point, e.g., using Tj is the class threshold in Figure 3.4(a) will produce another ROC point, as shown in Figure 3.4(b) forTj.

d) Use Eq. (3.4) to calculate the final AUC value for the ROC curve.

A major limitation ofAucin the fitness function is the increased training times, due to the computational overhead required to construct an ROC curve. Once a solution in evaluated on the input instances (i.e. after step (a) in the above procedure), the extra computational overhead is due to two main factors that are

not required in the calculation of the Acc and Ave measures. These factors are: sorting the output values for the two classes (i.e. step (b) in the above procedure), andmultipleiterations over these output values to obtain the different ROC points (i.e. step (c) in the above procedure). When the distance between class thresholds

Ti and Ti+1 is small, more points on the ROC curve are generated (compared to

larger distances), allowing for a highly accurate AUC estimation.

While the above procedure can be optimised using more efficient program- ming/optimisation techniques to speed up the calculation, this is not used in this thesis. This is because the full procedure (shown above) represents the traditional “out-of-the-box” method to calculate the AUC (as outlined in [27]). For example, a more efficient technique to count the TPs and FPs at a given class thresholdT

(after the output values are sorted) in step (c) would be to only count the number of TPs and FPs in region r where Ti1 < r ≤ Ti , and reconcile these with the TPs and FPs from the previous iteration (for the class threshold Ti1). In this

case, given the number of TPs from the previous iteration (call this tpi1) and

the number of TPs in the regionrfor the current iteration (call thistpr), the final number of TPs in the current iteration (call thistpi) would then betpi =tpi1−tpr (and likewise for the FPs).

However, in the next chapter, several new measures are developed to approx- imate the AUC in fitness but with faster training times, and these compared the traditional AUC measure. One of these includes a lower-precision AUC measure where the distance between class thresholds is increased (to speed up the calculation).