6.2 MOGP Approaches for Ensemble Learning
6.2.4 Pairwise Failure Crediting (PFC)
The second diversity measure is PFC [36], as given by Eq. (6.2) which calculates the PFC for solutionpwith respect to classc.
P F Cc,p = 1 T −1 T X q=1,q6=p PNc i=1Dif f(gp p i, gp q i) Errpc +Errqc (6.2) where Dif f(gpp, gpq) = 1 ifIcls(gpp)6=Icls(gpq) 0 otherwise
In Eq. (6.2), T is population size, Nc is the number of training examples in class c, and gppi is the raw output of genetic program p when evaluated on the ith example in class c. The function Dif f(·) is used to compute the Hamming distance (HD) between the outcomes of two solutions (p andq) on classc. This function returns 1 if the predicted class labels of two genetic program solutions are different for a given input instance, or 0 otherwise. The predicted class label of a solution is determined by indicator functionIcls in Eq. (6.2) which simply returns 1 (minority class) if raw output value is zero or positive, or 0 (majority class) otherwise according to the zero-threshold strategy. In Eq. (6.2), Errp
c is the number of incorrect class predictions by a given solution p on class c. An incorrect prediction occurs when the predicted and actual class labels for a given input are different. Unlike NCL, Eq. (6.2) will return values between 0 and 1 where the higher the PFC value, the better the diversity, i.e., lower overlap of common errors.
Similar to the NCL, Eq. (6.2) calculates the diversity between examples from the two classes separately to account for the skewed class distributions in the tasks. The average PFC on the minority and the majority classes then represents the final diversity.
PFC has two major differences to NCL. Firstly, PFC is a population-level diversity measure. This means that PFC measures the diversity of each solution with respect to all other solutions in the population, whereas NCL compares the outputs of a solution to the ensemble and the ensemble members (not other solutions in the population that are not in the ensemble). This also means that unlike NCL, PFC does not require the ensemble’s output (on a given input) in the PFC equation. Secondly, PFC measures diversity based on the binary-valued outcome of a genetic program solution in terms of 1 or 0 for a correct or incorrect class prediction, respectively, whereas NCL uses the (processed) output values of
the solutions.
Note that the computational overhead required to compute the PFC during fitness evaluation, where each solution is compared to all others in the population, isT(T −1)total comparisons between solutions (whereT is the population size). However, the total number of comparisons can be reduced by simultaneously accumulating PFC values between any two solutions in a pairwise manner. For example, the diversity between two solutionsa1and a2in the population will be
the same whena1is compared toa2 duringa1’s fitness evaluation, and whena2 is
compared toa1 during a2’s fitness evaluation. By simultaneously accumulating
PFC values between solutions in a pairwise manner, 12T(T −1)comparisons are required to compute the PFC for the entire population.
PFC in MOGP Fitness
As each solution in the population is compared to all others, the PFC aims to make the solutions in the population uncorrelated to all other solutions. This is different to NCL which aims to minimise the correlation between solutions and the ensemble. As discussed above, Eq. (6.2) does not require the ensemble output in the PFC equation. This allows the PFC measure to be used atanystage in the fitness evaluation. In contrast, NCL requires that the non-dominated set of solutions in the population be known prior to the NCL calculation, as these solutions represent the ensemble and the ensemble’s output is used in the NCL equation.
To take advantage of this flexibility, Eq. (6.2) is incorporated into the objective performance of the evolved solutions (alongside the classification accuracy) on the two classes, and before the Pareto ranking (using the SPEA2 algorithm) is applied to the population. This gives equal selection preference to accurate and diverse solutions in the population. This is shown by Eq. (6.3), where(Sp)c is the objective performance of solutionpon objectivec.
(Sp)c =Y 1−Errp c Nc + (1−Y)P F Cc,p (6.3) In Eq. (6.3), weight factor Y specifies the trade-off between accuracy (first component in the equation) and diversity (second component in the equation) where 0 < Y < 1. The MOGP approach with PFC uses a Y value of 0.5 to treat accuracy and diversity as equally important in fitness. In both the Baseline MOGP and MOGP with NCL, the objective performance (Sp)c uses only the accuracy of a solutionpon class c. By incorporating the accuracy and diversity
6.2. MOGP APPROACHES FOR ENSEMBLE LEARNING 151 Input Solution 1 2 3 4 5 a1 1 0 1 1 0 a2 1 1 0 1 0 HD 0 1 1 0 0 Input Solution 1 2 3 4 5 a1 1 0 1 1 0 a3 0 1 1 0 1 HD 1 1 0 1 1 (a) Diversity is HD(a1,a2)
erra1+erra2 = 2 2+2 = 0.5 (b) Diversity is HD(a1,a3) erra1+erra3 = 4 2+2 = 1
Figure 6.2: Pairwise PFC comparisons between three solutions (a1,a2 anda3) on
five inputs (in the same class).
of solutions into the objective performances, the Pareto ranking of the solutions (according to SPEA2) is not solely based on the accuracy of the solutions on the two classes (as is the case for MOGP using NCL). This allows the PFC ensembles to contain more diverse but potentially less accurate solutions.
Other Weighting Coefficients
Other weighting coefficients forY (e.g. 0.25, 0.75 and 1) have also been explored in Eq. (6.3) to investigate if these can improve ensemble performances compared to an equal weighting between accuracy and diversity (Y of 0.5) on the tasks. Preliminary results find that an equal weighting between accuracy and diversity in PFC shows the best ensemble performances on the two classes compared toY
values of 0.25, 0.75 and 1 on the tasks. As exploring other weighting coefficients forY is not one of the main goals of this chapter, these preliminary results are omitted from this chapter but can be seen in the Appendix B (in Section B.2.2).
Example of PFC Calculation
Figure 6.2 illustrate how the PFC is computed in a pairwise manner for the same three genetic program solutions (a1, a2 and a3) on the same five inputs (as used
in the previous example in Figure 6.1(a) for NCL). However, Figure 6.2 shows theoutcomes of the solutions on the inputs, where 1 is a correct class prediction and 0 is an incorrect prediction. Figure 6.2(a) compares solutionsa1anda2, while
Figure 6.2(b) compares solutionsa1anda3. Although all three solutions generate
the same number of errors (two errors on the five inputs), a1 and a3 are more
diverse in their outputs (thana1 anda2) as these solutions make different errors
on the same inputs. In Figure 6.2(a), the pairwise PFC contribution betweena1
and a2 is 0.5; while in Figure 6.2(b), the pairwise PFC contribution between a1
solutions, the final PFC value fora1is 0.75 as shown below. P F Ca1 = 1 T−1 HD(a1,a2) erra1+erra2 + HD(a1,a3) erra1+erra3 = 12(0.5 + 1) = 12(1.5) = 0.75