We applied the filter procedure to the Flea beetle data and the Swiss bank note data (cf. Section 5.5). The following two-step procedure was performed on each of these data sets respectively:
- The smallest enclosing hypersphere was obtained for each of the two classes in the data set using the Gaussian kernel with γ=1 p. The support vectors for each class was obtained (the subset of cases from the training data).
- We then performed KFDA on all the data using the Gaussian kernel with γ=1 p and a hyperparameter of λ=0.1. The subset was then evaluated using the leave-one-out criteria. The results for the two data sets are given in the following subsections.
6.7.1 Flea beetle data
This data set consisted of 39 cases having 19 cases in class 1 and 20 cases in class 2. The number of support vectors was 12 for each class respectively. Thus the subset for the Flea beetle data that had to be evaluated comprised 24 cases (62% of the training data).
In Figure 6.12 the index plot of the α -values show us which observations formed the *
subset. Among this subset are cases 27 and 38. Similar to what was found in Section 5.5.1, the leave-one-out criteria identified these cases as the most influential. The criteria
( )i
d , ( )i
m , r , ( )i v ( )i , f( )i and ( )i
h identified case 27, while ( )i
v and ( )i
a identified case
38. The ( )i emp
R criterion did not perform well on this small data set since it did not identify a single most influential case. See Section 5.5.1 for the results on the KFD generalization performance without these cases.
6.7.2 Swiss bank note data
This two-class data set consisted of 200 observations, i.e. 100 cases in each class. For class 1 we obtained 29 support vectors and for class 2 we obtained 31 support vectors. Thus, a subset of size 60 (30% of the training data) was obtained. Figure 6.12 contains an index plot of the α -values indicating which cases form part of the subset. Among *
this subset is case 70 and similar to the results is Section 5.5.2, this case was identified as the most influential case by all the criteria. Refer to Section 5.5.2 for further discussions on the KFD generalization performance without these cases.
Based on the examples above, we conclude the results of the leave-one-out criteria are the same with or without the filter. However, by using the hypersphere as a filter, a lot of computing time is saved since the evaluation is done on a subset of the training data. The results also show that the smallest enclosing hypersphere is not very effective for small data sets. For the Flea beetle data (a small data set), 62% of the training data formed the subset. For the Swiss bank note data (a much larger data set), 30% of the training data was selected for the subset.
6.8 Concluding remarks
This chapter focused mostly on a discussion of the smallest enclosing hypersphere and the corresponding support region. We illustrated with examples how the hypersphere in
FIGURE 6.12: Index plots of the alphas for the Flea beetle and Swiss bank note data. The spikes
represent the cases with non-zero alphas, i.e. support vectors. The Flea beetle data have 12 support vectors per class (a subset of size 24). The Swiss bank note data have 29 and 31 support vectors for class 1 and 2 respectively (a subset of size 60).
feature space corresponds to a support region in input space. It was demonstrated that the hypersphere can be constructed by using only the support vectors, i.e. the observations that fall on the surface of the hypersphere. The aim of this chapter was to obtain a procedure to reduce the number of computations when calculating the nine criteria on a leave-one-out basis. In Section 6.5 we argued that the smallest enclosing hypersphere should be used as a filter to obtain a subset of observations, i.e. the support vectors from the training data. We then proposed that only this subset should be evaluated using the nine criteria proposed in Chapter 5. Thus, a two-step procedure was given in this chapter where:
- the first step entails constructing the smallest enclosing hypersphere in feature space for each class to find a subset of cases in each class, and
- the second step entails evaluating this subset, using the criteria, on a leave-one-out basis.
The simulation results showed that this two-step procedure worked effectively. Using the hypersphere as a filter reduced the number of computations, since only a few cases needed to be evaluated. Comparing the simulation results with and without the filter, we found that the decrease in the average error rates for these two situations is quite similar. The practical applications also showed promising results, proving that the application of the hypersphere as a filter works effectively on real-world data.
Avenues for further research:
- The analysis in this chapter was carried out only for the Gaussian kernel. Experiments should also be conducted with other kernel functions when applying the hypersphere.
- We have also seen that by specifying different γ -values for the Gaussian kernel the number of support vectors also changes. Selecting the appropriate γ -value to construct the hypersphere also requires further investigation. Similarly, one can also do this investigation for hyperparameters of other kernel functions.
CHAPTER7
---
GENERAL CONCLUSION
“What we know is not much. What we do not know is immense.”
- Pierre-Simon Laplace
This thesis started with a detailed explanation of KFDA in Chapter 2. It was demonstrated that the KFD classifier is a non-linear generalization of the FLD classifier. Several classification methods related to KFDA were discussed as well as some advantages and disadvantages of KFDA. The rest of the thesis focused mainly on the effect of atypical cases on KFDA and the development of criteria to identify such cases. The contributions made in this thesis can be summarized as follows.
We studied the effect of atypical cases on various aspects of KFDA in Chapter 3. The purpose of this study was to determine whether atypical cases have an effect on KFDA, specifically on the KFD classifier’s generalization performance. The simulation results in Section 3.7 showed that atypical cases do have a detrimental effect on the KFD classifier’s generalization performance. The study also revealed that certain other aspects (cf. Section 3.3) of KFDA are also affected by atypical cases. This leads to the question whether these aspects can be used to develop criteria to identify cases having a detrimental effect on the KFD generalization performance.
In Chapter 4 we discussed the estimation of posterior probabilities in KFDA. We showed how certain output from KFDA (the projections and the discriminant scores) can be used to estimate posterior probabilities. In a simulation study we also showed that the projections are not always normally distributed, as stated in Schölkopf and Smola (2002,