iii Performance Acceleration Heuristics
6.4 Conclusions
Instance weight learning is a widely used technique in the field of machine learning [Zuo08, Mod03, Lan09, Qu08]. In order to deal with the problems arising in instance- based learning, a framework is proposed that employs instance-weight modelling to determine a small subset of the original prototypes that can accurately represent the entire dataset. The novelty of the Direct Weight Optimization algorithm lies on the binary weights that control the preservation or removal of instances. These weights are computed by optimizing an objective function (Eq. 6.5) that involves the ratio between the nearest friend and the nearest enemy of every instance. An instance of the training set is correctly classified as long as this ratio is less than one. Hence, the rationale behind the design of DWO is to keep this ratio to a minimum, in order to guarantee that the majority of sample patterns are not misclassified by the nearest neighbour rule. A major advantage of the proposed method is the fact that the optimization procedure is performed by a genetic algorithm. Not only because GAs can find good quality solutions, but also because the various parameters of the GA offer DWO the ability to adjust its performance according to the needs of the user and the application at hand.
As experiments showed the proposed algorithm manages very competitive results as it yields very high reduction rate, the second best between the tested algorithms, without compromising classification accuracy. It is the best performing method in terms of accuracy since it surpasses even accuracy enhancing methods like ENN. It should also be mentioned that the proposed heuristics can efficiently accelerate DWO for large datasets, compensating for the increased complexity of the optimization component. On the downside, although there does not seem to be correlation between classification error and dimensionality, like other methods that use distances between samples, it may fail to remove the right instances in highly dimensional spaces due to the distance concentration
effect. Finally, for small datasets that their size is significantly lower than tmaxxpsize
(number generations and population size), DWO is rather slow despite the accelerating enhancements.
Unlike other methods, DWO models both objectives of classification accuracy and data condensation explicitly in a simple but very effective manner. So this work displays not only high accuracy results, as expected by selection techniques, but also very good condensation results. Experiments showed that DWO deals extremely well with the issue of condensation in selection algorithms, as the reduction rate achieved is equivalent if not better than the ones displayed by abstraction algorithms like CBP and ICPL.
109 its performance in terms of condensation and time requirements. The design of further heuristics, similar to the ones already introduced, can be beneficial in terms of speeding up the optimization process.
C
HAPTER
7:
E
PILOGUE
7.1
Conclusions
This thesis presented a comprehensive survey on instance reduction algorithms, including instance selection techniques as well as prototype abstraction algorithms. It also proposed four original algorithms, namely Class Boundary Preserving algorithm, Spectral Instance Reduction, Instance Seriation for Prototype Abstraction and Direct Weight Optimization algorithm, which can efficiently deal with the problems instance-based learning faces. Apart from the obvious advantages every algorithm displayed as analyzed in the respective chapters (i.e. the accuracy of DWO, the condensation ratio of the CBP algorithm or the overall performance of SIR that exhibited high accuracy, competent condensation as well as very good speed) the major contribution of this thesis is the novelty of the frameworks proposed. All methods designed offer new possibilities and provide new insights in the field of prototype reduction.
First of all, CBP is a powerful heuristic that determines the geometric structure of patterns around every sample, which is represented as the cosine score, in order to determine which instances should be discarded. It aims at identifying border samples, since the major information required to effectively describe the underlying distribution is held by samples close to class boundaries. Qualitatively, CBP exhibits the highest condensation ratio between all the tested algorithms, in the region of 89%, without compromising classification accuracy that is competent at an average of 77.09%.
Compared to other algorithms, DWO outperforms all others in terms of classification accuracy with an average of 79.75%, while it yields the second best reduction rate (83.72%), behind only CBP. But, the most important contribution of DWO is the instance-weight modelling proposed that employs a number of parameters (binary weights) focusing on maintaining misclassifications to a minimum. This is achieved with the appropriate objective function that is based on the ratio of the distances of a sample to
111 its nearest friend over its nearest enemy instance. Another major advantage of this algorithm is the objective function of the model, along with the parameters of the optimizing component that can be modified to fit the needs of the application at hand.
The SIR algorithm displays relatively high accuracies, third best as only DWO and ENN surpass its average of 79.25%, along with competent condensation results, as nearly 60% of the original samples are removed, and very good speeds (lower complexity than DWO and CBP). The novelty, though, of the specific framework is the totally new insight it offers to instance reduction with two major contributions. Firstly, the border discriminating features that capture different types of local geometric characteristics of the data samples, in terms of their friend and enemy profiles. Secondly, the spectral graph theory that is employed in order to partition between instances close to the decision surface, hence, worthwhile maintaining, and internal ones that should be removed. Essentially, samples are projected to a new space, which favors the separation between them.
Even ISPA, which can be consider as an early draft of SIR since it was designed in an effort to create some sort of ordering of sample patterns, has its own slight contribution to the field of prototype reduction. Although seriation has been known for over 50 years, it has only recently been used in machine learning and more specifically, in pattern recognition. So, it should be mentioned that despite the lack of competent results, as ISPA cannot be compared to the other very successful reduction algorithms designed or implemented in this thesis, it is the first method that tried to employ seriation on data reduction. It focused on organizing the training set in a way that favors direct merging of prototypes.
Table 7-I
Comparison of the three proposed algorithms in terms of classification accuracy, condensation ratio and computational complexity. Algorithms are numbered from the best performing (1) to the least (3). The complexity of DWO varies with the parameters of the GA and has lower complexity than CBP for larger datasets and higher for small datasets.
Accuracy Condensation Complexity
CBP 3 1 3 (2)
SIR 2 3 1
Finally, many statistics have been introduced [Dem06, Alp99] to decide which algorithm is better, and various methods have been proposed to determine which feature is most important [Wit08], but every method will conclude on a different result. A simple comparison between the three main schemes in terms of classification accuracy, condensation and computational complexity can be seen in Table 7-I. But, as explained in [Tri11] and [Gar10] for real-world applications, the final judgment often relies on what the application prioritizes and on the needs of the user. For example, if high condensation were required, setting accuracy and speed aside, the best solution would be CBP. On the other hand, if a robust and fast method is needed, then a worse performance in condensation can be traded off against a slightly better accuracy obtained in the lower possible time. So, SIR would be ideal. If accuracy is critical, DWO is the obvious choice. So, the final choice comes down to the application at hand.
To conclude, this thesis provided a thorough analysis of instance selection and prototype abstraction techniques and identified the advantages and disadvantages of each group respectively. Algorithms that generate new prototypes display high condensation ratios along with large time requirements, similar to CBP. On the other hand, selection methods are generally faster and highly accurate, as proven by SIR and other similar algorithms. So this work proposed novel techniques that not only improved the strong points of each group respectively, but also introduced DWO that enhances accuracy of already existing selection algorithms while simultaneously strengthens their weakness in terms of reduction rate.