The following experiments are used to observe the merit of applying the Nystr¨om approximation to some of the previously discussed algorithms. The DeFIMKL and GAMKLp algorithms are modified to utilize the Nystr¨om approximation by applying
(2.27) and (2.28) as discussed in Section 2.5.3, where the indices in z are randomly selected; the algorithms also use the LIBLINEAR SVM implementation [78]. The al- gorithms are evaluated using the data sets shown in Table 2.5 from the KEEL [79] and UCI [77] datasets, in addition to the data sets in Table 2.2. The algorithm parameters discussed in Section 2.6.1 are used again in this experiment; however, the Nystr¨om
approximation is applied to the kernel matrices. The experiments are repeated using Nystr¨om sampling quantities ranging from 1% to 100% of the size of the data set, allowing the effect of the Nystr¨om sampling quantity on the classification accuracy and run-time to be clearly visualized. Furthermore, the sampling and classification is performed 100 times; thus, all results are based on the average of the 100 trials.
Each MKL algorithm uses 10 RBF kernels of varying widths (i.e., σ in the RBF kernel definition): the first kernel width is always σ1 = n1
f, where nf is the number of features in the data set. The remainder of the kernels are chosen such that they are linearly spaced in the interval [nf/10,10nf].
2.7.1
Results
Figures 2.4 through 2.7 show the typical trend of classification accuracy versus the Nystr¨om sampling quantity for the GAMKL and DeFIMKL algorithms. The figures depict the classification accuracy of the full algorithms (algorithms not using the Nystr¨om approximation) as dashed lines, the trend of the classification accuracy as the Nystr¨om sampling percentage is varied as solid lines, and the points at which the Nystr¨om-based algorithms’ performances drop to 5% of the performance of the full algorithms as circles.
terms of accuracy and robustness to the Nystr¨om sampling percentage. The plots show that by applying the Nystr¨om approximation, we are able to use less than 10% of the kernel matrix to achieve results essentially equivalent to the full algorithm results, which requires the entire kernel matrix. The trend for the algorithms applied to the Ionosphere dataset are very similar.
Figures 2.4 through 2.7 also show an example that highlights a case where the Nystr¨om approximation has a stronger effect on the classification accuracy. The plots given for the Sonar dataset show that the performance decreases much more dramatically with respect to the Nystr¨om sampling quantity. This is typical of a high-dimensional dataset in which a large proportion consists of points that are far apart from each other in the kernel space, increasing the rank of the kernel matrix. In this case, a larger number of data points are required to accurately approximate the others and thus the matrix approximation suffers as the number of sampled points are limited.
Table 2.6 summarizes the results of applying the Nystr¨om approximation to GAMKLp
and DeFIMKL. The values in the table represent the percentage of the full data set required by the Nystr¨om approximation to achieve results within 5% of the classifica- tion accuracy acheived using the full data set, i.e., full GAMKL or DeFIMKL; these points correspond to the circled points in Figures 2.4 through 2.7.
Note how similar the performance degradation (with respect to the Nystr¨om sam- pling percentage) of the different MKL approaches applied to the various datasets
0 20 40 60 80 100
Nystrom Sampling Percentage
50 55 60 65 70 75 80 85 90 95 100 Classification Accuracy Wine Ionosphere Sonar
Figure 2.4: Results of using GAMKL1on the Wine, Ionosphere, and Sonar
datasets with the Nystr¨om approximation. Dashed line indicates full sample performance; circle indicates sample percentage at which performance drops 5%.
0 20 40 60 80 100
Nystrom Sampling Percentage
50 60 70 80 90 100 Classification Accuracy Wine Ionosphere Sonar
Figure 2.5: Results of using GAMKL2on the Wine, Ionosphere, and Sonar
datasets with the Nystr¨om approximation. Dashed line indicates full sample performance; circle indicates sample percentage at which performance drops 5%.
is. Table 2.6 shows that we can regularly sample less than 10% of the kernel yet in- cur negligible performance degradation. This general invariance to datasets or MKL
0 20 40 60 80 100
Nystrom Sampling Percentage
0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 Classification Accuracy Wine Ionosphere Sonar
Figure 2.6: Results of using DeFIMKL1 on the Wine, Ionosphere, and
Sonar datasets with the Nystr¨om approximation. Dashed line indicates full sample performance; circle indicates sample percentage at which per- formance drops 5%.
0 20 40 60 80 100
Nystrom Sampling Percentage
0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 Classification Accuracy Wine Ionosphere Sonar
Figure 2.7: Results of using DeFIMKL2 on the Wine, Ionosphere, and
Sonar datasets with the Nystr¨om approximation. Dashed line indicates full sample performance; circle indicates sample percentage at which per- formance drops 5%.
methods suggests that the performance degradation is mostly, if not all, due to the Nystr¨om approximation error, thus this approximation technique can be applied to
0 10 20 30 40 50 60 70 80 90 100 Nystrom Sampling Percentage
-40 -20 0 20 40 60 80 100
Average Prediction Speed Increase (%)
Australian Bupa Dermatology Ecoli Glass Haberman Ionosphere Pima Saheart Sonar Spectfheart WDBC Wine
Figure 2.8: Average speed-up percentages of classifiers using the Nystr¨om
approximation.
other MKL approaches to yield similar results. Furthermore, the application of the Nystr¨om approximation makes the MKL approaches more memory-efficient, making the application of MKL approaches to large datasets possible.
The prediction time of each experiment was recorded to show how the Nystr¨om approximation can also speed up classifiers. Figure 2.8 shows the average increase in speed of the classifiers on the datasets. Notice that if we only use 20% of the data, which Table 2.6 clearly indicates we can routinely do without sacrificing classifier performance, the prediction speed is increased by 45—80%, depending on the dataset. And if we choose only 5% of the data, again as Table 2.6 shows we can do with most datasets, the prediction speed can be increased up to 92%. Therefore, in addition to making MKL methods more memory-efficient, the Nystr¨om approximation also