Experiments with the Nystr¨ om Approximation

The following experiments are used to observe the merit of applying the Nyström approximation to some of the previously discussed algorithms. The DeFIMKL and GAMKLp algorithms are modified to utilize the Nyström approximation by applying

(2.27) and (2.28) as discussed in Section 2.5.3, where the indices in z are randomly selected; the algorithms also use the LIBLINEAR SVM implementation [78]. The algorithms are evaluated using the data sets shown in Table 2.5 from the KEEL [79] and UCI [77] datasets, in addition to the data sets in Table 2.2. The algorithm parameters discussed in Section 2.6.1 are used again in this experiment; however, the Nystr¨om

approximation is applied to the kernel matrices. The experiments are repeated using Nyström sampling quantities ranging from 1% to 100% of the size of the data set, allowing the effect of the Nyström sampling quantity on the classification accuracy and run-time to be clearly visualized. Furthermore, the sampling and classification is performed 100 times; thus, all results are based on the average of the 100 trials.

Each MKL algorithm uses 10 RBF kernels of varying widths (i.e., σ in the RBF kernel deﬁnition): the ﬁrst kernel width is always σ₁ = _n1

f, where nf is the number of features in the data set. The remainder of the kernels are chosen such that they are linearly spaced in the interval [nf/10,10nf].

2.7.1 Results

Figures 2.4 through 2.7 show the typical trend of classification accuracy versus the Nyström sampling quantity for the GAMKL and DeFIMKL algorithms. The figures depict the classification accuracy of the full algorithms (algorithms not using the Nyström approximation) as dashed lines, the trend of the classification accuracy as the Nyström sampling percentage is varied as solid lines, and the points at which the Nyström-based algorithms’ performances drop to 5% of the performance of the full algorithms as circles.

terms of accuracy and robustness to the Nystr¨om sampling percentage. The plots show that by applying the Nystr¨om approximation, we are able to use less than 10% of the kernel matrix to achieve results essentially equivalent to the full algorithm results, which requires the entire kernel matrix. The trend for the algorithms applied to the Ionosphere dataset are very similar.

Figures 2.4 through 2.7 also show an example that highlights a case where the Nyström approximation has a stronger effect on the classification accuracy. The plots given for the Sonar dataset show that the performance decreases much more dramatically with respect to the Nyström sampling quantity. This is typical of a high-dimensional dataset in which a large proportion consists of points that are far apart from each other in the kernel space, increasing the rank of the kernel matrix. In this case, a larger number of data points are required to accurately approximate the others and thus the matrix approximation suffers as the number of sampled points are limited.

Table 2.6 summarizes the results of applying the Nystr¨om approximation to GAMKLp

and DeFIMKL. The values in the table represent the percentage of the full data set required by the Nystr¨om approximation to achieve results within 5% of the classiﬁca- tion accuracy acheived using the full data set, i.e., full GAMKL or DeFIMKL; these points correspond to the circled points in Figures 2.4 through 2.7.

Note how similar the performance degradation (with respect to the Nystr¨om sampling percentage) of the diﬀerent MKL approaches applied to the various datasets

0 20 40 60 80 100

Nystrom Sampling Percentage

50 55 60 65 70 75 80 85 90 95 100 Classification Accuracy Wine Ionosphere Sonar

Figure 2.4: Results of using GAMKL₁on the Wine, Ionosphere, and Sonar

datasets with the Nystr¨om approximation. Dashed line indicates full sample performance; circle indicates sample percentage at which performance drops 5%.

0 20 40 60 80 100

Nystrom Sampling Percentage

50 60 70 80 90 100 Classification Accuracy Wine Ionosphere Sonar

Figure 2.5: Results of using GAMKL₂on the Wine, Ionosphere, and Sonar

datasets with the Nystr¨om approximation. Dashed line indicates full sample performance; circle indicates sample percentage at which performance drops 5%.

is. Table 2.6 shows that we can regularly sample less than 10% of the kernel yet in- cur negligible performance degradation. This general invariance to datasets or MKL

0 20 40 60 80 100

Nystrom Sampling Percentage

0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 Classification Accuracy Wine Ionosphere Sonar

Figure 2.6: Results of using DeFIMKL₁ on the Wine, Ionosphere, and

Sonar datasets with the Nystr¨om approximation. Dashed line indicates full sample performance; circle indicates sample percentage at which performance drops 5%.

0 20 40 60 80 100

Nystrom Sampling Percentage

0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 Classification Accuracy Wine Ionosphere Sonar

Figure 2.7: Results of using DeFIMKL₂ on the Wine, Ionosphere, and

Sonar datasets with the Nystr¨om approximation. Dashed line indicates full sample performance; circle indicates sample percentage at which performance drops 5%.

methods suggests that the performance degradation is mostly, if not all, due to the Nystr¨om approximation error, thus this approximation technique can be applied to

0 10 20 30 40 50 60 70 80 90 100 Nystrom Sampling Percentage

-40 -20 0 20 40 60 80 100

Average Prediction Speed Increase (%)

Australian Bupa Dermatology Ecoli Glass Haberman Ionosphere Pima Saheart Sonar Spectfheart WDBC Wine

Figure 2.8: Average speed-up percentages of classiﬁers using the Nystr¨om

approximation.

other MKL approaches to yield similar results. Furthermore, the application of the Nystr¨om approximation makes the MKL approaches more memory-eﬃcient, making the application of MKL approaches to large datasets possible.

The prediction time of each experiment was recorded to show how the Nyström approximation can also speed up classifiers. Figure 2.8 shows the average increase in speed of the classifiers on the datasets. Notice that if we only use 20% of the data, which Table 2.6 clearly indicates we can routinely do without sacrificing classifier performance, the prediction speed is increased by 45—80%, depending on the dataset. And if we choose only 5% of the data, again as Table 2.6 shows we can do with most datasets, the prediction speed can be increased up to 92%. Therefore, in addition to making MKL methods more memory-efficient, the Nyström approximation also

2.8 Application of the Nystr¨om Approximation to

In document Feature and Decision Level Fusion Using Multiple Kernel Learning and Fuzzy Integrals (Page 80-86)