Protein Fold Prediction Experiments - Multiple Kernel Learning Algorithms

We perform experiments on the Protein Fold (PROTEIN) prediction data set4from the MKL Repos- itory, composed of 10 different feature representations and two kernels for 694 instances (311 for training and 383 for testing). The properties of these feature representations are summarized in Ta- ble 3. We construct a binary classification problem by combining the major structural classes_{α,β_}

into one class and_{α/β,α+β_}into another class. Due to the small size of this data set, we use 30-fold cross-validation and the paired t test. We do three experiments on this data set using three different subsets of kernels.

Name Dimension Data Source

COM 20 Amino-acid composition SEC 21 Predicted secondary structure HYD 21 Hydrophobicity

VOL 21 Van der Waals volume POL 21 Polarity

PLZ 21 Polarizability

L1 22 Pseudo amino-acid composition at interval 1 L4 28 Pseudo amino-acid composition at interval 4 L14 48 Pseudo amino-acid composition at interval 14 L30 80 Pseudo amino-acid composition at interval 30 BLO 311 Smith-Waterman scores with the BLOSUM 62 matrix PAM 311 Smith-Waterman scores with the PAM 50 matrix

Table 3: Multiple feature representations in the PROTEINdata set.

Table 4 lists the performance values of all algorithms on the PROTEINdata set with (COM-SEC- HYD-VOL-POL-PLZ). All combination algorithms exceptRBMKL (product)andGMKLoutperform SVM (best)by more than four per cent in terms of average test accuracy. NLMKL (p=1),NLMKL (p=2),LMKL (softmax), andLMKL (sigmoid)are the only four algorithms that obtain more than 80 per cent average test accuracy and are statistically significantly more accurate thanSVM (best),SVM (all), andRBMKL (mean). Nonlinear combination algorithms, namely, RBMKL (product), NLMKL (p=1), and NLMKL (p=2), have the disadvantage that they store statistically significantly more

support vectors than all other algorithms.ABMKL (conic)andCABMKL (conic)are the two MKL algorithms that perform kernel selection and use less than five kernels on the average, while the others use all six kernels, exceptCABMKL (linear)which uses five kernels in one of 30 folds. The two-step algorithms, exceptGMKL,LMKL (softmax), andLMKL (sigmoid), need to solve fewer than 20 SVM problems on the average. GLMKL (p=1)andGLMKL (p=2)solve statistically significantly fewer optimization problems than all the other two-step algorithms. LMKL (softmax)andLMKL (sigmoid) solve many SVM problems; the large standard deviations for this performance value are mainly due to the random initialization of the gating model parameters and it takes longer for some folds to converge.

Table 5 summarizes the performance values of all algorithms on the PROTEIN data set with (COM-SEC-HYD-VOL-POL-PLZ-L1-L4-L14-L30). All combination algorithms except RBMKL (product) outperform SVM (best) by more than two per cent in terms of average test accuracy. NLMKL (p=1)andNLMKL (p=2)are the only two algorithm that obtain more than 85 per cent average test accuracy and are statistically significantly more accurate thanSVM (best),SVM (all), and RBMKL (mean). When the number of kernels combined becomes large as in this experiment, as a result of multiplication,RBMKL (product)starts to have very small kernel values at the off-diagonal entries of the combined kernel matrix. This causes the classifier to behave like a nearest-neighbor classifier by storing many support vectors and to perform badly in terms of average test accuracy. As observed in the previous experiment, the nonlinear combination algorithms, namely, RBMKL (product),NLMKL (p=1), andNLMKL (p=2), store statistically significantly more support vectors than all other algorithms. ABMKL (conic), ABMKL (convex), CABMKL (linear), CABMKL (conic), MKL,SimpleMKL, andGMKLare the seven MKL algorithms that perform kernel selection and use fewer than 10 kernels on the average, while others use all 10 kernels. Similar to the results of the previous experiment,GLMKL (p=1)andGLMKL (p=2)solve statistically significantly fewer optimization problems than all the other two-step algorithms and the very high standard deviations for LMKL (softmax)andLMKL (sigmoid)are also observed in this experiment.

Table 6 gives the performance values of all algorithms on the PROTEIN data set with a larger set of kernels, namely, (COM-SEC-HYD-VOL-POL-PLZ-L1-L4-L14-L30-BLO-PAM). All combination algorithms exceptRBMKL (product)outperformSVM (best)by more than three per cent in terms of average test accuracy. NLMKL (p=1)andNLMKL (p=2) are the only two algorithms that obtain more than 87 per cent average test accuracy. In this experiment,ABMKL (ratio),GMKL, GLMKL (p=1), GLMKL (p=2), NLMKL (p=1), NLMKL (p=2), andLMKL (sigmoid) are statistically significantly more accurate than SVM (best), SVM (all), and RBMKL (mean). As noted in the two previous experiments, the nonlinear combination algorithms, namely,RBMKL (product), NLMKL (p=1), andNLMKL (p=2), store statistically significantly more support vectors than all other algorithms. ABMKL (conic),ABMKL (convex),CABMKL (linear),CABMKL (conic),MKL,Sim- pleMKL, andGMKLare the seven MKL algorithms that perform kernel selection and use fewer than 12 kernels on the average, while others use all 12 kernels, except GLMKL ( p=1) which uses 11 kernels in one of 30 folds. Similar to the results of the two previous experiments,GLMKL (p=1) andGLMKL (p=2)solve statistically significantly fewer optimization problems than all the other two-step algorithms, but the very high standard deviations forLMKL (softmax)andLMKL (sigmoid) are not observed in this experiment.

Algorithm Test Accuracy Support Vector Active Kernel Calls to Solver

SVM (best) 72.06_±0.74 bc 58.29_±1.00 bc 1.00_±0.00 bc 6.00_± 0.00 bc

SVM (all) 79.13_±0.45a c _62.14_±_1.04a c _6.00_±_0.00 _1.00_± _0.00

RBMKL (mean) 78.01_±0.63 60.89_±1.02 6.00_±0.00 1.00_± 0.00

RBMKL (product) 72.35_±0.95 bc _100.00_±_0.00abc _6.00_±_0.00 _1.00_± _0.00

ABMKL (conic) 79.03_±0.92a c _49.96_±_1.01abc _4.60_±_0.50abc _1.00_± _0.00

ABMKL (convex) 76.90_±1.17abc _29.54_±_0.89abc _6.00_±_0.00 _1.00_± _0.00

ABMKL (ratio) 78.06_±0.62 56.95_±1.07abc 6.00_±0.00 1.00_± 0.00

CABMKL (linear) 79.51_±0.78abc _49.81_±_0.82abc _5.97_±_0.18abc _1.00_± _0.00

CABMKL (conic) 79.28_±0.97a c _49.84_±_0.77abc _4.73_±_0.52abc _1.00_± _0.00

MKL 76.38_±1.19abc _29.65_±_1.02abc _6.00_±_0.00 _1.00_± _0.00

SimpleMKL 76.34_±1.24abc _29.62_±_1.08abc _6.00_±_0.00 _18.83_± _4.27abc

GMKL 74.96_±0.50abc _79.85_±_0.70abc _2.37_±_0.56abc _37.10_± _3.23abc

GLMKL (p=1) 77.71_±0.96 55.80_±0.95abc _6.00_±_0.00 _6.10_± _0.31abc

GLMKL (p=2) 77.20_±0.42abc 75.34_±0.70abc 6.00_±0.00 5.00_± 0.00abc

NLMKL (p=1) 83.49_±0.76abc _85.67_±_0.86abc _6.00_±_0.00 _17.50_± _0.51abc

NLMKL (p=2) 82.30_±0.62abc _89.57_±_0.77abc _6.00_±_0.00 _13.40_± _4.41abc

LMKL (softmax) 80.24_±1.37abc _27.24_±_1.76abc _6.00_±_0.00 _85.27_±_41.77abc

LMKL (sigmoid) 81.91_±0.92abc _30.95_±_2.74abc _6.00_±_0.00 _103.90_±_62.69abc Table 4: Performances of single-kernel SVM and representative MKL algorithms on the PROTEIN

data set with (COM-SEC-HYD-VOL-POL-PLZ) using the linear kernel.

Algorithm Test Accuracy Support Vector Active Kernel Calls to Solver

SVM (best) 72.15_±0.68bc _47.50_±_1.25 bc _1.00_±_0.00 bc _10.00_± _0.00bc

SVM (all) 79.63_±0.74a c 43.45_±1.00a c 10.00_±0.00 1.00_± 0.00

RBMKL (mean) 81.32_±0.74 61.67_±1.31 10.00_±0.00 1.00_± 0.00

RBMKL (product) 53.04_±0.21abc _100.00_±_0.00abc _10.00_±_0.00 _1.00_± _0.00

ABMKL (conic) 80.45_±0.68abc _48.16_±_1.08abc _6.90_±_0.66abc _1.00_± _0.00

ABMKL (convex) 77.47_±0.62abc _87.86_±_0.76abc _9.03_±_0.61abc _1.00_± _0.00

ABMKL (ratio) 76.22_±1.14abc _35.54_±_1.01abc _10.00_±_0.00 _1.00_± _0.00

CABMKL (linear) 77.15_±0.63abc _73.84_±_0.80abc _9.90_±_0.31abc _1.00_± _0.00

CABMKL (conic) 81.02_±0.67 48.32_±0.86abc 6.93_±0.74abc 1.00_± 0.00

MKL 79.74_±1.02a c _56.00_±_0.85abc _8.73_±_0.52abc _1.00_± _0.00

SimpleMKL 74.53_±0.90abc _80.22_±_1.05abc _4.73_±_1.14abc _23.83_± _7.46abc

GMKL 74.68_±0.68abc _80.36_±_0.83abc _5.73_±_0.91abc _29.10_± _8.47abc

GLMKL (p=1) 79.77_±0.86a c _55.94_±_0.93abc _10.00_±_0.00 _6.87_± _0.57abc

GLMKL (p=2) 78.00_±0.43abc _72.49_±_1.00abc _10.00_±_0.00 _5.03_± _0.18abc

NLMKL (p=1) 85.38_±0.70abc 93.84_±0.51abc 10.00_±0.00 14.77_± 0.43abc

NLMKL (p=2) 85.40_±0.69abc _93.86_±_0.51abc _10.00_±_0.00 _18.00_± _0.00abc

LMKL (softmax) 81.11_±1.82 36.00_±3.61abc _10.00_±_0.00 _34.40_±_23.12abc

LMKL (sigmoid) 81.90_±2.01 51.94_±2.14abc _10.00_±_0.00 _31.63_±_13.17abc

Table 5: Performances of single-kernel SVM and representative MKL algorithms on the PROTEIN data set with (COM-SEC-HYD-VOL-POL-PLZ-L1-L4-L14-L30) using the linear kernel.

Algorithm Test Accuracy Support Vector Active Kernel Calls to Solver

SVM (best) 78.37_±1.08bc 93.09_±0.73 bc 1.00_±0.00 bc 12.00_± 0.00bc

SVM (all) 82.01_±0.76a c _89.32_±_0.99a c _12.00_±_0.00 _1.00_± _0.00

RBMKL (mean) 83.57_±0.59 65.94_±0.93 12.00_±0.00 1.00_± 0.00

RBMKL (product) 53.04_±0.21abc _100.00_±_0.00abc _12.00_±_0.00 _1.00_± _0.00

ABMKL (conic) 83.52_±0.94 63.07_±1.35abc _7.30_±_0.88abc _1.00_± _0.00

ABMKL (convex) 83.76_±1.02 64.36_±1.56abc _6.87_±_0.94abc _1.00_± _0.00

ABMKL (ratio) 85.65_±0.67abc 57.87_±1.24abc 12.00_±0.00 1.00_± 0.00

CABMKL (linear) 83.48_±0.92 68.00_±1.48abc _11.87_±_0.35abc _1.00_± _0.00

CABMKL (conic) 83.43_±0.95 62.12_±1.63abc _8.43_±_0.73abc _1.00_± _0.00

MKL 83.55_±1.25 81.75_±1.06abc _7.67_±_0.76abc _1.00_± _0.00

SimpleMKL 83.96_±1.20 86.41_±0.98abc _9.83_±_0.91abc _54.53_± _9.92abc

GMKL 85.67_±0.91abc _79.53_±_2.71abc _9.93_±_0.74abc _47.40_±_10.81abc

GLMKL (p=1) 85.96_±0.96abc _79.06_±_1.04abc _11.97_±_0.18abc _14.77_± _0.57abc

GLMKL (p=2) 85.02_±1.20abc 62.06_±1.02abc 12.00_±0.00 5.60_± 0.67abc

NLMKL (p=1) 87.00_±0.66abc _96.78_±_0.32abc _12.00_±_0.00 _4.83_± _0.38abc

NLMKL (p=2) 87.28_±0.65abc _96.64_±_0.32abc _12.00_±_0.00 _17.77_± _0.43abc

LMKL (softmax) 83.72_±1.35 37.55_±2.54abc _12.00_±_0.00 _25.97_± _5.75abc

LMKL (sigmoid) 85.06_±0.83abc _48.99_±_1.59abc _12.00_±_0.00 _25.40_± _9.36abc

Table 6: Performances of single-kernel SVM and representative MKL algorithms on the PROTEIN data set with (COM-SEC-HYD-VOL-POL-PLZ-L1-L4-L14-L30-BLO-PAM) using the linear kernel.

In document Multiple Kernel Learning Algorithms (Page 34-37)