The two-class locally-deep support vector machine

2. Applied Machine Learning Algorithms: Strengths and Pitfalls

2.1. Critical assessment of applied machine learning algorithms for the 1 st research

2.2.2 The two-class locally-deep support vector machine

The MS Azure two-class locally-deep support vector machine creates a two-class, non- linear SVMs classifier that is optimized for efficient prediction. The adjusted algorithm optimizes these predictive models in order to efficiently scale to larger training sets. MS Azure ML uses the kernel function for mapping data points to the feature space. The reason behind this approach is to reduce the time needed for training while maintaining most of the classification accuracy

Jose, Goyal and Aggrwal (2013) define the supervised learning algorithm which learns a non-linear kernel 𝐾(𝑥_𝑖, 𝑥_𝑗) = 𝐾_𝐿(𝑥_𝑖, 𝑥_𝑗) 𝐾_𝐺(𝑥_𝑖, 𝑥_𝑗) as the product of a local kernel 𝐾_𝐿 = 𝜙_𝐿𝑡𝜙_𝐿 and a global kernel 𝐾_𝐺 = 𝜙_𝐺𝑡𝜙_𝐺 leading to the following prediction function as follows: 𝑦(𝑥) = 𝑠ⅈ𝑔𝑛 (∑ 𝛼_𝑖𝑦_𝑖𝐾(𝑥, 𝑥_𝑖) 𝑖 ) = 𝑠ⅈ𝑔𝑛 (∑ 𝛼_𝑖𝑦_𝑖𝜙_𝐺𝑗(𝑥_𝑖)𝜙_𝐺𝑗(𝑥) 𝜙_𝐿𝑘(𝑥_𝑖)𝜙_𝐿𝑘(𝑥) 𝑖𝑗𝑘 ) = 𝑠ⅈ𝑔𝑛 (𝑤𝑡 (𝜙𝐺 (𝑥) ⊕ 𝜙𝐿(𝑥))) = 𝑠ⅈ𝑔𝑛 (𝜙𝐿𝑡(𝑥) 𝑊𝑡 𝜙𝐺(𝑥)) = 𝑠ⅈ𝑔𝑛 (𝑊𝑡_{(𝑥) 𝜙} 𝐺(𝑥))

where 𝑤𝑘 = ∑ 𝛼_𝑖 𝑖𝑦𝑖𝜙𝐿𝑘(𝑥𝑖)𝜙𝐺(𝑥𝑖), 𝜙𝐿𝑘 denotes dimension k of 𝜙𝐿 𝜖 𝑅𝑀, 𝑊 =

[𝑤1, … , 𝑤𝑀], 𝑊 (𝑥) = 𝑊𝜙𝐿(𝑥) and ⊕ is the Kronecker product. Thus, the algorithm can

be thought of as either learning a single fixed linear classifier 𝜙_𝐺 ⊕ 𝜙_𝐿 space or a different classifier for each point in the given global feature space 𝜙_𝐺.

However, their paper focuses on the problem of speeding up SVMs given a single global feature. Therefore, a local-deep kernel learning formulation wasdeveloped to speed up non-linear SVM prediction by learning deep local features.

Huang et al. (2018) define machine learning with maximization (support) of separating margin (vector) as support vector machine learning. The detailed procedure of the two- class support vector machine is described in the previous section. To sum up their characteristics for further comparison, the algorithm tries to find the best separating margin between two classes with a hyperplane (also known as a decision boundary). A kernel is used to measure the similarity between points from each of the classes, and the kernels are applied globally to all points. The closest points to the decision boundary are called support vectors.

The two-class locally-deep SVM is a modification of the two-class SVM where the kernels used for measuring similarity are composed of a local and global kernel. The algorithm tries to learn the local embeddings that are high dimensional, sparse and computationally deep, which allows the two-class SVM to decouple the cost from the

calculation from the number of support vectors. Therefore, two-class locally-deep SVM is much faster than a two-class SVM, but might make a sacrifice for accuracy.

The major difference between a two-class SVM and the two-class locally-deep SVM lies first in how the classifiers are calculated. The algorithm creates non-linear classifiers by using a kernel function to maximum-margin hyperplanes. Therefore, a non-linear classification rule is learned, which corresponds to a linear classification rule for the transformed data points to solve the optimization problem. This approach results in a transformed higher-dimensional feature space, which also increases the generalization error of SVMs, although the algorithm still performs well. It is common knowledge that the larger the margin, the lower the generalization error of the classifier. The cost and performance of calculating the separating hyperplanes based on a two- class SVM is linearly proportional to the training data. Compared to a two-class locally- deep SVM, the cost and performance of calculating the separating hyperplanes does not increase linearly with the training data.

Finally, it is recommended to apply a two-class locally-deep SVM for non-linear datasets and classification problems that need to be optimized. In case of increasing the classification accuracy, the two-class SVM should be a better choice.

The two-class locally-deep SVM is a modification of two-class SVM using the kernel method, which enables us to model higher dimensional, non-linear models. Huang et al. (2018) explained that in a non-linear problem, a kernel function could be used to add additional dimensions to the raw data and thus make it a linear problem in the resulting higher dimensional space. The kernel function will do certain calculations much more quickly, which would need computations in high dimensional space. Figure 2-11 below shows the approach by applying the kernel function to separate and transform the data by a non-linear SVM. The main idea of this approach is to combine kernel activations in non-linear ways (Abdullah, Veltkamp and Wiering, 2009).

Jose et al. (2013) explained that the kernel functions are used to calculate the scalar product between two data points in a higher dimensional space without explicitly calculating the mapping from the input space to the higher dimensional space. Computing the kernel while going to the higher dimensional space is almost trivial compared to computing the inner product of two feature vectors. The two-class locally- deep SVM from MS Azure ML library is based on the Localized Multiple Kernel Learning approach, and contains multiple layers of SVMs instead of a single adjustable

layer of weights (Abdullah, Veltkamp and Wiering, 2009). Thus, the algorithm tries to learn a different kernel for each data point. It is important to mention that the performance of a standard SVM model is affected by the choice of kernel function, among other factors. There is no way to figure out which (parameterized) kernel is the best for our classification problem; however, kernel functions are not flexible. Although, it is not part of current research scope.

Figure 2-11: Visualization of a two-class locally-deep SVM using kernel function The current research will choose the best performing machine learning algorithm to solve the defined classification problems through trials. The research starts to experiment with a variety of MS Azure ML algorithms and then experiment to further optimize the performance of the best supervised learning algorithm. Depending on the nature of the classification problem, it is possible that one algorithm is better than the others. An optimal machine learning algorithm can be selected from a fixed set of MS Azure ML algorithms in a statistically rigorous fashion by using a set of performance measurements.

In document Predictive Modelling of Retail Banking Transactions for Credit Scoring, Cross-Selling and Payment Pattern Discovery (Page 69-72)