The multiclass neural network - Critical assessment of applied machine learning algorithms for

2. Applied Machine Learning Algorithms: Strengths and Pitfalls

2.1. Critical assessment of applied machine learning algorithms for the 1 st research

2.1.1 The multiclass neural network

The aim of the first two research projects is to apply a range of classification models for credit scoring and cross-selling in order to understand which supervised learning algorithm works best. To gain these performance insights, I have processed different types of modified algorithms. What makes neural networks so special is that a successful net can be created without understanding how it works. However, I have learned and explained the characteristic of these model outcomes using various modified neural networks.

One of the research experiments uses the multiclass classification model, which solves the problem of classifying instances into one of three or more instances. The applied multiclass neural network from MS Azure ML solves the multiclass classification

problem based on neural networks. The distinction of that MS Azure ML algorithm is to have more than one neuron in the output layer. In practice, the final layer of a neural network is based on N logistic classifier. The adjusted algorithm neural network model, “Multiclass Neural Network”, can be used to predict a target that has multiple values. The classification uses a tagged transactional dataset that includes the label column ‘status’ for the first and ‘cardholder’ for the second research project, as illustrated in figure 2-2.

The neural network is primarily used for predictive modelling of the credit scoring case and cross-selling case, in which the adjusted multiclass neural network algorithm is trained on a pre-processed dataset. The acceptable range of the output is usually between 0 and 1. The connections between the input, hidden and output layer are modelled as weights. Negative values reflecting an inhibitory connection and positive values are reflecting an excitatory connection. According to this theory, various types of MS Azure ML neural network algorithms use this concept to model complex relationships between inputs and outputs (Steinhaeuser, Chawla and Ganguly, 2015). Training a neural network is the process of finding values for its weights and bias terms, which are used in conjunction with values given in the input layer to generate outputs and predictions given in the output layer. The created model is used to make predictions on transactional data with unknown outputs.

Schetinin et al. (2003) describes how a multiclass neural network can be learned from a large-scale clinical electroencephalogram’s dataset. The algorithm trains hidden neurons separately to classify all the pairs of classes in order to find best pairwise classifiers relevant to the classification problem. Regarding the current research project, an n-class model should be learnt from the large-scale transactional Berka dataset to correctly classify credit scores or cross-sell candidates of the training and test set.

The idea of the multiclass neural network is to separately train the hidden neurons of the neural network. The algorithm learns to divide the examples from each pair of classes. The aim is to learn n (n - 1) / 2 binary classifiers from n classes. Schetinin et al. (2003) defines a multiclass neural network as follows:

Let 𝑓_𝑖,𝑗̇ be a threshold activation function of a hidden neuron which learns to divide the examples x of ith and jth classes Ω_𝑖 and Ω_𝑗 respectively. The output y of the hidden neuron is:

y = 𝑓_𝑖,𝑗̇(x) = 1, Ɐ x ∈Ω_𝑖, and y = 𝑓_𝑖,𝑗̇(x) = - 1, Ɐ x ∈ Ω_𝑗

Figure 2-2: Visualization of a multiclass neural network

Assume q = 3 classification problem with overlapping classes Ω1, Ω2 and Ω3 centered

into C1, C2, and C3, as figure 2-3 depicts. The number of hidden neurons for this

example is equal to 3. In figure 2-2 and 2-3, lines 𝑓1,2, 𝑓1,3 and 𝑓2,3 depict the

hyperplanes of the hidden neurons trained to divide the examples of three pair of the classes, which are (1) Ω₁ and Ω₂, (2) Ω₁ and Ω₃, and (3) Ω₂ and Ω₃.

Figure 2-3: The dividing surfaces for the hyperplanes

By combining these hidden neurons into n = 3 groups, the algorithm built new hyperplanes 𝑔1, 𝑔2, and 𝑔3. The first one, 𝑔1, is a superposition of the hidden neurons

𝑓_1,2 and 𝑓_1,3., i.e., 𝑔₁ = 𝑓_1,2 + 𝑓_1,3. The second and third hyperplanes are 𝑔₂ = 𝑓_2,3 - 𝑓_1,2 and 𝑔₃ = – 𝑓_1,3 - 𝑓_2,3 correspondently. Figure 2-3 above also shows that in the general case for n > 2 classes, the neural network consists of n output neurons 𝑔₁, …, 𝑔_𝑛 and n (n – 1) / 2 hidden neurons 𝑓1,2, …, 𝑓𝑖,𝑗̇, …, 𝑓𝑛 - 1/n, where i < j = 2, …, n.

Learning classification models from a transactional dataset are still a complex problem because of the following: First, transactions are generally not static data which depends on an individual payment behaviour of bank customers; second, the transactional dataset can be noisy and incomplete; third, a given set of transaction attributes may contain attributes which are non-important to the classification problem and may probably diminish the classification results; and fourth, transactional datasets are large-scale data which are recorded during several time-periods, and for this reason the learning time is crucial.

A common criticism of applying neural networks, adapted to current research objectives, is that they require a large diversity of training for real-world transactional datasets. The reason is that any learning machine needs sufficient representative samples in order to identify the hidden behavioural structure that allows it to generalize for new credit scoring cases.

In document Predictive Modelling of Retail Banking Transactions for Credit Scoring, Cross-Selling and Payment Pattern Discovery (Page 52-55)