1.2 Soft-Computing algorithms and methods used
1.2.1 Neural Computation-based approaches
Neural Computation is a part of Soft-Computing that includes algorithms inspired on how the human brain learns. It is based on algorithms usually known as ANNs. Neural Compu- tation includes a large amount of different NNs, which have been mainly used in classification and regression problems. In this work, we consider feed forward neural networks, a type of neural processing approach which processes the information in different layers, with a privileged direction. MLPs are the most used feed forward neural approaches, and will be described next, together with ELMs, a kind of feed forward network with a very fast training scheme, perfect for constructing hybrid algorithms.
Multi-layer perceptrons
A MLP is a particular kind of ANN which is massively parallel. It is considered a distributed information-processing system, which has been successfully applied in modelling a large variety of nonlinear problems [Haykin1998, Bishop1995]. The MLP consists of an input layer, a number of hidden layers, and an output layer, all of which are basically composed by a number of special
processing units called neurons, as Figure 1.3 shows. As important as the processing units themselves is their connectivity, i.e. how the neurons within a given layer are connected to those of other layers by means of weighted links. These weight values are closely related to the learning ability of the MLP, and also with its ability to generalize the learning from enough number of
examples. Thus, note that such a learning process demands a proper database containing
a variety of input examples or patterns with the corresponding known outputs (tags). The adequate values of the neuron weights minimize the error between the output generated by the MLP (when fed with input patterns in the database), and the corresponding expected output in the database. The number of neurons in the hidden layer is a parameter to be optimized when using this type of neural network [Haykin1998, Bishop1995].
Figure 1.3: An scheme of Artificial Neural Network model.
The input data for the MLP consists of a number of samples arranged as input vectors, x={x1, . . . , xN}. Once a MLP has been properly trained, validated and tested using an input
vector different from those contained in the database, it is able to generate a proper output y. The relationship between the output and the input signals of a neuron is the following:
y = ϕ n X j=1 wjxj − θ , (1.1)
where y is the output signal, xj for j = 1, . . . , n are the input signals, wj is the weight associated
with the j-th input, and θ is a threshold [Haykin1998, Bishop1995]. The transfer function ϕ is usually considered as the logistic function,
ϕ(x) = 1
1 + e−x. (1.2)
The process to obtain an accurate output is related to the training procedure as it was mentioned before. During the training process, the error between the estimated output and its
1.2. Soft-Computing algorithms and methods used 5
corresponding real value in the database will determine to what degree the weights in the network should be adjusted. Hence, the objective of the network training is to find the combination of weights which results in the smallest training error with the best possible generalization of the result. There are different algorithms that can be used to train a MLP. One possible technique is the back-propagation training algorithm [Bishop1995] which uses the procedure known as
gradient descent to try to locate the global minimum of the error [Gardner1998]. Another approach is the well-known Levenberg-Marquardt [Hagan1994].
Extreme Learning Machine
An ELM [Huang2015, Huang2006] is a novel and fast training method based on the structure of MLPs, shown in Figure 1.3. The most significant characteristic of the ELM training is that it is carried out just by randomly setting the network weights, and then obtaining a pseudo-inverse of the hidden-layer output matrix. The advantages of this technique are its simplicity, which makes the training algorithm extremely fast, and also its outstanding performance when compared to alternative sequential learning methods, usually better than other established approaches such as classical MLPs. Moreover, the universal approximation capability of the ELM network, as well as its classification capability, have been already proven [Huang2012].
The ELM algorithm can be summarized as follows: given a training set T =
(xi, yi)|xi∈ Rn, yi∈ R, i = 1, · · · , l, an activation function g(x), which a sigmoidal function is
usually used, and number of hidden nodes ( ˜N ),
1. Randomly assign inputs weights wi and bias bi, i = 1, · · · , ˜N .
2. Calculate the hidden layer output matrix H, defined as
H = g(w1x1+ b1) · · · g(wN˜x1+ bN˜) .. . · · · ... g(w1xl+ b1) · · · g(wN˜xN + bN˜) l× ˜N (1.3)
3. Calculate the output weight vector β as
β = H†yt, (1.4)
where H† stands for the Moore-Penrose inverse of matrix H [Huang2006], and yt is the
training output vector, yt= [yt1, · · · , ytl]T.
Note that the number of hidden nodes ( ˜N ) is a free parameter of the ELM training, and
must be estimated for obtaining good results. Usually, scanning a range of ˜N values is the
best solution. The Matlab extreme learning machine implementation by G. B. Huang, freely available at [ELM2018], is often considered for ELM implementation.