1. Introduction
2.7. Neural network based model design by MOGA
The problem of designing a neural network based model can be divided into two sub- problems as follows [4]:
๏ท Neural network structure: It denotes the network inputs, the number of hidden layers, and the number of neurons in each layer.
๏ท Neural network parameters: They depend on the model chosen and are usually determined by a suitable learning algorithm.
Since the RBFNN models considered in this thesis were designed by a MOGA, the remaining of this section details the MOGA application to the design of RBFNN models for classification and regression problems.
The output of a RBFNN model is given by Eq. (2.57):
๐[๐] = ๐ค๐+1+ โ ๐ค๐๐ โ๐๐[๐]โ๐ช(๐)]โ22 2๐๐2 ๐ ๐=1 (2.57)
In Eq. (2.57), ๐[๐] and ๐๐[๐] denote the model output and the ๐th input at time instant ๐,
respectively. ๐ represents the vector of the linear weights, ๐(๐) refers to the vector (extracted from the ๐ matrix) of the center associated with the ๐th hidden neuron, ฯ
m is its
corresponding spread, and 2 represents the Euclidean distance. The network parameters which will be denoted as the parameter vector ๐ฉ, are therefore ๐, ๐ and ๐ฐ. In order to design a RBFNN model that satisfies a set of defined goals, it is necessary to define a set of quality measures in the form of objectives for each sub-problem mentioned above.
Assume that ๐ซ = (๐ฟ, ๐) is a data set composed of ๐ input-output pairs, which is divided into a training set, ๐ซ๐ก, a generalization or testing set ๐ซ๐ and a validation set ๐ซ๐ฃ. Assume also that
๐น is a set of all possible input features (delayed values of the modeled and exogenous variables in time-series regression problems). The problem of designing RBFNN model by MOGA can be expressed as follows:
The Dataset ๐ซ, the allowed range ๐ โ [๐๐, ๐๐] of input features from ๐น and the range
๐ โ [๐๐, ๐๐] of hidden neurons are given as design parameters to the MOGA. After the
38
๐๐ and ๐๐ denote a set of objectives related to the RBFNNโs parameters ๐ฉ and its structure, respectively. ๐๐ includes only one objective,
๏จ ๏ฉ
s O
๏ญ ๏ฝ ๏ฉ๏ซ ๏ญ ๏น๏ป (2.58)
that denotes the model complexity which is a function of the number of input features and the number of the hidden neurons.
Since the specification of ๐๐ is different in the classes of problems considered, the following subsections address the specification of ๐๐ for each class.
2.7.1. Specification of ๐๐ in classification problems
In classification problems, we are mainly interested to minimize ๐น๐ and ๐น๐ criteria (see Section 2.5). Hence the corresponding objectives for ๐๐ are considered as:
๐๐ = [๐น๐๐ซ๐ก, ๐น๐๐ซ๐ก, ๐น๐๐ซ๐, ๐น๐๐ซ๐] (2.59)
where ๐น๐๐ซ๐ก and ๐น๐๐ซ๐ก denote the ๐น๐ and ๐น๐ on the training set ๐ซ๐ก, respectively. Similarly,
๐น๐๐ซ๐ and ๐น๐๐ซ๐ refer to the ๐น๐ and ๐น๐ on the testing set ๐ซ๐, respectively.
2.7.2. Specification of ๐๐ in regression problems
The specification of ๐๐ in for the case of regression problems relies on the minimization of
the error between model outputs and desired values. Therefore, the corresponding objectives for ๐๐ are defined as:
๐๐ = [๐(๐ซ๐), ๐(๐ซ๐)] (2.60)
where ๐(๐ซ๐ก) and ๐(๐ซ๐) denote the Root Mean Square Errors (RMSE) of the model
considering training ๐ซ๐ก and the testing set ๐ซ๐.
2.7.2.1. Specification of ๐๐ in time series prediction problems
Regarding time series prediction problems, the basic objectives specified for regression problems are also taken into account. Besides these, an additional objective, ๐(๐ซ๐ , ๐๐ป), is
39
๐๐ = [๐(๐ซ๐ก), ๐(๐ซ๐), ๐(๐ซ๐ , ๐๐ป)] (2.61)
To understand ๐(๐ซ๐ , ๐๐ป), assume ๐ฌ(๐ซ๐ , ๐๐ป) is an error matrix defined over the simulation
set ๐ซ๐ as expressed in Eq. (2.62), where ๐ซ๐ is composed of a number of consecutive samples
with respect to the time instant.
๐ธ(๐ซ๐ , ๐๐ป) = [ ๐[1,1] ๐[1,2] โฏ ๐[1, ๐๐ป] ๐[2,1] ๐[2,2] โฏ ๐[2, ๐๐ป] โฎ โฎ โฑ โฎ ๐[๐ โ ๐๐ป, 1] ๐[๐ โ ๐๐ป, 2] โฏ ๐[๐ โ ๐๐ป, ๐๐ป] ] (2.62)
where
e i j๏ ๏,
is the model prediction error taken from instant i of Ds at step j within theprediction horizon PH. Denoting
๏ฒ๏จ ๏ฉ.,i
as the RMS function operating over the ith column of its argument matrix, then ๐(๐ซ๐ , ๐๐ป) is defined as:๐(๐ซ๐ , ๐๐ป) = โ ๐(๐ฌ(๐ซ๐ , ๐๐ป), ๐) ๐๐ป
๐=1
(2.63)
This value is proportional to the area below the curve defined by ๐(๐ฌ(๐ซ๐ , ๐๐ป), ๐) for ๐ within
the prediction horizon, reflecting the model accuracy over the complete prediction horizon for the data set considered.
2.7.3. Model representation in MOGA
Each RBFNN model in the population has a chromosome representation consisting of two components. The first corresponds to the number of hidden neurons and the second one to a string of integers, each one representing the index of a particular feature in ๐น. The chromosome representation is shown in Fig. 2.17.
40
Fig. 2.17. Chromosome representation in MOGA.
Before being evaluated in the MOGA, each model has its parameters determined by a Levenberg-Marquardt algorithm [32, 33] minimizing the error criterion in Eq. (2.38) that exploits the linear-nonlinear relationship of the RBFNN model parameters [34, 50]. The initial values of the nonlinear parameters (๐ช and ๐) are chosen randomly, or by the use of a clustering algorithm, ๐ is determined as a linear least-squares solution, and the procedure is terminated using the early-stopping approach [17] within a maximum number of iterations.
2.7.4. Model design cycle
There are three main actions in the model design cycle: problem definition, solution(s) generation and analysis of results. In the problem definition stage, the data sets, the ranges of features and neurons are defined, as well as the objectives. After this stage, the MOGA execution performs a search to obtain models that satisfy the predefined objectives and goals. In the third stage, the set of models obtained by the MOGA that lie in the Pareto front are analyzed. For this purpose, the performance of the models in the validation set (not involved in the training) is also considered and is of paramount importance. If good solutions are found, the process stops. Otherwise, based on the analysis of results, the search space can be reduced, and/or the objectives and goals can be redefined, therefore restricting the trade-off surface coverage. A more detailed description on the application of the MOGA to the design of ANN models can be found, for instance, in [4, 24].
41