Neural network based model design by MOGA

1. Introduction

2.7. Neural network based model design by MOGA

The problem of designing a neural network based model can be divided into two sub- problems as follows [4]:

 Neural network structure: It denotes the network inputs, the number of hidden layers, and the number of neurons in each layer.

 Neural network parameters: They depend on the model chosen and are usually determined by a suitable learning algorithm.

Since the RBFNN models considered in this thesis were designed by a MOGA, the remaining of this section details the MOGA application to the design of RBFNN models for classification and regression problems.

The output of a RBFNN model is given by Eq. (2.57):

𝑜[𝑘] = 𝑤𝑙+1+ ∑ 𝑤𝑚𝑒 ‖𝒊𝑗[𝑘]−𝑪(𝑚)]‖₂2 2𝜎𝑚2 𝑙 𝑚=1 (2.57)

In Eq. (2.57), 𝑜[𝑘] and 𝒊_𝑗[𝑘] denote the model output and the 𝑗th_{input at time instant}_𝑘,

respectively. 𝒘 represents the vector of the linear weights, 𝐂(𝑚) refers to the vector (extracted from the 𝐂 matrix) of the center associated with the 𝑚th_{hidden neuron,}_σ

m is its

corresponding spread, and ₂ represents the Euclidean distance. The network parameters which will be denoted as the parameter vector 𝐩, are therefore 𝐂, 𝛔 and 𝐰. In order to design a RBFNN model that satisfies a set of defined goals, it is necessary to define a set of quality measures in the form of objectives for each sub-problem mentioned above.

Assume that 𝑫 = (𝑿, 𝒚) is a data set composed of 𝑁 input-output pairs, which is divided into a training set, 𝑫𝑡_{, a generalization or testing set 𝑫}𝑔_{and a validation set 𝑫}𝑣_{. Assume also that}

𝐹 is a set of all possible input features (delayed values of the modeled and exogenous variables in time-series regression problems). The problem of designing RBFNN model by MOGA can be expressed as follows:

The Dataset 𝑫, the allowed range 𝑑 ∈ [𝑑𝑚, 𝑑𝑀] of input features from 𝐹 and the range

𝑛 ∈ [𝑛𝑚, 𝑛𝑀] of hidden neurons are given as design parameters to the MOGA. After the

𝜇_𝑝 and 𝜇_𝑠 denote a set of objectives related to the RBFNN’s parameters _{𝐩 and its structure,} respectively. 𝜇𝑠 includes only one objective,

 

s O

  _  _ (2.58)

that denotes the model complexity which is a function of the number of input features and the number of the hidden neurons.

Since the specification of 𝜇_𝑝 is different in the classes of problems considered, the following subsections address the specification of 𝜇_𝑝 for each class.

2.7.1. Specification of 𝝁𝒑 in classification problems

In classification problems, we are mainly interested to minimize 𝐹𝑃 and 𝐹𝑁 criteria (see Section 2.5). Hence the corresponding objectives for 𝜇𝑝 are considered as:

𝜇_𝑝 = [𝐹𝑃_𝑫𝑡, 𝐹𝑁_𝑫𝑡, 𝐹𝑃_𝑫𝑔, 𝐹𝑁_𝑫𝑔] (2.59)

where 𝐹𝑃_𝑫𝑡 and 𝐹𝑁_𝑫𝑡 denote the 𝐹𝑃 and 𝐹𝑁 on the training set 𝑫𝑡, respectively. Similarly,

𝐹𝑃_𝑫𝑔 and 𝐹𝑁_𝑫𝑔 refer to the 𝐹𝑃 and 𝐹𝑁 on the testing set 𝑫𝑔, respectively.

2.7.2. Specification of 𝝁𝒑 in regression problems

The specification of 𝜇𝑝 in for the case of regression problems relies on the minimization of

the error between model outputs and desired values. Therefore, the corresponding objectives for 𝜇_𝑝 are defined as:

𝜇_𝑝 = [𝜀(𝑫𝒕_{), 𝜀(𝑫}𝑔_)] _(2.60)

where 𝜀(𝑫𝑡_{) and 𝜀(𝑫}𝑔_{) denote the Root Mean Square Errors (RMSE) of the model}

considering training 𝑫𝑡_{and the testing set 𝑫}𝑔_.

2.7.2.1. Specification of 𝝁𝒑 in time series prediction problems

Regarding time series prediction problems, the basic objectives specified for regression problems are also taken into account. Besides these, an additional objective, 𝜀(𝑫𝑠_{, 𝑃𝐻), is}

𝜇_𝑝 = [𝜀(𝑫𝑡_{), 𝜀(𝑫}𝑔_{), 𝜀(𝑫}𝑠_{, 𝑃𝐻)]} _(2.61)

To understand 𝜀(𝑫𝑠_{, 𝑃𝐻), assume 𝑬(𝑫}𝑠_{, 𝑃𝐻) is an error matrix defined over the simulation}

set 𝑫𝑠_{as expressed in Eq. (2.62), where 𝑫}𝑠_{is composed of a number of consecutive samples}

with respect to the time instant.

𝐸(𝑫𝑠_{, 𝑃𝐻) = [} 𝑒[1,1] 𝑒[1,2] ⋯ 𝑒[1, 𝑃𝐻] 𝑒[2,1] 𝑒[2,2] ⋯ 𝑒[2, 𝑃𝐻] ⋮ ⋮ ⋱ ⋮ 𝑒[𝑚 − 𝑃𝐻, 1] 𝑒[𝑚 − 𝑃𝐻, 2] ⋯ 𝑒[𝑚 − 𝑃𝐻, 𝑃𝐻] ] (2.62)

where

e i j ,

is the model prediction error taken from instant i of _Ds_{at step j within the}

prediction horizon PH. Denoting

 .,i

as the RMS function operating over the ith column of its argument matrix, then 𝜀(𝑫𝑠_{, 𝑃𝐻) is defined as:}

𝜀(𝑫𝑠_{, 𝑃𝐻) = ∑ 𝜌(𝑬(𝑫}𝑠_{, 𝑃𝐻), 𝑖)} 𝑃𝐻

𝑖=1

(2.63)

This value is proportional to the area below the curve defined by 𝜌(𝑬(𝑫𝑠_{, 𝑃𝐻), 𝑖) for 𝑖 within}

the prediction horizon, reflecting the model accuracy over the complete prediction horizon for the data set considered.

2.7.3. Model representation in MOGA

Each RBFNN model in the population has a chromosome representation consisting of two components. The first corresponds to the number of hidden neurons and the second one to a string of integers, each one representing the index of a particular feature in 𝐹. The chromosome representation is shown in Fig. 2.17.

Fig. 2.17. Chromosome representation in MOGA.

Before being evaluated in the MOGA, each model has its parameters determined by a Levenberg-Marquardt algorithm [32, 33] minimizing the error criterion in Eq. (2.38) that exploits the linear-nonlinear relationship of the RBFNN model parameters [34, 50]. The initial values of the nonlinear parameters (𝑪 and 𝝈) are chosen randomly, or by the use of a clustering algorithm, 𝒘 is determined as a linear least-squares solution, and the procedure is terminated using the early-stopping approach [17] within a maximum number of iterations.

2.7.4. Model design cycle

There are three main actions in the model design cycle: problem definition, solution(s) generation and analysis of results. In the problem definition stage, the data sets, the ranges of features and neurons are defined, as well as the objectives. After this stage, the MOGA execution performs a search to obtain models that satisfy the predefined objectives and goals. In the third stage, the set of models obtained by the MOGA that lie in the Pareto front are analyzed. For this purpose, the performance of the models in the validation set (not involved in the training) is also considered and is of paramount importance. If good solutions are found, the process stops. Otherwise, based on the analysis of results, the search space can be reduced, and/or the objectives and goals can be redefined, therefore restricting the trade-off surface coverage. A more detailed description on the application of the MOGA to the design of ANN models can be found, for instance, in [4, 24].

In document Artificial neural network models: data selection and online adaptation (Page 73-77)