ANN Modelling Procedure - Data and Modelling Procedure of the Study

Chapter 4 Methods

4.5 Data and Modelling Procedure of the Study

4.5.3 ANN Modelling Procedure

MATLAB software and two toolboxes, Neural Networks and Genetic Algorithm and Direct Search, were employed in ANN modelling. The ANN model building stages involves many steps and these can be summarised in the following order.

4.5.3.1 Scaling Procedure

As ARIMA required stationary transformation, scaling was required for ANN estimation. Scaling was helpful because if there were any very small or very large numbers (outliers) in a series, they could cause an underflow or overflow problem. Linear scaling is the most widely used scaling procedure and this can be performed in two ranges [0, 1] and [-1, 1]. To scale a series linearly, we made use of the minimum and the maximum values. The scaling is illustrated in the following equation:

For scaling the data, the following equation is applied

min max min X X X X X i s ₋ − = [ 4.68]

and for de-scaling, the equation is

min min max ) (X X X X X _i = _s − + [ 4.69]

where X sis the scaled series, X iis the original series, X minand X maxare minimum and

maximum values of the original series, respectively. After scaling the data were divided into in-sample and out-of-samples data sets to match the divisions made in ARIMA modelling for

each series. However, the in-sample dataset of ANN was further divided into two subsets, one for training and other for validation, the validation set here was used to stop the training at the right time to ensure that the network was not overtrained. More details are given next.

4.5.3.2 Network Architecture and the Training Procedure

When using ANN solely, the total number of input neurons for the network was fixed according to the number of lags (the AR only) used in the ARIMA model associated with each time series. Network architectures with one output neuron were used throughout this study. It has been proven in many studies, for example, Fu (1994), Masters (1995) and Cybenko (1989), that using a network with only one hidden layer is sufficient to approximate any continuous nonlinear function, thus a network with one hidden layer was used in this study. A number of networks were trained with different transfer functions for the hidden layer. In the hidden layer, sigmoid functions of logistic and hyperbolic tangent type were used as transfer functions. The ANN modelling results for each of these two transfer functions were later compared and the best network architecture was chosen: a linear transfer function was used for the output layer.

Decisions about the optimum number of neurons in the hidden layer were obtained by trial and error, where the number of neurons in the hidden layer varied from 1 to 20. Each network was trained with 20 different initial weights; hence, each network was trained 20 times. The weights were initialised randomly using a built-in MATLAB function (net.initFcn). The maximum number of epochs to train the ANN was set at 10,000. This was to ensure that the network was sufficiently trained. However, the training was terminated when there was no improvement in the validation error. Normally, the training and validation errors will decrease until the optimum parameters (weights) were reached and then the error for validation test tended to increase when over-fitting to the training set. The weights of the optimum network are then saved along with the optimum number of epochs reached. The training and the validation sets were then combined to form the in-sample data set that was previously used in the ARIMA modelling. This allowed us to compare each model later. Each network

previously saved was then retrained up to the optimum number of epochs using the whole dataset (in-sample).

During the second training phase which contained the full in-sample dataset, the RMSE value for each network was calculated. Only the network with the lowest RMSE value in each training phase was saved. For example, a network with say one neuron in the hidden layer

will be trained with 20 different initial weights (20 times) and each time the network was trained, the optimum weights and the optimum number of epochs reached for each network according to the validation error were saved. The training and validation datasets were then combined so, together, they formed the in-sample dataset. During this training phase (with the full in-sample dataset) the RMSE was also calculated for each network, these networks were then compared and ranked according to the lowest RMSE value. The network with the lowest RMSE value was considered as the best network. This process was repeated for all the networks, starting from one neuron up to 20 neurons in the hidden layer. Finally, the ten best performed networks according to the RMSE were selected and applied to the testing data set. The network with the smallest RMSE in the testing phase was chosen as the final model. The whole training procedure was applied twice, once for the networks with a logistic

function in the hidden layer and another when the hyperbolic tangent function was used in the hidden layer. Each time the ten best network architectures from each of these were chosen and compared according to the lowest RMSE and, finally, the best overall network was considered as our final model.

4.5.3.2.1 Training Algorithm

According to the learning rules discussed earlier (section 4.3.1.8), the LM algorithm was found to be the best choice as a training algorithm. This was because it allowed for the combining of two algorithms (one linear and another nonlinear, 2ndorder) and it also

converges faster compared with other algorithms used in ANN modelling thus, allowing us to investigate the architecture of different networks. This widened our domain of search for the best model and shortened the time needed to train many networks. There are many

comparative studies on the training algorithms for ANN that show that the LM algorithm was the best performing algorithm when compared with others.

In addition to training the network with LM, Genetic Algorithm was also used to optimise the number of neurons in the hidden layer and the weights and the biases of the network. This was an attempt to provide an automated ANN training procedure in order to achieve, if possible better results. The parameters used for LM and GA are presented next.

4.5.3.2.2 Training Parameters

The training parameters were set according to MATLAB, the software used in modelling ANN. Prior to training, users needed to specify two main parameters to terminate the training. First, a goal of the performance function (MSE in MATLAB) needed to be set. The goal was the minimum error needing to be achieved. Second, the maximum epochs/iteration for the training had to be specified prior to training. Since the Levenberg-Marquardt (LM) algorithm was used to train the network, four additional parameters also needed to be specified, namely, an initial value for the Marquardt parameter (mu in MATLAB), a decrease factor (mu_dec in MATLAB), an increase factor (mu_inc in MATLAB) and the maximum step size (mu_max in MATLAB). For more explanation about these factors/parameters see section 4.3.1.8.8 in chapter 4. The default parameters set by MATLAB were used in this study, as presented in Table 4.4. Genetic algorithms (GA) were also utilised in this study for optimising the weights and the hidden neurons of the MLP network thus some parameters also needed to be set before training, as detailed in Table 4.4. GA parameters were discussed in section 4.3.1.8. In summary, the main parameters used in this study for modelling ANN are presented in Table 4.4.

Table 4.4: The Parameters used in ANN and GA Modelling General ANN Training Parameters

Maximum number of epochs _10,000

Goal (Minimum error to be achieved) ₀

Performance Function MSE (modified to RMSE)

Levenberg- Marquardt (LM) Parameters

Initial Marquardt Parameter(mu) _{0.001 (default )} decrease factor (mu_dec) _{0.1 (default)} Increase factor (mu_inc ) 10

Maximum step size (mu_max) 1e10 Genetic Algorithms Parameters ( the main five)

Generation (iterations) 1000

Population size _{Yearly: max 100}

Daily: max 100 Monthly: max 150

Crossover Probability Scattered (0.7)

Mutation Rate Uniform (0.02)

Fitness limit ( Precision ) ₀

In document Hybrid computational intelligence systems based on statistical and neural networks methods for time series forecasting: the case of gold price (Page 87-91)