REVIEW OF LITERATURE
2.5 Issues in neural network construction
2.5.4 Neural network validation
Liu and Yang (2005) advised that validation issues were generally related to the capability of neural network models to deal with data outside the training data set and the production of an acceptable forecasting performance. Their idea of the validating neural network models related to the generalisation from Kaastra and Boyd (1996). This generalisation is defined as “the idea that a model based on a sample of the data is suitable for forecasting the general population” (Kaastra & Boyd 1996, p.229). This appears to be the goal of using neural network models in real-world applications.
Researchers have sought guidelines for generalisation from neural network models. Two words, underfitting and overfitting, were used to describe two conditions of neural network models. Mjalli et al. (2006, p.333) defined underfitting as “the condition when a neural network that is not sufficiently complex fails to fully detect the signal in a complicated data set”; and overfitting as “the condition occurs when a network that is too complex may fit noise, in addition to signal” (p.333). Both underfitted and overfitted neural network models have lesser degrees of generalisation.
To achieve good performance or a higher degree of generalisation on neural network models, many researchers have sub-divided the data into three sets: training set,
28 validation set and testing set. Such researchers have included Kaastra and Boyd (1996), Zhang, Patuwo, and Hu (1998), Yao and Tan (2001a), Yao and Tan (2001b), Kwon et al. (2005), Yusof (2005), Mjalli et al. (2006), Palmer, Montano and Sese (2006), Sospedra et al. (2006), Mabu et al. (2007), Abdelmouez et al. (2007) and Barunik (2008). The training set is used to create neural network models; the validation set is used to evaluate the models and then the models delivering the best performance are selected to be used, and the testing set to evaluate the true accuracy of predictions (Sarle, 2002). This method is also known as the hold out method (Bishop, 1995, as cited in Sarle, 2002). However, no precise rule has been found in the literature in terms of the sizes of training, validation and testing data sets that should be used (Kaastra, & Boyd, 1996; Zhang et al., 1998).
Since the main goal of prediction tasks is to gain results close to the target, there being no definitive rule for the construction of forecasting models, researchers have tried to adapt some methods facilitated by software or tools they have used or some ideas they have developed in their research. For example, the neural network toolbox in
Matlab provides the number of epochs to be configured for the stopping of neural
networks. This may permit experiments without a requirement for a validation data set. Some researchers have divided data into two sets, a training data set and a testing data set. They held the testing data set as unseen data for their models. They trained the models with the training data set and tested with the testing data set. Gan and Danai (1999), Jaruszewicz and Mandziuk (2004), Chaigusin et al. (2008b) and Khan et al. (2008) have used this approach. Some researchers divided data into more than two sets and sub-divided each set into a training set and a testing set. For example, Kim and Han (2000) divided a ten year data set of the Korea stock price index (KSPI) into ten sets before sub-dividing each set by two, a training data set and a hold out data set which was for testing. Generally the relevant economic information for the stock prices are provided every three or four months by governments and companies, so training using the incomplete full-year data set may cause the model to miss learning some patterns, even though the models have been generated via learning from ten data sets.
Besides the hold out method, other methods such as window-moving and cross validation have also been adapted to be used by some researchers. Kim et al. (2005) used the window-moving method in time delay neural networks (TDNN). The performances of their neural network models were not entirely successful and they
29 recommended further research should be done for gaining more knowledge on the limitations of TDNN. Tsang et al. (2007) and Tilakaratne et al. (2007) also used the window-moving method in their studies. For these three studies, the authors did not compare the window-moving method with the other methods, as the performances of their models were influenced by many factors, such as the various selections of inputs in their domain applications, the numbers of hidden layers and the numbers of hidden nodes.
For the cross validation method, Luu and Kennedy (2006) compared the performances of neural network models with a 10–fold cross validation scheme and with a hold out method in the forecasting of the performances of Australian listed companies. They found that the best neural network model with hold out method achieved 58.7 percent accurate of prediction (Luu & Kennedy, 2006). The best neural network model with 10-fold cross validation delivered the best performance at 50 percent (Luu & Kennedy, 2006).
To summarise, almost all researchers used only one method rather than two or more methods for the validation of the models being employed. The accuracy or performance of the models may be influenced by many factors. It was difficult to decide which method, hold out or cross validation is a better method for prediction tasks. However, Luu & Kennedy (2006) have offered some useful advice for forecasting neural performance; suggesting that the hold out method is a more appropriate method in forecasting.