Hidden Layers Study - Accuracy and Generalisation Studies

6.5 Accuracy and Generalisation Studies

6.5.1 Hidden Layers Study

Problem Description [Bodis, 2004] states that:

“The optimum number of hidden layers is highly problem dependent and is a matter for experimentation.”

[Crone, 2005, Kolarik and Rudorfer, 1994, Bodis, 2004, Rudorfer, 1995] all conclude with a single hidden layer. However, they all use technical data, which means the problem tackled in this project is not the same, thus a study is warranted.

Justification of Choice

As the number of hidden layers are increased, the network’s ability to learn the input data also increases as the ANN can replicate more complicated functions. However, as stated in the Literature Review and pruning algorithms research, if the ANN over-fits the learning data, out of sample predictions usually deteriorate because the network loses generality. Therefore, similarly to constructive algorithms the number of hid-den layers in this study should be incrementally increased until testing performance deteriorates.

The accuracy measures have thus far proved good indicators of ANN performance and therefore they will also be used in this study to determine the accuracy of test predictions.

There are two reasons why ‘over-fitting’ would be particularly negative for this do-main. Firstly, as time progresses companies usually alter their business model and general marketing tactics in order to keep in touch with competitive and rapidly changing modern markets. This will result in investors adapting their evaluation strategy. As a consequence, the functions mapping inputs to the target output will be dynamic over time, with probably only the general trends and patterns being maintained. If the learning ‘over-fits’ these functions it will learn detailed patterns and correlations which may only exist over the learning period. This would cause

the test predictions to be incorrect because the learnt mappings are by then out of date. Secondly, the share price time-series contains a large amount of noise, which follows a random pattern. If the network ‘over-fits’ the share price then it will learn the noise patterns as well. The noise is random in nature and this could hamper test predictions because during the testing phase the noise follows an unrelated pattern.

As a result, the optimum number of hidden layers is one, because this gives the network the optimum generalising abilities.

Hypothesis

The optimum number of hidden layers will be one. Having two hidden layers will cause the learning to ‘over-fit’ which will have a detremental consequence on test predictions.

Study Framework

For this study the original framework will be re-applied, so five iterations of training and testing will be applied per trial.

The following settings will be applied to the network:

• Each time the connections’ weights are to be re-initialised to random values, within the range [-1, 1].

• Shuffling: yes

• learning rate: 0.2

• learning algorithm: back-propagation with online weight updates

• epoch: 10000

• topology: fully connected, architecture will be trial dependent

• activation function: logistic

Both visual inspection and accuracy measures will be used to determine the perfor-mance of testing. Visual inspection will be used to determine which prediction is the median. Then accuracy measures will be determined to compare the performance of the different trials. When a trial shows deteriatorated testing predictions the study will cease.

Results

Tables 6.13 and 6.14 show the accuracy measures for testing and training.

Table 6.13: The accuracy measures computed on the median test predictions, made during the hidden layers study.

Trial Medians - Test Predictions

Hidden Layers Absolute Direction Mean Information

1 0.78465 0.65217 0.86260 2.86547

2 -0.67044 0.52174 0.99236 6.59434

Table 6.14: The accuracy measures computed on the median training predictions, made during the hidden layers study.

Trial Medians - Training Predictions

Hidden Layers Absolute Direction Mean Information

1 0.97003 0.5 0.24767 1.11511

2 0.98606 0.55333 0.16846 0.78225

Table 6.15: T-tests on the four accuracy measures, to determine the independence of the results

T-test Results. The t stat is for the two tailed test.

Absolute Direction Mean Information

df 7 8 7 5

t Stat 4.25019 1.04257 -4.02977 -5.25950 Significants 0.0028 0.3318 0.0050 0.0033

This study was concluded after only two variations were tested. These results show that having a single hidden layer provided the best test predictions¹⁰. For the two hidden layers trial, four out of the five iterations gave negative absolute error re-sults. All five single hidden layer iterations provided better mean and information error figures than the two hidden layers iterations. Table 6.15 shows the findings from applying the T-test to the results. Three of the four accuracy measures showed statistical significants above 99.5%. This is statistically significant, therefore the two sets of results were independent, thus the findings of the study are reliable.

In the single hidden layer iterations, the consistency of predictions was visibly better.

This could have been forecast, because when a network ‘over-fits’ during learning it replicates a function which generates the target output over a finite section. Many functions will be able to do this, but they will all be extremely complex because of the volatile nature of the target. Because they are complex, these functions have the potency to be very diverse outside of the fitted finite section. Increased generality is the result of simplifying the mapping functions. If simpler functions have similar outputs over a finite section they lack the potency to dramatically diverge when out-side this section. Figures 6.10 and 6.11 visually demonstrate this explanation. It is now possible to explain why the consistency of test predictions is better with a single layer.

10See Appendix C.1 to see the trial charts and performance results.

Figure 6.10: These demostration charts are designed to give a visual explanation of the reasoning behind the consistency variation between hidden layers. The black line is the time-series being learnt. The red and green lines are the two functions developed to replicate the black line. The rectangular box to the right is the test prediction segment of the function. In this example over-fitting has occured, outside the learning segment the functions produce dramatically different results.

Figure 6.11: In this example chart the network has ‘generally’ learnt the training target. Because the functions are very simple they diverge less aggressively outside the training segment.

Conclusions

Figure C.1 shows that the two hidden layers network fitted the target during learning more closely than the network containing a single layer. The learning error values also demonstrate this. These findings, together with the findings from the T-test, shown in table 6.15 suggests that the hypothesis was correct. However, it is inter-esting that with two hidden layers the network still showed some generality during learning. This means that ‘over-fitting’ can be quite loose, that is, a network can show some signs of generality and still be seen as ‘over-fitting’ the learning set. It will be interesting to see how the training set is learned after pruning, as it appears the network already loosely fits the training target and pruning should exaggerate this.

A single hidden layer provided the most accurate predictions, as well as higher con-sistency between iterations of learning. Henceforth, this project will exlusively imple-ment a single hidden layer.

6.5.2 Accuracy and Generalisation Studies: Part 2 - Online vs

In document An Investigation Into Stock Market Predictions Using Neural Networks Applied To Fundamental Financial Data (Page 79-83)