• No results found

4.4 Numerical Results

4.4.3 Preparation of the Neural Network and the Training Patterns

The statements in the previous subsection illustrate that the selection of the data plays an important role in deriving an efficient model for forecasting stock or index prices. In general, the following aspects have to be considered for the choice of the data (see also [19]):

1. Do the data satisfy the requirements regarding quantity and quality?

2. Which data are suitable as input and output data with regard to the problem formulation and are added to the neural network?

4.4 Numerical Results 61

3. In which form are the data to be presented to the neural network?

Regarding the quantity, it has to be guaranteed that a sufficient number of data sets are available, since the neural network gain their knowledge out of these data. The more extensive the data material is the better the different situation is covered and the smaller the runaways in the measured values is weighted [19]. However, one has to consider that the preprocessing of the data and the learning process gets more expensive with increasing data material. Furthermore, in this case the risk increases that contradictory information is added to the network which might make worse the learning result [19].

The quality of the data is the basis for the function built by the neural network, i.e. one has to clarify which are the possible values having a causal influence of the predicting value [87]. The choice of the data is certainly influenced by ones conviction regarding the success of technical or fundamental analysis in forecasting stock or index prices (see also the previous subsection). In this connection also the principle holds “garbage in – garbage out”, i.e. the quality of the statements of the neural network largely depends on the validity and consistence of the data underlying the training process of the network [19]. However, as already mentioned above, often it is not possible to specify concretely which variables influence the examined object. Hence, it is necessary to include many variables into the model to increase the probability that the scope is described comprehensively.

The data which crucially determine the examined problem are to be chosen as the input data. The output of the neural network is determined by the wanted information [19]. In the case of stock or index price forecasting, the input vectors are the lagged values of a time series and the output is the prediction for the next value [39]

Moreover, within the scope of the representation of the data, we have to deal with the aspects of scaling and preprocessing of the data. The scaling of the data applies to the task that the data which are supplied to the network are mapped onto the interval which can be processed by the network [19]. The choice of the interval depends on the activation function. In our case, we have chosen the logistic function (see (4.1.1))

σL(r) =

1

1 + e−kr.

For this function, it is necessary to scale the input and output data onto the interval [0; 1]. In doing so, we have chosen the interval [0.1; 0.9] since the end points 0 and 1 are not reached by the logistic function. The preprocessing of the data deals with the question in which form the information should be made available to the neural network and which form the output of the neural network should have. In the field of financial forecasting, the preprocessing functions are often derived from technical analysis [78] in order to capture some of the underlying dynamics of the financial markets (see for instance [86], [109]). Regarding the output of the neural network, the preprocessing prescribes whether point predictions are carried out or the network output informs about the trend of the examined time series.

Construction of the training pattern

When the choice of the input and output data is made, the patterns for the training of the network have to be arranged. In this connection, we stress again how important the choice of the learning

62 4 Forecasting with Neural Networks

patterns is for the attainable efficiency of a neural network. The patterns in its entirety must represent as good as possible the set of all possible situations in the relevant part of the capital market. On the other side, the information in the training set must not get arbitrary large since the capability of learning, i.e. the storage capacity, in a neural network is limited.

Training:

Input Output

1. pattern (DAX1, DAX2, $1, $2) (DAX3)

2. pattern (DAX2, DAX3, $2, $3) (DAX4)

..

. ... ...

10. pattern (DAX10, DAX11, $10, $11) (DAX12)

Forecasting DAX13:

Input Output

1. pattern (DAX11, DAX12, $11, $12) unknown

FIGURE4.5: Arrangement of the training and forecasting patterns

We will explain the arrangement of the training and forecasting patterns by the following example (see also Figure 4.5). Suppose we wish to forecast the German stock index DAX for instance for the month 1/1994, i.e. the closing value of the last day of the month 1/1994 (in Figure 4.5 this value corresponds to DAX13). Furthermore, we assume that lagged values of the time series

of the DAX itself and the US–Dollar are to present to the network, and that this are the values of the last twelve months each, i.e. the DAX and US–Dollar values of the months 1/1993 until 12/1993 (these are the values DAX1− DAX12and $1− $12, respectively, in Figure 4.5). Each of the input

vectors are to contain two scaled DAX values as well as two scaled US–Dollar values. Since each neuron in the input layer of the neural network only contains one value, thus, the first layer of the network consists of four neurons. The output of the pattern consists in a scaled DAX value according to the given problem formulation. According to the available data, we are now able to construct ten training patterns. The composition of this training patterns is made clear in Figure 4.5. This figure shows that the input vector contains the values of the DAX and US–Dollar which directly preceded the output value, i.e. if the output value for instance represents the DAX value of the month 5/1993 then the input vector of this pattern contains the DAX and US–Dollar values of the months 3/1993 and 4/1993.

After the termination of the training of the neural network, the actual vector of input data, i.e. the DAX and US–Dollar values of the months 11/1993 and 12/1993, is presented to the network in the example (this corresponds to the forecasting pattern in Figure 4.5). Then, the output of the neural network represents the prediction of the DAX value for the month 1/1994.

4.4 Numerical Results 63

Note that the number of values in the input vector which belong to the same time series and the number of training patterns determine the period of observations that is available for the learning process.

The design of the neural network

The solution of the questions how many layers or how many units in any of the layers enables the best forecast is mathematically unsolved. It turned out that different results were achieved when the numbers of layers and units in any of the layer were varied; however, a deterministic method for the choice of the optimal network is not known so far. In principle, neural networks with only one layer of hidden neurons are already able to approximate any structure contained in a data set [78].

Regarding the number of neurons in each of the layer, we note that the number of neurons in the input and output layer are already determined by the decision which variables are to be presented to the network and the number of forecasts to be made with just one network, respectively. For the determination of the necessary number of units in the hidden layer, usually the strategy is followed such that one starts with a network whose dimension is too large for the given problem and subsequently removes superfluous units.

Thus, an optimal configuration of a neural network can only be found in an extensive trial– and–error process.

Overfitting

After the training patterns and the neural network architecture is fixed, the training process of the neural network, i.e. the adjustment of the weights in the neural network, is started.

Now, if the training process is carried out until the error of the outputs of the neural network and the target outputs in the training patterns is minimized, then the neural network strives to perfectly map the presented data. However, the extracted structure cannot be used for a generalization. Figure 4.6 makes clear that in case of a perfect map of the training patterns the neural network generally generates poor forecasts. This is the case because time series not only contain deterministic behavior but also a certain portion of noise that should not be modelled by the network. This problem of fitting the noise in addition to the signal is called overfitting [39].

Moreover, the problem of overfitting is increased by the constellation of a large number of parameters compared to a small number of training data. In problems like forecasting stock or index prices often there are only a few training data available compared to the complexity of the network parameter [109], especially when the adaption of the weights is carried out on the basis of monthly data [44]. This enables the network to obtain an arbitrary exact adaption to the training data without building an ability of generalization.

In forecasting, it is less important how well a model fits the training data. The aim of the training process rather is to achieve an optimal ability of generalization of the neural network, i.e. the network should extract regularities from the training examples that do transfer to new examples [19].

64 4 Forecasting with Neural Networks

training forecasting

NN

DAX

FIGURE4.6: Training and Forecasting period of a neural network 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0 200 400 600 800 1000 1200 1400 iterations training test

FIGURE4.7: Error on the training set and the test set

Stopping criterion for the training process

One of the most well known techniques for attacking the overfitting problem [78] and achieving the ability of generalization of the neural network is the early stopping procedure, also called the Stopped–Training method.

In this method, the time when to stop during the learning process is determined when the ability of generalization of the network decreases. In this connection, the concept of cross validation has been proven successful. The idea of this concept is to divide the available patterns of input/output data into two disjoint sets (see e.g. Zimmermann [109]):

1. The set of training data serves for the actual learning process, i.e. the adaption of the unknown weights.

2. The set of test data is used after each iteration of the learning algorithm to compute the error generated by the network.

The plot of the error in predicting points out of the training set and the error in predicting points out of the test set in Figure 4.7 shows that the former error decreases monotonically during the complete training process, see the line in Figure 4.7 labelled with “training”. By contrast, the error on the test set will initially decrease as the network starts to learn the interactions of the input data, but then will begin to increase once the network starts to learn the noise, see the line in Figure 4.7 labelled with “test”. The location of the minimum of this error determines when the effective network complexity is right [39], i.e. this state of the neural network presents the best possible weights and accordingly should be saved and documented.