2.3 Technology Methods
2.3.6 Artificial neural networks
Artificial neural networks (ANN) are computational structures inspired by the neural networks of biological brains. The original perceptron model was developed by Rosenblatt in 1958 [122]. This was picked up in 1986 when David Rumelhart, Geoffrey Hinton and Ronald Williams published ”Learning Internal Representa-tions by Error Propagation” [123]. They purposed the multilayer neural network with non-linear, but differentiable transfer functions and reasonably effective train-ing algorithms. When writtrain-ing ANN in this paper this refers the general technique and includes all the different types discussed and used in this project.
Artificial neural networks consist of a set of interconnected neurons that interact with each other by sending variable strength activation signals. Each connection has an associated weight, i.e. parameter, which modulates the activation signal.
The activation signals from all connections to a neuron are combined by a transfer function. This function is typically just the summation of the individual weighted activation signals. The transfer function’s output is passed to an activation func-tion which fires if the output surpasses a threshold. Some implementafunc-tions use an S-shaped activation function, such as a sigmoid function, instead of binary acti-vation. The resulting activation function output is then passed along the neurons outgoing connections.
Input
Input
Input
Output
Output
Output Input
layer
Hidden layer
Output layer
Weigts Weigts
Figure 2.14: A feed forward neural network with a single hidden layer.
Activation output Weighted
inputs
Input
Threshold activation
function transfer
function
Net weighted inputs Weights
Neuron
Input Input Input
Figure 2.15: A neuron sums multiple weighted inputs from other neurons and applies it to an activation function. If the net weighted input exceeds the activation threshold, then the activation function will fire a signal known as the activation output.
ANN Types
Here the different ANNs relevant for this paper is presented. This is a small subset of the different ANNs that exist, but this subset has been chosen for their relevance in this domain. There are four ANN types at use in this project; they again fall into two categories. There aare the common traditional multilayer perceptron neu-ral networks (MLPNN) of which feedforward neuneu-ral network and recurrent neuneu-ral network are members. Then there is the more advanced radial basis function net-works (RBF) of which probabilistic neural network (PNN) and general regression neural network (GRNN) is used in this project. For an overview of these and their relationship see figure 2.16
Feedforward Networks. Feedforward neural networks (FNN) are thought of as the standard ANNs. They are connected like a directed acyclic graph, with an additional restriction. All neurons are parts of one layer, and all neurons in one layer can only connect to one other layer, creating a chain of layers. For an example with one hidden layer, see figure 2.14.
Recurrent Neural Networks. Recurrent neural networks (RNN) are similar to multilayer feed forward networks with the addition of cycles between layers. There are several types depending on where the cycles are, but the common purpose is
56 Technology Methods
ANN
RBF MLPNN
FNN RNN GRNN PNN
Figure 2.16: The hirearki of the ANNs used in this system
to provide memory and attractor dynamics in the network [150]. One of the draw-backs with recurrent networks is that they tend to have complex and often obscure behavior and are considered harder to train and understand. Jordan[88] was the first to create such networks, but Elman’s [54] adaptation is more commonly used.
The main difference between feed-forward networks and RNNs is that the latter naturally takes into account previous signals and can thus achieve memory and a ”sense of time”. This makes RNNs well suited to discover temporal patterns [21; 71; 72; 86]. In an Elman network the first hidden layer feeds its input to a context layer as well as the next layer. This context is then fed back to the input as part of the next datapoint, see figure 2.17.
RBF Networks. Both general regression neural networks (GRNN) and proba-bilistic neural network are branches of radial basis function (RBF) networks[109].
RBF network have been shown to have a higher computation precision as well as faster convergence rate than back propagation network [110]. The GRNN and PNN model was proposed by Specht, and is capable of estimating any function of historical data [136]. These two basically have the same architecture. Prob-abilistic networks perform classification where the target variable is categorical, whereas general regression neural networks perform regression where the target variable is continuous. The GRNN has four layers 2.18. General regression neural networks (GRNN) are used in some papers for time series prediction with good results [98; 152]. PNN has also shown good results on time series prediction [124].
In the input layer as with normal ANNs each neuron corresponds to each pre-dictor variable. The hidden layer has one neuron for each case in the training
Input
Input
Input
Output
Output
Output Input
layer
Hidden layer
Output layer
Weigts Weigts
Contex layer
Figure 2.17: The architecture of an Elman recurrent neural network
dataset and uses an RBF function on the Euclidean distance between the current case and the neuron’s center. For GRNNs the next layer, the summation layer only has two neurons. The denominator neuron sums adds up all weight values from the hidden layer neurons. The numerator neuron adds the weight values multiplied with the actual target value for each hidden neuron. In the last layer the lone decision neuron then divides the numerator by the denominator to find the predicted target value.
All the ANN types mentioned here are in use as time series predictors in trading systems. The MLPNNs are also used as portfolio selectors by mapping the portfolio positions to the output neurons.
58 Technology Methods
Hidden layer
Input
Input
Input
Input layer
Output Summation
layer
Decision node
Figure 2.18: The architecture of a General Regression Neural Network
Training the Neural Network
The methods used to train the system are of the same level of importance as the choice of predictors themselves. Both prediction and portfolio agents need train-ing to be effective. Due to their differtrain-ing structures some need different learntrain-ing methods. However, some learning methods like evolution is universally applica-ble to meta-parameters, and may even be used for initial application on learning parameters. For low level parameter optimization in neural networks variants of backpropagation can be used. For parameter search, few techniques can com-pete with the ease and effectiveness of evolution. Evolution perform well at broad search, especially where the search space are complex. Evolution is, however, weak at fine tuning. This is where backpropagation shines, thus making the two tech-niques ideal for combination into a hybrid solution. Many studies have shown that this hybrid approach performs well [29; 90; 144; 149; 157]. It is both well tested and flexible, and thus worthy approach to apply. There are, however, others with several advanced techniques that should be considered. Alternatives to the initial technique will be considered if the hybrid model does not deliver.
Training a neural network involves adjusting its weights (and thresholds). This can be done with regard to some metric of performance. Typically this is done by minimization of the error between a target output, i.e., label, and the predicted
output. In practice this is usually done with a specialized gradient decent algo-rithm such as backpropogation. Since gradient decent is a greedy algoalgo-rithm, backpropogation suffers from the same common problems associated with this technique. It may get stuck in a local optima of multimodal solution spaces and it need a differentiable error function[153].
Aside from backpropagation there are conjugate decent algorithms, also based on gradient decent, with much the same advantages and drawbacks. To overcome the disadvantages with the gradient decent approach evolution can instead be used on the connection weights. Evolution can effectively find near optimal set of weights without computing gradient information. A disadvantage with evolution is that it cannot guarantee a optimal solution and often have problems when it comes down to fine-tuned search. Thus a hybrid solution between evolution and a gradient decent based approach is common, and shows promising results[153]. Evolution does the rough search and then may use gradient decent for the last part of the search.
Another problem for the classical backpropagation is the degree to which weights are adjusted. Resilient propagation is a modified version of backpropagation that deals with this issue. Contrary to gradient decent, resilient propagation only uses the sign of the change, not the magnitude. Resilient propagation keeps in-dividual delta values for each weight and uses these when the weights are to be adjusted. The gradient is then used to determine how much the deltas should be adjusted [84; 120].
ANNs are very prone to overfitting as they will continue to adjust until they perfectly fit any input data [155]. To avoid this problem it is important to stop the training when at the point where the generalization ability starts to fall. It is also important to have a large set of data relative to the size of the network.
Evolving ANN Architecture
When referring to an ANN’s architecture this include the topology and the transfer functions of a network. This includes of how many layers are used, how many nodes in each layer has and how the neurons are interconnected. The architecture of a network is crucial for its information processing capabilities [153]. Traditionally the architecture design has been a human expert’s job.
When presented with a large solution space and no known algorithm to easily cre-ate an optimal architecture it is common to use trial and error. There has been done some research toward automatic design by constructive or destructive algo-rithms [135]. Angeline et al. [15] points out the weakness of such approaches as they tend to end up at local optima, they also only investigate a restricted
topolog-60 Related Work
ical subset rather than the complete architectural search space [153]. More recent research on automatic architecture generation focuses more on genetic algorithms.
These are less computationally intense, cover a large search space and are less prone to over-design than those designed by human experts[34]. The search space to find a suitable ANN architecture is infinitely large, non-differentiable, complex, deceptive and multimodal[153]. All these characteristics make it hard or impos-sible to use traditional algorithms like gradient decent. This leaves evolutionary algorithms using genetics a better chance to find good solutions to the architecture problem.
Miller et al. shows that evolution is well suited for finding neural network archi-tectures. This is because evolution is adept at this type of search spaces[107].
• The search surface is infinitely large as one can always add more neurons
• The search surface is nondifferentiable as there are a discrete set of neurons
• The search surface is noisy since the mapping between architecture and per-formance is indirect.
• The search is deceptive as small changes in architecture can make large differences in performance
• The search surface is multimodal since different architectures may have sim-ilar performance
Yao shows in his review of evolutionary ANNs (EANN) that evolution has ex-cellent performance on finding good architectures for ANNs. Furthermore he recommends evolving the weights in the same process[153]. Backpropagation is the quintessential training method for ANNs and a combination of these two ap-proaches in hybrid form has, as mentioned, been successfully validated in many studies[29; 90; 144; 149; 157].