• No results found

Chapter 7 Training Heuristics and their Implementation 7. 1 8

learnt, this network with learnt weights can be used as the starting point for

training on the whole task. For example a network might be trained to find all

edges in images. This pre-trained network would then be used as a starting point for learning to find the edges of apples in images of apple trees.

The error surface for the main task is not modified by pre-training. The effect of pre-training is to start the network in a different position in error space. This may have two non-exclusive benefits; It may help to avoid the network becoming trapped in local minima. It may provide faster convergence due to the network starting in a position which is well within a basin of attraction of the optimal solution.

This method is most appropriate for an implementation of some of the other

heuristics such as simpler similar tasks. It is perhaps surprising that it works at all

for general sub-tasks as the same network is being trained to perform one task and then training is switched to make it perform a different task. The success of this method will clearly depend on the nature of the sub-task. Even when they appear quite dissimilar on the surface the pre-training task may bias the network to start the search for a solution in a more propitious region of weight space. Examples of the use of this method are described in chapter eight.

A dual input network is simply a network in which a second set of inputs is provided. When learning a sub-task the additional inputs allow the sub-task to be handled either by an additional sub-network or by an algorithm. The outputs of the unit performing the sub-task are connected as additional inputs to the main network. A NNWF with an algorithmic or subnet sub-task is shown diagrammatically above.

This method of implementation is similar to that used when coding the inputs to a network but provides both the original and the coded inputs. This can be

helpful if it is only one aspect of the information available in the inputs which is

being coded by the sub-task. For example, a particular task may have two inputs x and y. If the main task involves the product of x and y but also uses the two

Chapter 7 Training Heuristics and their Implementation 7 . 1 9

terms individually then a dual input network which accepts an xy term in addition to the original x and y inputs may be appropriate. Examples of the use of this method are given in the following chapter.

7 .4. 1 .5 M4 Dual

The main network is configured to provide a supplementary output which is

only used during training. A sub-task may be used to define the target for this supplementary output. By forcing the network to perform the sub-task in addition to the main task, the learning of the main task may be improved. It is

hoped that the additional error terms introduced by the second output will influence the learned function of hidden units in such a way as to assist in the performance of the main task. This method of training using dual outputs is

shown in the diagram above.

7 .4. 1 .6 MS Embedded Sub-Network

This implementation method requires provision of a sub-network which is first trained and then incorporated into the main network. This is shown diagrammatically above for a NNWF. The sub-network must be allocated its own targets during the pre-training. This method can be considered as a generalisation of method M3 which uses a sub-network but requires that it be at the front of the main network.

A sub-network may either process inputs directly or may process the outputs from a number of other sub-nets or sub-tasks. When the sub-network is

incorporated into the main network its weights can either be frozen, allowed to adapt slowly or treated in the same way as the other network weights. One advantage of using a sub-network for a sub-task is that errors can be back-

Chapter 7 Training Heuristics and their Implementation 7.20

propagated through a sub-network in the same way as they are back­ propagated through other sections of the main network. Use of the BP algorithm does not require that all weights be modifiable, only that the delta term of equation 2.7 can be calculated. Each neuron has a delta term which represents the portion of error which has been back-propagated to the neuron and back through its squashing function. The particular package used for the experimentation described in this thesis was Aspirin/Migraines [Leighton 1 99 1 ] . This did not allow for implementation of the second option of a sub-network with slower weight changes than the main network, therefor this aspect of embedded sub-networks has not been investigated. A sub-network with either fixed weights or weights which change at the same rate as the others in the main network was implemented using Aspirin/Migraines. An example is given in the next chapter [§ 8.2. 1 ] .

The inputs for the sub-network may either be taken from a scanned window or if non-image operations are being learnt from a file. In a similar way the targets for the sub-network may also be taken from a target image or from a file. For example, if the sub-task being learnt is the logical AND operation the inputs and target would have to initially be provided from a file. Later when the sub-network is incorporated into the main network the inputs would come from the scanned window, or possibly from other sub-networks which had already processed the window inputs.

When this method is used to implement the pre-learning of a useful sub-task a sub-network is trained to perform the sub-task. The advantage of using a sub­ network is that it can continue to adapt after incorporation into the fmal network. If the sub-task can only be defined in an approximate form this can be used for the pre-training. The exact function of the sub-network can be refined by further training once it is incorporated into the main network. Tills is

particularly useful when the sub-task is defined by a rule which only provides an approximate definition of the required sub-task.

In human training, a cue is a stimulus which already elicits the required response. In using this concept in training ANN s, a possible cue is first identified. It is then necessary to incorporate a sub-task into the network which can elicit the required response when presented with the cue. One method of doing this is to train a sub-network for the sub-task( i.e. cue -> required output). Once this has been achieved the cue can form a new or additional target for training the main network. For example, improved performance of the NNWF as a Marr Hildreth

Chapter 7 Training Heuristics and their Implementation 7.2 1

edge operator was obtained by the training of sub-networks. This is described in chapter five [§5.4. 1 ] .

7 .4. 1 .7 M 6 Embedded Sub

This method is very similar to method MS in which a sub-network is used within

the main network. However, in this case, it is an algorithm rather than a sub­

network, which is embedded in the network. The constraints on a unit which is

to be embedded in a neural network require careful consideration as they will limit the types of functions or algorithms which can be used.

Consider placing a functional unit with multiple inputs and a single output in

various positions within a back propagation network. If the unit is placed

between the inputs and the first hidden layer or in the first hidden layer, there are no special constraints as it does not take part in the weight modification calculations. All other positions can be considered as equivalent to each other in terms of constraints. For the forward pass, the function can be evaluated at the same stage at which the output of a neuron in the same position would have been evaluated. The output value from the function can then be used in exactly the same way as the output from a neuron would have been used. In the reverse pass, for back propagation of error, replacement of a neuron by a function block can be considered in the following manner; In order to back-propagate the error to the previous layer to calculate weight changes the partial derivatives of the function output with respect to each input is needed. These partial derivatives will need to be bounded and must not be continuously zero over any range of the input values.

Chapter 7

I nputs

Training Heuristics and their Implementation

Hidden Layer Hidden Layer k Output Layer

Figure 7.3: Use of an embedded function

7.22

The replacement of a single neuron by a general function g which receives inputs from several of the units in the previous layer and propagates an output to the following layer in the same manner as other neurons in the layer will only affect calculation of delta terms. Referring to Figure 7.3 above in which the top neuron in a hidden layer i has been replaced by a general function g. The delta term for the function will have the same form as in equation 2. 7 .

For a function g in layer i 0, =

(

w�O, }'

(x) . . . • . . . 7.1

Where the input to the function g is represented as vector x this will consist of outputs from various neurons in the previous layer j. Which in turn will be calculated from the neurons in the previous layer. The second term g' in 7 . 1 is required so that the functions delta term can be evaluated and used to calculate delta terms and hence weight changes in the previous layer. The function g plays no direct part in calculating weight changes in the layer in which it is

contained or subsequent layers.

Delta terms in the previous layer j have the form

For hidden unitj o1

= ( �wA}i

<a1 l . ...

.

. . . . ... .

..

....

.

. .

.

..

.

..

.

..

..

.. . 7.2

The summation is over all the units in layer i which contains the special unit g.

The term inside the summation when connection is to the function g in layer i will have the form

Chapter 7 Training Heuristics and their Implementation 7.23