• No results found

Classification using cyclic input data

7.4 Representing cyclical data

7.4.2 Classification using cyclic input data

In order to test the performance of the various input encoding schemes for a classification task two artificial data sets were developed. This experiment was designed to test the impact of cyclical data encoding on the learning speed and generalisation of a neural network on a classification problem. The artificial data-sets were inspired by work on prediction of wind-speed and direction data. Eight points were selected in the cartesian co-ordinate system, and these were treated as the central points for eight different classes. By adding various amounts of flat noise to these (x,y) co-ordinate pairs multiple examples of each class were generated, and divided into training and test sets of equal size. The data sets contained 200 examples of each class.

+1 -1 -1 +1 +1 -1 -1 +1

Figure 7.3 and 7.4 The box and bullseye data sets. The dots represent the eight base points, and the shaded regions represent the neighbourhood of each base point at a

In the first data set the base points were distributed as shown in Figure 7.3, and the noise was added to the x and y co-ordinates of the cartesian representation of these points. This will be referred to as the box data set. For the second data set the base points were distributed as shown in Figure 7.4, and noise was added to the angle and magnitude components of the polar representation of these points. This will be referred to as the bullseye data set. For each example the network was presented with the distance and angle of the point from the origin of the co-ordinate system, and trained to classify the point as belonging to one of the eight classes. In all cases the network had eight units in both the hidden and output layers. The number of input units was determined by the encoding system used, with the linear and cartesian encodings using two input units and the trig and sawtooth encodings using three input units.

For both the box and bullseye data several data sets were created, differentiated only by the amount of noise added to the basic points to produce the examples. For the box data the noise levels used were 10%, 25%, 50% and 60% of the distance between the base points. The first two levels are low enough to retain complete separation between the classes. At 50% the boundaries of each class touch those of its neighbours as shown in Figure 7.3, whilst at 60% there is an area of overlap between bordering classes. For the bullseye data the noise levels used were 25%, 37.5%, 50%, 52.5% and 60% of the distance between the base points (for example, at a noise level of 50% the magnitude noise ranged from -0.25 to 0.25, and the angular noise from -45° to 45°). Again the first two noise levels maintain separation between the classes, at the third level the class boundaries are touching (as in Figure 7.4) and at the higher levels there is overlap between the classes.

At each noise level ten different networks were trained for each encoding method. Each individual network was trained until it reached 100% correct on the training set, or until it timed out (an upper limit of 500,000 pattern presentations was set). In either case a record was kept of the maximum classification percentage on the training set, and the corresponding percentage on the test set. It was found that for each noise-level the networks either always converged to 100% or always failed to converge. For the noise levels where the networks could converge the number of pattern presentations is a measure of how quickly this convergence occurs. For the non-converging examples the number of pattern presentations is a measure

of how quickly the network reached its maximum level of performance. The results presented in Tables 7.9 and 7.10 are averages over the ten trials.

Table 7.9 Summary of the performance of the different cyclical data encoding techniques on the box data set at varying levels of noise

Noise level Linear Cartesian Trig Sawtooth

10% PPs 39,000 6,000 5,000 7,000 % training 100.0 100.0 100.0 100.0 % test 100.0 100.0 100.0 100.0 25% PPs 362,000 9,000 8,000 9,000 % training 99.7 100.0 100.0 100.0 % test 99.2 100.0 100.0 100.0 50% PPs 452,000 464,000 334,000 282,000 % training 93.5 98.0 98.2 98.2 % test 92.6 97.5 97.4 97.6 60% PPs 432,000 339,000 261,000 215,000 % training 82.7 84.4 84.8 84.7 % test 79.7 81.7 82.2 82.1

Table 7.10 Summary of the performance of the different cyclical data encoding techniques on the bullseye data set at varying levels of noise

Noise level Linear Cartesian Trig Sawtooth

25% PPs 39,000 52,400 10,000 10,000 % training 100.0 100.0 100.0 100.0 % test 100.0 100.0 100.0 100.0 37.5% PPs 156,000 109,000 14,000 14,000 % training 100.0 100.0 100.0 100.0 % test 100.0 100.0 100.0 100.0 50% PPs 432,000 354,000 377,000 436,000 % training 98.1 97.0 99.5 99.4 % test 97.9 97.4 99.2 99.1 52.5% PPs 400,000 302,000 276,000 300,000 % training 91.5 91.1 92.3 92.3 % test 90.3 89.4 90.9 90.9 60% PPs 323,000 320,000 334,000 322,000 % training 73.7 74.5 74.4 74.5 % test 71.6 70.3 71.4 71.7

The main conclusion to be drawn from these results is that the trig and sawtooth encodings performed very similarly regardless of the data set or noise level. It appears that the network was able to adapt to the noise distorting properties of the trig encoding during training and so this had no impact on the final performance of the net. Overall these two encoding schemes outperformed either of the other two encodings.

The performance of the cartesian encoding varied considerably between the two different problems. On the box data set the cartesian encoding's performance was comparable to that of the trig and sawtooth encodings, although it took longer to train at high noise levels. On the bullseye data however the cartesian encoding learnt and generalised less well than either of these techniques at higher noise levels, and took considerably longer to train at lower noise levels.

The linear encoding consistently performed slightly worse than the trig and sawtooth encodings on both data sets, usually obtaining lower classification accuracies on both training and test data. In addition this encoding scheme trained much more slowly on the examples where convergence to 100% correct was possible.

On the basis of these experiments it would appear that either the trig or sawtooth encoding would be the best option in encoding cyclical data for input to a classification network.