Applying thresholds to temporal sequences

In signing there exist transitional phases in which the hand moves from one handshape to another. At any point during this transition the hand will be in an intermediate handshape which combines elements of both the previous and the next handshape involved in the sign. In some cases this intermediate handshape may bear a close resemblance to another shape which is not actually involved in this sign (for example in moving from the fist to the flat handshape the hand will pass through a transitional handshape which is similar to the cup handshape). Using the simulated handshape data some experiments were run to examine how the networks trained on static handshapes would respond during these transitional periods.

A sequence of 100 randomly chosen handshapes was generated, and random noise in the range of -0.1 to +0.1 was added to the input values of these examples. Linear interpolation was then used to generate the intermediate handshapes formed by the transition of the hand between the shapes in this sequence. Four intermediate handshapes were generated between each pair of genuine handshapes. The noise was added prior to interpolation rather than after as it was hoped this would produce a more realistic simulation of the transition between handshapes.

Each handshape (both genuine and transitional) in this test sequence was presented to the network in order, and a magnitude threshold was applied to the value of the highest output node. The first time a node's output rose above the threshold value the handshape was classified as the associated class for that node. The sequence was not classified as the same handshape again until the value of that node had fallen back below the threshold level. This was intended to prevent a single handshape from producing multiple classification labels. This sequence was tested on a wide range of threshold values, the results of which are given in Table 7.5.

The errors have been broken down into two categories in Table 7.5 – misses (genuine handshapes for which no classification was produced) and false hits (classifications which were produced which did not correspond to a genuine handshape in the sequence). As would be expected lower threshold values reduce the number of genuine handshapes which are not classified, but also increase the number of intermediate handshapes which are falsely classified. The performance at a threshold of 0.7 is the best, representing an overall error rate of less than 2.5%

Table 7.5 Mean results of the networks applied to sequences of transitions between simulated handshapes generated using linear interpolation

Magnitude threshold

Misses False hits

0.4 0 232 0.5 0 113 0.6 0 55 0.7 2 10 0.8 80 2 0.9 506 0

The linear interpolation used in these experiments is a poor simulation of the movement of the human hand, as it assumes that hand features such as finger joints can change their direction of movement instantaneously. It would be expected that the signer's hand would tend to slow down as a genuine handshape is made, hold that shape momentarily as the sign is performed and then make the transition to the next handshape. To more accurately reflect this the test sequence was recalculated using quadratic interpolation and a larger number of samples (119) between the genuine handshapes. Three different test sequences were generated at noise levels of 0.1, 0.2 and 0.3. The results of the networks on each of these data sets is recorded in Table 7.6.

Table 7.6 Mean results of the networks applied to sequences of transitions between simulated handshapes generated using quadratic interpolation

Amount of noise (+/-) Magnitude threshold 0.4 0.5 0.6 0.7 0.8 0.9 0.1 Missed 0 0 0 0 0 174 Extra 1 0 0 0 0 0 0.2 Missed -9 -3 -2 0 -1 175 Extra 210 78 24 5 0 0 0.3 Missed -38 -11 -8 -2 -2 179

Time threshold =1 Extra 749 320 151 67 29 15

0.3 Missed 1 1 1 1 1 351

Time threshold =2 Extra 0 0 0 0 0 0

The performance of the network was much better than on the sequence generated using linear interpolation. The negative numbers in the missed

column indicate that the network missed no handshapes but instead produced some 'double hits' where multiple classification labels were output for a single genuine handshape, as the net output temporarily dipped below the magnitude threshold before exceeding it again.

For the highest noise level an extra parameter was added to the algorithm. This was a temporal parameter which specified the amount of time which the network's output had to remain above the magnitude threshold before a classification would be made. It can be seen from Table 7.5 that using a value of 2 time steps for the time threshold successfully eliminated all of the false hits recorded by the networks, and produced near perfect overall performance over a wide range of magnitude thresholds.

The quadratic interpolation used does not include the co-articulatory features of signing described by Peters (1992). However even taking this into account it would appear that magnitude and temporal thresholds are of value in applying a static handshape network to a continuous stream of handshapes. The quadratic interpolation possibly exaggerates the extent to which genuine handshapes are held by the user during signing but the results show that slowing of the hand during the formation of a genuine handshape should aid in distinguishing that handshape from any intermediate handshapes.

The combination of a magnitude and temporal threshold developed during this research was used in the experiments in sign segmentation described in Section 11.1.8.

In document Recognition of sign language using neural networks (Page 116-118)