Accuracy Testing - Faster upper body pose recognition and estimation using compute unified devi

This section describes the tests that aimed to compare the accuracy of the four implementations. The aim of the comparison was to determine whether an improved or sustained accuracy was achieved by the modified algorithm in accordance with the re- search question. The following subsections discuss the criterion for a correctly recognized sign, used as a metric for accuracy testing, followed by the procedure used to conduct the experiments and the accuracy test results and analysis.

6.2.1 Criterion for a Correctly Recognized Sign

Each input frame maps onto an estimated output frame in Blender. The estimated frame is generated by setting the positions of the wrists of the 3D humanoid avatar to the wrist positions predicted from the frame. There are many ways to measure the correctness of the estimated result such as performing a comparison of the joint angles and/or positions between the input and estimated frames. An alternative method is to perform a visual comparison between the input and estimated frames.

A point that becomes relevant in deciding between these methods is to consider the goal of the SASL project, which is the translation between SASL to English. The project is intended to create animations to be viewed by human beings. Therefore, it is considered sufficient for a visual comparison to indicate whether a match has been obtained. The estimated outcome is dichotomized as either being a match or non-match on a per-frame basis. Figure6.2 illustrates a potential match.

(a) Input frame (b) Estimated frame

Figure 6.2: _{Visual comparison for a potential match.}

Chapter 6. Experimental Setup and Analysis of Results 77

Assessor Signs

Assessor 1 Bye, Curtains, Eat, Light, Right, We, Wide

Assessor 2 Away, Crackers, Dress, Left, Love, Run, We, Wide

Table 6.2: _{Signs inspected by the assessors.}

6.2.2 Experimental Procedure

The 420 sign videos from the 30 test subjects were used as input to the four implementations. In each case, the output of the system was analyzed using the criterion for a correctly recognized sign. In order to obtain an unbiased result two independent

assessors were used to perform the visual comparisons. Table 6.2 identifies the signs

that were assessed by each assessor. Each assessor was instructed to inspect every input frame and the corresponding estimated frame and determine whether there was a match between the frames.

Table A.5 in Appendix A contains the total number of frames in each sign video per-

formed by each subject.

The complete set of results of the number of matches for OrigCPU and ModCPU are

provided in Tables A.6 and A.7, respectively, in Appendix A. It was found that both

implementations running the original algorithm—OrigCPU and OrigCUDA—and both implementations running the modified algorithm—ModCPU and ModCUDA—achieved exactly the same results. For this reason, these results have been omitted. The comparison of accuracies is, therefore, carried out between the original algorithm and the modified algorithm, referred to as “Orig” and “Mod” for the purposes of the current experiment. Table 6.3 depicts the average number of matches for each sign, across all test subjects, as a percentage of the total number of frames in each case.

Analyzing Table6.3, the accuracies range from 78.72% to 93.71% for Orig and 83.61% to 97.33% for Mod. Overall Mod achieves a higher average accuracy of 93.08% compared to 87.35% for Orig.

A statistical test was used to determine whether the difference in accuracy between Orig and Mod for each sign was significant. The accuracy values are bounded above by 100 and the values are skewed for most of the signs. For this reason, the non- parametric Signed Rank Test was identified as being appropriate to use to test for significant differences between the algorithms. Table 6.4 summarizes the results of the test. It can be seen from the table that there are significant differences in favour of Mod (p < 0.001 ) for 9 out of the 14 signs, highlighted in the table. The accuracy of Mod for the remaining 5 signs is no less than that of Orig.

Chapter 6. Experimental Setup and Analysis of Results 78

Sign Orig(%) Mod(%)

Away 90.15 90.41 Bye 87.46 97.33 Crackers 78.72 84.07 Curtains 85.54 95.66 Dress 89.29 90.66 Eat 90.54 95.05 Left 89.69 98.80 Light 92.72 95.82 Love 93.71 93.05 Right 85.23 96.17 Run 83.92 83.61 We 87.39 90.69 Why 82.60 94.68 Wide 85.93 97.10 Average 87.35 93.08

Table 6.3: _{Mean accuracy of the original and modified algorithms per sign.}

The observation that 9 out of the 14 signs register significant increases in accuracy with the modified algorithm and the remaining 5 do not is attributed to the differences in the amount of exposed skin between the two groups of signs. The 9 signs that perform better all expose larger amounts of moving skin. The majority of the motion in the remaining 5 signs occurs towards and away from the camera, therefore exposing small amounts of moving skin. As expected and explained in the previous chapter, signs that expose larger amounts of moving skin are expected to achieve a higher accuracy due to the optimized skin detection component. However, the frame differencing background subtraction technique was chosen over GMMs. Therefore, more tests need to be conducted to further investigate the components that improve the accuracy.

Table6.5 depicts the average number of matches for each test subject, across all signs,

as a percentage of the total number of frames in each case. Analyzing Table 6.5, the

accuracies range from 84.45% to 90.86% for Orig and 87.89% to 99.60% for Mod. The standard deviation in these results for Orig and Mod is 1.81% and 2.94%, respectively. These values are very small and indicate that both algorithms are robust to variations in test subjects.

In conclusion, the accuracy of the modified algorithm is at least as high as the original algorithm, although significantly higher in 65% of the signs. It is also as robust to variations in test subjects as the original algorithm. This result is extremely encouraging and shows that the optimizations made were appropriate and successful. The accuracy of the original algorithm, which was very high, has been improved significantly.

Chapter 6. Experimental Setup and Analysis of Results 79

Sign Test Statistic P-Value

Away −0.5 1.0000 Bye 215.5 <0.0001 Cracker 136.5 0.0017 Curtains 200.5 <0.0001 Dress 41 0.3599 Eat 159.5 0.0001 Left 201 <0.0001 Light 134.5 0.0038 Love −1 0.9814 Right 198.5 <0.0001 Run 9.5 0.8489 We 54.5 0.2453 Why 220.5 <0.0001 Wide 212.5 <0.0001

Table 6.4: _{Results of the Signed Rank Test performed between the original and} modified algorithms.

In document Faster upper body pose recognition and estimation using compute unified device architecture (Page 88-91)