Comparison between hand-detection systems

6.6 Results and discussion

6.6.2 Detection and tracking evaluation

6.6.2.1 Hand detection

6.6.2.1.3 Comparison between hand-detection systems

In order to determine the best method to use for hand detection, a statistical analysis was performed to compare the methods. The comparison considered four factors, namely:

1. A comparison between random forests or SVMs;

2. A comparison between extracting features from skin-coloured pixels only (filtering) or extracting features from all pixels in an image (no filtering);

3. A comparison between global LBP histogram features, spatially enhanced LBP histogram features and LBP image features; and

4. A comparison between the original, uniform and extended LBP.

When taking these factors into consideration, 36 different hand-detection methods were created. The statistical analysis was performed using a generalized linear model ac- counting for the dependencies inherent in having the same image classified by each of the 36 different methods. A compound symmetric structure was used to model the dependencies. The analysis was done using the GENMOD procedure in SAS3 version 9.

SAS Institute Inc. Cary, NC, USA

Chapter 6. Experimental Results and Analysis 120

Table 6.4: _{Comparison between different hand-detection methods considering all four} factors.

Observation Factor 1 Factor 2 Factor 3 Factor 4 Combination P-value

1 Random forest Filtering Global Extended 1212 0.723

2 Random forest Filtering Global Uniform 1213 0.70578

3 Random forest Filtering Global Original 1211 0.70302

4 SVM Filtering Global Extended 2212 0.69048

5 SVM Filtering Global Original 2211 0.67389

6 SVM No filtering Global Extended 2112 0.66922

7 Random forest Filtering Local Extended 1222 0.65965

8 Random forest Filtering Local Original 1221 0.65838

9 Random forest No filtering Global Extended 1112 0.65668

10 Random forest Filtering Local Uniform 1223 0.65625

11 SVM No filtering Global Uniform 2113 0.6469

12 Random forest No filtering Global Original 1111 0.64094

13 Random forest No filtering Global Uniform 1113 0.64052

14 Random forest No filtering Local Uniform 1123 0.63159

15 Random forest No filtering Local Extended 1122 0.62946

16 Random forest No filtering Local Original 1121 0.6233

17 SVM Filtering Global Uniform 2213 0.62245

18 SVM No filtering Global Original 2111 0.61777

19 SVM Filtering Local Extended 2222 0.60374

20 SVM No filtering Image Uniform 2133 0.60332

21 Random forest No filtering Image Extended 1132 0.60098

22 SVM Filtering Local Uniform 2223 0.59949

23 Random forest Filtering Image Extended 1232 0.59779

24 Random forest No filtering Image Original 1131 0.59609

25 Random forest Filtering Image Original 1231 0.59247

26 SVM Filtering Image Uniform 2233 0.59056

27 Random forest No filtering Image Uniform 1133 0.58503

28 SVM Filtering Local Original 2221 0.57207

29 SVM No filtering Image Original 2131 0.56611

30 SVM No filtering Local Extended 2122 0.56526

31 SVM No filtering Image Extended 2132 0.56441

32 SVM No filtering Local Uniform 2123 0.56144

33 Random forest Filtering Image Uniform 1233 0.55527

34 SVM Filtering Image Original 2231 0.51892

35 SVM No filtering Local Original 2121 0.51658

36 SVM Filtering Image Extended 2232 0.51594

Chapter 6. Experimental Results and Analysis 121 When sorting the methods according to the estimated probability of predicting a hand correctly, several patterns emerged, as shown in Table 6.4. The results show that the top six best-performing methods use the global LBP histogram features, followed by some methods that use spatially enhanced LBP histogram features and 12 of the last 17 methods that employ LBP image features. The results also show that 8 of the top 10 methods extract features from skin-coloured pixels only, while 6 of the last 10 methods extract features from all pixels in the image. In addition, it shows that 12 of the top 16 methods use random forests, while 10 of the last 16 methods use SVMs. Moreover, 5 of the top 10 methods use the extended LBP when compared to the original and uniform LBPs.

In terms of statistical significance, the accuracy of the top three methods do not dif- fer significantly from each other; however, using the extended LBP achieves a higher accuracy. In Appendix D a list of the non-significant differences between the different methods is given. When comparing the accuracy between random forests and SVMs using the extended LBP with global histogram features on skin-coloured pixels, the results were statistically significant. It was therefore established that random forests using the extended LBP with global histogram features on skin-coloured pixels will be a better approach than SVMs for the hand-detection algorithm. Hereafter, the hand-detection algorithm will be referred to as the ELBP-RF hand-detection algorithm.

When comparing related systems to the ELBP-RF hand-detection algorithm used in prototypes DT and DTL, the comparison is restricted to 2D views. For the subjective evaluation, the output was described and compared. For the objective evaluation, the output was compared in terms of its accuracy relative to the dataset.

When compared to the hand-detection system of Petersen and Stricker [125], which is limited to open hand postures where the fingers are visible; the ELBP-RF hand-detection algorithm successfully detects hands in any hand posture. According to Thangali and Sclaroff [151], extracting HoG features from the entire image region can be used to detect hands in images. However, the ELBP-RF hand-detection algorithm was used to compare using the features from the entire image region and from features on the hands only, and showed that by only using features on the hand, better results can be obtained (see TablesD.5 and D.7).

Chapter 6. Experimental Results and Analysis 122 In studies that have used LBPs for hand detection, for example, Xiao et al. [166], which used the original LBP on a small dataset of 482 images, and Nguyen et al. [116], which used a simplified version of the LBP on a small dataset of approximately 673 frames, a higher average accuracy was achieved on the limited datasets. Furthermore, their systems were trained and tested on a limited set of hand postures, thus contributing to a higher accuracy.

In the hand-detection system of Mittal et al. [109], a recall rate of 85,3% was achieved on a larger dataset of 5 628 images, which is comparable to the recall rate of 85,5% achieved using the ELBP-RF hand-detection algorithm. Moreover, Mattheij and Postma [106] used a random forest model based on Haar-like features where they obtained an accuracy, recall and precision rate of 69,5%, 78,9% and 66,6%, respectively, on a testing sample of 320 images. In comparison to their results, the ELBP-RF hand-detection algorithm achieved a higher accuracy and recall rate of 72,25% and 85,5%, respectively, and a comparable precision rate of 65,8%.

Therefore, compared to related studies, the ELBP-RF hand-detection algorithm can be considered state-of-the-art.

In document Independent hand-tracking from a single two-dimensional view and its application to South African sign language recognition (Page 136-139)