Experiments with Haar-like features to recognise human hands

Figure 5.1: The basic hand images used on the training process.

by as little as 4o _{were very difficult to detect by classifiers trained at fixed angles. They}

then trained classifiers with examples rotated between 0o and 15o. They concluded that for examples within 5o _{of in-plane rotation, there was no significant loss of accuracy for}

their test set (Kolsch and Turk, 2004a).

The results of the work described above were limited in two aspects. Firstly, no separate analysis of the different angles of rotation was carried out. Secondly, they used square kernels, which includes a large portion of the background in the training, making it unclear how the background influenced the training process. The training and the test images presented similar background patterns, as the images for both sets were acquired directly from the same video sequences. An analysis of the choice of features during the training phase would be important in order to find out if parts of the backgrounds were being used.

5.2 Experiments with Haar-like features to recognise hu-

man hands

Every gesture made by a hand produces a different 2D pattern. Although there are hand image databases in the literature, (for example Athitsos and Sclaroff (2001) used a 3D model to create images for indexing and estimating pose), our own collection of images is used to have the flexibility and control of the samples. In order to analyse the performance and accuracy of classifiers created using Viola-Jones method, a simple gesture (shown in figure 5.1) was adopted. The hand images used by the training algorithm were acquired from 5 different individuals. The images were acquired under different illumination through artificial light, with a dark background. After a simple skin colour segmentation, the raw segmented images were used to prepare the positive set. The process of collecting examples is discussed in more details in Dadgostar and Barczak (2005).

5.2.1 Preparing the Positive Set

Based on the intensity and hue factor of the pixels with skin colour, an automated process segmented the hand in each image. Segmented hands were then added to a random

Segmented hand

change background Rotated and

Examples for Cascade 30

Examples for Cascade 90

Examples for Cascade 0o

. . .

Figure 5.2: Creating the positive set.

background to facilitate the training procedure. Creating positive examples with varied background patterns was an important step, as this avoided the selection of Haar-like features located over the background. The creation process is shown in figure 5.2. An automated process was used to rotate the original set of images to angles from -90 to 90 degrees and introduce random backgrounds for each of the images. Each segmented image was reused 30 times for each angle. A total of 149 segmented images generated 4470 images with random backgrounds for each angle. A total of 19 orientations were used, at 12o _{intervals (from -84}o _{to 84}o_{) plus the angles 30}o_{, -30}o_{, 90}o_{,and -90}o_.

5.2.2 Training

The cascades using algorithm 2, presented in section 2.2, were trained. Each cascade was capable of detecting hands (with the particular gesture of the samples) within a certain in-plane angle of rotation (figure 5.3). Some rotation tolerance was desirable because it was difficult to align the positive examples perfectly. The tolerance of the base examples were within 2o_.

Instead of using an approach of converting cascades to produce another cascade at 90o (as done by Jones and Viola (2003)), training separately all the necessary cascades was the option adopted. The justification for that is based on the fact that different features might be chosen. It is not trivial to convert cascades from one angle to another when using tilted features, as well as oblong kernels, and the effects on the feature selection should be observed.

A modified version of Viola-Jones algorithm using the OpenCV (Bradski, 2000) library was used on training. 4470 images used to train each cascade, so a total of 84930 images were used (4470 images for 19 different angles). Different kernel sizes for each angle were used, in such a way that most of the area of the input was part of the hand. The kernel

5.2. Experiments with Haar-like features to recognise human hands ₆₉

Table 5.1: The kernel sizes for each cascade.

Angle Kernel width Kernel height

(pixels) (pixels) 0o ₂₄ ₄₂ 12o _{and -12}o ₂₆ ₄₂ 24o _{and -24}o ₃₀ ₄₂ 30o _{and -30}o ₃₂ ₄₂ 36o _{and -36}o ₃₆ ₄₂ 48o _{and -48}o ₄₂ ₄₀ 60o _{and -60}o ₄₂ ₃₂ 72o _{and -72}o ₄₂ ₂₈ 84o _{and -84}o ₄₂ ₂₄ 90o _{and -90}o ₄₂ ₂₄

size for angle 0o _{was 24x42 pixels. The size was adjusted for each angle to keep the hands}

proportional until an angle of 90o_{, in which the kernel was 42x24 pixels. Table 5.1 shows}

a list of the kernel sizes for each angle.

5.2.3 Detection

Detection was carried out using a test set that was created with the same camera used to acquire the training set images. No colour segmentation was used for the test set. The images were converted to greyscale and had a resolution of 640x480 pixels. The cascades run concurrently and benefited from the fact that the same SATs were used for all cascades at any angle (figure 5.3). A modified version of the OpenCV detector was used to accommodate more than one cascade running concurrently.

Return rotation angle Classified as a hand? ... Cascade 0 Cascade 12 Cascade 24 Cascade 84 Cascade 90 Cascades Concurrent Summed−area Table Greyscale frame

5.3 Results and Discussion

In document Feature based rapid object detection : from feature extraction to parallelisation : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Computer Sciences at Massey University, Auckland, New Zealand (Page 86-89)