The Training Process - Feature based rapid object detection : from feature extraction to parall

The training process started by choosing a number of images for the positive images set and the negative images set. The division, simple in principle, is in itself subject to errors on both sets due to alignment problems and unexpected similarities between the object and some negative images.

Positive images were gathered from the FERET database (Phillips et al., 2000). Neg- ative images were gathered from the web, from images acquired with the web camera and from image libraries such as the CorelDraw image dataset. The negative images set needed to be checked for the existence of any object that could have an impact on the training process. False positive objects on the negative images may cause the training process to discard good features. This looks like a simple problem, but errors may appear due to resolution changes in the original set. Figure 4.2 shows an example where part of the background in a lower resolution was wrongly classified as a face by an earlier version of the face classifier. The only way to check for false positives is by searching the negative images set manually.

The next step of the training process was the calculation of the Haar-like features over the positive and negative image examples. The total number of features per frame may surpass the total number of pixels, making the computation very intense. Due to these constraints all the images for training were changed to a lower resolution. In these experiments, all the images had a resolution of 24x24 pixels. Then the classifiers were trained using the stump version implementation of AdaBoost available in OpenCV (Brad- ski, 2002).

Three training experiments were carried out. In the first one, a web camera was used to acquire images of a single person with and without occlusions and shadows (as an example, see the first image to the left of figure 4.3). In the second experiment, FERET

4.2. The Training Process ₅₉

Figure 4.2: Example of a false positive object within the negative set. Table 4.1: The training parameters for the classifiers.

Classifier 1 Classifier 2 Classifier 3 Number of positive images 976 4767 13566

Training images web-cam, 1 person FERET FERET Presence of occluded images? yes no yes

Number of negative images 1134 1134 1134

images were used, all quasi-frontal faces without any occlusions. In the third experiment, modified FERET frontal face images were used to train a classifier with occlusions. Gentle Adaboost with the complete Haar-like features (upright and tilted features) were used in all experiments. The classifiers produced by these training experiments had 30 layers each. Table 4.1 shows the different parameters used in the training processes.

In the first experiment, 976 images of a person were acquired by a web camera at an initial resolution of 352x288 pixels. Random portions of the face were manually occluded with background pixels obtained randomly from images that did not contain any face. The marking process was manual, according to the process described in section 3.1.3. The resulting classifier of this training process was calledClassifier 1.

The second experiment used 4767 FERET faces for the positive images set. The main reason to have this classifier was for reproducibility purposes and to compare the false detection rates with the other two classifiers. It was expected that this classifier would not be as generic as the original OpenCV sample classifier, because it was trained with frontal faces only. The resulting classifier of the second training process was calledClassifier 2.

In the third experiment, 1938 of the FERET frontal images were partially occluded with random pixels instead of background pixels. When providing these images with occlusions, one would expect that the Adaboost algorithm would yield a more generic classifier. Partial occlusion and shading effects would be regarded as part of the object. Whenever at least half of the face would be visible, a hit would be expected. The position of each face was marked as before. Based on the position and sizes found in this step,

Figure 4.3: The occlusion process creates 6 additional positive examples for each frontal face.

6 different partial occlusions were calculated, varying from one quarter of the area of the sub-window up to half of the area. As the objective was to produce final 24x24 pixel images to the training process, each image was filled by either 12x12 or by 12x24 occlusion patches. All the patches’ pixels varied from 0 to 255, randomly. The initial set of 1938 images composed a total of 13566 positive examples. Figure 4.3 shows an example of how the six additional positive images were created. It is important to stress that images from this person were not used in the training process, only images belonging to the FERET database were used as part of the positive set. The resulting classifier of the third training process was called Classifier 3.

The number of negative images stated in table 4.1 gives just an indication of how many negative images were used in each layer. The total number of negative images used depends on each layer’s result. Initially, sub-windows with sizes of 24x24 pixels were used to train the first layer. As the training proceeds, new negative images are acquired from the same set by scanning the set and finding new sub-windows that are classified as positives.

In document Feature based rapid object detection : from feature extraction to parallelisation : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Computer Sciences at Massey University, Auckland, New Zealand (Page 77-79)