Passive face detection methods - Remote camera-based systems

Chapter 2 Review of video-based facial feature detection methods

2.3 Remote camera-based systems

2.3.2 Passive face detection methods

The performance of most passive eye detection system depends on reliable localization of face region of interest (fROI) in an image. Passive fROI localization methods based on colour segmentation (Singh & Papanikolopoulos, 1999; Sirohey & Rosenfeld, 2001; Sirohey et al., 2002; Smith et al., 2003), difference imaging (Morris et al., 2002), and Haar-face detection algorithms were reviewed. The colour segmentation method localizes the fROI in the image by

locating and segmenting the pre-defined facial skin colour. This is likely to have high false positive detection when objects with colour similar to the pre-defined facial skin colour are present in the background image. The difference imaging method localizes the fROI in a video by detecting the head motion. The head motions are detected by subtracting the consecutive frames in a digital video (Morris et al., 2002). In the difference imaging method, any background movement is likely to result in false positive detection. Also, the change of illumination creates motion artifacts in the difference image, which can be falsely identified as head movements by the difference imaging method. Both the skin colour segmentation and difference imaging methods are affected by varying background conditions and are more suitable under constraints environment.

The Haar-face detection algorithm is an attractive choice for fROI localization due to its robustness and accuracy under varying illumination and background conditions (Lienhart et al., 2002). A Haar-face detection application is available with a free open source licence as part of the OpenCV project (OpenCV, 2001). Performance of the Haar-face detection algorithm is comparable to other face detection methods (Viola & Jones, 2001) and is being continually improved with release of new versions of OpenCV. For these reasons, Haar-face detection was used for fROI localization in this project.

2.3.2.1 Face localization using Haar-object detection method

The Haar-object detection algorithm uses a trained-object classifier to localize the object of interest in an image3 (Lienhart & Maydt, 2002; OpenCV, 2001). For example, the face classifier that defines what a face should look like in an image can be used with the Haar- object detection algorithm systematically search the regions in the image that best fits the face classification. The Haar-object detection algorithm returns the co-ordinates of square regions within the image that are most likely to contain the object of interest.

The Haar-object detection algorithm was introduced by Papageorgiou et al. (1998) who used Haar-like features to form object classifiers. The method was improved by Viola & Jones (2001) to enable it to operate in real-time for face detection with comparable accuracy to other face detection methods. Lienhart et al. (2002) further improved the algorithm by extending the Haar-like features and optimizing the object classifier training algorithm. The Haar-face

Haar-object detection is a generic algorithm that can be used to detect any object defined in the object classifier. In this project the desired object of interest is a face.

detection application in OpenCV uses the cascade of face classifier trained by Lienhart et al. (2002).

Training the object classifiers

The cascade of boosted object classifiers is trained by applying samples of positive and negative images to the modified Adaboost machine learning algorithm (Viola & Jones, 2001). The samples of positive and negative images are the selected images with and without the object of interest, respectively. The images are scaled to same size during training of the object classifier. The cascade of frontal face classifiers used in OpenCV was trained with 5000 positive and 3000 negative sample images (Lienhart et al., 2002). These images were taken from unconstrained environment with varying lighting and background conditions.

The sample images are used to form basic decision-tree object classifiers with subset of Haar- like features. Haar-like features encode the contrast exhibited by an object of interest and their spatial relationship in the image. Examples of a subset of Haar-like features that are used in OpenCV for face detection are shown in Figure 2-6. Haar-like features are an extension of the 2D-Haar wavelet and encode the average intensity between different regions of the image. The Haar-like features are calculated in a similar manner to the coefficients of the Haar wavelet transforms (Papageorgiou et al., 1998), hence the name.

Figure 2-6. Extended Haar-like features used by Lienhart et al. (2002) for face classifier.

The object classifier encodes the shape, position, orientation, and scale of a particular subset of Haar-like features within a region of interest to represent the object in an image. Figure 2-7

shows an example of a way the Haar-like features are used by a face classifier to represent various features of the face (Viola & Jones, 2001). In this example, the horizontal and vertical Haar-like features are used to represent the difference in intensity between the darker regions of the eyes and the lighter regions of the upper cheeks and the bridge of the nose, respectively.

Figure 2-7. Example of Haar-like features used to define different regional average intensity of face (Viola & Jones, 2001).

The object classifiers are used to build more complex boosted classifiers using the various boosting techniques (Lienhart et al., 2002). These boosted classifiers are further combined to be part of the various stages in the cascade structure. The cascade of frontal face classifiers used in the OpenCV is made up of 20 stages. There is a Haar-training application implemented within the OpenCV project that allows adjustment of various parameters to form a cascade of boosted object classifiers (OpenCV, 2001). This training application applies the positive and negative sample images to the machine learning algorithms to form the cascade of boosted object classifiers.

Haar-object detection algorithm

The Haar-object detection method uses the trained cascade of boosted object classifiers to detect the object in any given image. A search window is moved pixel-by-pixel over the entire image to search for the object of interest. The size of the search window is defined by the object classifier. At each pixel, the sub-image within the search window is either rejected at some stage of the cascade of object classifier or accepted if it passes all of the stages. The classifier is designed so that it can be rescaled. The image is scanned several times at different scales of the classifier with increments of 10% of the classifier size to detect objects of unknown sizes. The Haar-face detection application returns either the coordinates of the square regions in the image most likely to contain the face or ‘0’ if no face is found in the image. The

Haar-face detection application developed by Lienhart & Maydt (2002) has a false positive detection of only 24 at a hit rate of 82.3% when applied to a frontal face test set with 510 different frontal faces in 130 grayscale images with 320 x 240 pixel resolution. This hit rate improved with increase in false positive detection of faces.

In document Automated video based measurement of eye closure using a remote camera for detecting drowsiness and behavioural microsleeps (Page 36-40)