Find Head Method - Normalising Method - South African sign language recognition using feature v

3.3 Normalising Method

3.3.1 Find Head Method

The Find Head Method is important for three reasons. First, we need to determine if a signer is present in the video. Second, the coordinates of the signer’s position within the video frame is needed in the Centring Method. Third, the dimensions of the signer’s head are essential in the Construct Grid Method.

To find the head in a given frame, the method described by Viola and Jones (Viola- Jones method) is used [52, 53]. The OpenCV library’s implementation of the Viola Jones method is used by our system [54–56]. The Viola-Jones face detector method uses a Haar Cascade classifier. Viola and Jones combined four fundamental concepts in the face detector method:

• Haar features,

• An integral image for rapid feature detection, • The AdaBoost machine-learning method, and • A cascade classifier to combine many features.

The method is based on Haar wavelets. Haar wavelets are single wavelength square waves, i.e one high interval and one low interval. An example of these Haar wavelets used in the original Viola-Jones cascade is shown in Figure 3.12. A square wavelet is a

Figure 3.12: _{The first two Haar features in the original Viola-Jones cascade [53].}

pair of adjacent rectangles, one light and the other dark. The Haar feature is determined by differencing the average dark region pixel value from the average light region pixel

value. Using a threshold determined in the training phase, the resulting value determines the presence or absence of a Haar feature. This produces many Haar features at every image location over multiple scales. To find all the Haar features in an image efficiently, Viola and Jones implemented an integral image technique. The integral value for each pixel is the sum of all the pixels to its left and above it. The entire image can be integrated with a few operations per pixel.

For the training method Viola and Jones used a machine learning method called Ad- aBoost. The idea behind AdaBoost is to combine many weak classifiers to create a

strong one. A weak classifier is defined as a classifier that correctly recognises a feature more often than random guessing would. By combining many weak classifiers, and each classifier ultimately boosting the final outcome, a strong classifier is built. AdaBoost first selects a set of weak classifiers to combine, then each classifier is weighted. The resultant strong classifier consists of a weighted combination of classifiers.

Viola and Jones used an efficient filter chain, shown in Figure 3.13, to classify image regions. Each filter consists of one Haar feature. If a region at any moment fails to pass through the cascade, it is classified as non-face, and the next region is applied to the cascade. The order of filters is determined by the AdaBoost assigned weights. The heavier weighted filters are assigned first in the cascade, in order to eliminate regions as quickly as possible. If a region passes through all of the filters without failing, it is considered to contain a face region. The OpenCV implementation of the Viola-Jones face

Figure 3.13: _{The classifier cascade is a chain of single feature filters that determines} if a region is a face or not.

detector method comes trained with a frontal face pose classifier. We apply the Viola- Jones method to all images in the video sequence. An example of the face detector output is shown in Figure 3.14. Any frames that do not contain a face are discarded. Once the face region is identified, the region is grey-scaled and the Marr/Hildreth and

Figure 3.14: _{A correctly classified face using the Viola-Jones face detector method.}

Canny edge detection algorithm is applied to find the outline of the face [57]. The edge detection method applies a 7 × 7 convolution mask to every pixel and analyses each surrounding pixel’s intensity. A sharp change in intensity, most likely indicates an edge. The entries of the mask are calculated using a variation of the Gaussian equation show in Equation 3.10. ▽2g(x, y) = 1 σ2(2 − ( x2+ y2 σ2 )(e −x2+y2 σ2 )) (3.10)

An example of an image after the edge detection algorithm has been applied is shown in Figure 3.15. The dimensions of the head (in pixels) are measured by our system. The

Figure 3.15: _{Marr/Hildreth and Canny edge detection applied to the face region} classified by Viola-Jones face detector.

height and width dimensions of the head in 3.15 are measured to be 84 and 63 pixels respectively, with the x, y coordinates of the center of the head being x = 158, y = 75. These dimensions are forwarded to the Construct Grid method for further use. The coordinates of the detected face are sent to the Centring method to centre the signer in the frame. In addition, the system averages the colour of the original image within the edge detected regions, producing an average skin model of the signer. This skin information is sent to the Feature Extraction method.

In document South African sign language recognition using feature vectors and Hidden Markov Models (Page 43-45)