Face Description - Gender Classification Methodology

Gender Classification Methodology

2.3 Face Description

Once we have a preprocessed image containing the face area, the information provided by the face should be described so a learning algorithm can learn from it. This information is generally encoded as a set of numerical data (commonly referred to as features). There are many ways of describing the information contained in an image, most of which can be classified either as appearance-based or geometric-based approaches. In appearance-based approaches, the information extracted from the image is based on the values of the pixels in the image. These pixel values can be directly used as features or a transformation can be applied to them to obtain a face description in a different space. In geometric-based approaches, the information extracted is related to geometric characteristic of the face, that is, distances or ratios between particular face parts. For instance, geometric features could be the distance between the eyes, the eyebrows thickness or the ratio of mouth width to thickness of lips. Regarding the extension of the area described, we can have global or local features. Global features are those extracted taking into account the whole image at once, as opposed to local features which are extracted from certain regions of the image. Consequently, global descriptions provide information about the structure of the face (configural information) as well as information about the individual face parts (featural information), while local descriptions only provide information about isolated regions of the face (featural information).

As mentioned above, the description of an image consists in a set of features (numerical data) which is generally handled in the form of an array (known as feature vector ). Let D be the number of features (elements in such array), then a face is represented in a D−dimensional feature space. This could be the original representation space, or it could be a transformed space where the number of dimensions is usually smaller than D.

2.3.1 Grey Levels

The values of the pixels that form the face image can be directly used as features. In that case, the content of the image is described by means of the grey level values of its pixels. A

22 2.3 Face Description

1^stPC

2^ndPC

(a) Data represented in the original space.

1^stPC

2^ndPC

(b) Data represented in the PCA space.

Figure 2.7: Two-dimensional data drawn from two classes before and after applying PCA (PCA basis vectors shown in black).

description using grey levels consists in a vector whose elements are all the pixel values of the image read following a pre-defined order. In the experiments, the process starts from the top left corner pixel to the right and downwards. Even though this seems a very simple and straightforward descriptor, it has successfully been used for addressing face analysis problems, among them gender classification [39].

2.3.2 Principal Component Analysis

Principal Component Analysis (PCA) is a widely adopted technique for transforming the representation space [32]. By using PCA, our goal is to obtain a new representation space that best describes the variance of our data. Additionally, PCA can also be applied to reduce the dimensionality of the feature space. PCA analyses the data, which possibly consist of several correlated features, and searches for a new space whose basis vectors correspond to those directions in the original space with maximum variance. PCA could be thought of as a procedure for revealing the internal structure of the data in a way that best explains its variance. Figure 2.7 shows some data represented in the original space (Figure 2.7(a)) and the PCA space (Figure 2.7(b)) with the basis vectors of the PCA space in black. As can be seen, the greatest variance is given by the first component and the second component provides the greatest possible variance while being orthogonal to the previous component. This will continue to further components in a higher dimensional data. In PCA, lower components always carry more variance than higher ones. Consequently, selecting those components that account for a reasonably high percentage of the variance, we would reduce the dimensionality of the representation space without much loss of information.

In Figure 2.7(b), keeping only the first component would result in a 1-dimensional space where it would be possible to discriminate between both classes.

Chapter 2 ^V Gender Classification Methodology 23

Figure 2.8: The first 20 eigenfaces, that is, the first 20 eigenvectors showed as images. The PCA was calculated with images from FERET database.

Formally, let W be a linear transformation that maps the original D−dimensional space onto a F −dimensional space where D ≪ F . Then, new feature vectors are defined by yi = W^Tx_i where xi ∈ ℜ^D and yi ∈ ℜ^F. The columns in W are the eigenvectors ei

obtained from solving λiei = Aei, where A is the covariance matrix and λiis the eigenvalue associated to the eigenvector e_i. Before obtaining the eigenvectors, we should subtract the average of all vectors to each of the vectors to ensure that the mean of the data is zero.

In our classification system, the transformation W is computed from the training data and then new training and test sets are obtained by applying the transformation to the original data. Usually, only those features containing a certain percentage of the variance of the data (typically, 95% or 99%) are kept.

In face analysis, this method has been widely used, mainly for face recognition [57, 64, 50], but also for gender classification [37], leading to very good performances in both cases.

When the original data are face images, if the eigenvectors are shown as images, we see that they could be identified as faces (in the literature they are referred to as eigenfaces).

Some eigenfaces are shown in Figure 2.8. When calculating the PCA transformation, we are looking for a linear combination of these eigenfaces that produces the input face image.

24 2.3 Face Description it is its grey level value.

(b) If the pixel-value is higher than the centre’s, number 1 is assigned to that pixel, else number 0.

(c) The LBP value associated to the central pixel is 01111000 which is obtained by reading the binary values following the direction of the arrow.

Figure 2.9: Computation of LBP⁸,R.

2.3.3 Local Binary Patterns

Local Binary Patterns (LBPs) were originally defined to characterise image textures [51].

More recently, they have been used as face descriptors [2] which provide information about the texture of the face.

An LBP value associated to a given pixel is a binary number which is calculated consid-ering a neighbourhood around that pixel. To create this binary number, all neighbours are given either value 1, if they are brighter than the central pixel, or value 0 otherwise. The values assigned to the neighbours are read sequentially in the clockwise direction to form the binary pattern which characterises the central pixel. An example of the computation of LBP values is depicted in Figure 2.9. It is worth mentioning that LBPs do not provide information about contrast since the magnitude of the grey level differences is ignored.

To deal with textures at different scales, LBP can use neighbourhoods of different radii.

A local neighbourhood is defined as a set of sampling points spaced in a circle centred at the pixel to be labelled. Hence, the radius of the neighbourhood indicates how far from the centre the pixels considered are. The notation LBPP,Rrefers to LBPs with a neighbourhood of P sample points on a circle of radius R. Figure 2.10 shows which pixels are considered with neighbourhoods of various radii. In case a sample point does not fall in the centre of a pixel, a bilinear interpolation¹ is used.

The LBP operator can be improved by using the so-called uniform LBP [52]. A uniform pattern is one that has at most two one-to-zero or zero-to-one transitions in the circular binary code. The total number of uniform LBP values (LBP^u_P,R), when considering a neighbourhood of 8 points, is 58. Although the number of patterns is significantly reduced from 256 (all possible LBP_8,R) to 58 (LBP_8,R^u ), this smaller set of uniform patterns provides

1In image processing, a bilinear interpolation considers the four pixels of known values surrounding the location of the unknown point. Then, a weighted average of the known values is computed to obtain the interpolated value. The weights are based on the distances from the unknown point to the known pixels.

Chapter 2 ^V Gender Classification Methodology 25

(a) Radius R = 1 (b) Radius R = 2 (c) Radius R = 3

Figure 2.10: LBP8,R neighbourhoods for different values of R.

the majority of patterns of texture, sometimes over 90% [51].

As it has been described to this point, the LBP operator provides a representation sensitive to rotation because several different codes can be obtained depending on which is the first neighbour considered when creating the binary code. In Figure ?? the starting pixel is the one on top, but it could be any of them as long as they are read in a clockwise direction. In order to obtain rotationally invariant LBPs [52], all possible binary numbers that can be obtained by starting the sequence from all neighbours in turn are considered.

Then, the smallest of the constructed numbers is chosen. If the object (in our case, the face) is slightly inclined in the image, the rotation invariant uniform LBP (LBP_P,R^ri,u) is supposed to provide a description which is not affected by that inclination.

Normally, a representation based on LBPs consists in a histogram where the LBP val-ues obtained for each of the pixel in the image (or area of interest) are accumulated. In the histogram, the bins show how many times the corresponding LBP codes have been produced. The number of bins depends on the type of LBP employed. In the literature, the most common number of sample points (neighbours) is P = 8. In such case, if uniform sensitive to rotation LBPs are used, the histogram has 59 bins, that is, 58 possible LBP_8,R^u values and an extra bin for accumulating all the non-uniform LBPs obtained. When using rotation invariant uniform LBP, the number of bins is reduced to 10 bins, since there are 9 possible LBP_8,R^ri,u codes and an extra bin for all the other codes.

2.3.4 Local Contrast Histogram

Describing an image using the grey level values of its pixels implicitly provides information about the contrast. However, shifts in the grey scale, which could be due to changes in illumination, strongly affect such type of features. For instance, two face images of the same person could seem quite different just because one is much darker than the other. In cases where illumination cannot be controlled, Local Contrast Histogram (LCH) is a more suitable descriptor, since it provides information about the contrast while being invariant against shifts in the grey scale. LCHs are usually combined with LBPs to overcome the lack of contrast information of the latter.

26 2.3 Face Description it is its grey level value.

• Brighter pixels add up to 326 and darker ones to 192.

• The contrast value (C) of the central pixel is:

C= ¹₄× 326 −¹4× 192 C≈ 115

Figure 2.11: Computation of LCH8,R.

LCH provides information of the contrast of local areas. For this purpose, neighbour-hoods are defined in a similar way as for LBPs. Then, the contrast value associated to the central pixel is calculated taking into account all its neighbours. To compute the local contrast of a pixel, the average of the grey level values of those neighbours that are brighter than that pixel is subtracted from the average of the grey level values of the darker ones.

An example of the computation of local contrast values is depicted in Figure 2.11.

More formally, let P be the number of sample points in the neighbourhood and g(i) be the grey level value of the i^th pixel in the neighbourhood. The contrast around the central pixel, whose grey level value is g_c, is,

C= 1

Finally, the local contrast values of all neighbourhoods are accumulated in a histogram to obtain the LCHP,R descriptor. This notation means that the neighbourhood used has P sample points on a circle of radius R. The number of possible contrast values can be different for each image, therefore the number of bins in the LCH should be set beforehand.

In order to make LCHs directly comparable to LBPs, two versions of LCH are considered.

In one version, the number of bins is set to 10 (for comparisons with rotation invariant LBPs) and, in the other, it is set to 59 (for comparisons with sensitive to rotation LBPs).

Chapter 2 ^V Gender Classification Methodology 27

In document Face gender classification under realistic conditions. Dealing with neutral, expressive and partially occluded faces (Page 51-57)