Neural Networks and Deep Learning - Overview of 3D Face Recognition Algorithms

Chapter 2 Literature Review

2.2 Overview of 3D Face Recognition Algorithms

2.2.9 Neural Networks and Deep Learning

The traditional hierarchy approaches that are discussed above are widely used to solve the pattern region problems and are known as ‘shallow’ methods [106]. Many classic 3D face recognition strategies suffer from the generalisation problem. 3D captures from different data capture devices may possess various characteristics. For example, the data captured from laser scanners has high quality and preserves better facial details. In comparison, facial information obtained from photometric stereo is likely to suffer problems caused by integration during the reconstruction and the captures are also quite smoothed which might lose some fine details. Therefore, feature extraction algorithms might have various implementations or development for different kinds of data, which is hard to design a generalised algorithm to meet all the types. Neural networks are proposed to address this problem and in [107] a literature review on its applications is provided in terms of Probabilistic Neural Network [108], Radial Basis Functions neural networks [109], and convolutional neural networks [110]. However, the traditional algorithms suffer high computational cost for training. In recent years, with the development of GPU’s parallel and cloud computing, deep learning algorithms are being applied to face recognition, aiming to address the shortcomings of the shallow methods. Compared to the shallow algorithms, the deep learning architecture performs feature extraction, selection and classification in each trainable unit. The deep learning strategy is similar to the way of automatic feature extraction and selection in the human brain which uses different neuron layers to train the features.

However, the drawbacks of such kind of approaches are also apparent and there are still some ongoing research or application problems. In general, the computation speed of deep learning algorithms is much slower than that of shallow algorithms. Using an inappropriate number of neurons may result in the over or under fitting problems and it is often hard to find specific parameters of layers to produce high training results. Finally, some deep learning architectures can only achieve good recognition performance on some specific datasets.

2.3 3D Face Recognition under Expressions

The human face, regardless of whether is a 2D or 3D representation, has high probability of exhibiting different kinds of expressions. It is hard to constrain all the facial captures to the neutral or the same expression in realistic biometric application scenarios. In fact, compared to other variations caused by pose or occlusions, facial expressions result in facial muscle

movements that may significantly deform the face surface. This causes problems for feature space manipulation and makes the subject lose the within-class similarity and between-class scatter. Therefore, matching images with different expressions in the probe and gallery is one of the most challenging problems in 3D face recognition and the development of recognition techniques that are relatively robust to expression variations has been the subject of much research over the past decades.

Many 3D face recognition algorithms are proposed to address the expression variations and Smeets et al. provide a comparative study of 3D face recognition under expression variations [111]. In general, expression robust 3D face recognition algorithms can be categorized into two groups in terms of rigid and non-rigid strategies. Rigid approaches aim to investigate the constant facial patches under expressions, for example of the 3D nasal region. The forehead is also considered as the relatively expression robust region [5], even though it suffers more hair occlusions. In contrast, the non-rigid strategy, attempts to transform the facial captures with expressions to the neutral face by a deformable model.

For the rigid strategy, one effective method is selecting relatively stable structures on the face for expression robust discriminative features extraction. Wang et al. proposed to use shape difference boosting for evaluating the deformations caused by expressions in facial patches and select some stable facial region under expressions [8]. The Bosphorus database is used for facial patches selection and the FRGC v2.0 database is used for recognition performance evaluation. In a similar manner, Li et al. divide the whole face into different scales of rectangular patches and select the most expression robust patches using a weighted sparse representation [6]. The features extracted from each patch are obtained from the depth and three components of the surface normals. Mian et al. exploit the recognition performance of the whole face and show that the nasal region and forehead are the most robust structures under expressions [5].

Instead of selecting facial patches, the nasal region, which is considered to be a relatively rigid region on the human face, has been widely utilized in 3D face recognition in the recent years. The nasal region has been shown to be more consistent over natural expressions and occlusions. Chang et al. matched multiple overlapping 3D nose and its surroundings and obtained a good recognition performance [4]. Wang et al. also explored different size nasal and its surrounding region by changing the radius of a sphere centred on the nose tip, which indicated that the performance of nasal region is equal to that of the whole face [8]. Ballihi et al. found that

circular curves around the nasal region produce a better recognition performance than other curves [80]. All the results show the potential for employing nasal region to find discriminative features and its significant contribution to face recognition. Some algorithms that extract features on the nasal region for expression invariant face recognition are discussed in Section 2.4.1.

Using a non-rigid strategy, Kakadiaris et al. employed the deformable model framework and pyramid transformation to address expression variations [39]. Using the deformable model framework, all the 3D facial captures are accurately aligned to fit the model, even in the presence of facial expressions. The 3D difference between the input capture and facial model is saved as a 2D geometry image, which is further transformed to the wavelet domain. The pyramid transformation is applied to solve the position and rotation changes caused by expressions, as the pyramid wavelet is translation and rotation invariant. To make the features less sensitive to expressions, it was proposed to apply the ICP algorithm to the surface normal vectors instead of depth representation, for face registration and find the corresponding points between a generic reference face and each input face [20].

Inspired by the introduction of sparse representation [74] for face recognition, Deng et al. proposed to address the problem of expression variations by artificially creating expressions for each capture [105]. The additional expression variations, as well as other variations in terms of illuminations and occlusions, are learnt from the given database. The newly added captures increase the number of training samples of each subject, which is beneficial to the classification performance of sparse representation as the original proposed method fails if training samples are insufficient.

In document 3D Face Recognition Using Multicomponent Feature Extraction from the Nasal Region and its Environs (Page 48-50)