2017 2nd International Conference on Software, Multimedia and Communication Engineering (SMCE 2017) ISBN: 978-1-60595-458-5
Face Identification Using LBP&LDA Methods
Muhammad Salman, Jun LI
*, Muhammad Faheem Khakhi and Hassan Hayat
Nanjing University of Science and Technology, Nanjing, China*Corresponding author
Keywords:Lbp, Lda, Template matching, Image processing, Facial recognition.
Abstract. In this paper, we have evaluated the possibility of using database for the task of driver identification in near-ir videos using face recognition techniques. Two texture-based methods are presented to solve this task. The first method utilizes local binary patterns (lbp) combined with
local discriminant analysis (lda) to identify subjects while the second method investigates brute-force template matching for identification purposes. The lbp-method is fast while the template matching method proved to be very time-consuming. Future refinements and extensions are proposed for both methods.
Introduction
Automated face recognition is a comparatively new concept which was developed in the 1960s, the first semi-automated system for face recognition required the manager to find features such as eye, ears, nose, and mouth on the pictures before it compute the distances and ratios to a common reference point which were then compared to reference data. In the 1970s, Goldstein, Harmon, and Lesk [1] used 22 precise features such as color, lip thickness and hair to systematize the recognition. The problem with this concept was that the measurements and locations were calculated manually. In 1987, Kirby and Sirovich [2] applied the principle component analysis (pca) which is a standard linear algebra technique to the facial recognition problem. Today, face recognition technology is mostly used to overcome passport fraud, identify missing children, support law enforcement, minimize identity fraud, etc. Human beings often use faces to recognize each other. Development in computing skills over the past few decades now enables us to detect similar recognitions automatically.
The technology has not been used in vehicles so far even though the auto industry is embracing new technologies such as smart driving, voice-activated software etc. In this paper, we created an artificial test environment to identify the driver which can be useful for the owner to ensure that only authorized driver is using the vehicle. Or, in case of an accident, the insurance company or the police needs to know who was driving at the time. After the identification, the owner is capable to list his family members’ or the pupil faces (whom he wants to allow driving) to the vehicle. If an individual gets into the driving seat and is not enlisted in the profiles, the vehicle will not start and it will inform the owner.
recognition, on the other hand, is a biometric method which can be used in non-cooperative and unconstrained conditions where the subjects don’t know that they are being identified.
In this paper we present two methods that use textural information from the tracking profiles to identify the subjects in pre-recorded near-ir videos. The first method utilizes a lbp combined with
lda to encode face images in a compact and discriminative form, simplifying the process of matching faces between users. Lbp is a local texture operator that has previously shown great results when applied to face recognition [3], both by itself and together with lda [4]. This suggests that a combination of both could also work well for videos. The second method uses brute-force template matching, using normalized cross-correlation [5] to identify the users. Both methods treat the videos as a sequence of gray-scale images.
System Design
An artificial test environment using three monochrome cameras which are placed in a way that one is near the speed-meter (looking at the driver through the steering wheel), one is on the right-side (near the gearbox, looking upwards to the driver), and the third one is placed to the left of the steering wheel (near the rear-view mirror). All videos are stored with each image separated by a description header. With knowledge of the contents of the header, the videos can easily be read in
matlab.
(a) Camera 1 (b) Camera 2 (c) Camera 3
Figure 1. Images taken with the camera setup.
Method
Linear Binary Pattern (LBP)
The methodology for lbp face recognition proposed by T. Ahonen et al. [3] is adopted with some simplifications. The face image is first tiled and each tile is encoded using the lbp-operator which yields a normalized histogram containing the image information for that specific tile. Its length depends on the settings of the lbp-operator, i.e., 𝑙𝑏𝑝8,2𝑢2. To encode face tiles into lbp-histograms, the code provided by M. Heikkila and T. Ahonen are used. All tile-precise histograms are finally aligned into a single feature vector which become a face descriptor and now is an encoded representation of the facial image.
Once a face descriptor is created, it is added to the gallery of all earlier stored face descriptors together with a number indicating its class uniqueness. Pca is applied to the database to transform it to the linear subspace that is defined by the inherit variance of the data. The code used for performing both pca and lda was created by D. Cai. The database now contains information suitable for matching. When an unknown face image is found in a video or a sequence of images, it is processed as shown in Figure 2. Instead of storing the reduced descriptor in the gallery, the probe is matched to all descriptors in the gallery using the nearest neighbor classifier (nn-classifier).
Neighbors with distances more than the threshold are then valued too far away from the subject to actually be the same person and the identity is allocated as unknown. Three different distance metrics are evaluated for the 1 − 𝑛𝑛 classifier; euclidean, cosine and chi-square distance.
Euclidean distance: 𝑑𝑥𝑦𝑒 = √ (𝑥 − 𝑦) (𝑥 − 𝑦)𝑇 (1)
Cosine distance: 𝑑𝑥𝑦𝑐 = 1 − 𝑥.𝑦
Chi-square distance: 𝑑𝑥𝑦𝑥2 = ∑ (𝑥𝑖−𝑦𝑖)2
(𝑥𝑖+𝑦𝑖)
𝑁
𝑖=1 (3)
Full image Face Image Found
Features Aligned Image Face
Detector
Feature Detector
Face Aligned
𝑥_1
⋮ 𝑥_𝑛
𝒚_𝟏 ⋮ 𝒚_𝒎
ID Number
Lbp
encoding
Lbp encoding
Pca + lda
Descriptor reduced
[image:3.595.72.500.71.232.2]Nn-classifier
Figure 2. Illustration of the face recognition process, the face is detected, searched for features, aligned, encoded using the lbp operator and then finally assigned an id using the nearest neighbor method.
Using Template Matching
One of the drawbacks of the lbp-method is that it requires full frontal images from a video source which is not always so easy to achieve that’s why it is not guaranteed that a full frontal image will be presented in the image stream. The worst condition could be a one-camera placed in a non-frontal position where the chances of the driver facing the camera are low. The way to overcome this issue is to apply a method which is not pose-dependent. Since the tracking profiles contain templates of all features in different poses which have been used in implementing a template matching algorithm. Since each profile contains a high number of templates and we all are human beings with human facial features, it is assumes that all templates will correlate positively with every face. Therefore the recognition cannot be based on any single template but it has to use one of the two classification methods developed in this project.
Frame score classification (FSC) is a classification method which consists of a two-step classifier. First it classifies distinct frames individually and a frame score vector (fsv) is formed to count the frames each profile has been classified. Then the whole video is classified as the profile with the highest frame score.
Table 1. The fsv is shown here for each video in the pilot database, highest scores are the correct ids for each video which would give a 100% recognition rate.
Video/ID 1 2 3 4 5 6 7
1 22 1 0 0 0 0 1
2 13 0 4 0 0 0 8
3 3 11 5 1 21 0 1
4 0 14 4 0 22 0 0
5 0 0 36 0 0 0 33
6 3 0 1 16 0 10 1
Accumulative score classification (ASC) is similar to the fsc, but rather than classifying every frame, the template score vector (tsv) is accumulated over all frames. The final classification is then based on the maximum value found in the accumulated tsv. The template matching shows decent results on the pilot database, but huge number of templates increase the time consumption. There are two ways to reduce the time consumption; either to make the algorithm faster which is a challenge for now or to reduce the amount of data processed which can be done by limiting the size of images, number of templates and frames to analyze.
Results and Discussion
[image:3.595.165.427.530.621.2]90% after 60 processed face images, with the lda-euclidean and lda-cosine acquiring the highest recognition rate of 91%. The aligned settings acquire a mean recognition rate of 94% after 20 processed face images, with the lda-cosine performing the best and acquiring a recognition rate of 95%. The alignment procedure therefore increases general performance which leads to better decisions even when using less data. The clean settings performs worse in both cases, indicating that the dimensionality reduction and the class clustering using pca+lda method is beneficial to the performance. However, the chi-square similarity is almost equal in performance despite using the full feature vector, motivating the choice made by T. Ahonen et al. [3] to use the chi-square similarity for lbp face recognition.
Since the template matching is a time-consuming task as of now, there had to be done some restrictions in the number of simulations, and reduce the amount of data to process in order to make the system work. A parallel simulation was done using the adaptive blob detector (abd). Since each simulation can be done with different parameters such as how many frames shall be processed and template reduction.
Unaligned settings on the complete video data base
R
ec
og
ni
ti
on
r
at
e
Number of processed unaligned images
Aligned settings on the complete video data base
Number of processed aligned images
R
ec
og
ni
ti
on
r
at
e
[image:4.595.123.477.269.384.2](a) Unaligned (b) Aligned
Figure 3. The recognition rate of the LBP-methods on the complete database. Unaligned setting reaches a mean recognition rate of 90%, whereas aligned achieves 94%.
In the classification stage, we first look at the fsc. One of the difficulties is to decide which threshold is good to use therefore circumvent is done by sweeping over a range of acceptable thresholds and evaluate the method based on maximum performance.
Figure 4. Here the number of correct classifications is shown as a function of threshold for Modified, non-modified and all videos for closed set identification.
Conclusion
In this paper we evaluated two methods for face recognition in videos recorded in the near-ir range. The data present in the tracking profiles proved to be appropriate enough to reach good recognition rates for both methods. Overall, the lbp-method is robust to both geometric transformations (scaling and rotations) and to facial occlusions due to its ability to capture image information on both a local and global level. The results showed that the facial alignment and the dimensionality reduction procedures were beneficial to the identification process which increased the performance of the lbp-method.
[image:4.595.194.406.463.565.2]References
[1] J. Goldstein, L. D. Harmon, and A. B. Lesk, “Identification of Human Faces,” Proceedings of IEEE, May 1971, Vol. 59, No. 5, 748-760.
[2] L. Sirovich and M. Kirby, “A Low-Dimensional Procedure for the Characterization of Human Faces,” Journal of the Optical Society of America. A, 1987, Vol. 4, No.3, 519-524.
[3] T. Ahonen, A. Hadid, M. Pietikainen, “Face Description with Local Binary Patterns: Application to Face Recognition”, IEEE Transactions on Pattern Analysis and Machine Intelligence, 28 (12) (DEC, 2006) 2037-2041.
[4] S. Yan, H. Wang, X. Tang, T. S. Huang, “Exploring Feature Descriptors for Face Recognition”, ICASSP (1), IEEE, 2007, 629-632.