A Survey of Techniques for Automatic Facial Expression Recognition

(1)

International Journal of Emerging Technology and Advanced Engineering

Website: www.ijetae.com (ISSN 2250-2459, ISO 9001:2008 Certified Journal, Volume 7, Issue 9, September 2017)

A Survey of Techniques for Automatic Facial Expression

Recognition

Gunavathi H S

1

, Dr. Siddappa M

2

1_{Research Scholar, Dept. of CS&E, Jain University, Bengaluru, India} 2_{Professor and Head, Dept. of CS&E, SSIT, Tumkur, India}

Abstract—Facial expression recognition is an active

research which has wide applications in the area of Human-computer interaction, social interaction, social intelligence and detection of autism. In recent years it is also used in emotion-based ads, emotion-based music play, etc. This paper summarizes the major challenges in facial expression recognition and available methods to solve those challenges. Further, we will present a comparative analysis of the existing methods for automatic facial expression recognition.

Keywords—Challenges in facial expression recognition, feature extraction, expression classification, datasets.

I. INTRODUCTION

[image:1.612.52.286.563.680.2]

As a proverb “Face is the Index of Mind”, face can represent the inner thoughts of the brain, due to which Facial Expression Recognition (FER) plays a major role in improving the human and computer interaction, treating the mental disorders by analyzing the emotional behavior, helping to provide security by identifying terrorist activities based on the emotional clues on their face. Human and computer interaction for example video conferencing and also on the social media to post the relevant posts to help the people to overcome their emotional feelings when they are depressed in order to avoid suicidal attempts and also to help the society in that aspect and to introduce the intelligence into the robots to help the old age people by understanding their needs with the help of facial expressions.

Figure 1: Facial expression recognition system overview

Facial expression recognition usually takes place in three main steps: Face Acquisition, Feature Extraction and Feature Classification.

Challenges in Facial Expression Recognition:

Choosing a robust method, to extract the features and also to classify the features will overcome the different challenges listed:

Reduce the dimension of the facial image in classifying the expression of the image.

Head pose identification: When the image captured form the different angle.

Partially occluded images because of different posture, beard, mustache, wearing sunglass, scarf, and mask and also due to hand.

 Illumination of light: Robustness of facial feature for light variation.

Robustness of facial feature to different images of every age, sex, and race.

Real-time classification: Expression recognition on the live video through a web cam, surveillance camera.

In this section, we discuss the importance of the facial expression recognition and the real-time applications, if we automate the system. Because of these reasons, multiple methods have proposed by many researchers to automate the system with respect to feature extraction and classification. Even though there are many methods available for FER, no one is suitable for all scenarios listed above. Therefore automating FER is a challenging area for research.

The scope of this research paper is to identify one such method to overcome all the challenges listed above and to work as a robust method for any scenario.

(2)

651

Section 3 gives the comparative analysis of the techniques and also talks about the more suitable one to overcome the challenges listed before. Section 4 concludes the paper with the more suitable technique.

II. LITERATURE SURVEY

In this section, we discuss some important methods available for feature extraction and feature classification. Feature extraction can be categorized based on Geometric method, appearance-based method and system based method by combining geometric and appearance based methods. Some feature classification methods will be discussed in this section.

A.Local Binary Pattern (LBP)

LBP was introduced by T Ojala et. al., in 1994[10] based on the assumption that texture has two conflict aspects a pattern and its strength and its proposed as a two-level version of the texture unit to describe the local textural patterns. The fundamental version of the local binary pattern operator works in a 3 ×3 pixel block of an image. The pixels in this block are threshold by its center pixel value, multiplied by powers of two and then summed to obtain a label for the center pixel.

[image:2.612.85.254.496.555.2]

As the neighborhood consists of 8 pixels, a total of 28 = 256 different labels can be obtained depending on the relative gray values of the center and the pixels in the neighborhood. See (Fig. 2) for an illustration of the basic LBP operator. LBP using 8 pixels in a 3 ×3 pixel block, this generic formulation of the operator puts no limitations on the size of the neighborhood or to the number of sampling points.

Figure 2: Circular (8,1), (16,2) and (8,2) neighborhoods. Pixel values are bilinearly interpolated whenever the sampling point is not in the

center of a pixel

B.Sparse Representation plus LBP

Sparse Representation-based Classification (SRC) is prominent for its strong performance to occlusions and corruptions. Local Binary Patterns (LBP) is a very powerful method to describe the texture and shape of images. They proposed a method for facial expression recognition based on a sparse representation of LBP features.

They use LBP to extract the features of facial expression images and then they use SRC to classify images into different categories based on the expressions.

The original goal of Sparse Representation algorithm was not only for classification but also for representation or compression of signals. Compare to ShannonNyquist bound, the lower sampling rate of signals are used in a sparse representation. In sparse theory reconstruction of the signals can be done with a small number of linear measurements, because of which performance of the algorithm was measured in terms of sparsity of the representation and fidelity to the original signals. In SRC measurements are nothing but the training samples.

They conducted extensive experiments on Japanese Female Facial Expression (JAFFE) database. This algorithm can attain an expression recognition rate of 62.86% for FER, better performance than using Sparse Representation-based Classification alone on facial expression recognition, and is also much better than algorithms like Principal Component Analysis (PCA) and Linear discriminant analysis (LDA).

C. Atlas Construction by Sparse GroupWise Registration

In this, the longitudinal facial expression atlases are constructed to obtain salient facial feature changes during an expression process. Given K types of facial expressions of interest, and C different subject image sequences for each expression, denote the image at the jth time point of the ith subject (i =1,...,C) as Itij . Assume each image sequence begins at time point 0 and ends at time point 1 (i.e., tij

∈

[0,1]). For each expression ,to construct N atlases at given time points T ={ t1,...,tN},where tk

∈

[ 0,1](k =1,...,N), we formulate it as an energy minimization problem by minimizing:

Where Mt is the longitudinal atlas at time point t and φi is the diffeomorphic growth model that models facial expression process for subject i. denotes the warping of subject i’s image at ﬁrst time point It

(3)

Website: www.ijetae.com (ISSN 2250-2459, ISO 9001:2008 Certified Journal, Volume 7, Issue 9, September 2017) The more states (i.e., the number of time points N) are

used, the more accurately the atlas sequence can describe facial expression process, while the computational burden also increases. Finally, images belonging to the same time points are used to initialize and iteratively reﬁne the atlas.

D.Supervised Descent Method (SDM)

As per Xuehan Xiong, 49 facial characteristics points (Fig. 3) are extracted from the face based on the Supervised Descent Method [3], where only 11 characteristics points are considered to calculate six distances. (Fig. 4) The present distances in (Fig. 4) are defined as follows: D1 is the distance between eye and eyebrow, D2 represents the opening of the eyes, D3 is the vertical distance between eye and mouth, D4 is the distance between nose and mouth, D5 represents the opening of the mouth, and D6 is the width of the mouth. Different known distances; 'Euclidean' (1), 'Manhattan' (2), and 'Minkowski' (3) were calculated in order to compare results given by each one.

[image:3.612.113.224.479.579.2]

With: P= [p1, p2,…, pn], Q= [q1, q2,…, qn]. For c = 1 and c = 2, the Minkowski metric becomes equal to the Manhattan and Euclidean metric respectively, where c is a scalar positive value of the exponent. A different exponent c can be specified. In our work, c=2.25 is chosen experimentally

Figure 3: The forty nine fiducial points extracted from the face using SDM

Figure 4: Face descriptive distances used as features

E.Histograms of Oriented Gradients

David G. Lowe proposed the algorithm Scale-invariant feature transform (SIFT) in 1999 [9] and he summarizes the algorithm comprehensively in 2004. SIFT algorithm has rotation, immutability, scaling and high tolerance against noise, brightness variations, perspective transformation, biometric transformation at the same time. Due to which it has an excellent recital while matching the scene, recognizing the object, image stitching, face recognition and other fields.

The major steps of the SIFT algorithm are 1) Scale-space extrema detection. 2) Orientation assignment. 3) Descriptor extraction.

Histogram of oriented gradients features descriptor arrives as a result from the final step of Lowe’s scale-invariant feature transform (SIFT) method for wide baseline image matching. Further, it summaries the capacity of the delivery within the image regions and is particularly useful for detecting the textured objects with deformable shapes. As HOG feature descriptor and LBP operator extracts information in differential mode, therefore they find certain similarities between them. The impact caused by the change of gray scale is weakened due to the linear variation of illumination.

HOG describes the features that replicate the shape information of the image and the gradient corresponds to the first derivative of the image. For an image I(x, y), the gradient in an arbitrary pixel point is a vector which can be defined as Equation (4).

[ ]

Where Gx is the gradient of the X direction and Gy is the gradient of the Y direction. The magnitude and direction angles of the gradient are as given by Equation (5) and Equation (6).

| |

For a digital image I(x, y), the gradient also can be defined as Equation (7).

| |

F.Hidden Markov Model (HMM)

(4)

653

TABLEI

COMPARATIVE ANALYSIS OF DISCUSSED METHODS FORFER

Title Author Publication

and Date Method Data Set Performance Drawbacks

A New Method For Facial Expression Recognition Based On Sparse Representation Plus LBP

Ming-Wei Huang, et. al.

CISP 2010 Sparse Representation Plus LBP

JAFFE 62.9% Average

performance is relatively low for commercial appliance Dynamic Facial Expression

Recognition With Atlas Construction and Sparse Representation

Yimo Guo, et. al.

IEEE 2016 Atlas construction and Sparse representation

Extended Cohn-Kanade, MMI, FERA, and

AFEW for

*DFER

UNBC-McMaster database for *SPEM above 90% LDDMM registration algorithm used may not compensate strong illumination changes Facial Expression

Recognition using Decision Trees

Fatima Zahra SALMAM, et. al.

IEEE 2016 Geometric approach +Decision tree JAFEE COHEN 89.20% 90.61%

Not working for pose of the face and real time applications Discovering the Best

Feature Extraction and Selection Algorithms for Spontaneous Facial Expression Recognition

Ligang Zhang, et. al.

IEEE 2012 SIFT+FAP+m

RMR

Feedtum and NVIE databases

63% 83%

More types of geometric features and texture features would be included for the comparison. Facial Expression

Recognition using Local Binary Patterns and Kullback Leibler Divergence AnushaVupp uturi, SukadevMeh er

IEEE 2015 LBP+KL JAFFE 95.24% Confusion with

classifying sad and fear

Facial Expression Recognition Using Local Binary Patterns

Kannan Subramanian

IJIRCCE 2013

LBP+SVM JAFFE good Not suitable for

occlusion

Feature Fusion of HOG and WLD for Facial Expression Recognition

Xiaohua Wang, et. al.

IEEE 2013 WLD with HOG and chi-square distance and the nearest neighbor

JAFFE Cohn-kanade

93.97% and 95.86%

Machine learning methods can be introduced into the system to achieve more accurate classification Facial expression

recognition using Support Vector Machines

Muzammil Abdulrahma n, Alaa Eleyan,

IEEE 2015 PCA+SVM JAFFE

MUFE database

87% 77%

Not suitable for Occlusion

(5)

Website: www.ijetae.com (ISSN 2250-2459, ISO 9001:2008 Certified Journal, Volume 7, Issue 9, September 2017) Therefore HMM has developed to identify the higher

level of emotions like interested, encouraging, discouraging, unsure and disagreeing from the basic emotions like neutral, sad, surprise and joy. It has used an emotional index model to understand the states of emotions, so it functions as database, therefore, it uses one to one mapping between facial emotions and expressions. The indexer receives the symbol and matches them against stored in the mind and then chooses symbols to represent concepts in the index. Hidden markov expert rule (HMER) is used for segmentation and to recognize emotional states from a set of video sequences. A classification framework is used for every incoming video frame and hence the facial expression recognizer identifies the head and actions which combine to form displays and thus HMER represents dynamic display and classification framework. HMER topology is constructed for four state emotional states like N represents neutral state and SU for surprise etc from which it is possible to transit the higher level emotion states. Here the probabilistic framework for modeling, time varying sequences and convergence of recognition computation runs in real time. The performance of recognizing the emotions unsure is 87% and disagreeing 78%.

G.Neural Networks

As per the author Er.MonicaVerma et. al., [14] in this paper there is a combination of two methods Feature extraction and neural network and two stages are involved in face detection and classification. Images are preprocessed in order to decrease the time and to increase the quality of an image. In the first stage, features are extracted from face image using Gabor filters with an orientation of 5*8 that is 40 filters. The second stage to classify the facial images obtained from feature vector using a neural network. Here the Neural network is trained with face and non-face images from database. The images in the dataset are of 27*18 pixels, the images are in the grayscale in TIFF format. The performance rate is 84.4%.

H.Support Vector Machine

As per the author Stewart Bartlett et. al., [15] each video frame is first scanned in real-time to detect frontal faces, then the faces are scaled into image patches of equal size along with a bank of Gabor energy filters. The filtered image is given as an input to recognition classifier which codes expression into different dimensions. The facial features are selected from Gabor filters using AdaBoost and this again trained with Support Vector Machine. The author developed an end-to-end system that provides different facial expression codes at 24 frames per second and animates a computer generated characteristics, fully automated facial action coding is also applied and recognition rate is 93%.

III. COMPARATIVE ANALYSIS OF DISCUSSED METHODS

FOR FER

Table 1 provides the comparative analysis of the discussed methods for facial expression recognition and their performance.

Different types of methods are considered in this analysis like sparse representation + LBP, Atlas construction and sparse representation, Geometric approach + Decision tree, SIFT + FAP + mRMR, LBP + Kullback Leibler Divergence, LBP + SVM, WLD + HOG , PCA + SVM , etc.

IV. CONCLUSION

Automating facial expression recognition is a challenging issue therefore much research has been done on this area, as a result, many techniques have been evolved to extract the features from the face and also to classify the extracted features to analyze the expression of the face. The different methods for facial expression recognition are difficult to compare due to differences in the experimental setup. In this paper, a comprehensive study of different approaches available for facial expression recognition is discussed which will be useful for other researchers to decide an optimal solution for dedicated facial expression recognition or enhance the existing methods to get the better and accurate results. For a system to be more robust and practical to apply in real time it should be able to detect micro-expressions, occlusions and deal with different angles of the head. In our future work, we will try to address these problems using hybrid methods.

REFERENCES

[1] Yimo Guo, Guoying Zhao and Matti Pietikäinen, “Dynamic Facial Expression Recognition with Atlas”, IEEE 2016.

[2] Fatima Zahra SALMAM, Abdellah MADANI and Mohamed KISSI, “Facial Expression Recognition using Decision Trees”, IEEE 2016. [3] X. Xiong, and F. De la Torre. “Supervised descent method and its

applications to face alignment”, Computer Vision and Pattern Recognition (CVPR), IEEE, 2013.

[4] Kannan Subramanian . “Facial Expression Recognition Using Local Binary Patterns”, IJIRCCE 2013.

[5] AnushaVupputuri and SukadevMeher. “Facial Expression Recognition using Local Binary Patterns and Kullback Leibler Divergence”, IEEE 2015.

[6] Ming-Wei Huang , Zhe-wei Wang and Zi-Lu Ying, “A New Method For Facial Expression Recognition Based On Sparse Representation Plus LBP”, CISP 2010.

[7] Ligang Zhang, Dian Tjondronegoro and Vinod Chandran, “Discovering the Best Feature Extraction and Selection Algorithms for Spontaneous Facial Expression Recognition”, IEEE 2012. [8] Xiaohua Wang, Chao Jin, Wei Liu, Min Hu, Liangfeng Xu, and Fuji

Ren, “Feature Fusion of HOG and WLD for Facial Expression Recognition”, IEEE 2013.

(6)

655

[10] T. Ojala, M. Pietikainen, T. Maenpaa, “Multi-resolution Gray Scale and Rotation Invariant Texture Analysis with Local Binary Patterns”, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 7, pp. 971-987, 2010

[11] Muzammil Abdulrahman, Alaa Eleyan, “Facial expression recognition using Support Vector Machines”, IEEE 2015.

[12] M. J. Lyons, S. Akamatsu, M. Kamachi, J. Gyoba, “Coding Facial Expressions with Gabor Wavelets”, Proceedings of the 3th IEEE International Conference on Automatic Face and Gesture Recognition, pp. 200-205, 1998.

[13] Teik-Toe TEOH Siu-Yeung CHO, “Human Emotional States Modeling by HiddenMarkov Model”, 2011 Seventh International Conference on Natural Computation.

[14] Er. Monika Verma Er. Pooja Rani Er. Harish Kundra, “A Hybrid Approach to Human Face Detection”, 2010 International Journal of Computer Applications(0975-8887)Vol 1-No.13.

[15] GwenLittlewort, Marian Stewart Bartlett, “Dynamics of facial expression extracted automatically from video”, Image and Vision Computing 24(2006) 615-625.

[16] Z. Zhang, M. Lyons, M. Schuster, S. Akamatsu, “Comparison Between Geometry-Based and Gabor-Wavelets-Based Facial

Expression Recognition Using Multi-Layer

Perceptron”, Proceedings of the Third IEEE International Conference on Automatic Face and Gesture Recognition, pp. 454-459, 1998

[17] Z. Niu, X. Qiu, “Facial Expression Recognition Based On Weighted Principal Component Analysis And Support Vector Machines”, IEEE 3rd International Conference On Advanced Computer Theory And Engineering, pp. 174-178, 2010.

[18] A.Punitha,M.Kalaiselvigeetha “Texture based Emotion Recognition from Facial Expression using Support Vector Machine”, International Journal of Computer Applications(0975-8887) Vol 80, No.5,October 2013.

[19] Sandeep K. Gupta, ShubhLakshmi Agrwal, Yogesh K. Meena, Neeta Nain, “A Hybrid Method of Feature Extraction for Facial Expression Recognition”, 2011 Seventh International Conference on Signal Image Technology & Internet-Based Systems.

[20] Ziyang Zhang, Xiaomin Mu, Lei Gao, “Recognizing Facial Expressions Based on Gabor Filter Selection”, 2011 4th International Congress on Image and Signal Processing.

[21] Zhiguo Niu ,Prof. Xuehong Qiu, “Facial Expression Recognition based on weighted principal component analysis andsupport vector machines”, 2010 3rd International Conference on Advanced Computer Theory and Engineering (ICACTE).

BIOGRAPHIES

Author one is a Asst. Professor in Department of Computer Science and Engineering, Bangalore Institute of Technology, Bengaluru and research scholar in Jain University, Bengaluru, Karnataka, India. She had Obtained M.Tech degree and B.E. degree from VTU Belgaum. She has published research papers in National and International conferences. Her research areas of interest are Image and Video Processing, Pattern Recognition and Computer Networks.