2019 International Conference on Applied Mathematics, Modeling, Simulation and Optimization (AMMSO 2019) ISBN: 978-1-60595-631-2
Facial Expression Recognition via Kernel Sparse Representation
Shi-ping CHEN
1, Xiao-hu WANG
1, Zhi-qiang PENG
1,
Dong-chu CHEN
2and Jun HONG
1,*1School of Electrical and Information Engineering, Hunan Institute of Technology, Hengyang 421002, China
2Hunan Da Tu Surveying and Mapping Technology Co., Ltd., Hengyang 421002, China
*Corresponding author
Keywords: Facial expression recognition, Kernel sparse representation, Feature extraction, Local binary patterns.
Abstract. For seeking sparse representation coefficients in the kernel feature space, we propose a novel method of facial expression recognition based on kernel sparse representation classification to
improve the recognition rate of facial expression. The l1-norm minimization problem was solved in
this paper. The coefficient vector was obtained and the local binary patterns features were extracted as the facial representation features. kernel sparse representation classification was used to perform facial expression classification and compared with sparse representation-based classification, the nearest subspace, support vector machines, K-nearest neighbor, and radial basis function neural network. Test results based on the universal JAFFE database show that the presented kernel sparse representation classification method is more effective than other methods in facial expression recognition.
Introduction
Different facial expressions reflect a person's inner emotional states, intentions or social communications. Facial expressions are the primary way human beings express and interpret emotional states. Facial expression recognition aims to identify the human emotional states by using facial expression. In the past decade, facial expression recognition has become an important research topic in the fields of pattern recognition, artificial intelligence and computer vision, due to its important applications in human-computer interaction, artificial intelligence, security monitoring, social entertainment, etc.
and more attention in the field of facial expression recognition. Li et al. [5] successfully used polytypic multi-block local binary patterns for automatic facial expression recognition. A recent review of LBP could be found in [6].
In terms of facial expression classification, there are many classification methods for facial expression recognition, such as hidden Markov model (HMM), artificial neural network (ANN), support vector machines (SVM), K-nearest neighbor (KNN) and so on. Recently, a new classification method called sparse representation-based classification (SRC)[7] based on compressed sensing [8, 9], has been successfully utilized for facial expression recognition and drawn extensive attentions. In some previous work [10-12], it’s found that SRC could obtain promising performance on facial expression recognition tasks.
It’s worth noting that the recognition ability of SRC depends on the quality of the dictionary.
Ideally, for the l1-norm minimization algorithms, the dictionary atoms corresponding to different
classes should be separated from each other. For another perspective, the validity of SRC is based on the assumption that different types of data points are not distributed in the same radius direction. However, this assumption is invalid when dealing with some actual data, where the data classes are layered along the radius direction. Therefore, if a test sample has the same vector direction as the training samples belonging to two or more classes, SRC would not effectively identify it. In other words, SRC lost its classification ability when dealing with data with the same direction distribution.
To solve the SRC problem mentioned above, some preliminary work has been recently done to develop kernel sparse representation based classification (KSRC) [13]. In essence, with the aid of the kernel trick, KSRC aims to find sparse representation coefficients in a high-dimensional feature space (also called kernel feature space) rather than in the original feature space. On the other hand, KSRC recalls the kernel trick to maps the original data into the kernel feature space by using some nonlinear mapping associated with a kernel function and then performs the SRC algorithm in this kernel feature space. Motivated by little research done on investigating the performance of KSRC on facial expression recognition tasks, we proposed a novel methodology of facial expression recognition using KSRC. Test results on the universal JAFFE database show the validity of the proposed KSRC approach for facial expression recognition.
Local Binary Patterns
In this chapter, we present the descriptions of facial feature extraction based on local binary patterns (LBP). The LBP operator is originally used to label the pixels of an image, which is conducted by thresholding a neighborhood of the 3×3 size with the center value for each pixel. The resulted LBP code of the center pixel could be achieved by changing the binary code into a decimal one. an example of a basic LBP operator is shown in Fig.1. We could label each pixel of an image by using the resulted LBP code based on the LBP operator.
Figure 1. An example of a basic LBP operator.
Figure 2.The process ofthe LBP feature extraction.
The 59-bin LBP operator was used to perform the LBP feature extraction from Ref.[14]. Firstly, the cropped face images of 110×150 pixels were divided into several non-overlapping blocks with a size of 18×21 pixels, yielding a good trade-off between classification accuracy and the size of feature vector. The original cropped facial images could be divided into 42 (6×7) different regions, and the dimensionality of the obtained feature vector produced by the LBP histograms is 2478 (59×42).
Proposed Method Based on KSRC
By combining the kernel technique and SRC method, kernel sparse representation-based classification (KSRC) is developed as a nonlinear extension of SRC. In fact, KSRC aims to find sparse representation coefficients in the kernel feature space rather than in the original feature space.
It is well known that data point xowns a nonlinear kernel mapping :
:Rd , x ( )x
(1)
We can make the input data point d
i
x R map into some high-dimensional feature space based
on the nonlinear mapping . As for the kernel feature space , the inner product , can be defined
for a properly chosen , leading to a reproducing kernel Hilbert space (RKHS). In RHKS, a kernel
function k x x( ,i j) can be defined as
( ,i j) ( ), (i j) ( )i T ( j)
k x x x x x x
(2)
k is a kernel and there are three frequently-used kernel functions: the linear kernel, polynomial
kernel and the Gaussian kernel.
The linear kernel function can be represented as
( ,i j) iT j
k x x x x
(3) The polynomial kernel function is
( ,i j) ( iT j)r
k x x x x
(4)
In above equation, r is the degree for polynomial kernel and positive scalar.
The Gaussian kernel function can be expressed by
2 2
( ,i j) exp( | i j| / 2 )
k x x x x
(5)
Where is parameter of the Gaussian kernel. In SRC, a test sample x could be expressed linearly
by all the training samples in the original feature space. In the kernel feature space the input data
point x is transformed into ( )x . Similarly, in KSRC a test sample ( )x can be expressed by the
training samples in the kernel feature space :
( )x
μα (6)
Where μ[ 1, 2, ,n][ ( ), ( x1 x2), , ( xn)] represents all the training samples in the kernel
2
1 2
min , subject to ( )x
α μα (7)
In order to get the value of coefficient vectorα, the l1-norm minimization problem of Eq. (7) in
KSRC can be reformulated as the following Lasso optimization problem:
2
2 1
min ( )x +
μα α (8)
Where is used to balance the sparsity and the reconstruction error, and is called the
regularization parameter. Generally, a larger denotes a sparser solution. Various iterative methods
are used to solve Eq.(8), such as the feature-sign search algorithm [13], and the gradient-projection algorithm [15]. These related algorithms can be found in a survey [16].
In summary, the facial expression recognition based on KSRC is summarized in detail as shown in Algorithm 1.
Experiment Verification
To demonstrate the performance of the presented KSRC methodology on facial expression recognition tasks, the universal JAFFE [17] facial expression database is employed for experiments. The presented KSRC method is compared with several the-state-of-art methods, including sparse representation-based classification (SRC), the nearest subspace (NS) [18], support vector machines (SVM), K-nearest neighbor (KNN), and the standard radial basis function neural network (RBFNN). Note that, the optimal value of K for KNN was obtained by utilizing an exhaust search within the range [1, 20] with a step of 1, and recorded the best performance of KNN. The LIBSVM package [19] was employed to perform the SVM algorithm with the linear kernel function, one-versus-one
strategy for multi-class classification problem. For SRC, the l1-norm minimization is solved by using
the l1-magic method [20], and =0.001. For KSRC, the feature-sign search method is used to solve
the l1-norm minimization problem of Eq. (8) and =0.001. The typical Gaussian kernel function is
used for KSRC. A 10-fold cross validation scheme is conducted in 7-class facial expression classification experiments, and the average identification results are reported. The test platform is Intel CPU 2.10 GHz, 1G RAM memory, MATLAB 7.0.1 (R14).
Database
The used JAFFE database [17] consists of 7 basic expressions, including neutral, happiness, sadness, surprise, anger, disgust and fear. It contains 10 Japanese women, each of which has 7 expressions. Each expression has about 3, 4 images, a total of 213 images. Each image pixel is 256 × 256.
Experimental Results and Analysis
other words, KSRC aims to seek sparse representation coefficients in the kernel feature space for classification. On the other hand, as a kernel methodology, KSRC is able to capture the non-linear relationships of data, so it performs better than SRC for classification.
Table 1. Recognition accuracy (%) of different methods.
Metho
ds SRC NS SVM KNN RBF
NN KSR
C Accur
acy 84.7
6 81.74 79.88 80.00 68.09 85.7
[image:5.595.104.493.206.339.2]1
Table 2. Confusion matrix of recognition results when KSRC performs best.
Anger (%)
Happiness (%)
Sadness (%)
Surprise (%)
Disgust (%)
Fear (%)
Neutral (%)
Anger 90.00 0 6.67 0 0 0 3.33
Happiness 0 96.67 0 0 0 0 3.33
Sadness 0 3.23 90.32 0 0 3.23 3.22
Surprise 0 6.67 0 93.33 0 0 0
Disgust 17.86 0 3.57 0 71.43 7.14 0
Fear
0 0 6.45 6.45 9.68
77.4
2 0
Neutral 0 3.34 3.33 3.33 0 0 90.00
In order to give the correct recognition rate for each expression, Table 2 lists the fuzzy matrix of seven facial expression recognition results when KSRC perform best (i.e., the obtained accuracy is 85.71%). The diagonal data in Table 2 are the correct recognition rate of each expression. The results in Table 2 show that five expressions, including anger, joy, sadness, surprise and neutral, are classified well, since their correct recognition rates are more than 90%. Among them, happiness is identified with the highest accuracy of 96.67%. In contrast, the correct recognition rates of disgust and fear is slightly lower, i.e., 71.43% and 77.42%, respectively. The main reason is that disgust and fear are easily confused.
Conclusions
This paper develops a novel approach of facial expression recognition based on KSRC and the LBP features. The best performance with an accuracy of 85.71% on the JAFFE database was obtained by KSRC, outperforming SRC, NS, SVM, KNN, and RBFNN. This can be explained by the fact that KSRC integrates the kernel trick and the SRC method, resulting in a good classification performance.
Acknowledgment
This work is supported by Key Project of Hunan Science and Technology Department (No.2017GK2160), Hengyang Key Laboratory of Microwave Photon Application Technology (No.2018KJ119), Hengyang Key Laboratory of Photoelectric Information Detection and Processing Science and Technology(No.2016KF07), Project in Hengyang (No. 2017KJ064), and Key Cultivation Project in Hunan Institute of Technology.
References
[1] Shaohua Wan, J.K. Aggarwal, "Spontaneous facial expression recognition: A robust metric learning approach", Pattern Recognition, vol. 47, no.1, pp. 1859–1868, 2014.
[3] W. Gu, C. Xiang, Y. Venkatesh, D. Huang and H. Lin, "Facial expression recognition using radial encoding of local gabor features and classifier synthesis", Pattern Recognition, vol.45, no.1, pp. 80-91 2012.
[4] Wei Li, Chen Chen, Hongjun Su, Qian Du, "Local Binary Patterns and Extreme Learning Machine for Hyperspectral Imagery Classification", IEEE Transactions on Geoscience and Remote Sensing, vol. 53, no. 7, pp. 3681-3693, 2015.
[5] Li, Q. Ruan, Y. Jin, G. An, and R. Zhao, "Fully automatic 3D facial expression recognition using polytypic multi-block local binary patterns," Signal Processing, vol. 108, pp. 297-308, 2015.
[6] S. Brahnam, L. C. Jain, A. Lumini, and L. Nanni, "Introduction to Local Binary Patterns: New Variants and Applications," in Local Binary Patterns: New Variants and Applications, Springer, pp. 1-13, 2014.
[7] J. Wright, A. Y. Yang, A. Ganesh, S. S. Sastry and Y. Ma, "Robust face recognition via sparse representation", IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.31, no.2, pp. 210-227, 2009.
[8] E. J. Candès and M. B. Wakin, "An introduction to compressive sampling", IEEE Signal Processing Magazine, vol. 25, no. 2, pp. 21-30, 2008.
[9] D. L. Donoho, "Compressed sensing", IEEE Transactions on Information Theory,, vol. 52, no. 4, pp. 1289-1306, 2006.
[10] M. Mohammadi, E. Fatemizadeh, and M. Mahoor, "PCA-based dictionary building for accurate facial expression recognition via sparse representation," Journal of Visual [11] Communication and Image Representation, vol. 25, pp. 1082-1092, 2014.
[11] Y. Fang and L. Chang, "Multi-instance Feature Learning Based on Sparse Representation for Facial Expression Recognition," in MultiMedia Modeling, pp. 224-233, 2015.
[12] Y. Ouyang, N. Sang, and R. Huang, "Accurate and robust facial expressions recognition by fusing multiple sparse representation based classifiers," Neurocomputing, vol. 149, pp. 71-78, 2015.
[13] S. Gao, I. W.-H. Tsang and L.-T. Chia, "Sparse representation with kernels", IEEE Transactions on Image Processing, vol. 22, pp. 423-434 2013.
[14] T. Ahonen, A. Hadid and M. Pietikainen, "Face description with local binary patterns: Application to face recognition", IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.28, no.12, pp. 2037-2041, 2006.
[15] M. W. Schmidt, K. P. Murphy, G. Fung and R. Rosales, "Structure learning in random fields for heart motion abnormality detection", in IEEE Conference on Computer Vision and Pattern Recognition (CVPR'08) pp. 1-8, Citeseer, Anchorage, AK 2008.
[16] J. A. Tropp and S. J. Wright, "Computational methods for sparse solution of linear inverse problems", Proceedings of the IEEE, vol. 98, no. 6, pp. 948-958, 2010.
[17] M. J. Lyons, J. Budynek and S. Akamatsu, "Automatic classification of single facial images", IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 21, no. 12, pp. 1357-1362, 1999.
[18] K. C. Lee, J. Ho and D. J. Kriegman, "Acquiring linear subspaces for face recognition under variable lighting", IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 5, pp. 684-698, 2005.
[19] C. Chang and C. Lin, "Libsvm: A library for support vector machines, 2001", in Software available at http://www.csie.ntu. edu. tw/cjlin/libsvm, 2001.