Performance Analysis of Similarity Coefficient Feature Vector on Facial Expression Recognition

(1)

Procedia Engineering 144 ( 2016 ) 444 – 451

Peer-review under responsibility of the organizing committee of ICOVP 2015 doi: 10.1016/j.proeng.2016.05.154

ScienceDirect

12th International Conference on Vibration Problems, ICOVP 2015

Performance Analysis of Similarity Coefficient Feature Vector on

Facial Expression Recognition

Tanjea Ane

a,

* , Md. Fazlul Karim Patwary

b a _{IIT, Jahangirnagar University, Saver, Dhaka-1342, Bangladesh} b_{Associate Professor,}_{IIT, Jahangirnagar University,}_{Saver, Dhaka-1342, Bangladesh}

Abstract

Facial expression is an essential and impressive means of human contact. This is important connection of information for knowing emotional case and motive. A facial expression pursues not only emotions, but other creative action, social cooperation and psychological characteristics. Appearance based facial expression recognition systems are analyzed and have pulled widen application. A new study of bit intensity with coefficient feature vector for facial expression recognition proposed in this paper. All the binary patterns from gray color intensity values are grouped into possible number of attributes according to their similarity. Each attributes count the frequency number of similarity from binary patterns. Each image divided into equal sized blocks and extracts 4-bit binary patterns in two distinct directions for a pixel by measuring the gray color intensity values with its neighbouring pixels. For evaluation the proposed descriptors, JAFFE dataset and Support Vector Machine were applied. Proposed method has achieved excellent achievement in terms of efficiency, robustness and lessens execution time.

Peer-review under responsibility of the organizing committee of ICOVP 2015

Keywords: Feature descriptor, coefficient feature vector, similarity index, JAFFE.

1. Introduction

A vital component of human to human communication is facial expression. Facial expression is recognized to be the most meaningful indication. Signalling system composes primary seven candidate emotions [1]. Automatic recognition of facial expression is used for interaction of human to computer in divers area e.g. emotion analysis

* Corresponding author.

E-mail address: [email protected]

(2)

with static or dynamic facial state, indexing and retrieval of video or image information, animation with image sequence etc. as the human faces pick outstanding information about emotion and instinct state of every individual human [2], [3].The 'Mehrabian’s theory' (7%/38%/55%) was practiced in different positions where there was incongruence between speech and emotions. In human communication, only 7% is contributed by the verbal part, 38% by facial movement but 55% by facial expression. This means that facial expression analysis is a necessary part in human-human as well as human to machine interaction [4].Taking facial points from both forward and sideward view points are considered as essential facial features (e.g., face, lip, view, nose) [5]. In [6], face detection process is complex which is based on face and non-faced detection algorithm. Most of researches have been done on dynamic textures [7] – [13]. Advantages of these approaches include less local processing time for validity to monotonic grey-scale changes and simple computation. [14] Paper aimed to admit fine grained diversity in facial expression on the basis of Facial Action Coding System (FACS) and action units (AUs). It was gained that muscle movement of facial expression was bounded with 44 facial Action Units (AUs), every units were different from others. In [15], for making a model of human facial expressions from video arrangements as input image Hidden Markov Models (HMMs) was used and Naïve-Bayes was used to admit the facial emotion. In [16], the local features were considered as vital image features found around the regions of eyes, nose and mouth. [17] A real moment based model is built to detect the facial expression recognition and Support Vector Machine was used for classifier. Along with facial expression recognition both space and time was considered in model [18]. A large set of human facial states were shown in [19]. Rule base reasoning and facial features were used to identify 20 AUs. In [20], proposed candied grid nodes were used to create facial wire frame model for facial emotions detection. All the above papers used geometry-based feature extraction methods. In [21] Facial representation method based on Local Binary Pattern (LBP) was proposed which is on still images. The extension of LBP model was implemented as LBPRIU2, was proposed in [22].It was rotated-invariant system and here feature vector length was reduced. LBP is justified to be an impressive texture descriptor and practiced by many researchers for pattern recognition e.g. facial expression recognition [23], [24]. In [25] proposed a new texture descriptor LPQ (Local Phase Quantization). In [26], both local binary pattern and local phase quantization method were applied but system took more time to build a facial expression recognition system. Facial recognition process includes invariant under shifting and altering [27]. In [28], it was established on the basis of four prototypic expressions. [29], Local Distinctive Gradient Pattern (LDGP) established a pixel relation among the separated blocks in whole image, creates feature vector with decimal number from two binary patterns in each block. The image is divided on equal size 81 blocks.

Proposed feature representation method can capture more texture information from local 5x3 pixels area. It presents unique relations of the referenced pixel (‘c’) with four pixels at level one (‘a1’,‘a2’,‘a3’and‘a4’) and different four pixels at level two (‘b1’,’b2’,’b3’,’b4’) using LDGP (Local Distinctive Gradient Pattern) model for boosting the facial expression recognition. New similarity based feature vector is applied on which the histogram for all features vector in each block is connected to form the global feature vector for the whole image. In LDGP the all the features of binary patterns are converted in decimal number and create feature vector. But gray color intensity does not change more rapidly as like in decimal number, that is not matched with real practice. We proposed a new Jaccard similarity coefficient based feature vector in this paper. All the binary patterns from gray color intensity values are grouped into possible number of attributes according to their similarity. Each attributes count the frequency number of similarity from binary patterns. And each block, for two patterns level require only 32 bins for feature histogram. So feature dimension is less than other geometry or template based facial expression recognition process.

2. System implementation:

At first, The pattern is computed from the 5x3 pixels region as shown in Fig.1is used to calculate binary code for a pixel .Where ,’C’ is the referenced pixel. The gray color intensity values of the pixels are a1, a2, a3, a4, b1, b2, b3 and b4 are used to formulate the binary patterns as shown table 1. A detailed example is given in table 1, where (a) gray color image, (b) gray color intensity value in two levels, (c) represents a(1-4) the corresponding gray color value of the pixels at level 1 and (b) represents b(1-4) the corresponding gray color value of the pixels at level 2.

(3)

The feature binary pattern consists of two different levels 4-bit binary pattern, say pattern-1 (P1) and pattern-2 (P2). P1 is computed using the pixels values of C, a1, a2, a3, a4 and P2 iscalculated using the pixels value of C, b1, b2, b3 and b4 as shown in table 1. Each level 4-bit binary pattern can have at most 24=16 combinations. Therefore, P1 needs 16 bins to represent a single pixel and in the same way, P2 needs 16 bins too. The two values represent the gradient directions of the gray color intensities of the neighbouring pixels with respect to the referenced pixel in two different levels

The feature patterns can be derived for all the pixels where there exists eight neighbouring pixels are a1, a2, a3, a4, b1, b2, b3 and b4 with the reference pixel ‘C’. A detailed example of obtaining feature binary patterns for a pixel is shown in table 1 step by step. Therefore to code a gray facial image using proposed method, 16+16=32 bins are required. For example, for the first level pattern (0000 to 1111) and for the second level pattern same as before (0000 to 1111). Since a facial image is divided into equal sized blocks 32(16+16) bins for counting the numbers of occurrences of the two 4-bit patterns for a given block are needed. Finally, the histograms are derived from binary patterns from all blocks that are integrated to form the feature vector for the whole image table 2.

Table 1:

Detailed example of calculating feature code for a pixel in 5x3. (a) local region with pixel intensities, (b) 4-bit code for pattern- 1=0111,

and (c) 4-bit code for pattern- 2=0110

(a)3x5 pixels region

45 110 66 120 100 180 300 140 100 First bit is 0 as 66<100 45 110 66 120 100 180 300 140 100 second bit is 1 as 120>100 45 110 66 120 100 180 300 140 100

third bit is 1 as 180>100 45 110 66 120 100 180 300 140 100

fourth bit is 1 as 140>100 45 110 66 120 100 180 300 140 100 (b) therefore, pattern(p1)=0111 First bit is 0 as 45<100

45

110

66 120 100 180 300 140 100 second bit is 1 as 110>100 45 110 66 120 100 180 300 140 100 third bit is 1 as 300>100 45 110 66 120 100 180 300 140 100 fourth bit is 0 as 100=100 45 110 66 120 100 180 300 140 100 (c) therefore, pattern(p2)=0110

(4)

Table 2: Building feature vector for a single facial image. The feature histogram for the whole image is the

feature vector representation for this face.

Facial Image

Face region is divided

1

st

block histogram Last block histogram

3. Experiment and result analysis

3.1. Jaccard Coefficient based Feature Vector

We introduced a new process of indexing binary attributes using the concept of Jaccard similarity coefficient which can be used to measure the similarity between two sample binary sets. We partitioned the whole image (table 2) into number of blocks with 5x3 pixels areas. From each block there are a centre or ‘referenced’ pixel to which we compare neighbouring eight pixels in different two levels to make a 8- bit string. Another paper on LDGP [29] converted this binary pattern into decimal value directly to build local feature vector but in our proposed method, each similar probability of binary patterns is compared with a standard 8-bit string that comprises 1’s. Jaccard indexing for feature vector is calculated while the indexing binary attributes into similarity groups according to their coefficient values. Thus model extracts facial attributes from two levels (pattern 1, pattern 2). Each of the two levels has four attributes either 0 or 1. The similarity of each combination in attributes for both p1 and p2 are specified as follows:

The Jaccard similarity coefficient J, is given as:

M11 represents the total number of attributes where p1 and p2 both have a value of 1.

M01 represents the total number of attributes where the attribute of p1 is 0 and the attribute of p2 is 1

M10 represents the total number of attributes where the attribute of p1 is 1 and the attribute of p2 is 0.

M

11

M

00

+M

11

+M

01

+M

10

J=

(5)

M00 represents the total number of attributes where p1 and p2 both have a value of 0.

Patterns p1 and p2 the possible number of similar bit string is eight. Each featuring bit string is compared with respect to its standard featuring bit string to index the similar probability measurements. Capturing feature vector from each block, all the feature vectors are integrated to build global facial feature vector.

3.2. Cross validation performance:

In this paper, a ten-fold none overlapping cross validation was performed. Cross validation divided the whole image set in two segments, one is training and other is testing image set. Among seven facial expressions, the 90% images were taken from each expression for training and rest 10% images were used for testing. Thus features are extracted from each block. Integrating the feature vectors from all the local blocks produced the distinct feature vector for whole image. Within the folds there was no overlapping and it was user choice. For each fold, ten rounds of training and testing were reported. The computation results in average confusion matrix is displayed and compared against the others in table 5, 6.

3.3. Kernel Parameter settings for Classifier:

For multi class classification LIBSVM, the kernel input values were set to: s=0 for SVM type C-Svc (default), t=0/1/2/ for linear, polynomial and RBF kernel function appropriately, c=1 is the cost of SVM, g=1/ (diameter of feature vector dimension), b=1 for probability evaluation [31]. This setting of LIBSVM was found to be suitable for JAFFE dataset with seven classes of data. Matlab “fdlibmex” library was used to detect the frontal face portion of all the JAFFE dataset images. Images are partitioned into 9x9 equal sized blocks and 11x11 pixels are bounded in each local block. Then facial images are masked by red color box to cut out the inessential hair and neck-sides and results a frontal facial image and divide equal 9x9=81 sized blocks.

Newly proposed method for facial expression recognition achieved high accuracy than other recent works on JAFFE dataset [30]: seven expression in 213 images, table 3, 4. Table 7 shows that proposed new method performance is high accurate, less computation time than LBP, LPQ. Table 8 has shown the misclassification result. From experiment result analysis, it is clearly proved that coefficient based feature vector method has obtained best performance with high accuracy.

Table 3: Instances for seven Expression on JAFFE Expression class JAFFE

Anger 30 Disgust 29 Fear 32 Happy 31 Sadness 31 Surprise 30 Neutral 30

(6)

Table 4: Some image samples from JAFFE dataset for the 7 facial expressions, two in each column for each expression.

Angry Disgust Fear Happy Natural Sad Surprise

Table 5: Average confusion matrix for facial expression recognition

Prediction Angry Disgust Fear Happy Neutral Sad Surprise

Angry 100 0.0 0.0 0.0 0.0 0.0 0.0 Disgust 4.9 94.1 1.0 0.0 0.0 0.0 0.0 Fear 0.0 0.0 90.4 9.6 0.0 0.0 0.0 Happy 0.0 0.0 0.0 99.0 1.0 0.0 0.0 Neutral 0.0 0.0 0.0 0.0 100 0.0 0.0 Sad 0.0 0.0 0.0 0.0 0.0 98.7 1.3 Surprise 0.0 0.0 0.0 0.0 0.0 2.4 97.6

Table 6: Average confusion matrix for facial expression recognition using LDGP descriptor

Prediction Angry Disgust Fear Happy Neutral Sad Surprise

Angry 100 0.0 0.0 0.0 0.0 0.0 0.0 Disgust 0.0 93.1 6.9 0.0 0.0 0.0 0.0 Fear 0.0 6.3 84.4 0.0 3.1 3.1 3.1 Happy 0.0 0.0 0.0 96.8 0.0 3.2 0.0 Neutral 0.0 0.0 0.0 0.0 100 0.0 0.0 Sad 3.2 0.0 3.6 6.5 1.6 85.1 0.0 Surprise 0.0 0.0 0.0 3.3 0.0 0.0 96.7

(7)

Table 8: Examples of misclassified expressions

Images in sequences

Truth expressions Angry Sad Disgust Happy

Misclassified- results Sad Angry Happy Disgust

4. Conclusion

This local descriptor provides the gray color intensities values from different and opposite direction pixel relation comparing centre pixel. We applied similarity coefficient based feature vector extraction method among the whole dataset. It executes more accurate classification with top most features similarity among all the image samples so that the resultant features attribute becomes uniform with no more than two changeovers at the binary patterns. Above experiments illustrated our newly proposed feature descriptor method is effective and more accurate than other appearance based local feature method on static image. The proposed method successfully classifies 93.50% of facial expressions accurately with less misclassification result. Future work may include working with motion pictures, real time video, computer vision and images having different viewpoints. So, proposed method obtained high accuracy, better recognition results and improves execution time.

Acknowledgements

We would like to deeply thank several researchers for providing various sources [1]-[31] that made our research work possible.

Table 7: Comparison of facial expression accuracy with recent

appearance-based works Method Classification accuracy Feature extraction time(sec) Classification time(sec) Proposed new coefficient based method 93.50% 0.001 0.003 LPQ 79.11% 0.095 0.070 LBP 90.42% 0.026 0.070 LBPU2 84.43% 0.039 0.025 LBPu2+LPQ 83.82% 0.137 1.00

(8)

References

[1] J. A. Russell and J. M. Fernandez-Dols, The Psychology of Facial .Expression. Cambridge, U.K.: Cambridge Univ. Press, 1997. [2] I. S. Pandzic and R. Forchheimer, Eds., MPEG-4 Facial Animation. New York: Wiley, 2002.

[3] P.Ekman and W.Friesen,Facial Action Coding System,Palo Alto,CA Consulting Psychologists Press,1978. [4] A. Mehrabian, Communication without words, psychology Today. 2(9) pp. 53-56, 1968.

[5] M. Pantic and L. Rothkrantz, “Automatic analysis of facial expressions: The state of thno. 12, pp. 1424–1445, Dec. 2000. [6] Guodong Guo, Stan Z. Li, and Kapluk Chan,” Face Recognition by Support Vector Machines”, School of Electrical and Electronic

Engineering Nanyang Technological University, Singapore 639798.

[7] J. Yu and B. Bhanu, “Evolutionary feature synthesis for facial expression recognition,” Pattern Recog. Lett., vol. 27, no. 11, pp. 1289– 1298,Aug. 2006.

[8] T. Wu, M. Bartlett, and J. Movellan, “Facial expression recognition using Gabor motion energy filters,” in Proc. IEEE Int. Conf. Comput. Vis. Pattern Recog. Workshop Human Commun. Behav. Anal., Jun. 2010, pp. 42–47.

[9] Y. Tong, J. Chen, and Q. Ji, “A unified probabilistic framework for spon-taneous facial action modeling and understanding,” IEEE Trans.Pattern Anal. Mach. Intell., vol. 32, no. 2, pp. 258–273, Feb. 2010.

[10] P. Yang, Q. Liu, and D. Metaxas, “Boosting coded dynamic features for facial action units and facial expression recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recog., Jun. 2007, pp. 1–6.

[11] Z. Ambadar, J. Schooler, and J. Cohn, “Deciphering the enigmatic face: The importance of facial dynamics to interpreting subtle facial expressions,”Psychol. Sci., vol. 16, no. 5, pp. 403–410, May 2005.

[12] J. N. Bassili, “Emotion recognition: The role of facial movement and the relative importance of upper and lower areas of the face,” Pers. SocialPsychol., vol. 37, no. 11, pp. 2049–2058, Nov. 1979.

[13] Y.L. Tian, T. Kanade and J.F. Cohn, “Recognizing action units for facial expressions analysis,” IEEE Trans. Pattern Analysis Machine Intell.,Vol. 23, No. 2, pp. 97-115, March 2001.]

[14] I. Cohen, N. Sebe, A. Garg, L.S. Chen and T.S. Huang, “Facial expression recognition from video sequences: Temporal and static modeling,” Comput. Vision Image Understanding, Vol. 91, pp. 160-187, August 2003.

[15] Y. Zhang and Q. Ji, “Active and dynamic information fusion for facial expression understanding from image sequences,” IEEE Trans. Pattern Anal. Machine Intelli., vol. 27, pp. 699-714, May 2005.

[16] T. Anderson, A. Handid and M. Pietikainen, “Face description with local binary patterns: Application to face recognition,” IEEE Trans. Pattern Anal. Mach. Intel., Vol. 28, pp. 2037-2041, Dec. 2006.

[17] Z. Yeasin, M. Pantic, G.I. Roisman and T.S. Huang, “A survey of affect recognition methods: Audio, visual and spontaneous expressions,” IEEE Trans. Pattern Anal. Machine Intelli., Vol. 31, pp. 39-58, Jan. 2009.

[18] M. Pantic, M. I. Patras, “Dynamics of facial expression: Recognition of facial actions and their temporal segments from face profile image sequences,” IEEE Trans. Syst. Man Cybernet. Part B: Cybernet., Vol. 36: pp. 433-449, April 2006.

[19] I. Kotsia and I. Pitas, “Facial expression recognition in image sequences using geometric deformation features and support vector machines,”IEEE Trans. Image Process., Vol. 16, pp. 172-187, Jan. 2007.

[20] T. Ahonen, A. Hadid and M. Pietikainen, “Face description with local binary patterns: Application to face recognition”. IEEE Trans. Pattern Anal. Mach. Intel., Vol. 28, pp. 2037-2041, Dec. 2006.

[21] T. Ojala, M. Pietikainen and T. Maenpaa, “Multiresolution Gray-scale and rotation invariant texture classification with local binary patterns,” IEEE Trans. Pattern Analysis Machine Intellig., Vol. 24, pp. 971-987, July 2002.

[22] G. Zhao and M. Pietikainen, “Dynamic texture recognition using local binary patterns with an application to facial expressions,” IEEE Trans.Pattern Anal. Mach. Intell., Vol. 29: pp. 915-928, June 2007.

[23] L. Ma and K. Khorasani, “Facial expression recognition using constructive feed forward neural networks,” IEEE Trans. Syst. Man Cybernet. Part B: Cybernet., Vol. 34, pp. 1588-1595, June 2004.

[24] V. Ojansivu and J. Heikkila, “Blur insensitive texture classification using local phase quantization,” Proceedings of the 3rd International Conference on Image and Signal Processing, July 1-3, 2008, Cherbourg-Octeville, France, pp. 236-243.

[25] S. Yang and B. Bhanu, “Understanding discrete facial expressions in video using an emotion avatar image,” IEEE Trans. Syst. Man Cybern.B Cybern, Vol. 42, pp. 980-992, May 2012.

[26] M. K. Hu, “Visual pattern recognition by moment invariants." Information Theory, IRE Transactions on. Vol. 8, no. 2, pp.179-187, 1962

[27] Y. Zhu, L. C. DE. Silva, C. C. Co, “Using Moment Invariant and HMM for Facial Expression Recognition,” Pattern Recognition Letters Elsevier. Vol. 23, no. 1, pp. 83-91, 2002.

[28] Mohammad Shahidul Islam, Surapong Auwatanamongkol Md. Zahid Hasan,” Boosting Facial Expression Recognition Using LDGP- Local Distinctive Gradient Pattern”, ICEEICT Conference Paper.

[29] M.J.Lyons, M.Kamachi and J.gyoba,The Japanese female facial expression (JAFFE) dataset Available: www.kasrl.org/jaffe_download.html

[30] LIBSVM -- A Library for Support Vector Machines, multi- class classification Online Available: http://www.csie.ntu.edu.tw/~cjlin/libsvm/