 Static Human Facial Expression Recognition Using Facial Components Detection And Hog Features

(1)

 Static Human Facial Expression Recognition Using Facial Components Detection And Hog Features

1M.Prakash, ²Dr.S.Pannirselvam

1Ph.D Research Scholar, ²Associate Professor & Head,

1,2Department of Computer Science,

Erode Arts and Science College (Autonomous), Erode, India.

Abstract

An appropriate approach to resolve the limitation of facial expression detection is introduced in this research paper. The system detects nose and ears, hair, brows and mouth components. Considering that facial expressions are a function of muscle or deformations, and that Oriented Gradient (HOG) histograms are very sensitive to things, the researcher prefers to use the HOG for the encryption of those facial parts. A linear SVM is then educated in the classification of face expression. In the JAFFE data set combined with an expanded CHANADE dataset, the investigator seeks to evaluate the expected technique. The average values for both datasets amount to 94.3% and 88.7% respectively.

Our proposed approach demonstrates the competitive accuracy of classification by experimental results.

Keywords : facial component detection, facial expression recognition, HOG features, SVM.

1. Introduction

Human beings can communicate thoughts and desires, along with expression, facial and unwillingness, through some nonverbal approaches. The most effective non-verbal way of speaking with each other is also a facial expression. The identification of facial expressions has received increased attention, as they can therefore be commonly used in various fields, such as detection of deception, medical evaluation and the HCI. Yes, it is widely accepted that computing should shift history, blend into the shape of our ordinary living environments and look at the human consumer in the closer perspective[1]. To achieve this objective, computer vision and machine learning must be developed, while psychological analysis of feeling must be strengthened. Face recognition, however, is an extremely difficult task. A variety of factors, such as lighting, location, deformation and the wild can contribute to the difficulty. In fact, facial expressions are refined muscle movements and their difficulty to detect such slight changes. Facial expressions are long-standing research and have seen improvement in recent decades. A fully common method for assessing facial features is the Facial Action written (FACS), which was projected by Ekman et al. in 1978[ 2] and improved in 2002[ 3].

FACS aims to break down facial expressions into entirely different action groups. Facial expressions can be identified when paired with the action units. Another approach is to recognize face expressions from photos directly.

Two methods, namely look and geometry[ 4], are used in face expression recognition in a direct approach. The Gabor Filters, local Binary Patterns (LBP) textural descriptors represent facial expression choices, are apparently applied. Methods based on geometry are intended to capture the face shapes. A formula with a bunch of confidence points is remembered. The geometry features may be considered for these purposes. A lot of people try to interpret facial expressions. Zhang et al. [5] have examined two different options for the identification of facial presentation, geometry-based features and gabor wavelets.

(2)

The two-layer sensor was applied as the classifier, comparing the output of both applications. The topics of gross to fine classification were described by feng et al.[6] for facial characteristics. The ground floor enclosed the output and the computation of the space between the function vectors and the model vectors. After this, a K-close neighbor classification was used to try and make the final classification in the fine point. In[ 7] Khandait et al. found that facial recognition of the dimensions and height of the facial components was distinct.

Zhang et al., who endorsed the facial components and muscle movements, took the essential distance to take and to identify the face[8]. You extracted 3-d gabor choices, picked "salient" patches and matched patches to establish outstanding distance characteristics. The local Binary Pattern (LBP) Shan and others[9] thought it was a decent texture descriptor and used as facial expressions. They used a Boosted-LBP to pick the key LBP options.

The improved LBP options were used for the training of the SVM and the identification rate was excellent. In this paper we try to incorporate a technique based primarily on the economical look to deal with the facial recognition issue. The system detects the initial facial elements from the face image in a face picture. The HOG is then removed so that these facial sections are encrypted and connected to a single vector. Such functional vectors are a linear SVM.

This work is a little bit like the work in[ 10] previously. Nevertheless, some gaps do remain between the current work and the previous work. The research previously used attribute descriptors across the entire face and together with HOG, LBP and LTP explored completely different possibilities. Our work looked at the facial sections and used HOG's facial element feature descriptors. In the previous work, facial recognition with recording errors was targeted. The research proposed focuses on the face elements that lead to the understanding of face expression.

SVM Classification

Eye Detection

Face Detection Components Detection

Components Extraction HOG Encode

HOG Encode

Figure. 1. Overview of the ProposedSystem

(3)

2.PROPOSED METHOD

There are three blocks in the planned structure. Face detection and facial extraction is the primary function. The second performing block is to encode these components using HOG. The last block of results is SVM grouping. The scheme of our expected system of facial recognition is shown in Fig.1.

A. Face Detection and Facial Components Extraction

This section begins with the use of the Viola-Jones face detector for face detection[11]. When the face is obtained, the brows, eyebrows, nose and mouth must be removed from the face. We appear to understand the initial eyes and remove the alternative components which are assisted by their relative location. The face shots of the details we are going to use are all of the front views and the brows are above the ears. They appear to enlarge the observed areas of the eye to include the brows. As regards the nose and mouth, we all know that they are under the eyes; finding the area of the nose and mouth is not hard.

B. Histogram of Oriented Gradients Features

A wide range of characteristics are designed for face expressions recognition along with SIFT[12], Gabor filters[13], Local Binary Patterns[14] and HOG (Histogram of Oriented Gradient)[15]. Facial expressions arise from muscle movements and can be seen as a type of deformation. For example, the mouth's muscle movements open or close the mouth and cause brows to rise or decrease. These are the same movements as deformations. Considering that HOG choices are very prone to the deformations of objects.

In this post, we prefer to use the options of HOG to encrypt facial components. Dalal and Triggs originally planned the HOG in 2005[15]. Computer vision community is well received and widely used in various object detection applications, especially in pedestrian detection. During an image patch, HOG numerizes the presence of gradient orientation. The idea is that the distribution of the strength and orientation of the local gradient could define the look and shape local object[15].

In addition, HOG is teribly beneficial for face expression detection in contrast to completely different options such as LBP and Gabor filters. HOG may classify the shapes of facial expressions of the necessary components. Therefore, we are inclined to use the HOG to encrypt these facial parts. In our experiments we are inclined to set the cell size to 8x8, the bin size to 9, the focus to 0 to 180.

C. Support Vector Machine

Support Vector Support (SVM) was commonly used for many pattern recognition tasks. It is believed that SVM will perform an efficient division between classes in detail. In our research, we continue to train SVMs to mistreat the characteristics that we tend to project in face expression. In general, SVM creates a high-dimension housing hyperplane.

The gap between the hyper-flugzeug and therefore the coaching expertise of any group is that bigger, is reached when complete division. Given a collection of samples labeled:

A SVM tries to find a hyperplane to distinguish the samples with the smallest errors.

For a input vector Xi, the classification is achieved by computing the space from the input vector to the hyperplane. The initial SVM may be a binary classifier. However, we will take the one-against-rest strategy to perform the multi-class classification. We have a tendency to use the LISVM in our experiments [16].

(4)

3. EXPERIMENTALRESULTSANDDISCUSSION

We prefer to use two commonly adopted datasets to determine the achievements of our intended approach; the Japanese Female Facial Expression JAFFE (Japanese Female Facial Expression Data Base)[16 ] and the Extended Cohn-Kanade Dataset[18 ].

A. JAFFE Database

A total of 213 images are found in this collection. Each topic consists of ten subjects and seven facial expressions. Growing subject has 20 films and each word involves a pair of to a few films. The seven expressions are angry, happy, disgust, sadness, surprise, fear and neutral severally. The seven expressions of a subject are shown in Fig.2.

Pictures are 256 x 256 during this experiment. We tend to change dimensions in 156 x156 until we deed the facial region from the facial image. Then we are likely to use these methods to remove and change the sections of the face to the same scale.

The measurements of the eyebrushes in our experiments are 52x106, the spatial property of the HOG encoded property is 1x2160. The nose-mouth ratio is 78 x104 and the corresponding HOG-encoded characteristic is also 1x3456. We tend to link the two characteristic vectors to one. The last feature can be a vector of 1x5616.

We tend to adopt a sample-out method to test our technology and equate it with other strategies.

During this process, database with ten subjects, each has some photos. There are ten subjects during this database. Every subject contains a few pictures. From every cluster, we have a tendency to every which way choose 2 or 3 pictures because the check data set and also the remaining pictures as the training set. At last, there are twenty three pictures within the check set and 190 pictures represent the coaching set. The results are shown in Table 1.

Table 1 : Classification Results for JAFFE Dataset

Method Classificati on Rate

Gabor + FSLP 90.2%

LBP 87.5%

Patch based

Gabor 91.6%

Neutral Sad Anger Disgust Fear Happy Surprise

Figure.2. Seven Expressions of One Subject

(5)

Proposed

Method 92.3%

In[ 9] local binary patterns were used to represent the facial expression and the Adaboost was used to select the best features. The rate of popular classification was 89.1%. In 19 the filtered images were combined with 18 gabor filters in order to cause them, and as characteristic vectors only the amplitudes of the selected points of fiduciary were used. They checked entirely different scores, and the best performance for 91 was therefore. In[ 8], Zhang et al. took the most striking distance decisions to understand face expression.

C. The Extended Cohn-Kanade Dataset

The dataset consists of 123 topics and 593 sets. During this dataset there are seven words and neutral. The seven expressions are angry, happy, sad, surprise, contempt, fear, and disgust. The eight expressions are shown in fig.3 with each of a particular subject. Of the 593 sets, 327 have marks in their entirety. As a result of the sample image, we are trying to use the highest frame of each labeled sequence. Table II shows the frequency of each expression.

Table 2 : Frequency of each Expression in the Extended Cohn-Kanade Dataset

Expression Frequency

Angry 42

Contempt 20

Disgust 62

Happy 26

Surprise 72

Sad 30

Fear 88

Figure.3 The Eight Expressions for different subjects in CK+ Dataset

Remember that the neuronal output of the experiments is omitted. In our JAFFE dataset studies, we tend to follow the same strategy. The image's first dimension is 640x490. We can detect the initial face and change the face size to 256x256. When we try to get the region of the nose, the eyes can be realized and the faces are removed. The eyebrow's total scale is 74x150 and therefore 130x128 nose-mouth Down sampling of the facial components collected is used to scale the spatial property before the HOG

(6)

can be used. Ultimately, the HOG eye-brow half encoded choices are one to a 864 vector, including the HOG nose-mouth part encoded options are 1x1764 vector. The last function could be a vector 1x2628.

We tend to divide images into two sets in the present experiment. The first thing is that the training is set up and that the test is conducted jointly. Every way chosen for the test set relates to fifth pictures of each cluster. The other images represent the training package. Eventually, 59 samples for the inspection and 268 for the training are included. We appear to recur ten times the method and establish a specific classification rate in order to reduce the randomness effect. We appear to finally reach an average rating of 88.7 with a variation of ±2.3 percent.

We are tending to follow the basic technique and adopt the strategy of leave-and-one subjects in order to calculate the classification rate of each expression. This technique means that each subject can be evaluated once. 118 subjects are there. Every time, pictures are chosen for expression of 1 subject and thus for preparation, images of the opposite subjects are used. We are 118 times regular and the average is common. Table 3 shows the results. The diagonal values are the levels of effect. We tend to notice that the word "contempt" has the lowest impact rates. This expression may be easy to mix with other words. Superior levels are "surprise," "disgust" and "happy" words. The three kinds of words are distinct from the same.

Table 3 : Confusion Matrix of the Expressions

AN CO DI FE HA SA SU AN 0.81 0.03 0.06 0.01 0.01 0.00 0.03 CO 0.09 0.61 0.01 0.09 0.13 0.13 0.00 DI 0.03 0.00 0.90 0.00 0.04 0.00 0.01 FE 0.05 0.02 0.00 0.68 0.12 0.01 0.05 HA 0.01 0.02 0.01 0.00 0.94 0.00 0.00 SA 0.06 0.03 0.00 0.03 0.01 0.79 0.03 SU 0.00 0.02 0.00 0.00 0.00 0.00 0.97

We also compare our methodology with the results of three basic strategies. The specific forms in which entirely different functions are used: SPTS, CAPP and SPTS + CAPP.

The effect rate is almost 40% increased. The hit rates of SPTS and CAPP are greater than that of our system; the hit levels of the expressions anger and "fear" are less than our method, however. Our approach achieves good performance in tougher conditions compared to the basic ways. Remember that because the relation in our scheme is not used to use neutral faces.

Table 4 : Classification rates for each expression in different methods

SPTS CAPP

SPTS + CAPP

Proposed Method

AN 0.30 0.81 0.90 0.92

(7)

CO 0.29 0.25 0.81 0.81

DI 0.62 0.98 0.91 0.99

FE 0.29 0.23 0.70 0.73

HA 1.00 0.92 0.99 1.00

SA 0.35 0.65 0.69 0.84

SU 1.0 0.98 1.0 1.0

From Table Ⅳ, we can see that the performance of our method is much better than SPTS and CAPP, especially for the expression “contempt”.

4. CONCLUSION

This paper aims to suggest an important approach for dealing with the problem of facial recognition.

We tend to find and remove the facial parts of our face rather than manipulate the whole face. Facial expressions are triggered by face muscle movements and the HOG features, which are sensitive to object shapes, are represented in such movements or refined changes. For training a linear SVM, encoded options are used. Two database test results, JAFFE and the extended dataset from Cohn-Kanade, show that this is a good performance for our planned methods. The level of classification of our methods on both datasets range between 94.3% and 88.7±2.3%. Recognition of facial features may be an extremely difficult disadvantage. There should be much effort to improve the efficiency of the classification for the appropriate applications. Our future work will focus on increasing the strategy's success in the wild and on the numbers of distilled words like "contempt."

REFERENCES

1. Z. Zeng, M. Pantic, G. I. Roisman, and T. S. Huang, "A Survey of Affect Recognition Methods Audio, Visual, and Spontaneous Expressions," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 31, pp. 39-58, 2009.

2. P. Ekman and W. V. Friesen, "Facial Action Coding System: A Technique for the Measurement of Facial Movement," Consulting Psychologists Press, 1978.

3. P. Ekman, W. V. Friesen, and J. C. Hager, "Facial Action Coding System: The Manual on CD ROM.

A Human Face," 2002.

4. S. Z. Li and A. K. Jain, Handbook of face recognition: springer, 2011.

5. Z. Zhang, M. Lyons, M. Schuster, and S. Akamatsu, "Comparison between geometry-based and Gaborwavelets-based facial expression recognition using multi-layer perceptron," in Automatic Face and Gesture Recognition Proceedings. Third IEEE International Conference on, 1998, pp.

454-459.

6. X. Feng, A. Hadid, and M. Pietikäinen, "A coarse-tofine classification scheme for facial expression recognition," in Image Analysis and Recognition, ed: Springer, 2004, pp. 668-675.

7. S. Khandait, R. C. Thool, and P. Khandait, "Automatic facial feature extraction and expression recognition based on neural network," International Journal of Advanced Computer Science and Applications, vol. 2, pp. 113-118, 2012.

8. L. Zhang and D. Tjondronegoro, "Facial expression recognition using facial movement features,"

Affective Computing, IEEE Transactions on, vol. 2, pp. 219-229, 2011.

9. C. Shan, S. Gong, and P. W. McOwan, "Facial expression recognition based on Local Binary Patterns: A comprehensive study," Image and Vision Computing, vol. 27, pp. 803-816, 2009.

(8)

10. T. Gritti, C. Shan, V. Jeanne, and R. Braspenning, "Local features based facial expression recognition with face registration errors," in Automatic Face & Gesture Recognition, 2008. FG'08.

8th IEEE International Conference on, 2008, pp. 1-8.

11. P. Viola and M. Jones, "Robust Real-Time Face Detection," International journal of computer vision, vol. 57, pp. 137-154, 2004.

12. D. G. Lowe, "Distinctive Image Features from Scale-Invariant Keypoints," International journal of computer vision, vol. 60, pp. 91-110, 2004.

13. H. G. Feichtinger and T. Strohmer, Gabor analysis and algorithms: Theory and applications:

Springer, 1998.

14. T. Ojala, M. Pietikainen, and T. Maenpaa, "Multiresolution gray-scale and rotation invariant texture classification with local binary patterns," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 24, pp. 971-987, 2002.

15. N. Dalal and B. Triggs, "Histograms of Oriented Gradients for Human Detection," in Computer Vision and Pattern Recognition, 2005. IEEE Conference on, 2005, pp. 886-893.

16. C.-C. Chang and C.-J. Lin, "LIBSVM: a library for support vector machines," ACM Transactions on Intelligent Systems and Technology (TIST), vol. 2, 2011.

17. M. Lyons, S. Akamatsu, M. Kamachi, and J. Gyoba, "Coding Facial Expressions with Gabor Wavelets," in Automatic Face and Gesture Recognition, Proceedings. Third IEEE International Conference on, 1998, pp.200-205.

18. P. Lucey, J. F. Cohn, T. Kanade, J. Saragih, Z.Ambadar, and I. Matthews, "The Extended Cohn-Kanade Dataset (CK+)_ A complete dataset for action unit and emotion-specified expression," in Computer Vision and Pattern Recognition Workshops (CVPRW), 2010 IEEE Computer Society Conference on, 2010, pp.94-101.

19. G. Guo and C. R. Dyer, "Learning from examples in the small sample case: face expression recognition," Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on, vol. 35, pp. 477-488, 2005.