Facial Expression Recognition

(1)

International Journal of Emerging Technology and Advanced Engineering

Website: www.ijetae.com (ISSN 2250-2459,ISO 9001:2008 Certified Journal, Volume 4, Issue 5, May 2014)

597

Facial Expression Recognition

Yash Arya

1

, Prathamesh Shinde

2

, Sneha Chandwani

3

, Naveena Chandwani

4

, M. Mani Roja

5 1,2,3,4_{Students, Thadomal Shahani Engineering College, Mumbai, India.}

5_{Associate Professor, Thadomal Shahani Engineering College, Mumbai, India.}

Abstract— Research by the eminent Psychologist Paul Ekman in the field of expression of emotions in humans tells us that emotions are biologically determined and universal to human culture. However, research has shown conclusively that even when the subjects try to conceal their emotions, they leak out for a minuscule amount of time in the form of "microexpressions". A Microexpression is an involuntary facial expression that is displayed on the face of humans depending upon their emotional state. Microexpressions usually last a very short span of time. On the other hand, “Macroexpressions” are the expressions we see in our daily interactions. Since they are not suppressed, they last longer. The aim of this paper is to develop a system that will help us recognize macroexpressions in humans.

Keywords— Image Processing, Facial Expression, PCA, DCT

I. INTRODUCTION

For a great part of human history, emotions in humans were considered to be purely subjective and completely culture dependent. Before understanding the role and development of emotions in humans, however, it is necessary to understand a few things about human psychology and innate human nature. It was considered to be common knowledge that the minds of human beings are a tabula rasa (literally: blank slate) at birth, and can be moulded completely one way or the other by parents, elders and society. The questions of human development were treated in essence as questions of ―Nature vs. Nurture.‖ This attitude was expressed explicitly by the Jesuit motto "Give me a child until he is seven and I will give you the man". Are human beings blank slates upon birth? Is the personality and development associated with an individual only because of the way he is nurtured, or does it have something to do with innate human nature? The issue of whether emotions are culture-bound and subjective or innate and universal is thus, a critical one. This is because: If emotions are universal and innate, then human nature is not tabula rasa. If human nature is not tabula rasa, then there is a balance between how much the forces of nurture and nature shape our development and personalities.

II. UNIVERSALITY OF EMOTIONS

In 1966, psychologists E. A. Haggard and K. S. Isaacs – while seeing hours of psychotherapy footage as a part of their experiment – noticed small, involuntary non-verbal cues being exchanged between the therapist and patient. They noticed that these cues or expressions remained constant in all of the footage they monitored. They called these involuntary non-verbal cues as ―micro momentary expressions‖. A few years later, building on this earlier work and observing these same behaviours, Paul Ekman coined the term "Microexpressions" while he was studying deception [1]. A Microexpression is a brief, involuntary facial expression shown on the face of humans according to emotions experienced. They usually occur in high-stakes situations, where people have something to lose or gain. This discovery of Microexpressions proved beyond doubt that the expression of emotions in humans is universal and culture-independent. Unlike regular facial expressions, it is difficult to hide Microexpression reactions. These Microexpressions usually last for around 1/25th to 1/5th of a second, and so detecting them with the naked eye is next to impossible for laymen.

The aim of the paper is to implement a system that recognizes facial expressions from images. Neuroscientists and Behavioural Psychologists have identified seven universal emotions that are expressed in all humans, irrespective of their race, religion, caste, colour, sex, or creed. These expressions are innate to human nature, and each of them has a special significance. These emotions include: disgust, anger, fear, sadness, happiness, surprise, and contempt [2].From these seven, a database comprising of the following five expressions was created: disgust, anger, sadness, happiness and a neutral face. However, the system can be extended to include all the seven expressions without any modification to the algorithm. This can be done by extending the scope of the original database to all the seven emotions.

III. PROBLEM FORMULATION

(2)

International Journal of Emerging Technology and Advanced Engineering

598 Considering the vital role emotions play in our lives, it is of great importance to develop a system that can objectively detect and monitor them. It may often not be possible to deploy human personnel to monitor the expression of emotions in public areas where it is, in fact, most essential. For example, it may be extremely important and beneficial to monitor a person expressing extreme rage or hatred along with markers for contempt or anger in a public subway. Another situation where detecting such expressions is crucial could be a public meeting or election rally. Many famous assassinations in history could have been avoided had the proper expression been identified at the proper time. The assassination of J.F Kennedy could have been avoided had the expression of contempt or hatred been identified on the face of John Wilkes Booth. In the private domains, business meetings can be recorded (with the consent of all the parties involved) and analysed for Microexpressions of deception. Identifying deception before it‘s too late to avoid damage can help a company save a huge amount of money. A similar application is possible in the court of law.

Here, of course, it is of crucial importance to understand and anticipate the occurrence of false-positive results and false-associations of emotions to expressions. Other independent systems can be used to assess the truth value of a suspect‘s or witnesses‘ testimony. These systems may include polygraph testing, voice-stress analysers, functional MRI, etc. While each of these individual systems has its own drawback, their combination along with facial expressions analysis provides a pretty accurate (albeit not always definitive) indication of honesty or deception [3-4]. Syllogistically, the three premises established so far serve as necessary (but not sufficient) conditions for the need of a facial expression analyser system. The three premises are:

 Expression of emotions in humans is an important and

minute-to-minute occurring phenomenon.

 Since emotions are universal and provide a window into a person‘s inner consciousness, it is important to monitor them in high-population areas and events. Even with specialized training it may not be practically possible to deploy a human workforce for this task.

 Therefore, it is necessary to develop systems that help detect and measure facial expressions in humans.

IV. IMAGE TRANSFORMS

As the first step in the implementation procedure, several Discrete Image Transforms were reviewed.

Among others were sinusoidal transforms such as such as Discrete Fourier Transform, Discrete Cosine Transform, Discrete Sine Transform, Hartley Transform, and several non-sinusoidal transforms such as Walsh-Hadamard Transform, Fast Hadamard Transform, Haar Transform, Modified Haar Transform, Slant Transform and the Karhunen-Loeve Transform or Hotelling Transform (Principal Component Analysis). From these, Principal Component Analysis [5] and Discrete Cosine Transform were selected.

A. Principal Component Analysis

[image:2.612.369.526.435.524.2]

The K-L Transform or Hotelling transform is also called as Eigenvector Transform. The K-L Transform is based on statistical properties of the image and has several important properties that make it useful for image processing. The K-L Transform performs the task of decorrelating the data in an image. This facilitates high degrees of image compression. Also, because of the high energy compaction properties, it is suitable to be used for image analysis. The unique feature about Principal Component Analysis is that unlike other transforms, the Transform matrix of PCA is not fixed. Rather, it is constructed based on the image under consideration. In other words, the basis set of PCA is image dependent.

Figure 1: The PCA transformation

Figure 1 gives a geometric illustration of the PCA process [6] in two dimensions. Using all the data points we find the mean values of the variables (µx1, µx2) and the

covariance matrix is a 2 x 2 matrix in this case. If we calculate the eigenvectors of the co-variance matrix we get the direction vectors indicated by Φ1 and Φ2. Putting the

two eigenvectors as columns in the matrix Φ = [Φ1, Φ2], we

create a transformation matrix which takes our data points from the [x1, x2] axis system to the axis [Φ1, Φ2] system

with the equation:

(3)

International Journal of Emerging Technology and Advanced Engineering

599 Where px is any point in the [x1, x2] axis system, µx =

(µx1,µx2) is the data mean, and pΦ is the coordinate of the

point in the [Φ1, Φ2] axis system. Up until recently, the K-L Transform was seldom used in practice. The reason being that since the basis set is image dependent, the amount of computation required to generate the transform matrix and thereafter compute the transformed image is tremendous. However, with increased development in the field of Computer Architecture and with stronger and more powerful processors coming in the market every day, it is now feasible to comfortably use the K-L Transform and reap its benefits.

B. Discrete Cosine Transform

The Cosine Transform was published by N. Ahmed, T. Natrajan and K. R. Rao and has found applications in image compression, face detection and other areas of image processing [7].Just as the Fourier Transform uses sines and cosines to represent a signal, the DCT uses only cosine waves. Hence the DCT is purely real unlike the DFT which is complex. For image processing applications, a 2-Dimensional DCT Transform is utilized. The basis set of DCT is not image dependent, and so the computational time required is low. Thus, DCT is a fast transform that provides excellent energy compaction.The expression for 2D-DCT is given as

) 2 ) 1 2 ( cos( ) 2 ) 1 2 ( ( cos ) , ( ) ( * ) ( ) , ( 1 0 1 0 N v y M u x y x s v c u c v u S M x N y     



    (2)

Where, S(u,v) is the DCT transform of the image s(x,y) and otherwise v for N v c and u for M u c 1 0 2 ) ( 0 2 ) (     

V. PROBLEM FORMULATION

After being acquainted with the core concepts of Image Processing, reviewing Discrete Image Transform, and going through previously employed approaches [6-7], the roadmap of implementation procedure was finally ready.

A. Database Creation

The first step in the implementation procedure was to create a database of images with expressions of happiness, anger, disgust, sadness and neutral. In creating the database, the following thumb rules were followed as closely as possible:

 For any given emotion, an entire spectrum of

expressions must be captured. For example, everything from a faint smile to an outburst of laughter should be encapsulated under ―happiness‖.  Every image should have a clear, distinct facial

expression that corresponds to only one emotion. Expressions corresponding to multiple emotions should be avoided.

 The distance of the camera from the face, the ambient

brightness level and other parameters must be kept constant.

Adhering to thumb-rule 1 was relatively easy, but in retrospect it has been fairly difficult to thoroughly satisfy thumb-rule 2. Partial success has been achieved in fulfilling this criterion. Thumb-rule 3 was also satisfied to a great extent by using auto colour correction techniques [9] present in software such as IrfanView or Microsoft Office Picture Manager. While there are limitations to the database so developed, it is by no means impossible in principle and practice to create a database that completely satisfies the above three thumb-rules.

[image:3.612.326.577.489.617.2]

A total of 245 images were captured for this purpose. The images whose expressions are to be determined, referred in this paper as ‗Test Images‘ are then matched to these ‗Training images‘ by the algorithm. The number of these ‗Test Images‘ captured solely for Testing is 70. However to test the algorithm, sometimes some Train images were also used as Test images. Here is a sample of the database captured. One image of each of the expressions included in the study is shown in the sample.

(4)

International Journal of Emerging Technology and Advanced Engineering

600 Figure 3: Block Diagram for the Proposed Solution

A. Face Detection

[image:4.612.383.507.134.211.2]

After creating the database, the next is to detect the face and crop it from the remaining background using Image Segmentation techniques. All the images are now converted to grey-scale and the colour component has been dropped.

Figure 3: Image 1 and Corresponding Detected Face

When the Image 1 was ‗Face Detected‘, a grey scale image with only the face component was obtained. Similarly all the other images were also ‗Face Detected‘. The detected faces were then stored into a separate directory and the results were observed. It was found that some of the faces in the Images were not detected as expected due to boundary detection errors. This was because the lighting of the room wasn‘t uniform throughout the photography session. An example is shown below.

Figure 4: Image 2 and Corresponding Detected Face

[image:4.612.94.235.149.408.2]

When Image 2 was ‗Face Detected‘, the algorithm did not obtain the results as expected. Some part of the background still remained after the detection as seen in the fig. 5. To solve the boundary resolution errors, a procedure known as Brightness Correction was used. All the images were now brightness corrected and a separate corrected database was formed.

Figure 5: Brightness Corrected Image and Detected Face

It was found that, once Brightness Corrected, all images which earlier had boundary detection errors, could now be face-detected properly. As shown, the Image 2 which could not be detected earlier was detected properly using Brightness Correction.

C. Transform and Euclidean Distance Algorithm

Before starting with the implementation algorithm it must be recalled that there are two sets of images named as ‗Training Images‘ and ‗Test Images‘. These images are now stored in two different directories namely Train and Test on which the expression recognition algorithm works. The train images are used as a reference to match the images from test to any of the expression. In this procedure PCA was performed on train images followed by a 2D-DCT on the principal components obtained before. Given below is the implementation algorithm for PCA and PCA cascaded with DCT based technique.

 From the ‗Training‘ images, the faces are extracted as described earlier.

 All the training images are now arranged into rows and

[image:4.612.378.513.315.390.2] [image:4.612.105.231.515.597.2]

(5)

International Journal of Emerging Technology and Advanced Engineering

601  PCA is performed on this resultant train image.

 DCT is then performed on the principle components

computed in the previous step.

64 DCT coefficients are taken and the rest are discarded.

 The above procedure is then repeated for test images. A

resultant test image, containing all Test images as rows, is formed on which PCA and DCT are performed.  Now, every row from the resultant test image is now

picked up and Euclidean distance from every row in the resultant train image is calculated. The image corresponding to the row in the resultant train having the least Euclidean distance determines the expression of the corresponding TEST image [10-11].

VI. RESULTS AND ANALYSIS

After the execution of several well-thought cases, the results were analysed based on several criterions. To remove any possible subjective human bias in evaluating the results, any given case has been evaluated by a minimum of three human observers. The mismatch in expression recognition has been broadly classified into two categories.

 Mismatch due to Inefficiency of Algorithm. (MIA)

 Mismatch due to Inefficiency of Database. (MID)

While efforts have been made to keep the database as efficient as possible, there are several inefficiencies that need to be acknowledged. There have been times when the classification of images into respective emotions has been less than optimal. For example, due to subjective differences, sometimes an image classified as angry might perhaps better be classified as disgust, and so on. Sometimes, the images do not display a single isolated emotion and instead display a combination of two or more emotions. Such errors are classified under MID. The errors that are clearly due to the algorithm used are classified as MIA. From the point of view of developing a professional system, a group of experts trained in body language and facial expressions recognition shall certainly come up with a database lot more efficient than our database. If this is done, the efficiency of any facial-expression analyzer system is bound to increase many folds.

B. Sample Images and Their Nearest Matches

In this section, some ‗Test Images‘ and their nearest matches as identified by the algorithm are shown. This gives us a direct interpretation of the working and the accuracy of expression recognition of the algorithm.

Expression 1: Happy

[image:5.612.346.540.189.266.2]

The Test Image 1 is classified as ‗Happy‘. It is correctly identified as ‗Happy‘ by the algorithm. Test Image 1 and its best match are shown in fig. 7

Figure 6: Test Image 1 and its Best Match

Expression 2: Disgust

[image:5.612.339.554.332.414.2]

Test Image 2 is classified as ‗Disgust‘. It is correctly identified as ‗Disgust‘ by the algorithm. Test Image 2 and its best matched image are shown in Fig 8.

Expression 3: Anger

Test Image 3 is classified as ‗Anger‘. It is correctly identified as ‗Anger‘ by the algorithm. Test image 3 and its best matched image are shown in Fig 9.

Expression 4: Sad

Test Image 4 is classified as ‗Sad‘. It is incorrectly identified as ‗Disgust‘ by the algorithm. Test Image 4 and its best matched image are shown in Fig 10.

[image:5.612.351.537.473.549.2] [image:5.612.335.553.616.696.2]

(6)

International Journal of Emerging Technology and Advanced Engineering

602 After executing and analyzing several instances of the algorithm, the results were listed in table 1 and were arranged expression-wise.

Table 1

EXPRESSION-WISE RECOGNITION RATES

Expression %Accuracy with PCA

% Accuracy with PCA & DCT

Happy 90.32 80.5

Disgust 61.18 81.25

Anger 75.47 70

Sad 64.81 80.3

Neutral 88.89 77.27 Average 76.134 78

As seen in table 1, the recognition rate for PCA is very high for the ‗Happy‘ and ‗Neutral‘ expressions. However it is very low for the ‗Disgust‘ and ‗Sad‘ expressions. But the recognition rate for algorithm when DCT is cascaded with PCA, the recognition rate becomes more or less constant at around 80 %. Also as seen, the average recognition rate when DCT is used alongside PCA is slightly better than the PCA alone. Thus the PCA with DCT approach is more reliable than PCA alone. The recognition rate may further be improved in the future by using the standard average-face size approach [12]. However that is not in the scope of this paper.

VII. CONCLUSION

Thus, a system that has a significantly good efficiency in facial-expression recognition was developed. MATLAB 2009 was used for carrying out the tests. The processor speed was 2.8 GHz and the average time taken for one image to load on this processor was 0.433 seconds. The average time taken for one expression to be detected however depends on the number of ‗train‘ images included. The recognition rate obtained with the algorithm implementing PCA was 76.134 %. When PCA cascaded was with DCT, the recognition rate achieved was 78%.

Given the observation that recognition rate increases when transforms are cascaded, it may further be improved using the Discrete Wavelet Transform (DWT). The recognition rate may also increase if the standard average-face size approach is used [12]. However that hasn‘t been included in the study. Given the immense importance and urgency of the various applications of such a system, further research in this area is imminent.

REFERENCES

[1] Paul Ekman, ―Telling Lies: Clues to Deceit in the Marketplace, Politics, and Marriage‖ (Revised and Updated Edition). W. W. Norton & Company, 2001.

[2] Sridhar Godavarthy, ―Microexpression spotting in video using optical strain.‖Graduate School Theses and Dissertations, University of South Florida.

[3] How Stuff Works Website, http://www.howstuffworks.com/lie-detector1.html/, Dated 21 April 2014.

[4] Applied Communication Services, Inc. http://www.studioacs.com/acs-dark/portfolio-3s.htmlDated 10 April 2013.

[5] Andrew J. Calder, A. Mike Burton, Paul Miller, Andrew W. Young, Shigeru Akamatsu, ―A principal component analysis of facial expressions‖, Journal of Vision Research, 41 (2001) 1179–1208. [6] The Duncan Fyfe Gillies(DFG) Website,

http://www.doc.ic.ac.uk/~dfg/ProbabilisticInference/, Dated 28 April 2014.

[7] N. Ahmed, T. Natarajan and K. R. Rao, ―Discrete Cosine Transform‖, IEEE Transaction son Computers, January 1974. [8] Andrew B Watson, ―Image Compression using Discrete Cosine

Transform‖, Mathematica Journal, 4(1), 1994, p. 81-88.

[9] Ankit Aggarwal, R.S. Chauhan and Kamaljeet Kaur, ―An Adaptive Image Enhancement Technique Preserving Brightness Level using Gamma Correction‖, Advance in Electronic and Electric Engineering. ISSN 2231-1297, Volume 3, Number 9 (2013), pp. 1097-1108

[10] Mathew A. Turk and Alex P. Pentland, ―Face Recognition using Eigen Faces‖, Vision and Modeling Group, The Media Laboratory, Massachusetts Institute of Technology.

[11] Iftekhar Tanveer, Program ―Eigen face based Facial Expression Classification‖, MATLAB File Exchange.