Exploring visual features through gabor representations for facial expression detection

(1)

This is the author version published as:

This is the accepted version of this article. To be published as :

This is the author version published as:

Catalogue from Homo Faber 2007

QUT Digital Repository:

http://eprints.qut.edu.au/

Chew, Sien Wei and Lucey, Patrick J. and Sridharan, Sridha and Fookes,

Clinton B. (2010)

Exploring visual features through Gabor representations

for facial expression detection.

In: International Conference on

Auditory-Visual Speech Processing (AVSP 2010), 30 September - 3 October 2010, The

Prince Hakone, Hakone, Kanagawa.

(2)

Exploring Visual Features Through Gabor Representations for Facial

Expression Detection

Sien W. Chew

1

, Patrick Lucey

2

, Sridha Sridharan

1

, Clinton Fookes

1

_{Image and Video Research Laboratory, Queensland University of Technology,}

GPO Box 2424, Brisbane 4001, Australia

2

_{Robotics Institute, Carnegie Mellon University/}

Department of Psychology, University of Pittsburgh, Pittsburgh, USA

[email protected], [email protected], {s.sridharan;c.fookes}@qut.edu.au

Abstract

Gabor representations have been widely used in facial analysis (face recognition, face detection and facial expression detec-tion) due to their biological relevance and computational prop-erties. Two popular Gabor representations used in literature are: 1) Log-Gabor and 2) Gabor energy filters. Even though these representations are somewhat similar, they also have dis-tinct differences as the Log-Gabor filters mimic the simple cells in the visual cortex while the Gabor energy filters emulate the complex cells, which causes subtle differences in the responses. In this paper, we analyze the difference between these two Ga-bor representations and quantify these differences on the task of facial action unit (AU) detection. In our experiments conducted on the Cohn-Kanade dataset, we report an average area under-neath the ROC curve (A`) of 92.60% across 17 AUs for the Ga-bor energy filters, while the Log-GaGa-bor representation achieved an average A` of 96.11%. This result suggests that small spatial differences that the Log-Gabor filters pick up on are more use-ful for AU detection than the differences in contours and edges that the Gabor energy filters extract.

Index Terms: action unit detection, AdaBoost feature selec-tion, Log-Gabor filter, Gabor energy filter

1. Introduction

Facial expressions provide cues about emotion and the cognitive processes associated with them [1, 2]. The decomposition of visible muscular movements in the face using the Facial Action Coding System (FACS) had been the de-facto standard in auto-matic facial expression recognition [3]. An entire hour of cod-ing can translate to just one minute of ground-truthed video [4], hence providing the motivation for the automatic coding of ac-tion units (AUs). However, automatic facial expression recog-nition is still deemed a difficult task plagued by similar difficul-ties experienced in the pattern recognition and face processing communities.

Gabor filters have been frequently applied to facial expres-sion recognition. The superior performance of these filters can be attributed to two positive characteristics: i) its excellent rep-resentation of texture, and ii) its invariance to changing illu-mination conditions. Several instances have been reported on Gabor filters producing good performance [5, 6]. Zhang et al. [7] applied multilayer perceptrons (MLP) on a Gabor fil-terbank, composed of 3 spatial frequencies and 8 orientations, to recognise the seven facial expressions that are shared univer-sally among people [8] (neutral, happiness, sadness, surprise,

anger, disgust and fear). Reduction of the large dimensionality of Gabor features was accomplished through principal compo-nent analysis. An accuracy of 80% was achieved overall, but this improved to 85.6% after data containing fear expressions were omitted.

The work of Bashyal and Venayagamoorthy [9] closely paralleled that of Zhang et al. Both procedures were almost identical; with the only exception being that their system used learning vector quantization (LVQ) instead of MLP. An accu-racy of 90.22% was achieved on the Japanese Female Facial Ex-pression databset [10]. The interesting finding surfacing from these two studies showed that the MLP learning algorithm had difficulty classifying “fear” emotions, but this problem could be circumvented by LVQ. Littlewort et al. [11] were able to achieve a 93% emotion classification accuracy on the Cohn-Kanade dataset using a system based on Gabor filters and sup-port vector machines. Whitehill et al. [12] utilized this same system for the recognition of smiles. A Gabor energy filter-bank (5 spatial frequencies, 8 orientations) was applied to dis-tinguish between smiling and non-smiling faces, which man-aged to achieve an accuracy of 98%.

Across these various works described above, there has been little attention paid to the various types of Gabor representations that can be used. Two of these are: 1) Log-Gabor and 2) Gabor energy filters. Even though these representations are somewhat similar, they also have distinct differences as the Log-Gabor fil-ters mimic the simple cells in the visual cortex while the Gabor energy filters emulate the complex cells, which causes subtle differences in the responses. In this paper, we analyze the dif-ference between these two Gabor representations on the task of facial action unit (AU) detection.

2. Gabor Representations: Log-Gabor vs

Gabor Energy

The primates’ primary visual cortex system (commonly abbre-viated to V1) have been frequently modeled using Gabor fil-ters. Neurons within V1, comprised of simple and complex cells [13], administer visual processing tasks by means of edge and line detection. In an ideal sense, Gabor wavelets provide a sufficient representation of these cells [14, 15, 16]. Simple cells are sensitive to an image’s orientation as well as its posi-tion, whereas complex cells are sensitive only to its orientation. The transfer function of a two-dimensional Gabor filter consists of a Gaussian envelope modulated by a sine carrier,

(3)

input image spatial Log-_{Gabor filter} Log-Gabor_Features

Figure 1: Computation of Log-Gabor features by convolution of the original image with a spatial Log-Gabor filter

g(x, y) = K expn− πha2(x − x0)2r+ b 2 (y − y0)2r io expnj2π(µ0x + ν0y) + P o (1)

where g(x, y) denotes a two-dimensional Gabor filter. The first exponential term in Equation 1 defines the two-dimensional Gaussian envelope, and the second exponential term defines the sinusoid carrier. Constant K determines the scale of the Gaus-sian envelope; the elongation factor of the envelope is manipu-lated by variables a and b. The subscript r stands for a rotation operation of the envelope. Horizontal and vertical spatial fre-quencies of the sinusoid carrier are determined by u0 and v0;

and P denotes the phase of the sinusoid carrier. Upon closer inspection of Equation 1, filtering operations through a Gabor filter may be perceived to be synonymous to that of a spatial bandpass filter. In view of the fact that a single Gabor filter is only capable of representing a single cell in V1, multiple Gabor filters implemented as filterbanks have been common practice. In order to permit the detection of bars from multiple orien-tations and positions, the implementation of Gabor filterbanks becomes imperative.

2.1. Log-Gabor Features

In 1987, Field [17] proposed that the coding of natural images could be improved through the modification of the Gaussian en-velope’s transfer function. The suggestion was to design the Ga-bor filter so that the shape of the envelope was retained on the logarithmic frequency scale instead of on the linear frequency scale, g(f ) = exp − log( f fn)2 2[log( σ f0)]2 ₍₂₎

Extending the limit of the Gaussian envelope from the lin-ear to the logarithmic frequency scale would allow broader spectral ranges to be achieved. In consequence, by the in-verse relationship between frequency and spatial domains, a broader spectral range would facilitate a finer resolution in the spatial domain. In other words, Log-Gabor filters may be de-signed with arbitrary bandwidth, but not Gabor filters (one oc-tave bandwidth limitation). Figure 1 illustrates how Log-Gabor features are computed from the convolution between an image and a Log-Gabor filter.

(a)

(b)

(c)

Real Component

Imaginary Component

Figure 2: Formation of Gabor energy features from Gabor fea-tures and its quadrature through Equation 3. a) Gabor feafea-tures, b) Quadrature Gabor features, c) Gabor energy features

2.2. Gabor Energy Features

Energy mechanisms operate by summing the squares of the out-puts of a quadrature pair [15]. The quadrature counterpart of a linear operator is equivalent to its Hilbert transform. Hence, taking the square-root of the sum of squares between a Gabor feature and a 90-degree phase-shifted version of itself would produce the corresponding Gabor energy feature. The complex cells in V1 can be appropriately represented by these energy mechanisms, for the reason that they do not become modulated by drifting sinusoids [18],

e(x, y) =qg2

0(x, y) + g−2π 2

(x, y) (3)

The advantage of Gabor energy filters over Gabor filters is that the former is able to give a smooth response to an edge or line [19, 20]. This phenomenon can be observed in Fig-ure 2 where the resultant Gabor energy featFig-ures are observed to be sharper around edges (cheek contours and nose contours) as compared to either the Gabor features or from its quadrature.

3. AdaBoost for Feature Selection

Large representations of features inevitably result from the ap-plication of large filterbanks, often necessitating the use of fea-ture selection. This is important as the redundancy of insignifi-cant features would prolong training times of the classifier oth-erwise, and yet offer little performance benefits. In this study, the feature selection capability of the AdaBoost algorithm was adopted.

Boosting [24] is a type of committee method commonly ap-plied to improve the performance of an individual classifier. A multitude of weak learners (also known as weak classifiers) are combined to reweighted versions of the data at each iteration. A common realisation of weak learners is via classification and

(4)

Figure 3: Feature selection - determining how discriminative a feature is. (a): ideal case clearly showing the black curve (more discrim-inative feature) converges to zero quicker than the red curve (less discrimdiscrim-inative feature). (b): practical example where convergence is not so obvious, but still discernable through the curves’ numerical gradients.

regression trees [21]. Improvements in classification can be obtained at the final stage after a weighted voting scheme is ap-plied on the weak learners.

AdaBoost [22] is one of the most widely used form of boosting. In a binary classification problem comprising N data points, a set of training data x1, . . . , xN is prepared to

corre-spond to a set of binary target variables t1, . . . , tNholding only

either the value of +1 or -1. For example, t1assigned a value

of +1 would label x1as a positive instance; and a value of -1

would label x1as a negative instance.

During initialisation, each data point is assigned a weight-ing parameter wnwith a value of _N1. At each iteration, a new

classifier is trained on the data set using weights adjusted ac-cording to the performance of the previously trained classifier. A data point classified incorrectly would have its corresponding weight increased. Hence, this would mean that the weights of the most discriminant features would converge in the least num-ber of iterations. In our experiments, we found ten iterations to be sufficient for the task of feature selection.

In Figure 3, |αt| represents the absolute value of the weight

assigned by AdaBoost to a feature. The figure on the left-hand side of Figure 3 illustrates an ideal case − observe that the ab-solute value of the weights |αt| assigned to the more

discrim-inative feature (denoted by the black curve) converges to zero quicker than the less discriminative feature (denoted by the red curve). Shown on the right-hand side of Figure 3 is a practical example where convergence of |αt| to zero is not so obvious.

However, the features’ discriminative power can still be deter-mined by calculating the numerical gradients of the two curves, which may be represented compactly through the sum of the numerical gradients. Hence, a smaller sum would indicate that a feature is more discriminative than a feature which outputs a larger sum.

The “GML AdaBoost Matlab Toolbox” [23] was modified to perform feature selection using the Gentle AdaBoost algo-rithm [24]. For each of the 64 filter outputs, the 60 most dis-criminant features were selected by AdaBoost. This gave a total of 60 × 8 × 8 = 3840 discriminant features for each image. Im-age dimensions were 80 × 100 pixels, which produced a total of 80 × 100 × 8 × 8 = 512,000 features per image. Hence, only 60

8000 × 100 = 0.75% of the feature space was utilised

as discrimiant features. No significant differences in perfor-mance was observed when the number of discriminant features selected per image was increased ten-fold from 60 to 600.

Fear Expression Neutral Expression

Figure 4: Discriminant Features Computed by AdaBoost. Left: Fear Expression, Right: Neutral Expression, the most discrimi-nant features are highlighted with blue markers

An example of the AdaBoost feature selection algorithm is illustrated in Figure 4. The image on the left-hand side portrays the “fear” expression of a subject, and the image on the right-hand side portrays the “neutral” expression. In this context, the task of feature selection is to identify the features that differ the most between these two images. The markers highlighted in blue represent the top one thousand most discriminant features between these two images.

3.1. Discriminant Features Depict Different Filter Mecha-nisms in Action Unit Detection

Visual inspection of the most discriminant features (between a frame containing an AU, and a frame containing the neutral expression) permitted further insight into the different mecha-nisms by which these two filters operate. AUs 6 and 12 are used here for illustration (Figures 5 and 6). Blue markers indicate the top 60 most discriminant features selected by AdaBoost.

The “cheek-raiser” action unit (AU6) is illustrated in Figure 5. The most discriminant Gabor energy features were located in the region near the left-cheek. This region corresponded to an area where distinct changes in texture and curvature occurred. In the case of Log-Gabor features, however, the locations of the most discriminant features were scattered around the mouth region. This region does not correspond to the same landmarks as the Gabor energy features.

This prompted the hypothesis that simple and complex cells had indeed possessed different mechanisms of operation. Gabor energy features were well-adept at representing macro struc-tural changes as well as variations in texture. On the other hand,

(5)

Activated AU Neutral Expression Original Image Gabor Energy Features Log-Gabor Features Cheek-Raiser

Figure 5: Action Unit 6: Top 60 discriminant features per filter are highlighted with blue markers. Texture and contour varia-tion were easily detected by Gabor energy filters; Log-Gabor filters were able to pick out small differences in spatial domain

Log-Gabor features were particularly good at distinguishing mi-cro structural changes. On this premise, we propose in Section 5 that action units involving large facial structures are best rep-resented using Gabor energy filters, and action units involving smaller facial structures are best represented using Log-Gabor filters.

4. Experimental Setup

In situations where data is scarce, the S-fold cross validation technique is suitable for training classifier models. A spe-cial case of S-fold cross validation is the leave-one-out strat-egy, where the number of folds equals the total number of data points. This strategy was employed in our experiments to train 17 selected action unit models on 97 subjects from the Cohn-Kanade [25] dataset. The action units selected were - AU 1 2 4 5 6 7 9 11 12 15 17 20 23 24 25 26 27. Support vector ma-chines (LibSVM [26] used) with a linear kernel had been used for training. Images were rescaled to a size of 100 × 80 pix-els. The subjects in this dataset ranged from 18 to 30 years of age, 65% were female, 15% African-American and 3% Asian or Latino. Instructions were given to the subjects to perform various facial displays which comprised of both single action units as well as combinations of several action units. The six emotions − joy, surprise, anger, fear, disgust, and sadness − were a part of these displays. All action units displayed by the subjects had been annotated by certified FACS coders.

Further details on the leave-one-out configuration imple-mented in this study ought to be addressed − if AU1 was to be trained on subject S010, then the frames containing the peak AU intensity from all other subjects in the dataset that con-tained AU1 would be used as positive instances. The negative instances were selected from frames that contained the neutral expression of the corresponding positive sequences. The ratio between positive instances to negative instances was held at 1 : 1. This procedure was repeated for every other subject in the dataset, and for each of the 17 selected action units selected. A total of 97 × 17 = 1649 action unit models were trained.

Original Image Gabor Energy Features Log-Gabor Features

Activated AU Neutral Expression

Lip-Corner Puller

Figure 6: Action Unit 12: Top 60 Discriminant Features per filter computed for Log-Gabor and Gabor Energy features. No-tice that in the Gabor energy features, the single action of the ‘lip-corner puller’ created actions in other regions of the face (movements in the nose and eyes).

In the testing stage, all frames belonging to a particular sub-ject were used to evaluate the action unit models. To test the AU1 model of subject S010, all peak frames containing AU1 were chosen to be positive instances, and all neutral frames and peak frames containing other AUs were chosen for negative in-stances.

5. Results and Discussion

We report the performance of our action unit detection sys-tem using a metric derived from receiver-operating character-istic (ROC) curves. ROC curves are generated by plotting the number of true-positives against the number of false-positives, as a decision threshold is varied. The area under the ROC curve (denoted A`) is a reliable performance measure which we used. This area indicates the probability that a positive in-stance selected at random will be ranked higher than a nega-tive instance also selected at random. An upper-bound on the uncertainty of A` was included to approximate the reliability of performance. A statistic commonly used for this purpose is s = q_{min {n}A0(1−A0)

p,nn} where np, nnare the number of positive and negative examples [27]. The system obtained a mean A` of 0.9504, 0.9611 and 0.9260 respectively for pixels-only, Log-Gabor and Log-Gabor energy features.

Results shown in Table 1 indicate that the performances ob-tained from using all three features were very similar (note that small test sample sizes would effect a diminished impact on per-centage differences). The Gabor features did not offer signifi-cant advantages over pixels-only features due to 2 reasons − i) good texture properties and little variation in illumination were already present in the images, ii) the Gabor features were sub-jected to high reduction in dimensionality.

An interesting finding from this study was that Log-Gabor and Gabor energy filters employed contrasting modes of op-eration. From the most discriminant features selected in Fig-ures 5 and 6, we observed that Gabor energy filters detected

(6)

Table 1: Results showing the area under the ROC curve for pixels-only, Log-Gabor and Gabor energy features. N indicates the number of positive examples available.

AU N Pixels-only Log-Gabor Gabor Energy

1 142 0.93±0.02 0.96±0.02 0.91±0.02 2 103 0.93±0.03 0.94±0.02 0.93±0.03 4 148 0.92±0.02 0.92±0.02 0.87±0.03 5 76 0.97±0.02 0.98±0.02 0.94±0.03 6 106 0.99±0.01 0.99±0.01 0.99±0.01 7 105 0.94±0.02 0.95±0.02 0.90±0.03 9 49 0.98±0.02 0.99±0.01 0.95±0.03 11 33 0.97±0.03 0.96±0.03 0.94±0.04 12 91 0.99±0.01 0.99±0.01 0.99±0.01 15 71 0.94±0.03 0.96±0.02 0.90±0.04 17 147 0.91±0.02 0.94±0.02 0.89±0.03 20 67 0.96±0.02 0.95±0.03 0.92±0.03 23 41 0.95±0.03 0.96±0.03 0.91±0.04 24 41 0.95±0.03 0.98±0.02 0.92±0.04 25 285 0.93±0.02 0.97±0.01 0.91±0.02 26 36 0.89±0.05 0.91±0.05 0.89±0.05 27 75 1.00±0.00 1.00±0.00 0.99±0.01 Mean − 0.95±0.02 0.96±0.02 0.92±0.03 Variance − 9.3 6.39 13.78

texture and contour variations. On the other hand, Log-Gabor filters detected small spatial differences. The high resolution of the images enabled finer spatial details of the images to be represented by Log-Gabor filters. As the availability of high-resolution images grows with the advent of modern video and image acquisition equipment, the utilisation of Log-Gabor fil-ters could contribute positively to facial analysis. The ex-periments conducted in this study showed that a reduction of 99.25% ( [1 − 60

100×80] × 100%) in feature space did not

signifi-cantly deteriorate classification performance, thus making these filters a fitting candidate for real-time implementation.

Potential exists for using Gabor energy filters to detect ac-tion units that involve large facial structures with well-defined contours. Results show that action units which involved such facial structures (AU6, AU12 and AU27) were detected best by Gabor energy features, among other action units.The high vari-ance exhibited by these features indicated an inclination to rep-resent certain facial structures better than others. Despite suffer-ing from the curse of dimensionality (which explains the high variance encountered) classification accuracies on these three action units were on par with pixels-only and Log-Gabor fea-tures. This finding corroborates well with that of Whitehill et al. [15]. The system that they had developed for smile detec-tion compared four different types of features. They concluded that a smile, composed of AU6 and AU12, was represented best by Gabor energy filters.

A very practical conundrum was encountered in this study. From Figure 5, it is clear to see that the majority of the dis-criminant Gabor energy features selected represented only the left cheek. The remaining usable information contained in the right cheek was not utilized at all. Instead of benefiting from a larger feature representation, a deterioration in performance was observed. This was the manifestation of the curse of di-mensionality, evident from the deteriorated performance of Ga-bor energy features with respect to the pixels-only features. In-creasing the number of discriminant features selected ten-fold did not produce significant differences. The partial

represen-tation of Gabor energy features, following the feature selection stage, was unable to surpass the full representation of the pixels-only features. Moreover, it should be noted that texture proper-ties of the images in this dataset was already quite good, and any improvements to texture brought about from Gabor energy features would be modest at best. The same number of dis-criminant Log-Gabor features were able to represent more fa-cial structures.

6. Conclusion

This study was conducted to explore different Gabor filter repre-sentations for the detection of action units in facial expressions. The roles that simple and complex cells of the visual cortex system play in the detection of action units were examined. We chose Log-Gabor filters for the representation of simple cells, and Gabor energy filters for the representation of complex cells. A preferred filter-bank consisting 8 spatial frequencies and 8 orientations was designed to gain a closer representation of the cells. The resultant large dimensionality in feature space ne-cessitated feature selection, which was realized through the Ad-aBoost algorithm. The most discriminant features selected by AdaBoost were used to train support vector machines.

The mean accuracy obtained by the system on the Cohn-Kanade posed facial expression dataset was 96.11% and 92.60% respectively for Log-Gabor and Gabor energy filters. Our results suggest that simple cells are slightly more adept than complex cells at detecting action units under optimal conditions (i.e. limited image noise). Performance of the Gabor energy filters was inferior to that of pixels-only features. This phe-nomenon was the result of the curse of dimensionality, in which a larger representation in feature space deteriorates performance instead of bringing about improvement. Another reason given was that the images in the dataset were already of high-quality, leaving little room for improvement in terms of texture repre-sentation.

An interesting outcome from this study was that simple and complex cells had different modes of operation. The simple cells identified small spatial differences, while complex cells identified differences in contours and edges. An alternate inter-pretation could be that simple cells operate by searching an im-age for micro-structures, and complex cells operate by search-ing an image for macro-structures. Another findsearch-ing was that Log-Gabor filters proved well-suited for the analysis of high-resolution images. No significant deterioration in performance was observed even after a 99.25% reduction in feature space, thus making these filters a fitting candidate for real-time appli-cations.

The diverse functionalities of biological cells in the visual cortex system cannot be contained within this single study. Log-Gabor and Log-Gabor energy filters serve, at best, as an approxima-tion to these cells. Images acquired under more realistic condi-tions that have been contaminated by varying illumination and pose conditions, or the analysis of spontaneous facial expres-sions, or utilizing the information embodied in the time evolu-tion of acevolu-tion units; all serve as interesting avenues for further research. Significant effort had been put in by the research com-munity toward understanding these two types of filters. How-ever, there is still much left unknown about these cells in image understanding. The benefit of pursuing a deeper comprehen-sion about the behaviour of these cells could create a hybrid filter drawing on the benefits of both simple and complex cells, and again enabling nature as the conduit for the continual im-provement in machine vision.

(7)

7. Acknowledgments

This research was supported by the Cooperative Research Cen-tre for Advanced Automotive Technology (AutoCRC).

8. References

[1] M. Bartlett, J. Hager, P. Ekman, and T. Sejnowski, “Mea-suring facial expressions by computer image analysis.” Psychophysiology, vol. 36, pp. 253–263, 1999.

[2] M. Yuki, W. Maddux, and T. Masuda, “Are the windows to the soul the same in the east and west? cultural differences in using the eyes and mouth as cues to recognize emotions in japan and the united states,” Journal of Experimental Social Psychology, vol. 43, pp. 303–311, 2007.

[3] M. Bartlett, G. Littlewort, M. Frank, C. Lainscsek, I. Fasel, and J. Movellan, “Automatic recognition of facial actions in spontaneous expressions,” Journal of Multime-dia, vol. 1, no. 6, pp. 22–35, Sep. 2006.

[4] G. Donato, M. Bartlett, J. Hager, P. Ekman, and T. Se-jnowski, “Classifying facial actions,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 10, pp. 974–989, Oct. 1999.

[5] R. Chellappa, C. Wilson, and S. Sirohey, “Human and ma-chine recognition of faces: a survey,” Proceedings Of The IEEE, vol. 83, no. 5, pp. 705–740, May 1995.

[6] M. Bartlett, G. Littlewort, I. Fasel, and J. Movellan, “Real time face detection and facial expression recognition: de-velopment and applications to human computer interac-tion,” in Conference on Computer Vision and Pattern Recognition Workshop, vol. 5, Jun. 2003, pp. 53–53. [7] Z. Zhang, M. Lyons, M. Schuster, and S.

Aka-matsu, “Comparison between geometry-based and gabor wavelets-based facial expression recognition using multi-layer perceptron,” in Proceedings of the Third IEEE In-ternational Conference on Automatic Face and Gesture Recognition, 1998.

[8] P. Ekman, “Emotion in the human face,” Cambridge Uni-versity Press, Tech. Rep., 1982.

[9] S. Bashyal and G. Venayagamoorthy, “Recognition of fa-cial expressions using gabor wavelets and learning vector quantization,” Engineering Applications of Artificial Intel-ligence, vol. 21, pp. 1056–1064, 2008.

[10] M. Lyons, S. Akamatsu, M. Kamachi, and J. Gyoba, “Coding facial expressions with gabor wavelets,” in Pro-ceedings of the Third IEEE International Conference on Automatic Face and Gesture Recognition, 1998.

[11] G. Littlewort, M. Bartlett, I. Fasel, J. Susskind, and J. Movellan, “Dynamics of facial expression extracted au-tomatically from video,” Image and Vision Computing, vol. 24, pp. 615–625, 2006.

[12] J. Whitehill, G. Littlewort, I. Fasel, M. Bartlett, and J. Movellan, “Toward practical smile detection,” IEEE Transactions On Pattern Analysis and Machine Intelli-gence, vol. 31, no. 11, pp. 2106–2111, Nov. 2009. [13] D. Hubel and T. Wiesel, “Receptive fields, binocular

in-teraction and functional architecture in the cats visual cor-tex,” Journal of Physiology, vol. 160, pp. 106–154, 1962. [14] D. Pollen and S. Ronner, “Phase relationship between ad-jacent simple cells in the visual cortex,” Science, vol. 212, pp. 1409–1411, 1981.

[15] E. H. Adelson and J. R. Bergen, “Spatiotemporal energy models for the perception of motion,” Journal Optical So-ciety of America A, vol. 2, pp. 284–299, 1985.

[16] J. P. Jones and L. A. Palmer, “An evaluation of the two-dimensional gabor filter model of simple receptive fields in cat striate cortex,” Journal of Neurophysiology, vol. 58, pp. 1233–1257, 1987.

[17] D. Field, “Relations between the statistics of natural im-ages and the response properties of cortical cells,” J. Opt. Soc. Am. A, vol. 4, no. 12, pp. 2379–2394, Dec. 1987. [18] J. Movellan, “Tutorial on gabor filters.”

[19] S. Grigoresu, N. Pektov, and P. Kruizinga, “Comparison of texture features based on gabor filters,” in IEEE Trans. on Image Processing, vol. 11, no. 10, Oct. 2002. [20] P. Kruizinga and N. Petkov, “Non-linear operator for

oriented texture,” in IEEE Trans. on Image Processing, vol. 8, no. 10, 1999, pp. 1395–1407.

[21] L. Breiman, Classification and Regression Trees, L. Breiman, Ed. Chapman & Hall, 1984.

[22] Y. Freund and R. Schapire, “Experiments with a new boosting algorithm,” in Proceedings of the Thirteenth In-ternational Conference on Machine Learning, Bari, Italy, 1996, pp. 148–156.

[24] J. Friedman, T. Hastie, and R. Tibshirani, “Additive logis-tic regression: A statislogis-tical view of boosting,” The Annals of Statistics, vol. 38, no. 2, pp. 337–374, Apr. 2000. [25] T. Kanade, J. F. Cohn, and Y. Tian, “Comprehensive

database for facial expression analysis,” in Proceedings of the Fourth IEEE International Conference on Automatic Face and Gesture Recognition (FG’00), Grenoble, France, 2000, pp. 46–53.

[26] C.-C. Chang and C.-J. Lin, “Libsvm: a library for support vector machines,” 2001, software available at http://www.csie.ntu.edu.tw/ cjlin/libsvm.

[27] C. Cortes and M. Mohri, “Confidence intervals for the area under the roc curve,” Advances in Neural Informa-tion Processing Systems, 2004.