2.8 Current Benchmark Datasets
2.8.1 Polikovsky Dataset
One of the first micro-expression datasets was created by Polikovsky et al. [5]. The participants were 10 university students in a laboratory setting and their faces were recorded at 200 fps with a resolution of 640×480. The demographic was reasonably spread but limited in number with 5 Asians, 4 Caucasians and 1 Indian participant.
The laboratory setting was set up to maximise the focus on the face, and followed the recommendations of [114]. To reduce shadowing, lights were placed
above, to the left and right of the participant. The background consisted of a uniform colour of approximately 18% grey. The camera was also rotated 90 degrees to increase the pixels available for face acquisition.
The micro-expressions in this dataset were posed by participants whom were asked to perform the 7 basic emotions. Posed facial expressions have been found to have significant differences to spontaneous expressions [115], therefore the micro- expressions in this dataset are not representative of natural human behaviour and highlights the requirement for expressions induced naturally. Further, this dataset is not publicly available for further study.
2.8.2
USF-HD
A similar dataset was created named USF-HD [6] and includes 100 posed micro- expressions recorded at 29.7fps. The participants were shown various micro-facial expressions and told to replicate them in any order they wished. As with the previously described dataset, posed micro-expressions do not re-create a real-world scenario and replicating other people’s micro-expressions does not represent how these movements would be presented by the participants themselves.
Recording at almost 30 fps can risk losing important information about the movements. In addition, this dataset defined micro-expressions as no higher than 660 ms, which is longer than the previously accepted definitions. Moreover, the categories for micro-expressions are smile, surprise, anger and sad, which is reduced from the 7 universal expressions by missing out disgust, fear and contempt. This dataset has also not been made available for public research use.
2.8.3
YorkDDT
As part of a psychological study, the York Deception Detection Test (YorkDDT), Warren et al. [116] recorded 20 video clips, at 320×240 resolution and 25 fps, where participants truthfully or deceptively described two film clips that were either classed as emotional, or non-emotional. The emotional clip, intended to be stressful, was of an unpleasant surgical operation. The non-emotional clip was meant to be neutral, showing a pleasant landscape scene.
Figure 2.4: An example of a ‘positive’ labelled micro-movement from the SMIC dataset.
The participants watching the emotional clip were asked to describe the non- emotional video, and the opposite for the participants watching the non-emotional clip. Warren et al. [116] reported that some micro-facial expressions occurred during both scenarios, however these movements were not reported to be available for public use.
During their study into micro-facial expression recognition, Pfister et al. [64] managed to obtain the original scenario videos where 9 participants (3 male and 6 female) displayed micro-expressions. They extracted 18 micro-facial expressions for analysis, 7 from the emotional scenario and 11 from the non-emotional version.
Other than the very low amount of micro-expressions in this dataset, it is created through a second source that do not go into a large amount of detail about
AUs, or participant demographic. With the data unable to be publicly accessed, it is not possible to study these micro-expressions. It is also an issue with the frame rate being so low, the largest amount of frames for analysis would be around 12-13 frames. The lowest reported micro-expression length was 7 frames.
2.8.4
SMIC
TheSMICdataset [15] consists of 77 spontaneous micro-facial expressions filmed at 100fpsand was one of the first to include spontaneous micro-expressions obtained through emotional inducement experiments. However, this dataset was not coded using FACS[39] and gives no information on neutral sequences (the participant’s face not moving before onset).
The protocols for the inducement experiment consisted of showing partici- pants videos to react to and asking them to suppress their emotions, however with noFACScoding the categorisation of emotion labels was left to participant’s own
self-reporting by asking to fill out how they felt during each video. Leaving the categorisation to participants allows for subjectivity on the emotional stimuli to be introduced. The recording quality was also decreased due to flickering of light in the laboratory setting and the facial area was 190×230 pixels.
The SMICincluded a wider demographic of participants with 6 being female and 14 male. Ethnicity was more diverse than previous datasets with ten Asians, nine Caucasians and one African participant, however this still only includes 3 ethnicities and does not provide a good overview of a population. This dataset also includes recordings at 25 fps, and near-infrared images but for consistency the high-speed videos are the only ones used for comparison. An example from the dataset can be seen in Fig.2.4 that shows a ‘positive’ micro-movement.
2.8.5
CASME
To address the low number of micro-expressions in previous datasets, theCASME
dataset [16] captured 195 spontaneous micro-expressions at 60 fps, but the facial area dimensions were lower thanSMICat 150×190 pixels. A further advantage of this dataset is all expressions are FACS coded and included the onset, apex and offset frame number. The duration of any movement did not exceed 500 ms unless the onset duration was less than 250 ms because fast-onset facial expressions, with slow offset durations, are classed as micro-expressions [43].
Participants watched videos to induce an emotional response. All 17 videos were selected from the with a positive and negative valence. The authors acknowl- edge that the videos may elicit various emotions from the videos. Video durations ranged from 1 - 4 minutes, and mainly were used to elicit 1 particular emotion. Finally, participants rated the videos afterwards on the emotional content from a scale of 0 - 6, where 0 was the weakest and 6 the strongest.
The frame rate of this dataset, 60 fps, does not represent micro-expressions well, as the movements could easily be missed when recording. The categories for classifying a labelled emotion have been selected based on the video content, self- report of participants and universal emotion theory. Moreover, the dataset uses repression and tense as new additions aside from the universal emotion theory and leaves out contempt and anger.
Figure 2.5: An example micro-movement from the CASME II dataset. In static images, it is not easy to see the movement, however the participant has
been FACS coded with AU12 (lip corner puller).
2.8.6
CASME II
Shortly after, CASMEII [17] was created as an extension of the originalCASME
dataset. The frame rate increased to 200 fps to analyse more detail in muscle movements, and 247 newly FACS coded micro-expressions from 26 participants were obtained. The facial area used for analysis was the larger than CASME and
SMIC at 280×340 pixels. An example from the dataset can be seen in Fig. 2.5
that showsAU12 (lip corner puller) induced from this participant.
However, as with the previous version, this dataset includes only Chinese participants and categorises in the same way. Both CASMEand CASME II used 35 participants, mostly students with a mean age of 22.03 (SD= 1.60). Along with only using one ethnicity, both these datasets have the disadvantage of using young participants only, restricting the dataset to analysing similar looking participants (based on age features).
Based on the findings from the previous benchmark datasets, much more can be done to address the limitations such as consistent lighting and a wide demo- graphic, however the lack of datasets for micro-movements induced spontaneously motivates the creation of this dataset.
2.9
Real-World Applications
Detecting and recognising human faces and facial expressions has attracted in- creasing attention from psychologists, cognitive scientists and computer scientists due to its application to human computer interaction [117, 118], law enforcement and security [119] and computer animation.
Successfully detecting micro-facial movements to a high degree of accuracy has benefits to many areas, including security, psychological assessment, teaching and training [13, 19, 20]. Within airports the use of such technology can spot a face in the crowd who may be concealing emotion and showing micro-movements. Spotting this early can help prevent a person potentially causing harm as they try to conceal the intent in their facial expressions. This system may be a long way off yet due to the technological challenges of spotting such small movement with security cameras in an area with a dense population.
It should be emphasised that systems in security would not be able to im- mediately spot if someone wants to cause harm, but will provide a starting point to investigate further. This links with another application of automatic micro- movement detection where people can be taught how to spot the expressions much easier if they can review situations where they occur, such as interrogations of criminals or experiments to spot liars.
A lot of applications show promise in security and law enforcement, where most research focuses applications, however there is a possibility to apply micro- facial movement detection to psychological assessments and neurological problems. A person having problems with their mind may try to conceal their true feelings to others, but they are truly under distress from problems arising from their lives [21]. Mental illnesses show few physical symptoms, but patients yet to be diagnosed with a mental health issue could be helped faster if their concealed distress is uncovered. Further, neurological issues in the brain may cause involuntary twitches on the face that could show early signs of diseases like Parkinson’s [22]. Therefore, even though the twitches will not be related the facial expressions it shows the broader scope of the solution as an application to real-world problems.
Paul Ekman’s research was used by the United States’ Transportation Secu- rity Administration (TSA) to train officers on how to analyse behaviour for border protection [14]. It is also known that this research has been used to train large organisation for crime prevention such as the FBI and CIA, showing that this research is being recognised as important for future crime prevention.
2.10
Research Direction
Analysing large or macro-facial expressions has been a big part of computer vision for decades, and the work done here seemingly can be replicated for micro-facial expression analysis. However, this field has branched into an area of its own garnering a multitude of experiments and applications [10,13, 18–20, 43,46,100]. With global threats such as terrorism and flawed science such as the polygraph for deception detection [120], using the non-verbal cue of micro-expressions may be a key to helping aid security in dealing with potential threats. Security and deception detection are not the only application though, as using systems to help in the detection of micro-expressions could help for clinical applications such as helping therapists uncover hidden feelings in patient’s regardless of their awareness of the emotion. Other potential uses would be for early facial paresis detection [22], cross-culture studies [23, 24] and computer animation [25].
A controversial point is whether or not it should be allowed to detect these micro-expressions, as the theory behind it states that the person attempting to conceal their emotion experience these movements involuntarily and likely un- knowingly. If we are able to detect them with high accuracy, then we are effec- tively robbing a person of being able to hide something that is private to them. From an ethical point of view, knowing when someone is being deceptive would be advantageous but takes away the freedom you had in your emotions.
Regardless of current issues surrounding privacy, micro-facial expression anal- ysis systems are still in the early stages of development, with accuracies reaching roughly no more than 70% [5,16,17,109]. Further, the methods have limited data to work with, especially compared to normal facial expression, due to the difficult nature of inducing micro-facial expressions. Also, as the micro-movements are so fast, normal camera systems that record between 25 - 30 fps barely capture any useful information for micro-movement analysis. High-speed cameras are therefore used [15,17], which captures many more frames (100 - 200fps) but at the expense of any current system not being able to perform in real-time.
For any micro-movement detection algorithm, there can be many problems to solve to ensure an effective detection system. Based on the review of current lit- erature, a few research challenges are identified. The speed and subtlety of micro- movements makes differentiating between micro-movements and non-movements
challenging, especially ones that are genuine and spontaneous. Similarly, image noise can affect detection algorithms considerably. The use of machine learning has been explored, however recognising micro-movements into distinct classes be- comes challenging when micro-movements can be expressed in many small facial muscle motions. Further, machine learning requires training with a large, balanced dataset and is not suitable for real-time processing.
2.11
Summary
This Chapter introduced the theory of universal facial expressions and why they are important in understanding and interpretation of emotion. It moves on to discuss facial expression analysis that informs the main review of micro-facial expression analysis. Analysing these subtle expressions is difficult for humans, but approaches to representing these movements using computer vision are steadily growing, but have a long way to go before being as well-established as facial expression analysis.
Micro-facial expression analysis tends to focus on the emotional interpreta- tion, meaning assumptions are commonly made. Focusing on micro-facial move- ments, which describe the facial muscle activations, removes these assumptions. This Chapter described the way machine learning makes many assumptions in the recognition of micro-expression into distinct categories, and in contrast, micro- movement analysis and detection is described as a more suitable approach when using computer vision.
Theories and Techniques
This Chapter focuses on the techniques used throughout the thesis, in- cluding detecting the face, alignment of face into a canonical plane, de-noising, feature extraction and measuring the performance of re- sults. The flow of this Chapter loosely follows the flow in which many algorithms take during the analysis of micro-facial movements.