Facial Expression Recognition with Separate Lights

Chapter 7 Evaluating FER under Harsh Lighting with an Enhanced HDR

7.4 High Dynamic Range Database

7.5.3 Facial Expression Recognition with Separate Lights

Tables 7.3 and 7.4 present the summary of confusion matrices for FER performance over the six datasets and using SBS and CNN techniques based on the four emotions.

Classification Accuracy Based on SBS and CNN

In this section, FER performance on the HDR database with separate lighting conditions is presented. The reason for this experiment is to enable a test under each lighting conditions with the four emotions for comparison with the combined lights in Section 7.5.2. Comparatively, the true positives and average FER recognition rates indicates increase of about double the results presented in Tables 7.3 and 7.4, with combined lights. This indicate some traces of complexity in performance with combined lights. This result corroborates that of Chapter 6. We observe that, there is no significant difference to explain gaps between the experiments performed under different lights, even over the datasets. However, across emotions, Lg_TMO and Re_TMO perform better on average recognition rates than naive, Opt_exp and other TMOs with both SBS and CNN techniques. Similarly, naive performs lowest with SBS technique, but relatively higher with CNN technique. Globally, CNN performed better than SBS across all datasets. This is because deep neural network learn to identify shapes and objects to define facial expressions. Thus, deep CNN visible layer uses matrix, this enable the network to understand the spatial proximity of the pixels, leading to more robust feature extraction [YCBL14,NNVW15].

Table 7.3: Summary of confusion matrix showing true positives and the average recognition rates of FER on separate lights with Naive, Opt_exp, DA_TMO, Dr_TMO, Lg_TMO and Re_TMO datasets using SBS technique (%).

Emotion/ Lights

angry disgust happy neutral

BL LL OL AvR BL LL OL AvR BL LL OL AvR BL LL OL AvRe

Naive 68 80 72 73 60 68 84 71 83 70 77 77 75 69 81 75 Opt_exp 87 82 91 87 72 64 60 65 87 82 85 85 74 80 82 79 DA_TMO 64 64 71 66 80 82 91 85 76 89 82 82 90 88 84 87 Dr_TMO 82 89 89 87 78 80 76 78 80 85 76 80 86 90 92 89 Lg_TMO 80 82 85 82 80 87 89 85 87 89 91 89 88 90 84 87 Re_TMO 87 93 80 87 82 76 80 79 93 93 85 90 82 86 86 85

Table 7.4: Summary of confusion matrix showing true positives and the average recognition rates of FER on separate lights with Naive, Opt_exp, DA_TMO, Dr_TMO, Lg_TMO and Re_TMO datasets using CNN technique (%).

Emotion/ Lights

angry disgust happy neutral

BL LL OL AvR BL LL OL AvR BL LL OL AvR BL LL OL AvRe

Naive 82 80 91 84 84 87 84 85 86 86 86 86 84 82 78 81 Opt_exp 74 82 89 82 72 88 80 80 87 78 85 83 88 88 84 87 DA_TMO 59 68 67 65 71 71 78 74 93 85 87 88 82 88 88 86 Dr_TMO 80 84 78 81 76 71 89 79 85 76 76 79 88 80 84 84 Lg_TMO 65 87 80 77 82 82 80 82 87 85 87 86 84 90 84 86 Re_TMO 78 89 82 83 85 87 82 85 80 82 80 82 81 88 72 81 Discussion

The proposed HDR imaging was evaluated on a FER system using six test datasets. The test images consists of three different harsh lighting conditions - left light, side light and overhead light. The performance of the HDR based FER system under harsh lighting conditions using the combination of all three lights resulted in an overall performance of 37% as shown in Table 7.1 with SURF+BOF+SVM method. It is observed that the low performance with the BOF technique used for characterising (representing and discriminating) SURF descriptors, is due to the complexity of the combined lights under harsh lighting in the process of describing the structural information of the BOF in an image. In discriminating the different emotions, happy was highly discriminated among the emotions across all datasets and lighting conditions. This is confirmed in the test with SURF+BOF+SVM method. Unlike in Table 7.2 with deep CNN, where the overall performance was 57%. This is above the baseline of 55.6% in [NNVW15] in this domain. Although, HDR tone mapped images (DA_TMO, Dr_TMO, Lg_TMO, Re_TMO) performed much better than Naive and Opt_exp, which is the purpose of this experiment. On another perspective, comparing the results with other methods used for FER in the literature, the highest performance of 57% accepted for the image conditions used with deep CNN, still needs more improvement.

separated across the different emotional facial lights, recorded little difference with discriminating angry faces across the lighting conditions and datasets with SBS method. It is observed that most participants find it difficult making angry face. Also, in Table 7.4, there is no significant difference in discriminating all the emotional faces across the lighting conditions and datasets with deep CNN method. Taking the average recognition rates across the different lights, the left light (LL) recorded the highest recognition rate. This follows from the observation discussed above in Table 7.2. Therefore, it can be concluded here that these results can be used as a benchmark for other studies, with more attention on the complexities when images under different harsh lights are used in separate datasets or when combined in a single dataset.

7.6 Summary

In this chapter, an HDR solution applied to FER problem was investigated using data with different harsh lighting each presenting specific challenging conditions. We have evaluated the HDR database in six different categories. The performance of the six categories were assessed using two computer vision algorithms: SURF and CNN. We demonstrated under different lighting conditions that the tone mapped versions of the HDR database gives a high recognition rates, which are higher than the naive and optimal exposure.

Chapter 8

Conclusions

"If I were again beginning my studies, I would follow the advice of Plato and start with Mathematics." Galileo Galilei

The thesis has investigated the performance of a facial expression system using images captured under harsh lighting conditions. We assume throughout the thesis that the aim is on addressing the deficiencies and limitations of the native imaging method (LDR), and how this can be improved by HDR methods. While some progress has been reported in this direction, in this thesis, the path to achieving accurate, informative, robust and real-time facial expression analysis has been presented, particularly for images captured under harsh lighting conditions.

The images captured under harsh lighting conditions considered in the thesis are known to affect the performance of FER systems negatively. This challenge has been a major research issue both from academics and a practical perspective. However, in reality, most of the images captured will not be in ideal studio like lighting environments, where the face of the subject is perfectly lit and stable to ensure flawless capture. Key to the viable application of the methods is the ability to consistently measure the same facial expressions over the full range of changing scene lights. The main goal of the research presented in this thesis was to improve the performance of FER system by taking advantage of the HDR imaging

technology.

The thesis presented first, image enhancement methods used as pre-processing techniques, investigating how much of pixel information is resulted. Chapter 5 showed an experiment that uncovered the loss of image information resulting from image enhancement. The chapter also proposed HDR-based method towards improving the performance of facial expression recognition in scenes where harsh lighting conditions are expected and the LDR imaging find difficulty capturing the full range of scene light in a single exposure. Chapter 6 introduced an experiment that showed how the use of HDR tone mapping operators use for face recognition is beneficial for harsh lighting conditions. Finally, Chapter 7 present HDR database as a solution applied to facial expression recognition problem with different harsh lighting each presenting specific challenging lighting directions. Furthermore, the thesis contributions is given and suggestions for future work from the findings are also suggested.

In document Facial expression recognition under harsh lighting using high dynamic range imaging (Page 136-141)