Conclusions - – Conclusion and Future Directions

Chapter 8 – Conclusion and Future Directions

8.1. Conclusions

This thesis has focused on three main phases essential for building a classification model to recognise participants’ feelings as expressed through physiological responses while viewing visualisations or various smile videos.

Phase I: In this phase, two visualisations (radial and hierarchical) are presented, and I asked six very similar questions about each visualisation from the observers, in a setting where the correct response rates could not be used to differentiate the visualisations. I investigated five parameters (observers’ correct response rate, response time, fixation duration, number of fixation, and saccades duration) to differentiate between them.

Individual analysis on each parameter shows that observers’ correct response rates are very similar for both cases where two tailed paired sample t-test shows that none of the correct response rate, the fixation duration, or saccade duration is able to differentiate between these two visualisations. The other parameters, response time and number of fixations, are able to differentiate between these two visualisations. This is to be expected in a compliance setting, in that the high cost of mistakes leads to behaviour in general such that people will make sure they have found the correct answer, so any difference in quality (or usability) of the visualisation will show up as time or other behaviours.

I compared two similar visualisations, and demonstrated that as designed, the user correct response rates were not able to show any statistically significant differences. I showed from eye gaze data that it is still possible to differentiate between these two visualisation examples using simple eye gaze metrics. I developed a neural network model to classify these two visualisations from participants’ pupillary responses and found that Levenberg- Marquardt is a better algorithm in this task as compared to other algorithms. Further, I demonstrated that the hierarchical visualisation is superior to the radial in this setting, as I also showed that users were significantly quicker on the hierarchical visualisation even though it was displaying slightly more complex data in graph analysis terms.

Phase II: In this phase, physiological signals were investigated to discriminate real from acted smiles, while watching emotion containing smile videos, along with recording the observer’s judgements via a Likert scale. It was a challenging task, because the recorded physiological signals were highly noisy. Different noise removal techniques with an advanced feature selection method were applied and the highest classification accuracy was found to be 97.8% by analysing 450 features of left eyes’ pupillary responses. The observers were only 52.7% (individually on average) to 68.4% accurate (together by voting) according to their verbal responses. These results are in the normal range reported in the literature over multiple studies for determining real smiles from surveys. A result of 97.8% from pupillary response suggests that at non-conscious levels people are very good at detecting genuine smiles, perhaps reflecting the fact that this identification can feed into the emotional responses to others, and perhaps even that there is a benefit to a relatively low level of

114 conscious identification of genuine smiles – it may be important in social interactions to be able to accept smiles or other expressions at ‘face value’ consciously, while emotionally correctly recognising the veracity of the expression.

In this connection, I investigated both recorded and extracted features from 10 participants’ GSR signal to compute classification accuracies using KNN, SVM and NN. Various feature selection methods and their outcomes are also investigated. It was observed that NN shows higher accuracies compared to other two classifiers (SVM and KNN). It was also found that RSFS shows higher accuracies compared to other four feature selection methods (SFS, SFFS, MI, and SD). On the other hand, NN and RSFS are costly compared to other techniques. Finally, higher accuracy (96.5%) is found from extracted features using RSFS and NN compared to others.

Phase III: In the third phase, I have overcome the effect of biasing on the testing set from training physiological features using an independent approach and showed that high accuracy results are achievable using a highly robust cross-validation approach. This approach is more robust compared to the other two approaches (leave-one-observer-out and leave-one-video-out) in the literature, as it does not consider any information from test data in the training data. In the leave-one-observer-out approach, training data is free from test observer information, but not free from test video information. The converse statement is true for the leave-one-video-out approach. On the other hand, my independent approach is totally free of such bias.

In addition, I considered four classifiers (KNN, SVM, NN, and their ensemble) to distinguish between real and posed smiles from observers’ peripheral physiological features using this independent approach. The ensemble classifier performs better than other classifiers. It provides accuracies of about 84% from individual physiological features (PR, BVP, or GSR), where other two approaches, called smiler independent and observer independent, show higher accuracies compared to independent approach. Feature level fusion improves the classification accuracy of the independent approach to 96.1%, using ensemble technique.

From the results of the robust independent approach, I found substantial effects from the parameters I considered on the smile classification as real or posed, from observers’ physiological features. I saw that lower K values and scaling factors, and higher numbers of hidden nodes were needed to find higher classification accuracies according to the architecture of each classifier. I noted that fusing results when optimized parameter values for each technique led to no improvement, strongly indicating that the errors made by each classifier must be quite similar. Fusing results from cases with less good parameter values led to improved classification results. I believe this would be a practical test for naïve users of these techniques, to indicate that further parameter tuning should be done.

In Summary, I have produced excellent results by using sensors on the viewer of a smile rather than the producer of a smile. This means the results can be extended to historical data, and to non-human data, for example to examine the veracity of smiles of virtual avatars [219] as it is clear that the brain areas controlling expression recognition and creation interact without conscious control [220] hence knowing it is an avatar may not change physiological signals caused by the recognition/ creation pathways.

115 In this respect, the results of smile-experiments’ indicated that there is a marked difference between verbal reporting and peripheral physiological signals. Final results show that participants are verbally 52.7% (on average) to 68.4% (by voting) correct whereas they are physiologically 96.1% correct using an independent approach and ensemble technique. Even when faces are not involved (E1, visualisation experiment), we can still find this difference in accuracy (verbally 81.0% and physiologically 95.8% correct), which can be achieved by examining the physiological processes as compared to conscious conclusions. It should also be noted that the final accuracy figure obtained from observers’ fused physiological features to distinguish smilers’ affective states into real or posed, shows that this system could be applicable in many situations, such as patients’ mental state monitoring, verifying trustworthiness during questioning, relationship management, and so on.

In document I Can Feel You Are Smiling Happily: Distinguishing between Real and Posed Smiles from Observers' Peripheral Physiology (Page 138-140)