A feasibility study for an investigation into the effect of emotion on multisensory integration using EEG

(1)

A feasibility study for an investigation into the effect of emotion on multisensory integration

using EEG

Student no. 140191729

Word count: 2748

(2)

Introduction

Multisensory integration (MSI) can be described as the “process by which information from different sensory systems is combined to influence perception, decisions, and overt behaviour”¹. Humans receive information through multiple sensory systems which, when engaged simultaneously, converge to heighten the perception of external stimuli. Bimodal or multimodal sensory integration has an important role in the perception of emotion, which in turn is a key aspect of social functioning and relationships^2,3.

Facial affect perception is seldom based solely on facial expression but is a result of a combination of visual stimuli with contextual sensory information such as sound or voice^3,4. De Gelder & Vroomen (2000) demonstrated in a behavioural paradigm that facial emotion recognition is biased towards the emotional nature of a voice when simultaneously presented.

This bias was present even after participants were instructed to base responses purely on the visual stimuli, suggesting it is an subconscious, automatic behaviour⁵.

While there is substantial research in the area of perceptual processing, this has primarily had a focus on unimodal stimuli. Relatively little research has incorporated bi-modal or multimodal stimuli or studied the behavioural and physiological patterns in multisensory integration.

Most of this research has examined the area of emotional conflict and its effect on MSI. The congruency of contextual sensory information with a primary stimuli is an important factor in perception of emotion; simultaneous exposure to multiple different emotional inputs can result in emotional conflict⁴. Incongruent conditions and emotional conflict tend to result in test responses with lower accuracy and slower reaction times than when congruent conditions are applied⁴. Neurological differences between emotionally congruent and incongruent conditions have also been observed in studies using event-related potentials (ERPs), with facilitated N1 amplitudes when congruent, and longer auditory P2b latencies when incongruent².

(3)

Exploring the neurophysiology behind MSI and emotion perception in healthy individuals is an important step in understanding the pathophysiology of schizophrenia. In addition to generalised cognitive impairment, schizophrenics exhibit a deficit in facial emotion recognition (difficulty in the association of a given facial expression with its corresponding emotion) and this deficit has an adverse effect on social functioning⁶. Hence, more research is required into the neural correlates behind these processes. The use of EEG and study of ERPs is most appropriate when investigating the physiological effects of contextual information on facial affect processing⁷.

In such studies, where stimuli are used to express a specific emotion, it is vital that the stimuli conveys that emotion effectively and as desired. Therefore preliminary studies, such as the present study, can be performed to obtain behavioural responses to stimuli before they are used in larger investigations.

Research Question

The current study aims to assess the feasibility of an investigation into the effect of emotion on multisensory integration which will study event related potentials (ERPs) in response to visual emotional stimuli contextualised with audio stimuli. We aimed to identify visual and audio stimuli as well as presentation methods feasible for use within this EEG study by recording behavioural responses to a range of emotional stimuli.

We hypothesised that the behavioural data collected in response to the selected stimuli would be concordant with expected responses and give a clear indication of which stimuli would be most appropriate in the proposed EEG study.

2 experiments were run in which participants were asked to evaluate emotional stimuli, one using visual stimuli and the other using auditory stimuli.

(4)

Experiment 1

We were aware of the “ceiling effect” that occurs in tests of facial affect recognition in healthy subjects and that in actual fact, positive facial affect is generally recognised better than other emotion⁸. In experiment 1 we aimed to identify visual stimuli displaying happy, neutral and fearful facial expressions that could be recognised to the same extent as each other.

Method

14 healthy participants, all students between the ages of 19-20, were recruited from the University of Sheffield. Participants confirmed their age and sex immediately before testing commenced. The stimuli used in this task were sourced from the Pictures of Facial Affect (POFA)⁹. From this database 33 black and white photographs were selected, which comprised of 11 different actors (5 male) each displaying one happy, one neutral and one fearful facial expression. For each individual actor, the happy and neutral photographs were manipulated using Psychomorph software¹⁰ and ‘morphed’ together. This produced a linear continuum between the two original images/expressions in an 11-image sequence in steps of 10%, thus enabling a certain percentage of ‘happy’ or ‘neutral’ to be selected. 5 graduations in these continua were selected for use as stimuli in the trial: 60, 70, 80, 90 and 100% ‘happy’ (100%

being the original taken from the POFA database). Therefore the 165 photographs used as test stimuli consisted of 55 ‘happy’ faces (5 graduations for each of the 11 actors used); 55 neutral faces (the original 11 repeated 5 times) and 55 fearful faces (the 11 originals repeated 5 times).

Non-facial features like hair and clothing in the POFA visual stimuli were all obscured⁶. A PC running PsychoPy¹¹ controlled experimental events and recorded data. Task stimuli were presented on a 144Hz monitor (24”, Illyama, ProLite GB2488HSU-B1), which resulted in a 6.9ms refresh rate. Stimuli were presented at the centre of the monitor, size: 14.4cm by 21.3cm.

Stimuli were shown for 7 frames, so approximately 50ms (7x6.9ms: see refresh rate)

(5)

Participants sat approximately 60cm from a computer monitor with a keyboard connected to the computer. An instruction screen gave an overview of the task, which began immediately once any key was pressed. Each stimulus was shown on the monitor for 50ms before a response screen was displayed. Participants were then asked to identify the emotion displayed by the stimulus. They did this by selecting one of three options, these being happy, fearful or neutral.

Participants made their selection by pressing corresponding keys on the keyboard; the left arrow key indicated happy, the down arrow neutral, and right arrow fearful. These directions were displayed on the response screen after each stimulus until the participant made a selection.

A fixation cross in the centre of the screen signalled the beginning of the next trial to participants. The sequence in which the stimuli were displayed was randomised, with only neutral and fearful faces being repeated.

Results

Stimuli Average Score (%)

Happy 100% 98.05

Happy 90% 95.45

Happy 80% 97.40

Happy 70% 93.50

Happy 60% 87.01

Neutral 93.77

Fearful 92.86

Average scores were calculated from the test data. A participant scored 1 point for every response they gave that was congruent with the actual emotion the face was expressing i.e. if a 80% happy face were shown, the participant would score 1 if they correctly identified it as happy. Incorrect responses scored 0.

(6)

As expected, the 100% happy stimuli were most easily recognised, with participants correctly identifying these faces as happy 98.05% of the time. ‘Happy’ faces were generally recognised less successfully as percentage happiness decreased, with the exception of the 80% happy faces which on average were recognised more successfully than the 90% happy faces. 60% happy faces were recognised least successfully with an average score of 87.01%.

70% happy stimuli provoked a response accuracy of 93.50%, within 1% of those of the neutral and fearful stimuli. This would suggest that a 70% happy stimuli would be most appropriate for use with the neutral and fearful stimuli in the aforementioned proposed EEG study.

The short exposure time (50ms) to stimuli led to a reduction in the ceiling effect. Hence, we saw a tendency for subjects to recognise the happiest or most positive stimuli more successfully, in accordance with previous research⁸.

80 82 84 86 88 90 92 94 96 98 100

100% Happy 90% Happy 80% Happy 70% Happy 60% Happy Neutral Fearful

Average Score (%)

Facial Expression

Recognition of facial emotion

(7)

Experiment 2

In experiment 2 we aimed to find emotional auditory stimuli that were consistently identifiable as either happy, fearful or neutral. The experiment was run under 2 different conditions.

Method

26 healthy participants (10 male, mean age 20.92 years, SD=2.61) were recruited from the University of Sheffield. Using the same software and equipment as in experiment 1, a test paradigm was built around audio stimuli taken from the IADS database¹². Selected audio files were taken from the database and edited using Audacity 2.1.1 software¹³ to reduce their duration from 6 seconds to 3 seconds whilst preserving the main audio event in each clip. 118 sounds were selected from the database according to their emotional category: 40 happy, 37 neutral and 41 fearful sounds were used as stimuli. As in experiment 1, A PC running PsychoPy¹¹ controlled experimental events and recorded data. Instructions and response screens were presented on a 144Hz monitor (24”, Illyama, ProLite GB2488HSU-B1).

Participants sat approximately 60cm from the computer monitor and entered their responses using the keyboard. Auditory stimuli were played through a set of Sennheiser HD 265 linear headphones, also connected to the computer. Participants confirmed their age and sex before proceeding with the task. An instruction screen gave a description of the task, which began when any key was pressed. Each stimulus was played through the headphones for 3 seconds, after which 3 response screens were displayed.

(8)

The first response screen asked participants to rate the valence of the sound they had just heard

on a scale of 1-9. 1 indicated completely unpleasant, 9 indicated completely pleasant:

Participants were then asked to give an arousal rating of the sounds on a similar scale, 1 being not arousing at all and 9 being completely arousing:

Finally, participants were asked to place the sound into one discrete emotional category: these were Happiness, Sadness, Fear, Disgust, Anger, Surprise and Neutrality.

The task was run under 2 different conditions: under condition 1, questions were based around how the sounds made participants feel, whereas under condition 2, questions were based around what the participants judged the content of the sound stimulus to be. To accommodate this design, 2 separate tasks were constructed using PsychoPy, both with the same experimental setup as previously mentioned, but with different wording of response screens. 14 individuals (5 male, mean age 20.57 years, SD=2.65) participated under condition 1, and 12 (5 male, mean age 21.33 years, SD=2.61) under condition 2.

(9)

Results

Valence

Paired-samples t-tests were conducted to compare the mean valence rating of all stimuli across conditions one and two. The mean valence rating from the feelings condition (M=4.6, SD=1.66) was significantly lower than that of the judging condition (M=4.75, SD=1.85), t(117)=2.54, p=0.012.

In happy stimuli the mean valence rating from the feelings condition (M=6.31) was significantly lower than the mean valence rating from the judging condition (M=6.83, SD=0.65), t(39)=4.47, p<0.001.

In neutral stimuli there was no significant difference between the mean valence ratings of each condition, t(36)=1.62, p=0.12. This was also the case in fearful stimuli, t(40)=1.98, p=0.054.

1 2 3 4 5 6 7 8 9

Happy Neutral Fearful

Mean Valence Rating

Stimuli Emotion

Fig. 1: Valence

Condition 1 Condition 2

(10)

Arousal

Paired-samples t-tests were conducted to compare the mean arousal rating of all stimuli across conditions one and two. The mean arousal rating from the feelings condition (M=4.93, SD=0.95) was significantly lower than that of the judging condition (M=5.25, SD=1.26), t(117)=4.10, p<0.001.

In happy stimuli there was no significant difference between the mean arousal ratings of each condition, t(39)=0.58, p=0.565. This was also the case in neutral stimuli, t(36)=0.53, p=0.60.

In fearful stimuli the mean arousal rating from the feelings condition (M=5.49, SD=0.87) was significantly lower than the mean arousal rating from the judging condition (M=6.27, SD=0.99), t(40)=7.22, p<0.001.

Correct responses

Paired-samples t-tests were conducted to compare the percentage of correct responses across conditions one and two. The mean percentage correct from the feelings condition (M=46.97, SD=23.72) was significantly lower than that of the judging condition (M=60.24, SD=26.74), t(117)=6.43, p<0.001

1 2 3 4 5 6 7 8 9

Mean Arousal Rating

Stimuli Emotion

Fig. 2: Arousal

Condition 1 Condition 2

(11)

In happy stimuli the mean percentage correct from the feelings condition (M=58.93, SD=20.64) was significantly lower than the mean percentage correct from the judging condition (M=75, SD=20.33), t(39)=4.77, p<0.001.

In neutral stimuli the mean percentage correct from the feelings condition (M = 41.89, SD = 23.64) was significantly lower than the mean percentage correct from the judging condition (M=60.36, SD=25.41), t(36)=5.78, p<0.001.

In fearful stimuli there was no significant difference in percentage of correct responses in each condition t(40)=1.53, p=0.134.

Discussion

Experiment 1

As stated previously, behavioural data from experiment 1 suggested that ‘70% happy’ faces were most appropriate for use as visual stimuli in conjunction with the neutral and fearful faces from the POFA database⁹ as these produced the most similar results. The behavioural data we

0 10 20 30 40 50 60 70 80 90 100

Mean Correct Responses (%)

Stimuli Emotion

Fig. 3: Correct Responses

Condition 1 Condition 2 Conditions 1 & 2 (mean)

(12)

received, albeit from a small sample (14) seemed to support the notion that positive facial emotion is recognised more readily than negative emotion⁸.

Experiment 2

We can conclude from the data that the way in which the question was asked had a significant impact on participants’ responses. The results of the paired-samples t-tests across all audio stimuli show that on average participants gave significantly stronger (higher in arousal – fig.

2) and more positive (higher in valence – fig. 1) responses when under condition 2 than when under condition 1. Responses to the emotion recognition also indicated a higher accuracy of emotion recognition under condition 2 (fig. 3).

On average across both conditions, happy stimuli were recognised significantly more successfully (66.97%) than neutral (51.13%) or fearful stimuli (42.82%), as seen in fig. 3. This suggests that in an auditory medium, positive stimuli are more easily recognised than other emotions, just as they have found to be in a visual context⁸. This may be because happy sounds are less ambiguous than fearful sounds, for example, which may be interpreted as similarly negative emotions like sadness or disgust. In contrast, Vroomen, Collier & Mozziconacci found in a similar behavioural paradigm that it is often harder to differentiate between happiness and other emotions in an auditory setting¹⁴. However observation was based on the use of emotional voices, suggesting that the inclusion of non-human stimuli in the present study had a different influence in emotional processing¹⁴.

The nature of condition 1 meant that responses were more subjective. Emotion can be easily influenced and participants’ responses may have been affected by their general mood on the day of testing. Responses may also have been influenced by residual emotion from previous stimuli within the task: if a stimulus evoked a particularly strong emotional response then the response to the stimuli immediately after may have been affected.

(13)

A precaution that could have been taken would have been to ask participants to rate their emotional state immediately before the experiment. This would have added context to individuals’ responses and potentially allowed us to be more selective with data i.e. exclude data from participants displaying more extreme emotional pre-test ratings.

The final response screen asked participants to put the sound into one of seven basic emotional categories. However, only 3 of these options could elicit a correct response – happiness, fear and neutrality. On one hand, this added difficulty to the task as participants were less likely to select the ‘correct’ option, however the choices of emotion meant that if the participant chose the correct option, it was more likely to be a true reaction to the stimuli i.e. they were not choosing an option simply because of a lack of other options. In this respect, the added complexity of having 7 options led to a higher degree of authenticity in the data.

The primary aim of the experiment was to identify the most recognisable stimuli based on the behavioural data with a view to being used in a future EEG study. A threshold of 70% correct responses was suggested to select the stimuli which best conveyed their emotion. It was decided that the data obtained under condition 2 would be the most appropriate to use for this because a significantly lower percentage of correct responses across all stimuli under condition 1 meant that very few neutral and fearful stimuli would have met the threshold if an average had been taken across both conditions. Another factor was the subjective nature of condition 1. However, using only data obtained under condition 2 left a very small sample size (12 participants).

Ideally, this experiment would be repeated with a much larger sample size which would add credibility to any findings and would allow a more informed decision to be made over the use of data from condition 1.

To conclude, our results have shown that behavioural responses to emotional auditory stimuli can vary significantly depending on conditions. We were able to see how this presented across

(14)

the 3 criteria of valence, arousal and % correct responses. The data indicated a higher degree of identifiability for happy stimuli across both audio and visual modalities when compared to neutral and fearful stimuli. The behavioural data successfully provided a numerical indication of identifiability which can be used to select stimuli for future EEG or behavioural studies.

(15)

Annotated Bibliography

De Gelder, B., & Vroomen, J. (2000). The perception of emotions by ear and by eye.

Cognition & Emotion, 14(3), 289-311

This study explored the relationship between auditory and visual processing in emotion perception through the use of bimodal stimuli in behavioural paradigms. It was showed that in healthy individuals: 1) sensory information from two different sources can be combined effectively in the process of emotion recognition, with the emotion of voice heavily influencing the identification of emotion in a simultaneously presented face; 2) that this influence is automatic and exists in spite of instructions to base judgment solely on the visual stimuli 3) that this influence is bi-directional. This supports notion of multisensory integration, the next step being to carry out a similar experiment with bimodal stimuli, whilst incorporating EEG to explore the neural correlates behind these processes further.

Müller, V. I., Habel, U., Derntl, B., Schneider, F., Zilles, K., Turetsky, B. I., & Eickhoff, S. B. (2011). Incongruence effects in crossmodal emotional integration. Neuroimage, 54(3), 2257-2266.

This research focused on the effect of emotional congruence vs incongruence, collecting both behavioural and whole brain fMRI data. The former supported the findings of previous behavioural studies into emotion perception such as that of de Gelder and Vroomen (aforementioned): participants judgement of facial emotion was biased in the direction of that of simultaneously presented auditory stimuli. The fMRI data was able to identify neural correlates in emotional congruence/incongruence and indicated that in emotionally incongruent conditions in audio-visual integration, a cingulate-fronto-parietal network involved in conflict monitoring and resolution is activated. It also suggested that amygdala responses in audio-

(16)

visual integration indicate the absence of neutral features, not only the presence of emotional features.

Johnston, P. J., Devir, H., & Karayanidis, F. (2006). Facial emotion processing in schizophrenia: no evidence for a deficit specific to negative emotions in a differential deficit design. Psychiatry research, 143(1), 51-61.

It is recognised that schizophrenia patients exhibit an impairment in emotion recognition.

Evidence shows that in basic facial recognition tests that patients are significantly worse at recognising negative emotions than positive emotions, pointing towards a negative-specific deficit in facial affect recognition. However this study found that that positive facial emotions are more recognisable than negative emotions irrespective of a schizophrenia diagnosis and that these differences in discriminability are normally masked in healthy subjects due to the

“ceiling effect”: in standard facial emotion recognition tests healthy subjects consistently score near maximum or maximum marks across all emotions. However when the difficulty of these tests are increased by degrading stimuli the ceiling effect in healthy subjects is much less applicable and we see a tendency to recognise positive emotion more successfully than negative emotion. The idea of this “ceiling effect” was important to take into account when carrying out experiment 1 in the present study, whose results seemed to support Johnston’s findings.

Müller, V. I., Kellermann, T. S., Seligman, S. C., Turetsky, B. I., & Eickhoff, S. B. (2014).

Modulation of affective face processing deficits in schizophrenia by congruent emotional sounds. Social cognitive and affective neuroscience, 9(4), 436-444.

This paper built upon the aforementioned work of V.I. Müller and others in 2011 and explored crossmodal emotion processing in schizophrenia patients. Participants, consisting of a patient group and a control group, rated the emotion of different facial expressions whilst emotional or neutral sounds were simultaneously presented. Event related potentials were measured

(17)

throughout the task and revealed significantly decreased P1 and P2 amplitudes in patients, irrespective of visual or auditory stimuli and their congruency. Also, in emotionally congruent conditions there was a significant negative correlation between symptom severity and P1 amplitudes. This comparison of patient and control data suggested that during emotion processing, early visual processing impairment can be observed in patients. However, this impairment can be resolved by presenting emotionally congruent sound, depending on symptom severity.

(18)

References

1 - Stein, B. E., Stanford, T. R., & Rowland, B. A. (2009). The neural basis of multisensory integration in the midbrain: its organization and maturation. Hearing research, 258(1), 4-15.

2 - Müller, V. I., Kellermann, T. S., Seligman, S. C., Turetsky, B. I., & Eickhoff, S. B. (2014).

Modulation of affective face processing deficits in schizophrenia by congruent emotional sounds. Social cognitive and affective neuroscience, 9(4), 436-444.

3 – Dolan, R. J., Morris, J. S., & de Gelder, B. (2001). Crossmodal binding of fear in voice and face. Proceedings of the National Academy of Sciences, 98(17), 10006-10010.

4 – Müller, V. I., Habel, U., Derntl, B., Schneider, F., Zilles, K., Turetsky, B. I., & Eickhoff, S. B. (2011). Incongruence effects in crossmodal emotional integration. Neuroimage, 54(3), 2257-2266.

5 - De Gelder, B., & Vroomen, J. (2000). The perception of emotions by ear and by eye.

Cognition & Emotion, 14(3), 289-311.

6 - Tsoi, D. T., Lee, K. H., Khokhar, W. A., Mir, N. U., Swalli, J. S., Gee, K. A., ... & Woodruff, P. W. (2008). Is facial emotion recognition impairment in schizophrenia identical for different emotions? A signal detection analysis. Schizophrenia research, 99(1), 263-269.

7 – Wieser, M. J., Gerdes, A. B., Büngel, I., Schwarz, K. A., Mühlberger, A., & Pauli, P. (2014).

Not so harmless anymore: How context impacts the perception and electrocortical processing of neutral faces. NeuroImage, 92, 74-82.

8 - Johnston, P. J., Devir, H., & Karayanidis, F. (2006). Facial emotion processing in schizophrenia: no evidence for a deficit specific to negative emotions in a differential deficit design. Psychiatry research, 143(1), 51-61.

(19)

9 - Ekman, P., & Friesen, W. V. Pictures of facial affect. 1976. Palo Alto, CA: Consulting Psychologists.

10 – J. Chen and B. Tiddeman, Multi-Cue Facial Feature Detection and Tracking under Various Illuminations, International Journal of Robotics and Automation, Vol 25, No. 2, 2010.

11 – Peirce JW (2009) Generating stimuli for neuroscience using PsychoPy. Front.

Neuroinform. 2:10. doi:10.3389/neuro.11.010.2008

12 – Stevenson, R. A., & James, T. W. (2008). Affective auditory stimuli: Characterization of the International Affective Digitized Sounds (IADS) by discrete emotional categories.

Behavior Research Methods, 40(1), 315-321.

13 –http://audacityteam.org/ Audacity® (accessed 13/10/15)

14 - Vroomen, J., Collier, R., & Mozziconacci, S. J. (1993, September). Duration and intonation in emotional speech. In Eurospeech.