Role of Modality - An ERP study of emotion processing within faces and voices

5 An ERP study of emotion processing within faces and voices

6.1. Role of Modality

For humans, it is a necessity to be able to recognise emotions when we only have access to one modality such as in darkness or in noisy situations. Across all three behavioural chapters, the present thesis concludes that participants rated emotions from separate modalities with more

similarities than dissimilarities. The similarity across modalities became especially evident during the analysis of confusion matrixes in Chapter 2 and 3. For each participant, rating patterns for each emotion were highly correlated across modalities and this correlation became stronger with age whilst younger children showed a face-preference.

Similar pattern of emotion perception across modalities questions the reliance on perceptual features during emotion classification. Instead, the researcher proposes the idea that there may be similar underlying emotion mechanisms which are shared across modalities and guided by cognitive appraisal such as novelty and pleasantness (Scherer, 1984). This may be despite the great perceptual difference in sensory inputs from faces and voices. An emotion-general core network that acts

independently of modality has previously been proposed by Peelen et al. (2010). In addition, neuronal structures such as the amygdala also seem to process emotions independently of their perceptual features (Adolphs, 2002; Phillips et al., 1998). The present findings suggest that not only person- identity from voices and faces (Yovel & Belin, 2013), but also emotion recognition from voices and faces may share common underlying coding mechanisms. Additionally, there may also be a

perceptual interdependence of physical features during emotion production. For example, muscular activation of the mouth and tongue, which reflect functionally adaptive behaviours such as vomiting, not only creates the typical face expression of disgust but also activates musculature in the upper vocal tract, creating typical vocal expressions of disgust (Scherer, 1994). This may explain the similarity of rating patterns of emotions across two separate and independent modalities.

This modality-independent rating pattern may, however, be a feature of adult emotion recognition whilst children during years at primary school relied more on facial than on auditory emotion information. Further, cultural differences may affect the strength of modality similarities. The

197 similarity across modalities was weaker within the German adult sample as there was a face reliance in emotion recognition, especially for German men. Children on the other hand did not seem to differ in their degree of modality similarities across cultures and culture-independentearly face reliance with later modality-independent processing was evident in both countries.

What could be the benefit of modality similarities during emotion recognition? In real life, emotions in faces or voices may not always be expressed in isolation. In other words, it is very common that emotions are expressed multimodally and simultaneously across modalities. In order to make sense of our environment and everyday life, we need to successfully integrate emotion cues from several independent modalities such as faces or voices. Multimodal emotion recognition is believed to be superior to unimodal emotion recognition from single modalities (Ethofer et al., 2006) and attending to one modality may change the perception of another modality in situation where visual and auditory inputs don’t match (Collignon et al., 2008).

The benefit of a possible shared emotion mechanism across faces and voices could be the successful integration of modalities in shared brain regions such as the STS (Kreifelts, Ethofer, Shiozawa, Grodd, & Wildgruber, 2009). Indeed, research into congruent and incongruent emotion cues across modalities hints at a strong link between processing emotion signals from faces and voices that cannot simply be switched off – in line with the idea of a shared emotion mechanism. For

example, De Gelder and Vroomen (2000) suggested that the recognition of emotions expressed in faces was influenced by the simultaneously presented emotion expressed in voices and vice versa. Purposely attempting to focus on one channel alone did not break this mandatory link between processing of emotions from faces and voices simultaneously. Similarly, Ethofer et al. (2006) suggested the bidirectional link between two separate modalities during congruent emotion perception. So called cross-modal effects were visible when fearful faces were presented with a simultaneous fearful voice, enhancing the emotion perception of fear by integrating congruent emotion signals from two separate modalities. This enhanced emotion perception of fear across two modalities was also reflected in an enhanced hemodynamic response in the emotion structure amygdala during an fMRI study.

In terms of incongruent emotion perception, the recognition of ambiguous face or voice

expressions was influenced by the simultaneous presentation of bodily expressions of emotions (van den Stock, Righart, & de Gelder, 2007). Collignon et al. (2008) suggested that for incongruent emotion pairs, visual emotion signals commonly overrode auditory emotion signals – although this was less common if the visual signal was unreliable and noisy. This carry-over effect of incongruent emotions across modalities suggests a link between emotion signals from faces and voices that influence the bidirectional perception. This in turn is consistent with the idea of a shared emotion network, independent of modality.

The superior temporal sulcus is believed to be involved in dynamic information from faces such as gaze and face expressions (Bruce & Young, 1986; Haxby et al., 2000). Interestingly, parts of

198 the STS are also believed to include temporal voice areas (Charest et al., 2009) which are active during processing of auditory voice information such as prosody. This shared use of neural structures during emotion recognition across modalities may enable the integration of information from several modalities. Indeed, the STS is commonly associated with multimodal emotion recognition: For example, Hagan, Woods, Johnson, Calder, Green and Young (2009) suggested increased activity in posterior regions of the STS during the combined processing of static faces and non-verbal emotion vocalizations. Kreifelts, Ethofer, Shiozawa, Grodd and Wildgruber (2009) reported a functional segregation of emotion processing in the STS with the trunk showing voice-selective activation whilst the ascending branch showed face sensitive activation. Grippingly, in the middle regions of the STS that spatially overlapped between face – and voice-selective regions, activity during audio-visual emotion recognition was recorded.

Hence, it is possible that the similarities of emotion recognition across separate modalities - as demonstrated by the current behavioural results – smooth the way for the integration of emotions from several modalities in the STS. Alternatively, every-day exposure to multimodal emotion expressions may have strengthened the associations of emotion representations from individual modalities so that the recognition from one isolated modality is associated with the recognition from another isolated modality. This could also possibly explain how the degree of modality similarity developed throughout childhood (see Chapter 3) as the associations between emotions expressed in individual modalities become stronger with every-day experience of multimodal emotion expressions.

In document Emotion recognition in the human face and voice (Page 196-198)