Faces - An ERP study of emotion processing within faces and voices

5 An ERP study of emotion processing within faces and voices

5.1.1. Faces

5.1.1.2. Early processing: 70 – 200ms

Early visual perception is believed to happen from 50ms after stimulus onset onwards at occipital and posterior sites and reflects visual processing within the primary and extrastriate visual cortex as well as parts of the fusiform gyrus (Di Russo, Martinez, Sereno, Pitzalis, & Hillyard, 2002). During this early visual processing, it is not thought that the brain pays much attention to the

emotional content of stimuli (e.g. Krolak-Salmon et al., 2001) and common onset of emotion recognition seems to be between 200-400ms after stimulus onset (see below). On the other hand, Eimer and Holmes (2002) showed that fearful as compared to neutral face expressions were processed as early as 120ms after onset in frontocentral sites with increased positivity for fearful faces. Batty and Taylor (2003) have also reported emotion- related waveforms that were evident as early as 90ms after onset of emotional faces – this, however, did not differ for specific emotions in question.

A negative peak at around 170ms (N170) after stimulus onset has also previously been associated with emotion processing.

166 This lateral negative peak at occipito-temporal sites between 140 and 200ms is commonly larger for faces compared to other visually evoked responses and may have a counterpart with positive polarity at more central sites (Bentin, Allison, Puce, Perez, & McCarthy, 1996; Eimer, 2011). Kanwisher et al. (1997) originally reported a brain area called the fusiform face area (FFA) that specializes in processing faces as compared to objects. The N170 activity is thought to reflect this face-processing activity in lateral occipito-temporal regions such as the occipital and fusiform face area as well as superior temporal sulcus (Eimer, 2011).

The role of the N170 component in regards to emotion processing in faces has not yet clearly been demonstrated. Several studies have reported that early face processing in the N170 time-range is independent of emotional content. For example, the N170 did not show emotion-specific responses and ERP amplitudes as well as latencies were comparable for different emotional as well as neutral faces in two studies conducted by Eimer and Holmes (2007) as well as Eimer, Holmes and McGlone (2003).

On the other hand, studies have demonstrated that early face processing can also be affected by emotional expressions. For example, Blau, Maurer, Tottenham and McCandliss (2007) reported an enhanced deflection at 170ms after stimulus onset for fearful compared to neutral faces. Marinkovic and Halgren (1998) also found emotion effects in faces as early as 170ms after stimulus onset at temporal sites for positive versus neutral faces. Batty and Taylor (2003) demonstrated that N170 waveforms even differed depending on the very specific emotion displayed. For example, at 140ms after stimulus onset, there was a later effect for fearful and disgusted faces as compared to positive emotions such as happy or surprised faces.

In support, Pizzagalli, Lehmann, Hendrick, Regard, Pascual-Marqui and Davidson (2001) used tomographic source localisation to report early activity within the fusiform gyrus – which has previously been associated with face encoding mechanisms (Bruce & Young, 1986) – at around 160ms post stimulus onset during the processing of liked versus disliked faces. Further, recent neuroimaging evidence also suggested shared activity in the posterior STS during the processing of face expression as well as face identity (Baseler, Harris, Young, & Andrews, 2014). This suggests that the STS may also modulate the perception of emotion to a certain degree. However, it is important to note that there are also studies that argue that the earliest signs of emotion processing are only visible after 200ms (e.g. Krolak-Salmon et al., 2001; Marinkovic, & Halgren, 1998) and hence separated in time from general face perception.

5.1.1.2. Late processing: 300-600ms

Neural processes which reflect processing for individual emotions may only reliably be found during processing stages that include higher cognition such as situation appraisal processes. Indeed, previous studies have reported emotion-specific effects at even later latencies, starting at around 400ms.

167 For example, Krolak-Salmon et al. (2001) reported different ERP waveforms for happy and fearful versus disgusted faces at around 550 to 750ms after stimulus onset in right posterior-temporal brain regions. Additionally, later waveforms for disgust seemed to be emotion-specific and in this case appeared between 700 and 950ms in more frontal regions. According to Krolak-Salmon et al. (2001), deep subcortical emotion structures such as the amygdala or basal ganglia may be responsible for a more wide-spread activity distribution on the scalp and even feed back to the extrastriate cortex in a top-down manner (see also Sato et al., 2001).

However, it is debatable how reliable those apparent emotion-specific ERPs really are. Eimer et al. (2003) for example did not find any evidence at all for emotion-specific ERP waveforms. Instead, it was suggested that - although all six basic emotions demonstrated typical ERP waveforms with an early frontocentral (120-180ms after stimulus onset) and a later posterior deflection (250- 1000ms after stimulus onset), this pattern of waveform was comparable across the six basic emotions. In contrast to emotion-specific ERP waveforms reported by studies cited above, results from Eimer et al. (2003) hinted towards a more emotion-general emotion processing mechanisms, independent of the presentation of specific emotions.

Following previous – partly conflicting – results, the debate remains about the exact timing of emotion recognition onset. Some studies have reported early differentiation which may even occur before the face-specific N170 response (Batty & Taylor, 2003) whilst others report late emotion processing that follows structural encoding in a hierarchical manner (Sato et al., 2001). Secondly, it is not clear when time-critical processes within the human brain start to categorise between distinct emotion-categories. Assuming modality-independent emotion mechanisms as suggested by previous behavioural chapters, it is of considerate interest to investigate whether a second, independent modality such as non-verbal affect bursts also shows distinct temporal stages for emotion processing that may be comparable to neural patterns within the face.

5.1.2. Voices

5.1.2.1. Early processing: 70 – 200ms

The auditory signal rapidly travels from the ear to the thalamus and the primary auditory cortex (Goldstein, 2001). Early auditory processing happens within the first 50ms and reflects activity within medial geniculate nucleus and primary auditory cortex (Luck, 2005). Initial acoustical analysis of sounds is often reportedly found at fronto-central sites from 50ms onwards. In order to specific an ERP time-window that is specific to the auditory processing of human voices, Charest and colleagues (2009) have compared the temporal dynamics of processing human voices, bird voices and

168 They reported an increased positivity at frontal sites which was typically accompanied by a posterior negativity that peaked at occipital sites at 164ms after stimulus onset and which was specific to human voices. This early voice-sensitive peak has been referred to as fronto-temporal positivity to voices (FTPV) and its latency can be compared to the face-sensitive N170 component which typically occurs in a similar time range (Charest et al., 2009). The FTVP is thought to reflect temporal voice areas within the right anterior superior temporal sulcus (STS, Charest et al., 2009) which may reflect the auditory ‘what’-pathway that connects the anterior superior temporal gyrus with the orbitofrontal cortex (Rauschecker & Tian, 2000).

Similarly to the N170 component during face processing, the important question emerges whether components in the time-window that is thought to process human voices is also modulated by emotional content. In support of pre-emotional processing of voices during the first 100-200ms, Chronaki et al. (2012) found a negative deflection between 90 and 180ms specific to human voices in 6 to 11 year old children which did not differ as a function of emotional prosody.

However, there is also evidence which suggests earlier processing of emotional information in human voices. Iredale, Rusby, Mcdonald, Di Marco and Swift (2013) reported that the early voice processing stage may in fact already be modulated by emotional content: Waveforms differed for emotional versus neutral voices in parietal areas as early as 100ms after stimulus onset. At this early stage, however, there was no evidence for differentiation between different classes of emotions. Similarly, Sauter and Eimer (2009) also suggested early processing of fear, achievement and disgust versus neutral affect vocalisations which were reflected in an enhanced positivity at fronto-central electrodes between 150 and 300ms. Overall, as pointed out by Eimer and Holmes (2002), this early processing of emotional voices from 150ms onwards is comparable with data from emotion face processing that shows early emotional versus neutral differentiation at around 150ms after stimulus onset.

5.1.2.2. Late processing: 300 – 600ms

Following previous findings, it does not seem likely that differentiation between specific emotion categories such as happiness or anger happens before 200-330ms after stimulus onset. A study that investigated non-verbal affect vocalisations as compared to semantically or verbal loaded stimuli suggested emotion differentiation between distressed or joyful exclamations from 300ms onwards (Bostanov & Kotchoubey, 2004). Similarly, in 6 to 11 year old children, Chronaki et al. (2012) found a larger effect between 380 and 500ms to angry versus happy or neutral voices.

After establishing whether a voice is of emotional significance within the temporal voice area at around 150 - 200ms, the acoustical signal is then being passed on to more frontal areas like the prefrontal cortex.

169 This information transmission might reflect travel along the auditory ‘what pathway’ to prefrontal areas for higher cognitive analysis such as context interpretation (Rauschecker & Tian, 2000). Due to conscious analysis of the emotion stimulus, it may now possible to actively discriminate between several emotional states. Indeed, this statement has been supported by Iredale and colleagues (2013) who reported an enhanced negative deflection for happy versus angry voices at frontal sites between 400 and 650ms which possibly indicated reallocation of cognitive resources.

Following the review on auditory emotion processing, the exact timing onset of emotion processing within the human brain remains unidentified. Previous studies have suggested early emotion-processing that coincided with the general human-voice processing response (Eimer & Holmes, 2007) whilst other reported latencies up until 300ms after stimulus onset crucial for emotion processing (Paulmann & Kotz, 2008). Further, individual emotion discrimination did not seem to happen before 300-400ms post stimulus onset (Iredale et al., 2013).

In document Emotion recognition in the human face and voice (Page 165-169)