voice and extended system
Findings from neurotypical populations (Belin et al., 2000; von Kriegstein et al., 2003; Bestelmeyer et al., 2011; Pernet et al., 2015) and the acquired and developmental phonagnosia cases reviewed throughout support the temporal lobe as a key structure in the core-voice system implicated in the stage of perceptual voice-identity analysis and voice-identity recognition. For
instance, lesions in the mid/posterior temporal lobe are associated with a selective impairment in voice-identity recognition (Roswandowitz et al., 2018).
What brain regions support specific stages of the voice-identity processing model? Lesions in the temporal lobe of the core-voice system, predominantly in the right hemisphere, have been consistently associated with impaired apperceptive voice-identity processing abilities, (Assal et al., 1976; Assal et al., 1981; Van Lancker et al., 1988; Van Lancker et al., 1989). Unfortunately, detailed lesion descriptions were not reported in these cases. In the case of AS, a developmental apperceptive phonagnosic, atypical responses in the auditory cortex (Heschl’s gyrus) and posterior part of the temporal lobe (STG) were found (Roswandowitz et al., 2017). Together, these findings highlight the importance of the temporal lobe, in particular the posterior part, and auditory cortex, in supporting the perceptual analysis of the voice (Figure 1A, Stage I, Figure 1B).
Based on findings on neurotypical populations the next stage of voice-identity processing, i.e. voice-identity recognition (Figure 1A, Stage II) is likely supported by the anterior/mid part of the right temporal lobe (Belin and Zatorre, 2003; von Kriegstein et al., 2003; von Kriegstein and Giraud, 2004; Andics et al., 2010) within the core-voice system. The two lesion studies, which have assessed familiarity decision confirm the right-hemispheric involvement in familiarity decisions, but unfortunately do not provide detailed lesion descriptions (Neuner and Schweinberger, 2000; Lang et al., 2009).
The subsequent stage of associative voice-identity processing, i.e. semantic association, may be supported by interactions between the core-voice and the extended system (Figure 1A, Stage III). Studies on acquired and developmental phonagnosia support the view that the anterior temporal lobe (Hailstone et al., 2010, 2011; Roswandowitz et al., 2017), the amygdala (Roswandowitz et al., 2017) and the vmPFC (Xu et al., 2015) may serve as potential candidates of the extended system (for review see Blank et al., 2014). Currently, the evidence that semantic- associative phonagnosia may result from dysfunctional connections between the core-voice and extended system rests on the fMRI findings from the case of SP (Roswandowitz et al., 2017). No other study has to date assessed the structural integrity or functional connectivity between these systems in phonagnosia cases. Note that direct damage to the extended system, rather than altered connectivity between the core-voice and extended system, would likely result in a multi- modal (i.e. non-voice selective) person-recognition disorder.
Dissociations between stages of the voice-identity processing model
Findings from patients suffering brain damage (Van Lancker and Kreiman, 1987; Van Lancker et al., 1988), neurodegenerative diseases (Hailstone et al., 2010; Hailstone et al., 2011), individuals with ASD (Schelinski et al., 2016b), and cases of developmental phonagnosia (Roswandowitz et al., 2014; Roswandowitz et al., 2017) suggest a double dissociation between apperceptive and associative voice-identity processing abilities. There are reports on intact perceptual voice-identity analysis (Figure 1A, Stage I, apperceptive processing) and impaired familiarity decision and semantic association (Figure 1A, Stage II, III, associative processing) and vice versa. Also different lesion locations have been associated with impaired apperceptive and associative processes respectively.
In contrast, there is to-date only limited evidence for a double dissociation between the two stages of associative voice-identity processing — familiarity decision (Figure 39.1A, Stage II) and semantic association (Figure 1A, Stage III). Only a few studies compared both abilities intra- individually in cases with presumed associative voice-identity processing deficits (Garrido et al., 2009, Hailstone et al., 2010, 2011, Roswandowitz et al., 2014). Most of these cases showed marked overlapping impairments in familiarity decision and semantic association (Garrido et al., 2009, Hailstone et al., 2010, 2011, but see case SP, Roswandowitz et al., 2014).
Phonagnosia: Modality specific and cross-modal interactions
Acquired and developmental cases of phonagnosia confirm dissociable processing streams for person identification by voices, face, and names (Assal et al., 1981; Van Lancker and Kreiman, 1987; Neuner and Schweinberger, 2000; Roswandowitz et al., 2018). However,
interacting mechanisms primarily between voice- and face-identity have been found as well. For instance, in acquired phonagnosia a deficit in voice- and face-identity processing tends to co- occur (Van Lancker and Canter, 1982; Hailstone et al., 2010; Hailstone et al., 2011). This could either suggest that the lesions commonly affect regions that process voice and face information independently, or alternatively it could suggest that there is a possible overlap in the neuroanatomical mechanisms, which support face and voice processing. Interestingly, cross- modal interactions between the face and voice regions have been observed in neurotypical
populations (e.g. von Kriegstein et al., 2008; Blank et al., 2011). These interactions appear to be behaviourally relevant as voice recognition is often enhanced when the speaker has been previously learned by face (von Kriegstein and Giraud, 2006; Schweinberger et al., 2007; O'Mahony and Newell, 2012; Schall et al., 2013). Taken together, these findings suggest that phonagnosia can be modality specific. However, for potentially facilitating purposes, some degree of overlap, and or interactions between, the processing of faces and voices in both the typical and atypical brain is evident (for review see Maguinness and von Kriegstein, 2017).
Phonagnosia: Relations within the auditory modality
Phonagnosia findings also confirm separate pathways within the auditory modality. Cases of acquired and developmental phonagnosia have been described with intact speech and vocal- emotion recognition (Assal et al., 1976; Assal et al., 1981; Garrido et al., 2009; Lang et al., 2009; Hailstone et al., 2010; Roswandowitz et al., 2014). However, although such cases have been identified the findings are far from homogeneous. For example, in lesion studies, LBD patients with aphasia showed both intact (Assal et al., 1976; Assal et al., 1981; Lang et al., 2009) and impaired (Van Lancker and Kreiman, 1987; Van Lancker et al., 1988; Van Lancker et al., 1989) voice-identity processing. Also in developmental phonagnosia, cases with impaired (Garrido et al., 2009) and intact (Roswandowitz et al., 2014) speech processing have been described. In neurotypical populations, interacting mechanisms between voice-identity and speech processing have been proposed. For example, voice-identity processing is facilitated when the speaker’s language is familiar, rather than unfamiliar, to the listener (Perrachione et al., 2011; Bregman et al., 2012; Fleming et al., 2014). Also speech content is more easily recognised when the speaker is familiar, rather than unfamiliar, to the listener (Nygaard and Pisoni, 1998; Levi et al., 2011). Currently, whether in phonagnosia voice-identity and speech mechanisms do interact or are dissociable remains somewhat elusive. Anecdotal reports of phonagnosic participants suggest that they can rely on the way of speaking and speech content to recognise voice identity. We speculate that in phonagnosia, for instance, information about the speech content (e.g. reference to a past common situation) could help to recognise a speaker’s voice identity. On the other hand, the facilitative effect of voice familiarity for speech recognition might not be available and could potentially explain difficulties with speech recognition in phonagnosia in certain cases, e.g. in KH for speech in noise (Garrido et al., 2009). Such interactions would be in accordance with recent suggestions on potential neural mechanisms for
interaction between speech and voice processing (von Kriegstein et al., 2010; Kreitewolf et al., 2014).