4.5 Discussion
6.4.1 Neural Oscillations and Speech Processing
It is well established that speech is processed concurrently on multiple timescales (Peelle and Davis 2012; Giraud and Poeppel 2012). Evidence for this account has come from investigations using psychophysical (Chait et al. 2015), invasive electrophysiological (Lakatos et al. 2005), non-invasive electro- and neurophysiolo- gical (Gross et al. 2013; Ding et al. 2016; Teng et al. 2017), haemodynamic (Davis and Johnsrude 2003; Obleser et al. 2007), and computational modelling methodo- logies (Santoro et al. 2014). Initial accounts focussed on the high correspondence between repetition rates of key linguistic features and prominent neural oscillat- ory frequency bands (Giraud and Poeppel 2012). Of particular note are the theta and low gamma frequency bands as they closely correspond to the syllabic (∼4 Hz) and phonemic (∼40 Hz) rate of speech. It was proposed that this corres- pondence was a reflection of processing using different length temporal sampling windows (Poeppel 2003). Further, this processing was proposed to be asymmetric with each cerebral hemisphere exhibiting a preference for sampling windows of different lengths.
Since these early accounts, complementary explanations have emerged for the relevance of neural oscillations to processing of auditory stimuli. It is thought that these oscillations reflect cyclical phases of low and high excitability within neural populations and their oscillatory phase aligns with rhythmic input to facilitate processing (Bishop 1932; Schroeder and Lakatos 2009; Luo and Poeppel 2007). For example, temporal alignment of oscillatory activity in the theta band to the syllabic patterns of speech is thought to play a key role in synchronised processing (Hyafil et al. 2015). At this rate, high complexity syllables are followed by low complexity periods with an average cycle length of ∼250 ms. This phase entrain- ment to the speech envelope enables the complex components to be processed when neural excitability is highest. Oscillatory activity within other frequency bands have been found to entrain to other speech aspects and nested neural os- cillations measured during presentation of speech stimuli are a possible substrate for an hierarchical framework that subserves synchronised linguistic processing
(Zoefel and VanRullen 2016; Gross et al. 2013; Ghitza 2017; Teng et al. 2017). Ding et al. (2016) measured the electrocorticographic neural response to quant- ised Chinese sentences and showed that 4 Hz activity aligned with word-level information (monosyllabic words in this case), 2 Hz activity aligned with phrase- level information and 1 Hz activity aligned with sentence-level activity. Crucially, this was shown to be dissociable from the acoustic content as the peaks at 1 Hz and 2 Hz were not present when the same stimuli were presented to non-Chinese speakers. This study reported these findings as evidence for a nested hierarchy of oscillatory activity, driven by speech linguistic content, however this is contro- versial as it is possible for a model with access to only lexical-level information to account for this neural data (Frank and Yang 2018). Further, the artificial nature of the stimuli confounds interpretation as natural speech is aperiodic and not perfectly quantised, though these oscillatory bands are still reflected in nat- ural speech processing. Despite these limitations, this is just one example of emergent evidence for neuronal oscillations in auditory areas as a functional sub- strate for the discretisation and multiplexed processing of speech (Meyer 2017; Zoefel et al. 2018) Amplitude modulations of the speech envelope may have a crucial role to play in this system. Robust tracking of the speech envelope by human auditory areas has been found and temporal modulations in critical fre- quency bands are suspected to play a key role in speech processing (Kubanek et al. 2013; Ghitza 2011). This envelope tracking may facilitate processing by enabling phase resets of delta band activity, ensuring entrainment and efficiency (Doelling et al. 2014).However, slow amplitude modulations are just one piece of the system, and their well-characterised nature may have led to an overemphasis of their importance (Obleser et al. 2012). For just one example, evidence for entrainment to higher level acoustic features of speech suggests oscillations are not purely driven by low level acoustic features (Zoefel and VanRullen 2016).
It remains controversial whether synchronous entrainment actually plays a crucial role in speech processing or is purely epiphenomenal (Zoefel et al. 2018). A train of phase-locked responses evoked to repetitive rhythmic stimuli would be
largely indistinguishable from intrinsic neural oscillations at the stimulus present- ation frequency but it has been suggested that endogenous and exogenous oscilla- tions may be functionally distinct (Meyer et al. 2018). Latest theories posit that this oscillatory activity measured by local field potentials or M/EEG may actu- ally be better characterised as synchronised burst events (van Ede et al. 2018). Neural oscillations constitute a candidate mechanism for synchronised segment- ation and processing of sensory inputs in multiple domains and are not specific to speech (Murphy 2015; Haegens and Zion Golumbic 2018; Ronconi et al. 2017). The notion that temporal sampling windows discretise continuous input is also not speech-specific and evidence for multiple key rates has also emerged in vis- ion research(Ronconi et al. 2017; Holcombe 2009). Specifically, higher frequency gamma oscillations have been linked with finer sensitivity in both vision and au- dition and this is suggested to be a perceptual consequence of a higher sampling rate (Baltus and Herrmann 2015). These similarities across multiple modalities suggest that multiplexed neural coding may be a general underlying principle of continuous segmentation and integration of sensory information.
To summarise, there is a wealth of evidence linking neural oscillatory activity and speech processing. There is mounting evidence for a hierarchical processing architecture that with nested oscillations at key frequency bands. Theta and low gamma have been closely linked with speech processing, initially due to their close correspondence to rates of linguistic speech aspects. Auditory cortex is able to continuously track the amplitude envelope of stimuli and this has been exploited in studies of neural entrainment. Current theories posit that exogenous and endo- genous oscillations may be functionally distinct and that some neural oscillations may be better characterised as synchronised bursting activity. Evidence for a hierarchical system of cascaded oscillations has been found for multiple sensory modalities, suggesting a general organisational system.