Summary - Understanding and Decoding Imagined Speech using Electrocorticographic Recordings in

This thesis was an explorative work aiming at better understanding imagined speech using electrocorticographic neural signals recorded in epileptic patients. We investigated various speech representations, such as acoustic sound features, phonemic features, and individual words. We also evaluated the ability to decode these speech features for targeting communication devices. For this, four different studies have been performed.

In Chapter 2, we reconstructed for the first time continuous acoustic features from high gamma

neural activity recorded during imagined speech. For this, we used cross-condition linear regression, and thereby extended the mathematical framework used in Pasley et al. (2012) to imagined speech. Results showed that spectrotemporal features of imagined speech were significantly reconstructed from models built from overt speech data. This highlighted that overt speech and imagined speech

Disclaimer: This chapter is adapted from the following articles – with permissions of all co-authors and journals:

Martin S., Millán J.d.R., Knight R.T., and Pasley B.N. 2016. “The Use of Intracranial Recordings to Decode Human Language:

Challenges and Opportunities.” Brain and Language. doi:10.1016/j.bandl.2016.06.003.

share a partially common spectrotemporal neural representation in the motor cortex and perisylvian areas.

In Chapter 3, we decoded continuous phoneme sequences from high gamma neural activity

recorded during imagined speech. In order to label intended phonemes more accurately during imagined speech, we designed a karaoke-like task, in which visual words scrolling on the screen were divided into their phonemic representations. Until now, isolated phonemes were successfully decoded during imagined speech (Ikeda et al. 2014; Pei et al. 2011; Brumberg et al. 2011), but these study failed to decode phoneme sequences during continuous speech. For this, we used hidden Markov models in order to incorporate both, a phoneme likelihood model and a language model. This approach has been widely used in the field of speech recognition (Rabiner 1993), and more recently in neural-based speech recognition (Moses et al. 2016; Herff et al. 2015). Here, we replicated these results, and extended the approach to imagined speech. Initial results in two patients were promising, nevertheless findings need to be extended to a larger pool of participants, in order to draw conclusions.

In Chapter 4, we classified for the first time individual words from high gamma neural time features

recorded during an imagined speech word repetition task. For this, we proposed a new approach that takes time features, and deals with speech production irregularities by introducing temporal alignment in the classification framework. Although words have been decoded during overt speech (Blakely et al. 2008), only phonemes were successfully predicted during imagined speech (Ikeda et al. 2014; Pei et al. 2011; Brumberg et al. 2011). This study represents a proof of concept for basic decoding of speech imagery, yet the results to date are not yet robust enough for a clinical communication device. The major difficulties derive from the weak signal-to-noise ratio and the lack of temporal alignment across trials. For instance, in the overt speech condition, decoding performances were increased when trials were extracted at speech onset/offset compared to when trials where extracted at cue onset. Finding behavioral or neural metrics that help defining speech onset/offset in the imagined condition would improve performances.

In these studies, we investigated imagined speech in parallel with overt speech production and/or speech perception. This allowed comparing speech representations across conditions, and integrate imagined speech into the general speech network. Results revealed complex patterns of brain activity across conditions and tasks. Altogether, the most informative areas to decode imagined speech units were located in the superior temporal gyrus, inferior frontal gyrus and sensorimotor cortex, areas commonly associated with speech. However, different tasks involve different speech production processes, ranging from lexical retrieval to phonological or even phonetic encoding (Perrone- Bertolotti et al. 2014), making it difficult to draw any conclusion about the specific function of anatomic locations. In addition, the signal was significantly weaker in the imagined speech condition than in the listening or overt speech conditions, where speech stimuli were directly observed. Finally, variability across participants in the imagined condition might reflect the subjective strategy employed by each individual to generate internal speech. In sum, it is still unclear how the content of imagined speech is processed in the human cortex.

In Chapter 5, we investigated the neural encoding of acoustic features during music imagery. This

study relied on an extremely rare clinical case in which a patient undergoing neurosurgery for epilepsy treatment was also an adept piano player. Evidence has shown that music and speech share common brain networks (Schön et al. 2010; Callan et al. 2006), and therefore helped understanding features of inner subjective experiences. While previous brain imaging studies have indicated anatomical regions active during auditory imagery (Zatorre et al. 1996; Griffiths 1999; Halpern and

Zatorre 1999; Rauschecker 2001; Halpern et al. 2004; Kraemer et al. 2005), it was unknown how fine- scale neural tuning of sound frequency were represented. This study provided a unique opportunity to apply receptive field modeling techniques to quantitatively study neural encoding during music imagery. Results showed that music perception and imagery share partial neural encoding mechanisms, a feature common to speech neural activity. Furthermore, these findings also demonstrate that receptive field and decoding models – typically applied in neuroprosthetics for motor and visual restoration – are now applicable to auditory imagery. This represents a major advance with direct application to the field of neural interfaces for restoration of communication

In document Understanding and Decoding Imagined Speech using Electrocorticographic Recordings in Humans (Page 93-95)