BCI technologies are used as communication-assistive techniques in two different ways: to control spellers and for speech imagination. Figure 2.5 lists various BCI activities and their utilisation for communication purposes. For BCI-speller applications, non- invasive EEG technologies are used to elicit ERPs activities to control the spellers. In addition, some applications use motor-imaginary activities to control spellers. For example, hands and feet movements are often used for letter selection and for the undo function. For speech-imagination studies, both technologies (invasive and non-invasive) have been used to measure brain activities related to speech. For invasive BCI, several researchers have used ECoG in order to gain greater insights into the brain areas related to speech. ECoG can be used to retrieve accurate information in terms of time and spatial resolution, which is promising for the direct translation of brain activities into text or speech without having to average brain signals. Earlier studies used non-invasive BCI (such as EEG) to recognise a limited number of words, syllables, or letters. Some of these studies included other types of signals such as the aforementioned EMG and EOG to help in the recognition process.
This chapter has reviewed various approaches for the development of BCI appli- cations. The basic building units of a BCI are the brain-signal measuring unit, the pre-processing and feature-extraction units, various classification algorithms, and exper- imental protocols. For non-invasive techniques for brain-signal gathering, EEG is the most widely used approach of those examined. The chapter has also presented a number of neuro-physiological signals that are commonly employed to drive EEG-based BCIs, with a focus on the patterns that appear alongside the techniques for communication purposes.
The focus of the research described in this thesis is to use EEG technology as an input technology using unspoken speech. The literature related to this area is summarised in the next chapter.
2.6 Summary 33 BCI Invasive ECoG Non-Invasive EEG ERPs P300 SSVEP N200 SCPs ERDs Speech Imagination
Word Imagination Vowel Imagination Syllable Imagination Motor Imagination fMRI
Spellers
Speech
Brain Computer Interface for
Unspoken Speech Recognition
Chapter 2 provided an overview of brain-computer interface (BCI) technologies as well as describing examples of BCI applications for communication purposes. The first category of BCI studies for communication is for controlling spellers (i.e. computer spelling devices). The second category is unspoken speech recognition (see section 2.6). Unspoken, imagined, or cvoiced speech can be defined as what occurs when subjects are asked to imagine the pronunciation of words as if they were pronouncing them aloud but without any articulatory movements.
The research reported in this thesis focuses on unspoken speech recognition from EEG signals. Researchers first began to express interest in understanding speech from EEG signals in 1997. (Suppes et al., 1997) was the first study of word recognition using EEG and MEG provided promising results about the speech-related information included in brain signals. Between 2006 and 2015, only a few studies examined the possibility of recognising unspoken speech using both invasive and non-invasive approaches. Some of these research studies were not conducted in English, however, while other works were restricted to a small number of subjects or a limited number of recognised parts of speech. Since 2017, the field has shown renewed interest in exploring different aspects of unspoken speech recognition in terms of the part of speech (vowel,
3.1 Invasive Electrocorticography (ECoG) 35
syllable, or word) that is examined, feature extraction techniques, and classification algorithms, in addition to various experimental factors that are often used to improve recognition rates.
Unspoken speech is very close to the natural way of communicating. The growing body of literature on the subject reflects the importance of examining this type of brain activity and of showing the potential of further improvements in recognition rates. This chapter reviews previous studies that have used BCI technologies (both invasive and non-invasive) for speech recognition. The chapter explains the methodologies the various authors followed to conduct their studies and reports on the results. The studies are categorised based on the sensors that were used to measure brain activities as well as by the different types of imagined speech that were performed. The end of the chapter includes a summary of the state of the art in this field as well as discussing study limitations. Part of this chapter stems from (AlSaleh et al., 2016) paper, which reviews studies on the subject between 2006 and 2016.
3.1
Invasive Electrocorticography (ECoG)
Section 2.4.1 of this thesis described invasive ECoG-BCI technology. In the literature there are several studies related to the understanding of spoken/unspoken speech from ECoG signals. Several studies have examined ECoG-derived brain patterns to decode audibly pronounced speech (Blakely et al., 2008; Kellis et al., 2010; Mugler et al., 2014; Zhang et al., 2012), understand the speech-production process in voiced and unspoken speech (Leuthardt et al., 2012; Lotte et al., 2015), speech perception, and feedback processing and understanding (Crone et al., 2001; Pasley et al., 2012).
For unspoken speech, in Guenther and Brumberg (2011) early study, two ECoG electrodes were implanted in one participant’s motor cortex area, which the authors selected based on a previous study the same researchers had conducted in 2004. This area is assumed to be connected to articulatory movement. Following the presentation of a stimulus, the participant attempted to speak, and a synthesiser was used to
generate formants, vowels, and transitions. In this way, over the course of 25 sessions, the subject produced patterns and achieved a 70.00% success rate after 15 to 20 trials per session. Guenther and Brumberg (2011) results thus implied the clear possibility that direct BCI could be used for the direct synthesis of formants.
Pei et al. (2012) examined whether or not it was possible to determine the vowels and consonants of spoken and imagined words following visual and audial stimuli using ECoG signals. To answer this question, the authors examined four experi- mental conditions (visual stimuli/actual spoken, audial/actual, visual/imagined, and audial/imagined), with four possible vowels sounds (/ë/,/æ/, /i:/, or /u:/) and conso- nant pairs (/b_t/, /c_n/, /h_d/, /l_d/, /m_n/, /p_p/, /r_d/, /s_t/, or /t_n/) among 36 words. The findings showed that the brain areas activated during actual speech include the motor cortex, Broca’s area, and the posterior superior temporal gyrus. In imagined speech, in contrast, two small foci in the temporal and frontal re- gions were found to have been activated. The results were promising, with classification accuracy rates of 55.00% in some cases among the four above-mentioned vowels.
Martin et al. (2016) recently conducted a study in which the authors used ECoG in the binary classification of words recorded in three different modes. First, the participants listened to a word, voicedly pronounced the word, or imagined the word. The words from each mode were classified independently. Six words were used: “spoon”,
“cowboy”, “battlefield”, “swimming”, “python”, and “telephone”. These words were
selected to have high variability in terms of semantic and acoustic features and numbers of syllables while still varying word length. The classification algorithm used in the study was improved using a nonlinear alignment algorithm to overcome the temporal variations the authors found between two trials (for the same word), which may have been caused by delays in the starting of a task or by differences in pronouncing/imagining words. The authors’ proposed solution was to classify the high gamma features using a SVM with a dynamic time warping (DTW) kernel to align the features in a non-linear manner. The researchers found that their results were as expected. The classification rates among the listening and voiced speech tasks were high (listening: mean = 89.40%,