• No results found

6.7

Discussion and conclusions

This chapter presents the first use of DTW for improving the classification of imagined speech via EEG signals. The classified patterns consisted of five imagined words: “left”,

“right”, “up”, “down” and “select”. The data acquisition from 10 subjects was performed

using a wireless EEG device with only 14 channels. Each word was recorded using two different trials separation methods: mouse-click separation and fixed-time separation. The proposed framework based on DTW was evaluated in comparison to TD features and time-frequency features, CSP, as well as some modifications to the proposed framework. These comparisons were performed using four classification algorithms: support vector machine (SVM), naive Bayes (NB), random forests (RF) and linear discriminant analysis (LDA). The experimental assessments involved discriminating between speech and non-speech and discriminating between classification of the five imagined words.

In summary, the proposed DTW feature extraction framework outperformed the TD features in the experiments 1 to 4. In experiment 5, the proposed DTW framework for the imagined words classification using LDA outperformed DDTW. In experiment 6, the proposed DTW feature extraction framework outperformed the classical DTW used in audio-speech recognition. In experiment 8, the proposed DTW framework outperformed time-frequency features in classifying imagined speech versus silence. In contrast, the average classification accuracy for classifying the five imagined words was higher with the statistics-DWT than the proposed DTW.

In comparison to the other TD features, DTW matches EEG signals without having to average or remove any parts from the signal. Moreover, this mapping is not one-to-one in which the variations between the start and the end are considered. This makes the resulting distance reflect the level of similarity between the compared EEG patterns. In comparison to RWE-DWT, calculating the relative wavelet energy from the DWT coefficients, as in (González-Castañeda et al., 2017), reduced the temporal information amount. It also reduced the effectiveness of the DWT coefficients, as discussed in (Yohanes et al., 2012). Comparably, statistics-DWT (RMS and SD on

the DWT coefficients) reflects the important role of frequency information in the classification of imagined words.

Three modifications were applied to the proposed DTW framework. One modifica- tion was using a derivative DDTW. Another modification was removing outlier training trials. The third modification was using EEG data from all channels as one input in the computation of DTW distance. In DDTW (experiment 5), the computation of the derivative of the signal amplified the noise. Since EEG is known to have a very noisy signal, this decreases the classification accuracy for the LDA and SVM classifiers. The removal of the outlier trials was suggested to enhance the performance of the proposed framework (experiment 7). The enhancement was clear in the SVM classifier results. However, it was not significant in the other classifiers. The last proposed modification was to use EEG from all the channels as a signal input to the DTW. The results showed that the proposed framework performed significantly better when the complexity of the generated brain signals cannot be compared with speech signals.

In terms of the differences in the results based on the trials separation method, the mouse-click separated data had better average classification accuracy for speech and non-speech than the fixed-time separated data. However, the fixed-time separated data provided better classification of the five imagined words. This could be due to two factors. First, in the classification of speech and non-speech, the difference between the two classes also includes the differences between the imagined words. In terms of results, the variation in the imagination length of speech and non-speech would help improve the classification accuracy. Second, for word classification, increases in the length of imagination time provides extra information (padding) to the signal (see Chapter 5, Section 5.3.2). This helps in distinguishing among the five words. Moreover, the mouse-click separated data included extra patterns, beyond word imagination patterns. These patterns were intended to perform the mouse click at the beginning and end of the task. In addition, muscle movements occurred during the clicks. These movements were very small; the usual time needed for adults to perform mouse clicks is 100 ms (Komandur et al., 2008).

6.8 Summary 131

6.8

Summary

The recognition of unspoken speech could be the most intuitive type of brain-computer interface for people with severe speech disabilities. Consequently, researchers are increasingly interested in classifying different types of unspoken speech from EEG signals. However, the time variations of imagination reflected in EEG signals have not been considered in previous studies. These variations, caused by differences in the starting time and duration of the imagined words, could have a detrimental effect on classification accuracy. In this chapter, for the first time, these temporal variations were investigated and minimised using a DTW-based framework. In this technique, the distances between the imagined words after warping by DTW are used as classification features. The proposed DTW framework was evaluated using EEG data collected from 10 subjects who imagined five different words. The evaluation involved discriminating between imagined speech and silence. It also involved discriminating between five imagined words from two data sets based on different trial separation methods (mouse-click separation and fixed-time separation).

10 experiments were conducted. These compared the classification accuracy results from the proposed DTW features and state-of-the-art features, and three modifications to the proposed framework (Table 6.1). The results show that the DTW-based framework outperformed all the discussed state of the-art feature extraction algorithms in classifying imagined speech versus non-speech. The proposed framework also outperformed the TD features in classifying the five imagined words. The justifications of these results were described in Section 6.7.

Conclusions

This chapter summarises the findings from the research reported in this thesis with respect to the objectives listed in Chapter 1. It then outlines its contributions to the imagined speech research domain. Finally, it offers directions for future work inspired by the results of the experiment performed so far.

7.1

Reviewing thesis scope and main findings

The research described in this thesis was motivated by the need to understand and alleviate several limitations in the recognition of imagined speech using EEG signals; it had three primary objectives:

1. Improving the discrimination between speech and non-speech.

2. Optimising a computational model to improve the classification between the imagined words by examining several temporal variations in the recognition model. This involved using EEG pattern separation methods, establishing different time intervals and examining the effect of word length in the recognition.

3. Improving imagined speech recognition by reducing the variations between EEG trials using the dynamic time warping (DTW) algorithm.

7.1 Reviewing thesis scope and main findings 133

Several steps (as presented in the related chapters) were used to achieve these objectives: The first stage in this research was to identify the major challenges and limitations in recognising imagined speech research studies. Chapter 3 presents a literature survey on the studies in the context of imagined speech recognition using brain–computer interface technologies. It concentrated on studies of EEG signals (the technology of interest in this thesis). The main conclusions from this chapter can be summarised as follows:

• The research on imagined speech recognition using EEG signals is a relatively new research domain. The first study to recognise imagined words was conducted by Wester (2006). The research studies conducted from 2006–2016 (before the beginning of this research) had limited results due to a lack of available datasets. After this, interest in the research domain increased, and several methods and results emerged.

• Compared to other applications for EEG (motor imagination), prior research had inconsistencies in their experimental design and data collection methods. • Most of the studies focused on recognising speech stimuli based on phonological

differences.

• The studies had a limited understanding of the recognition of imagined speech compared to non-speech task.

• Similar to other applications of BCI, there was a limited understanding of the contributions of temporal information to improving BCI recognition.

In Chapter 4, the first objective was achieved in terms of classifying imagined speech versus non-speech tasks. EEG data were collected from nine subjects during the imagination of semantically varying words. The literature presented evidence of the impact of word semantics on brain signals. The non-speech tasks asked participants to concentrate on visualised stimuli on a screen (the presentation of ‘+’ and the presentation of the word) and silence time. The data analysis involved examining

time domain (statistics of EEG) and spatio-spectral features (filter bank common spatial patterns) at different time intervals using different classifiers. The classification accuracies were examined for each word and for groups of words compared to non- speech tasks. The results showed differences in classification accuracy for different subjects and different features.

To achieve the second objective, Chapter 5 described the examination of important temporal experiment parameters in designing imagined speech recognition experiments. EEG data related to the imagination of five words were collected from 10 subjects. For each subject, each word was recorded with two different trial-separation methods: mouse-click separation and specified time frame separation. The experimental aspects examined in the study showed that the specification of long time frames provided distinguishable EEG patterns. The increase in training size also improved classification accuracy, although these improvements lessened after a certain training size. In addition, if the recording was performed at different times in the session, then the training size increased. Finally, the examination of imagination time length showed that this length could be used as a classification feature. The classification was significantly higher than the level of chance.

Chapter 6 presented the development of a novel feature extraction framework based on examining and reducing the temporal variations between EEG trials using DTW. The classification accuracy of the five imagined words and between speech versus non-speech tasks using the features extracted from the framework were compared. They were examined against time domain (maximum cross-correlation and EEG statistics) and time frequency (relative wavelet energy calculated from discrete wavelet transform coefficients, wavelet transform coefficients and common spatial patterns). The classification accuracy using the proposed framework outperformed the compared features in the classification of speech versus silence. Further, it outperformed time domain features in classifying the five imagined words. Several modifications to the framework were proposed and compared to the main developed framework. These modifications included examining DTW in the same approach used for audio speech