Study A Preprocessing Reassessment - Detection of microsleeps from the eeg via optimized classi

The Study A feature set used to achieve the current mean phi correlation benchmark of 0.39 (Peiris et al., 2011) had ICA performed on it bipolar EEG converted from referential EEG under the assumption that it would have less noise. However, a similar mean phi correlation of 0.38 was achieved on the same features without the use of ICA (Davidson et al., 2007). Both bipolar processed EEG and referential unprocessed EEG were examined to determine if bipolar features contained less noise. The gold standard used for the referential and bipolar EEG was the “lapse” event, so as to approximate previous work and to maximize the total number of events. The ICA-processed “clean” bipolar 544 spectral feature EEG, or Study A Bipolar ICA Spectral Power (SABIS), provided a valuable comparison against other feature set performances.

5.3.2.1 Alternative Feature Extraction

The other feature set developed from the ICA-preprocessed, artefact-pruned bipolar Study A EEG was the Study A feature set, which was known as the Study A Bipolar ICA Log Power (SABIL) feature set. Artefact pruning, as used in the SABIL and SABIS feature sets, was conducted by performing a z-score transform and rejecting any epoch greater than

30.0. While not elaborated by Peiris et al. (2011), the natural logarithmic transform of the entire power spectrum was used to calculate the spectral power of each band instead of the power spectrum estimate of the signal from the Burg algorithm (Peiris et al., 2011). Due to the lack of detail regarding differences in performance between spectral features and log spectral features, the initial tests were performed with both the SABIL and SABIS feature sets. Divergences in performance due to differing feature extraction methods were a gap requiring additional examination.

5.3.2.2 Raw Referential and Bipolar Feature Sets

The use of ICA eye blink removal and artefact pruning in the SABIL and SABIS feature sets removed an average of 578 epochs (208-1334) from each subject’s total of 7200, resulting in a potential ~16% loss of information. Two other feature sets were reconstructed from the raw EEG from Study A. A completely new set of features were generated in each case, with 34 spectral features from each of 16 referential channels. A matrix of 544 features and 3600 observations was generated for each of the two 1-hour sessions per subject. In parallel with this, 16 bipolar channels used previously (Peiris et al., 2011) were calculated. The original was “raw referential,” also known as the Study A Referential Unprocessed Spectral Power (SARUS) features. From the “raw bipolar” EEG features or the Study A Bipolar Unprocessed Spectral Power (SABUS) features, feature matrices of identical dimensions to the SARUS feature set were computed.

Figure 5.7: Study A feature set variations generation schematic

The SARUS and SABUS feature sets had both 1-hour sessions concatenated, resulting in a matrix of 544 features by 7200 observations per subject. Unlike prior work (Peiris et al.,

2011), no observations were deleted so as to better approximate a realistic scenario. The initial SARUS and SABUS feature sets were generated using the same feature extraction method as the SABIS features, linear spectral power. Variants of the SARUS and SABUS feature sets using the same log power feature extraction method as the SABIL features and additionally compared. The resulting variants were called the Study A Referential Unprocessed Log Power (SARUL) and Study A Bipolar Unprocessed Log Power (SABUL) feature sets.

Study C 5.4

Study C was examined in tandem with Study A. The Study C dataset was originally a combination of EEG and fMRI data, but only the EEG data were used to test automated microsleep detection. Originally, the study consisted of 20 subjects, but only 10 individuals with the largest number of microsleeps were analyzed further. The remaining Study C dataset (N = 10) used a 2D CTT in conjunction with video recording at 25 fps.

Figure 5.8: Study C feature generation schematic

Referential EEG from 64 channels was conducted, although several channels were discarded from each subject in preprocessing. The number of channels ranged from 30-60 per subject, with 17 channels consistent across all subjects. Additionally, ICA was performed to remove eye blink artefacts and overhead noise was filtered out. The primary feature set on Study C was a referential EEG dataset, Study C Referential ICA-Processed Spectral Features (SCRIS). Documentation of events, such as microsleeps, was also more meticulous (Poudel et al., 2010).

5.4.1 Study C Gold Standard

Data from Study C was examined utilizing similar formatting, feature extraction, feature selection/reduction, and classification to the Study A. Each observation was paired with a binary index indicating the presence or absence of an event. In the case of Study A, flat spots, video microsleeps, and definite microsleeps were all denoted as gold standard events. In the case of Study C, alert periods were treated as non-events, while events were defined as definite microsleeps and rest periods (sleep > 15 s) in the case of Study C.

Behavioural events in Study C were categorized as microsleeps, attention lapses, and impaired responsiveness events, rest periods and definite microsleep events as opposed to Study A’s more ill-defined “lapses of responsiveness.” In prior work (Peiris et al., 2011), the performance benchmark was performed on behavioural data where any video microsleep or flat spot was considered to be an event. The number of events in Study A was increased by using the “lapse” criterion rather than a “definite microsleep,” which consisting of both video microsleeps and flat spots. The distinction between lapses of responsiveness and microsleeps was often in Study A compared to later work (Poudel et al., 2010). As such, Study C’s gold standard was considered more reliable than Study A’s gold standard.

Feature Extraction 5.5

5.5.1 Feature Details

In both Study A and Study C, 34 spectral features (described in Section 4.1) were calculated based upon a 2-s sliding window of EEG with a 50% overlap with the prior second. Three feature sets derived from Study A resulted in a feature matrix of 544 features for two hours of data per subject.

5.5.2 Study C Complications

The Study C data used was the same as Poudel et al. (2010), which had undergone substantial preprocessing due to having been recorded in an MRI scanner. Channels were rejected due to electrical impedances, ICA was performed to remove eye blinks, and filtering was done to remove overhead power. Additional preprocessing was required to remove the MRI gradient artefacts. Due to being far more preprocessed than Study A, a conscious decision made was to minimize changes to the Study C data and to simply test if the successful approaches from Study A could be directly applied to the Study C data.

Due to the uneven number of channels in Study C, null vectors were inserted to compensate for missing channels, resulting in 2040 features for 50 min of data per subject.

The usage of zero vectors, constant DC offsets, null events, and “Not a Number” (NaN) substitutions for missing features did not affect the results on intra-subject classification, but the zero vectors were included for simplicity. Interpolation using inverse distance and spherical modelling also failed to improve results. As a result, Study C was given a consistent number of features to enable inter-subject classification.

Potential improvements of taking the log of the power spectrum were investigated with Study C. The data was stored in referential format, and it was decided to not to convert to bipolar for Study C. Given the irregular numbers of channels between subjects and potential limitations of interpolation, a bipolar conversion of the Study C EEG added additional layers of complexity. As the decision had been made to limit additional preprocessing to Study C, the conversion to bipolar was not performed.

Planned Evaluations

In document Detection of microsleeps from the eeg via optimized classification techniques. (Page 67-71)