Experiments with six age groups - Bayesian assessment of newborn brain maturityfrom sleep ele

The first experiments were run on classification of EEG of newborns in six age groups or classes, 36 to 41 weeks PCA. This multiclass problem is expected to be difficult as the EEG from the neighbouring age groups are hard to dif- ferentiate and so the classes can overlap. For this problem, we first run the Bayesian DT technique described in Chapter 3, and then obtain the posterior probabilities of EEG features being used in the DT ensemble. Having found the ranges of the posterior probabilities, we assign thresholds probabilities to define features as weak, and test our hypothesis that an ensemble can be refined from DTs using weak features without a decrease in performance. We then test the proposed technique of searching for the minimal subset of important features. For the comparison, we rerun the Bayesian averaging on the data of reduced dimensionality, having eliminated the weak attributes.

5.3.1 Bayesian classification

The experiments were run with the set of EEG recordings of 686 newborns aged between 40 and 45 weeks so that the number of age groups was six. Each of these groups (classes) included around 100 recordings. The EEGs have been segmented in 10-sec intervals, and the 72 spectral features, namely the spectral powers and their variances within the standard frequency bands, were computed within these segments. We averaged the segments of each patient to suppress the artefacts and transient variations in EEG as described in Chapter 4.

The Bayesian technique was run with the following settings. In a burn-in phase we collected 200,000 DTs, and in a post burn-in phase 10,000 DTs. During the post burn-in phase each 7th model was collected to reduce the correlation between DT models. The minimal number of data samples allowed to be in DT nodes, pmin, was set to six. Proposal variance was 1.0, and probabilities of

making moves of birth, death, change variable, and change threshold were set to 0.15, 0.15, 0.1, and 0.6, respectively. The performance and entropy of the DT ensemble collected in the post burn-in phase were evaluated using a five-fold cross-validation.

The rate of acceptance of DT models was around 0.13 in both phases. In the burn-in phase, the log-likelihood as well as the size of DT were stabilized after 10,000 samples, as seen from Fig. 5.1, so that the remaining samples were drawn from an approximately stationary Markov Chain. The average performance of the Bayesian technique (exact match of weeks) was 27.41±3.9% and the entropy was 0.414.

an almost 3% better performance than the single DT, and this result shows that the sampler has acceptably explored the parameter space.

0 0.5 1 1.5 2 x 105 −800 −600 −400 Samples of burn−in Log likelihood 0 0.5 1 1.5 2 x 105 0 20 40 DT nodes Samples of burn−in 0 10 20 30 40 0 0.1 0.2 DT nodes Prob 0 5000 10000 −500 −450 −400

Samples of post burn−in

Log likelihood 0 5000 10000 20 30 40 DT nodes

Samples of post burn−in

0 10 20 30 40 0 0.1 0.2 DT nodes Prob

Figure 5.1: Log-likelihood, number of DT nodes and distribution of DT sizes during the burn-in and post burn-in phases.

5.3.2 Feature importance

According to the proposed technique, we estimated the importance of all the 72 attributes in terms of the posterior probabilities of using these attributes by the DT models collected in the post burn-in phase. The posterior probabilities (frequencies) of using the attributes ranged between 0.0 (exactly zero) and 0.048 as shown in Fig. 5.2. Here, the probabilities were averaged over the 10 folds. We can observe that the three most important features with probabilities near 0.048 are the mean relative and absolute powers in the Delta range. The probabilities of all of the mean spectral powers are generally higher than those of their variances; 12 mean powers are with probabilities above 0.02, but only seven variance features have probabilities above this threshold. The probabilities of the absolute power variances generally are the lowest, all below 0.02.

0 0.01 0.02 0.03 0.04 0.05 0 0.01 0.02 0.03 0.04 0.05 Posterior probability Posterior probability

Figure 5.2: Posterior probabilities of 72 EEG attributes characterising the relative and absolute spectral powers (the upper plot) and their variances (the lower plot).

5.3.3 Refining the ensemble

Having found the range of feature importances, we applied the proposed technique to refine the DT ensemble. Table 5.1 shows the number of weak features,

k, versus the threshold values within a 5-fold cross-validation. At threshold value

0.001 the average number of weak features, k, was 15, whilst at level 0.005 their number has increased to 30. We found that around 30 weak attributes could be discarded without a significant decrease in performance, P. At the same time, when the threshold was gradually increased from 0.0 to 0.005, the uncertainty in decisions insignificantly decreased from 0.414 to 0.403 in terms of entropy E of the ensemble.

Having confirmed that the DT models using weak EEG features can be discarded from the ensemble without a decrease in performance, we test the

proposed sequential-forward strategy of finding the minimal subset of important EEG features. Fig. 5.3 shows the training accuracy, performance, ensemble size and p-value of the KS-test calculated within the proposed technique for one of the five folds. We can observe that for (k = 29) weak features, p-value becomes lower than 0.5, the given confidence interval. Further discarding of weak features did not increase the accuracy. Thus we define 28 weak features and select the remaining 44 as most informative ones.

Table 5.2 compares the performance and entropy of the original ensemble with that of the refined ensemble excluding the 26 weak features. The performance, entropy and the number of weak features are counted within the five-fold cross-validation. We can see that after refining the performance has slightly increased by 1.8% and the entropy has slightly decreased.

Fig. 5.4 shows the distributions of performances provided by the original and refined DT ensembles on the test data. We can see that the size of the refined ensemble becomes significantly smaller. Most of the DTs with performance above 32.0% have been kept, whilst most of the DTs with performance below 24.0% have been discarded from the refined ensemble.

Table 5.1: Performance (P ) entropy (E ) and the number of weak features (k ) for the thresholds

Threshold P_{, %} E_{, bits} k 0.001 27.8±4.5 0.414±0.014 15 0.002 26.8±3.6 0.414±0.014 20 0.003 27.6±3.4 0.413±0.009 23 0.004 27.8±6.2 0.409±0.001 28 0.005 27.6±5.0 0.403±0.011 30

Table 5.2: Performance and entropy of the DT ensembles Original ensemble Refined ensemble Rerunning

P_{, %} E_{, bits} P_{, %} E_{, bits} P_{, %} E_{, bits}

27.4±3.9 0.414±0.015 29.2±6.9 0.410±0.014 29.3±6.5 0.416±0.024

5.3.4 Rerunning the Bayesian classification

with a reduced set of features

Having found a minimal subset of important EEG features, we can rerun the Bayesian classification on a dataset of reduced dimensionality. Table 5.2 shows the performance and entropy of DT ensemble rerun on the EEG data represented

0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 40 60 80 Weak features Training acc., % 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 25 30 35 Weak features Performance, % 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 0 200 400 Weak features Entropy 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 0 5000 10000 Weak features Ensemble size 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 0 0.5 1 Weak features p value

Figure 5.3: Finding a minimal feature subset. From top: training accuracy, performance, ensemble size, and p value of KS-test.

by the features found most important. We can see that the performance is similar to that of the refined ensemble. Compared to the original ensemble, the increase in performance is 1.9%. This result supports our hypothesis that dimensionality reduction provides better conditions for proportional sampling.

0.2 0.22 0.24 0.26 0.28 0.3 0.32 0.34 0.36 0 100 200 300 400 500 600 700 800 900 1000 Performance Count

Figure 5.4: Distributions of performances of DTs included in the original (grey) and refined (black) ensembles

In document Bayesian assessment of newborn brain maturity from sleep electroencephalograms (Page 78-83)