Multichannel Sleep Stage Classification and Transfer Learning using Convolutional Neural Networks

(1)

Kent Academic Repository

Full text document (pdf)

Copyright & reuse

Content in the Kent Academic Repository is made available for research purposes. Unless otherwise stated all content is protected by copyright and in the absence of an open licence (eg Creative Commons), permissions for further reuse of content should be sought from the publisher, author or other copyright holder.

Versions of research

The version in the Kent Academic Repository may differ from the final published version.

Users are advised to check http://kar.kent.ac.uk for the status of the paper. Users should always cite the published version of record.

Enquiries

For any further enquiries regarding the licence status of this document, please contact: [email protected]

If you believe this document infringes copyright then please contact the KAR admin team with the take-down information provided at http://kar.kent.ac.uk/contact.html

Citation for published version

Andreotti, Fernando and Phan, Huy and Cooray, Navin and Lo, Christine and Hu, Michele T.M.

and De Vos, Maarten (2018) Multichannel Sleep Stage Classification and Transfer Learning

using Convolutional Neural Networks. In: 40th Annual International Conference of the IEEE

Engineering in Medicine and Biology Society (EMBC 2018). IEEE, Honolulu, Hawaii pp. 171-174.

DOI

https://doi.org/10.1109/EMBC.2018.8512214

Link to record in KAR

https://kar.kent.ac.uk/72663/

Document Version

(2)

Multichannel Sleep Stage Classification and Transfer Learning using

Convolutional Neural Networks

Fernando Andreotti

1

, Huy Phan

1

, Navin Cooray

1

, Christine Lo

2

, Michele T.M. Hu

3

and Maarten De Vos

1

Abstract— Current sleep medicine relies on the analysis of polysomnographic measurements, comprising amongst others electroencephalogram (EEG), electromyogram (EMG), and electrooculogram (EOG) signals. This analysis currently re-quires supervision of a trained expert. Convolutional neural networks (CNN) provide an interesting framework to auto-mated classification of sleep epochs based on raw EEG, EOG and EMG waveforms. In this study, we apply CNN approaches from the literature to four databases from pathological and physiological subjects. The best performing model resulted in Cohen’s Kappa of κ= 0.75on healthy subjects andκ= 0.64 on patients suffering from a variety of sleep disorder. Further, we show the advantages of using additional sensor data such as EOG and EMG. Last, to cope with smaller datasets of less prevalent diseases, we propose a transfer learning procedure using large freely available databases for pre-training. This procedure is demonstrated using a private REM Behaviour Disorder database, improving sleep classification by 24.4%.

I. INTRODUCTION

Sleep is a fundamental biological process, widely present in the animal kingdom, that plays a critical role in the maintenance of human mental and physical health [1], [2]. Sleep medicine relies on the analysis of polysomnographic (PSG) recordings, which include EEG, EMG, EOG amongst other physiological signals [3]. In order to understand these signals, guidelines that divide sleep into a handful of stages (e.g. R&K [4] and the AASM [5] norms) have been pro-posed. These subjective definitions have been the focus of criticism over the last 50 years [6], nonetheless manual scoring following these rules remains the gold-standard in clinical practice. In addition to being subjective, visual analysis of recordings is time consuming, tedious, and prone to subject variability. These drawbacks lead to a mounting number of papers published on computerised classification of PSGs [7], [3]. While automated scoring provides a more objective approach, methods usually make use of hundreds of hand-engineered features obtained from physiological signals. Based on these features, traditional classification methods such as support vector machines, decision trees or hidden Markov models are typically applied (for a review the reader is referred to [8]).

1

Department of Engineering Science, Institute of Biomedical Engineer-ing, University of Oxford, UK.

2

Department of Neurology, Sheffield Teaching Hospitals, Sheffield Institute of Translational Neuroscience, Sheffield, UK.

3

Nuffield Department of Clinical Neurosciences, Oxford Parkinsons Disease Centre (OPDC), University of Oxford, UK.

This research was supported by the National Institute for Health Research (NIHR) Oxford Biomedical Research Centre (BRC) and the Engineering and Physical Sciences Research Council (EPSRC – grant EP/N024966/1).

Correspondence author: F. Andreotti ([email protected])

Deep learning methods are increasingly popular due to their ability to automatically generate features at multiple levels of abstraction (i.e. layers). This allows the system to learn complex functions by mapping the input to the output directly from data, not relying on hand-engineered features [9]. These methods have been successfully applied in the field of computer vision and speech analysis, however applications to 1-dimensional biomedical signals (e.g. EEG) are only emerging in literature. Some of the early works [10], [11], [12] proposed the usage of deep learning techniques for sleep staging. However, these studies are limited with respect to sensors used and the fact that none of these investigate sleep staging in patients suffering from sleep disorders.

In this study, we evaluate different deep learning models proposed in the literature for automatic sleep scoring. Three freely available databases, containing physiological subjects and patients with a variety of pathologies were used to assess these methods’ performance. The advantages of using additional sensors such as EOG and EMG is evaluated on multiple databases. At last, a transfer learning procedure is suggested for improving classification when data is scarce. For this purpose, a private modest-sized dataset of REM Behaviour Disorder (RBD) patients is used. As RBD disease has low prevalence and patients sleep is plagued by arousals [13], sleep classification is a challenging task. Therefore, this dataset provides an ideal case for demonstrating the benefits of transfer learning.

II. MATERIALS

In this study, to avoid a saturation on the number of needed channels in a recording setup, a single central EEG lead (C4-A1 or C3-A2, where available), an average EOG (ROC-LOC), and/or an average EMG (CHIN1-CHIN2) channels are used. Although EOG and EMG signals are used by human experts for sleep staging, they are rarely present in automated systems [12]. The choice for these deriva-tions is due to their common usage in literature. Several public sleep databases exist including different groups of healthy/diseased subjects, ages and genders. This work uses 3 openly accessible databases and a private clinical database. All recordings were resampled at 100 Hz and divided in 30 s epochs. Preprocessing was done by using a zero-phase 100th

order FIR filter with 0.1 Hz high-pass cutoff frequency for EEG/EOG signals and 10 Hz for EMG. Annotations using R&K were converted into AASM guidelines by assigningS3

andS4stages toN3, while{S0, S1, S2}were relabelled as {W, N1, N2}, respectively. The datasets here investigated are described in the following sub-sections.

(3)

A. Physionet Sleep-EDF Database (SLPEDF-DB)

The SLPEDF-DB [14], [15] comprises 38 two-night recordings from 19 healthy subjects (recording SC4131E0

was excluded due to a missing second night). Recordings comprise 9 young males aged 28.3 ± 2.3 years and 10 young females (29.1±3.4years). Unlike most studies, EEGs were annotated using Fpz-Cz (or Pz-Oz) derivations and EOG using a horizontal derivation. Signals were originally sampled at 100 Hz except EMG, which was sampled at 1 Hz. A total of 37,147 epochs were produced, being 11.8% W, 20.3% REM, 7.3% N1, 46.0% N2, 14.6% N3.

B. Montreal Archive of Sleep Studies (MASS-DB)

The MASS-DB [16] is a large database comprising 200 healthy participants with ages ranging between 18 and 76 years, including 98 males aged 42.7±19.4 years and 102 females aged38.1±18.9years. The database contains single nights and is divided into 5 cohorts all of which were used in this study. Recordings with 20 s epochs were converted into 30 s by including 5 s before and after each segment. A total of 228,870 epochs were produced, being 13.6% W, 17.6% REM, 8.5% N1, 47.2% N2, 13.3% N3.

C. CAP Sleep Database (CAPSLP-DB)

The CAPSLP-DB [17], [15] consists of 108 single night PSG recordings of 16 healthy and 92 pathological subjects. The dataset includes 66 male (aged 48.4±19.2 years) and 42 female (aged40.0±19.4years). Between the pathologies are periodic leg movements, insomnia, sleep-apnea as well as 22 RBD subjects. The record brux1 was excluded from our analysis due to inconsistent sampling frequency, n04,

n08, andn16were excluded due to absence of either signal modality. A total of 154,094 epochs were produced, being 12.2% W, 11.8% REM, 4.5% N1, 42.5% N2, 28.9% N3.

D. RBD Database (RBD-DB)

The RBD-DB consists of 21 double-night recordings of 20 male (aged61.5±7.0 years) and a female patient aged

69 years all suffering from RBD. Data was acquired by our local partners from the John Radcliffe hospital, Nuffield Department of Clinical Neurosciences at the University of Oxford. This study complies with the requirements of the Department of Health Research Governance Framework for Health and Social Care 2005 and was approved by the Oxford University hospitals NHS Trust (HH/RA/PID 11957). A total of 45,410 epochs were produced, being 24.1% W, 11.1% REM, 12.2% N1, 36.4% N2, 16.1% N3.

III. METHODS

Convolutional and Recurrent Neural Networks (i.e. CNN and RNNs, respectively) are the most used techniques for deep supervised learning. Due to its computationally effi-cient algorithm and properties such as translation invariance, parameter sharing and sparse connectivity, CNNs are often the method of choice for operating over grid-like structures (e.g. images or fixed segment windows) [18]. In its hidden layers, CNNs produce feature maps with a high degree of

abstraction. In this work, we apply CNN architectures from literature to classify individual epochs of sleep data. These methods are described in the following sub-sections. For re-producibility, fully-connected (FC) layers were removed and the CNNs are followed by a single softmax layer. Removing FC layers forces the network to learn good representations in the convolutional layers, potentially leading to better generalisation [19].

A. Two-layer approach [10]

The approach by [10] proposed a two-layer CNN model specifically for sleep scoring and evaluated on the SLPEDF-DB. The resulting filters of the first 1-dimensional convolu-tion were stacked and further processed by a 2D convoluconvolu-tion, which results in 496k parameters. To evaluate the necessity of such a stacking procedure, we also evaluate a simpler network comprising two 1D-CNNs with the same temporal dimensions as in [10] resulting in 97k parameters.

B. DeepSleepNet [11]

In [11] two branches of 4 convolutional layers each were proposed. Each branch operates with different kernel sizes, aiming to generate feature maps with low and high frequency content. The CNN was then followed by a bidirectional Long-short Term Memory (LSTM - a type of RNN). The authors used a single lead which mixes EEG and EOG information and tested their models on a subset of the MASS-DB. In this work we make use of the proposed CNN network, which has 844k parameters.

C. Residual Network (ResNet) [20]

In this work we apply the pre-activation ResNet (also called v2), max-pooling the first two layers to reduce di-mensionality as in [19]. Similarly to [19], a receptive field of 30 samples was used in the first layer to allow the model a higher level of abstraction with a lower number of layers. We evaluated ResNet models with 12, 22 and 34 layers totalling from 608k to 4.7M parameters.

IV. EXPERIMENTS

A. Experiment 1: Input Channels and Performance

In this experiment, we aim to assess if using multiple sen-sors improves the classification accuracy. For this purpose, we chose to apply the DeepSleepNet model [11] due to its reported accuracy. The model is applied to various channel combinations of each dataset, using both nights from the SLPEDF-DB and RBD-DB merged into a single subject. Raw EEG, EOG and EMG waveforms are added to the input as additional features. The results of a 5-fold cross-validation are shown in Table I. Results are reported using Cohen’sκ

coefficient for nominal multi-class agreement, which takes chance into consideration. As a rule of thumb,κ∈[0.6,1]

is considered good whereasκ∈[0,0.4]is fair to poor. The model was trained in 100 epochs, with batch size 256.

From Table I we observe that including both EEG and EOG sensors significantly improves sleep stage classifica-tion. EMG improves the performance on the healthy subjects

(4)

TABLE I

EXPERIMENT1: MEAN AND STANDARD DEVIATION OFCOHEN’SKAPPA COEFFICIENTS FOR5-FOLD CROSS-VALIDATION ON DATABASES AND

DIFFERENT INPUT CHANNELS USING[11].

Input Signals

Dataset

SLPEDF-DB MASS-DB CAPS-DB RBD-DB

EEG 0.65±0.04 0.67±0.02 0.58±0.02 0.46±0.06 EOG 0.58±0.04 0.66±0.01 0.58±0.01 0.43±0.05 EMG 0.07±0.01 0.34±0.02 0.18±0.02 0.13±0.04 EEG+EOG 0.68±0.04 0.72±0.01 0.62±0.02 0.49±0.06

EEG+EOG+EMG 0.67±0.05 0.74±0.01 0.61±0.01 0.48±0.07

of the MASS-DB, while on pathological cases it slightly worsens results. The SLPEDF-DB is an exception since EMG is sampled at 1 Hz, therefore much of the informa-tion is lost. The MASS-DB results agree with [12], where multiple channels of each modality were used. Different from the method presented in [12], in this work all three signals undergo the same pipeline of neural networks, i.e. using non-linear transformations rather than linear on the input and allowing the network to deal with different statistical properties of those signals. The original work by [10] made use of a single EEG channel on the SLPEDF-DB, while [11] proposed combining EEG and EOG leads into one channel of the MASS-DB. Results should be taken with care since the architecture used may have an influence. Despite its slightly worse results, EMG was kept on following experiments since it contains crucial information for pathologies like RBD [13].

B. Experiment 2: Models on Different Databases

To evaluate each model’s performance on the individual databases, we performed a 5-fold cross-validation using all three signals as input. Hyper-parameters were kept the same as in the previous experiment. In Fig. 1, the methods’ performance are depicted in terms of macro-averaged F1 -measure, sensitivity (SE), and specificity (SP) using the largest healthy/disease databases available (i.e. MASS-DB and CAPS-DB). From this figure it is noticeable that the stacking procedure proposed by [10] considerably worsens the performance on the MASS-DB, however performs better on the CAPSLP-DB. As expected, increasing the number of layers on the ResNet increases performance, which differs from [19], who attributed a worsen in accuracy to ResNets overfitting the training set. From both Fig. 1 and 2 we note that the DeepSleepNet [11] consistently outperforms all other models with a modest number of required parameters.

In Fig. 2, classification performance on each database are shown using the best performing variants of each method (based on Fig. 1). It is visible that the model performs well on different datasets, except the RBD-DB which is more challenging especially for statesN1andREM. REM detection is crucial for RBD as the pathology is defined based on muscular atonia occurring during this state [13].

C. Experiment 3: Fine-tuning Model to Subject

Transfer learning is an important strategy when dealing with deep neural networks. Particularly when data is scarce, such as in the cases of less prevalent disorders (e.g. RBD). This strategy allows higher classification accuracy by 1)

F₁ SE SP 0.0 0.2 0.4 0.6 0.8 1.0 Performance (%) (a)MASS-DB

2-layer [10] 2-layer (no stacking) DeepSleepNet [11]

ResNet-12 ResNet-22 ResNet-34 [20]

F₁ SE SP

(b)CAPSLP-DB

Fig. 1. Performance of all methods evaluated in this study on largest databases MASS-DB and CAPSLP-DB. Metrics used are F1-measure,

sensitivity (SE_{) and specificity (}SP_).

pre-training a model in a more readily available data of a similar task; and 2) fine-tuning the network to the specific task at hand. In this study, the MASS-DB and CAPSLP-DB are used to pre-train the best performing methods from previous experiments (i.e. DeepSleepNet [11] and ResNet 34-layers [20]). The pre-trained model is then fine-tuned to each patient’s first night of the RBD-DB and evaluated on the second night. This personalisation procedure is similar to the one performed in [21]. As a baseline method we perform a leave-one-subject-out procedure on the second night of the RBD-DB using the DeepSleepNet, which resulted in

κ= 0.45±0.15.

During fine-tuning, the DeepSleepNet performed best when all layers were adapted, with an average κ= 0.43±

0.21. For the ResNet-34, parameters were only changed from the 5th _{residual block onward, producing} _κ_{= 0}_.₅₆_±₀_.₁₇_. From these results it is evident that pre-training the ResNet using large databases considerably improves classification performance on the RBD-DB. On the other hand, the Deep-SleepNet [11] does not seem to produce transferable feature representations.

V. DISCUSSION

In acoustic signal processing very deep networks are uncommon, instead raw waveforms are converted into spec-trograms and a few convolutional layers produce similar results [19]. Future work should compare both performance and transferability of these models. Moreover, the temporal dependency between epochs was not explored in this study. In order to treat sequences of sleep epochs, i.e. transitions between stages, [11] suggested bi-directional LSTM layers on top of the CNN network. Another limitation of this study is the amount of data on the RBD-DB. Additional subjects may be used to confirm the results here shown. Last, sleep staging is a common method in medicine, however, it only provides a partial understanding of the process itself. End-to-end learning of how to diagnose sleep disorders, on the other hand, may be more clinically relevant than improving sleep staging.

(5)

SLPEDF-DB 56.0% 2628 34.1% 1602 5.6% 261 3.5% 163 0.9% 42 11.8% 994 79.8% 6700 1.4% 119 7.0% 585 0.0% 2 26.5% 812 53.1% 1626 6.8% 207 13.3% 406 0.4% 11 4.2% 797 12.1% 2289 0.8% 150 78.4% 14785 4.4% 831 6.3% 309 4.7% 230 0.7% 34 22.6% 1105 65.7% 3220 Kappa 0.56 W R N1 N2 N3 72.0% 3381 7.6% 357 15.4% 725 4.2% 198 0.7% 35 2.8% 232 82.8% 6951 7.1% 600 7.1% 600 0.2% 17 13.7% 421 24.1% 739 39.8% 1219 21.4% 656 0.9% 27 1.2% 229 6.9% 1299 3.8% 718 81.2% 15315 6.8% 1291 0.5% 23 0.2% 11 0.5% 23 16.3% 800 82.5% 4041 Kappa 0.68 W R N1 N2 N3 88.1% 4135 2.1% 100 8.0% 378 1.3% 62 0.4% 21 10.4% 872 77.1% 6480 7.6% 642 4.7% 392 0.2% 14 30.9% 946 17.0% 521 37.9% 1159 13.8% 422 0.5% 14 2.3% 438 7.4% 1400 5.1% 962 77.7% 14644 7.5% 1408 0.6% 30 0.2% 10 1.0% 51 12.1% 591 86.1% 4216 W R N1 N2 N3 Kappa 0.68 W R N1 N2 N3 MASS-DB 79.8% 25496 14.0% 4472 3.4% 1100 2.5% 793 0.3% 97 1.5% 593 95.4% 36542 1.0% 369 2.1% 789 0.0% 2 20.9% 3930 30.6% 5760 23.8% 4480 24.6% 4623 0.2% 32 1.8% 1940 7.1% 7653 2.8% 2976 79.4% 84987 8.9% 9544 0.5% 144 0.6% 199 0.0% 14 22.4% 6952 76.4% 23719 Kappa 0.68 86.2% 27538 3.0% 954 7.3% 2327 3.3% 1063 0.2% 76 1.8% 695 89.7% 34357 4.1% 1565 4.4% 1672 0.0% 6 12.8% 2413 15.3% 2874 46.8% 8806 25.0% 4708 0.1% 24 0.7% 747 3.0% 3172 4.1% 4360 87.5% 93719 4.8% 5102 0.2% 61 0.0% 13 0.1% 16 26.2% 8133 73.5% 22805 Kappa 0.75 88.2% 28174 2.4% 752 7.7% 2452 1.7% 540 0.1% 40 3.0% 1164 91.3% 34955 4.6% 1745 1.1% 431 0.0% 0 15.5% 2911 14.8% 2783 56.9% 10707 12.8% 2419 0.0% 5 1.0% 1098 5.2% 5578 8.7% 9364 83.1% 88961 2.0% 2099 0.1% 43 0.1% 21 0.3% 106 42.9% 13313 56.5% 17545 W R N1 N2 N3 Kappa 0.71 CAPSLP-DB 99.6% 15839 0.2% 33 0.0% 4 0.1% 10 0.1% 21 66.5% 12149 29.2% 5327 0.3% 60 1.7% 318 2.2% 409 92.6% 6698 2.3% 167 0.4% 27 2.7% 195 2.0% 144 53.1% 36294 2.1% 1460 0.3% 177 18.5% 12667 26.0% 17734 40.8% 18102 0.5% 239 0.0% 7 2.3% 1042 56.3% 24978 Kappa 0.24 83.8% 13325 4.5% 722 4.1% 658 6.8% 1077 0.8% 125 5.3% 961 79.1% 14454 2.4% 431 11.2% 2041 2.1% 376 25.7% 1859 15.4% 1114 29.5% 2132 28.1% 2029 1.3% 97 3.5% 2361 4.8% 3250 2.4% 1608 73.7% 50363 15.7% 10750 0.5% 236 1.0% 433 0.2% 93 18.7% 8310 79.6% 35296 Kappa 0.64 84.7% 13473 3.8% 611 6.2% 984 4.2% 664 1.1% 175 11.8% 2161 70.7% 12904 8.7% 1597 6.1% 1114 2.7% 487 28.1% 2033 10.1% 732 42.1% 3046 17.5% 1264 2.2% 156 4.4% 2988 6.1% 4150 7.2% 4903 61.5% 41994 20.9% 14297 0.8% 375 1.2% 554 1.5% 673 16.9% 7484 79.5% 35282 W R N1 N2 N3 Kappa 0.57 RBD-DB 89.6% 8858 1.7% 164 1.5% 145 4.7% 462 2.6% 261 36.9% 1843 22.6% 1129 5.2% 258 25.5% 1273 9.8% 490 45.9% 2567 5.8% 324 8.4% 467 33.9% 1895 6.0% 334 27.2% 4276 1.7% 263 2.5% 398 51.6% 8108 16.9% 2660 6.5% 417 0.2% 10 0.3% 18 10.9% 692 82.2% 5235 Kappa 0.41 70.3% 6953 6.0% 596 10.8% 1068 12.3% 1212 0.6% 61 8.0% 399 42.4% 2118 12.4% 620 36.0% 1795 1.2% 61 15.0% 837 12.2% 683 19.2% 1071 53.0% 2962 0.6% 34 7.9% 1243 4.6% 721 7.2% 1130 73.8% 11597 6.5% 1014 1.2% 77 0.7% 43 0.5% 29 25.0% 1590 72.7% 4633 Kappa 0.48 80.5% 7957 3.5% 347 10.5% 1036 5.2% 518 0.3% 32 11.9% 595 50.8% 2534 12.9% 644 24.1% 1203 0.3% 17 24.3% 1355 11.0% 613 28.2% 1577 36.3% 2026 0.3% 16 18.9% 2976 2.8% 447 12.5% 1970 63.1% 9911 2.6% 401 9.7% 617 0.7% 42 0.3% 22 27.0% 1719 62.3% 3972 W R N1 N2 N3 Kappa 0.48 T s in a li s e t a l. 2 0 1 6 (n o s ta c ki n g ) S u p r a ta k e t a l. 2 0 1 7 H e e t a l. 2 0 1 5 (3 4 l a y e r s ) P re d ic te d L a b e l Target Label

Fig. 2. Resulting confusion matrices for experiment 2 produced during 5-fold cross-validation over each database and method.

VI. CONCLUSION

In this work, we apply several deep convolutional neural networks to the task of automated sleep staging. Approaches are compared in terms of input sensors (EEG, EOG and/or EMG) with the help of four different databases from patho-logical and physiopatho-logical subjects. At last, the generalisation power of these methods is demonstrated by pre-training a network on a combined large database consisting of both healthy and diseased subjects and fine-tuning this network’s weights so that it improves classification performance on a small, more challenging database (i.e. RBD-DB).

REFERENCES

[1] K. Wulff, S. Gatti, J. G. Wettstein, and R. G. Foster, “Sleep and circa-dian rhythm disruption in psychiatric and neurodegenerative disease,”

Nat. Rev. Neurosci., vol. 11, no. 8, pp. 589–599, 2010.

[2] F. Weber and Y. Dan, “Circuit-based interrogation of sleep control,”

Nature, vol. 538, no. 7623, pp. 51–59, oct 2016.

[3] R. Agarwal and J. Gotman, “Computer-assisted sleep staging,”IEEE Trans. Biomed. Eng., vol. 48, no. 12, pp. 1412–1423, 2001. [4] A. Rechtschaffen and A. Kales,A manual of standardized terminology,

techniques and scoring system for sleep stages of human subjects, 1968.

[5] R. Berry, R. Brooks, C. Gamaldo, S. Harding, R. Lloyd, C. Marcus, and B. Vaughn, The AASM Manual for the Scoring of Sleep and Associated Events: Rules, Terminology and Technical Specifications, v2.2 ed. American Academy of Sleep Medicine, 2015.

[6] S.-L. Himanen and J. Hasan, “Limitations of Rechtschaffen and Kales,”Sleep Med. Rev., vol. 4, no. 2, pp. 149–167, apr 2000. [7] T. Penzel and R. Conradt, “Computer based sleep recording and

analysis,”Sleep Med. Rev., vol. 4, no. 2, pp. 131–148, 2000. [8] S. Motamedi-Fakhr, M. Moshrefi-Torbati, M. Hill, C. M. Hill, and P. R.

White, “Signal processing techniques applied to human sleep EEG signalsA review,”Biomed. Signal Process. Control, vol. 10, no. 1, pp. 21–33, mar 2014.

[9] Y. Bengio, “Learning Deep Architectures for AI,” Found. Trends Machine Learn., vol. 2, no. 1, pp. 1–127, 2009.

[10] O. Tsinalis, P. M. Matthews, Y. Guo, and S. Zafeiriou, “Automatic Sleep Stage Scoring with Single-Channel EEG Using Convolutional Neural Networks,”arXiv preprint arXiv:1610.1683, p. 12, oct 2016. [11] A. Supratak, H. Dong, C. Wu, and Y. Guo, “DeepSleepNet: A Model

for Automatic Sleep Stage Scoring Based on Raw Single-Channel EEG,”IEEE Trans. Neural Syst. Rehabil. Eng., vol. 25, no. 11, pp. 1998–2008, nov 2017.

[12] S. Chambon, M. Galtier, P. Arnal, G. Wainrib, and A. Gramfort, “A deep learning architecture for temporal sleep stage classification using multivariate and multimodal time series,” pp. 1–14, 2017.

[13] B. F. Boeve, M. H. Silber, and et al., “Pathophysiology of REM sleep behaviour disorder and relevance to neurodegenerative disease,”Brain, vol. 130, no. 11, pp. 2770–2788, 2007.

[14] B. Kemp, A. H. Zwinderman, B. Tuk, H. A. C. Kamphuisen, and J. J. L. Obery´e, “Analysis of a sleep-dependent neuronal feedback loop: The slow-wave microcontinuity of the EEG,” IEEE Trans. Biomed. Eng., vol. 47, no. 9, pp. 1185–1194, 2000.

[15] A. L. Goldberger, L. A. Amaral, L. Glass, J. M. Hausdorff, P. C. Ivanov, R. G. Mark, J. E. Mietus, G. B. Moody, C. K. Peng, and H. E. Stanley, “PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals.”Circulation, vol. 101, no. 23, pp. E215—-E220, jun 2000.

[16] C. O’Reilly, N. Gosselin, J. Carrier, and T. Nielsen, “Montreal Archive of Sleep Studies: an open-access resource for instrument benchmark-ing and exploratory research,”J. Sleep Res., vol. 23, no. 6, pp. 628– 635, dec 2014.

[17] M. G. Terzano, L. Parrino, A. Sherieri, R. Chervin, S. Chokroverty, C. Guilleminault, M. Hirshkowitz, M. Mahowald, H. Moldofsky, A. Rosa, R. Thomas, and A. Walters, “Atlas, rules, and recording techniques for the scoring of cyclic alternating pattern (CAP) in human sleep.”Sleep Med., vol. 2, no. 6, pp. 537–53, nov 2001.

[18] I. Goodfellow, Y. Bengio, and A. Courville,Deep Learning, 2016. [19] W. Dai, C. Dai, S. Qu, J. Li, and S. Das, “Very deep convolutional

neural networks for raw waveforms,”2017 IEEE International Con-ference on Acoustics, Speech and Signal Processing (ICASSP), pp. 421–425, 2017.

[20] K. He, X. Zhang, S. Ren, and J. Sun, “Identity Mappings in Deep Residual Networks,”arXiv 1603.05027v3, no. 1, pp. 1–15, mar 2016. [21] K. Mikkelsen and M. De Vos, “Personalizing deep learning models for automatic sleep staging,”arXiv:1801.02645 [q-bio.NC], pp. 1–9.