Scikit-Learn machine learning package - Signal Processing for Early Warning Arrhythmia Detectio

The trained models were persisted using the method pickle.dumps() in the Scikit-Learn package ‘pickle’. This method dumped the persistence version of the classifier model as a binary file that could be recognised by Scikit-Learn. The

3.5. Scikit-Learn machine learning package 79

file was copied to BBB which already had Python and Scikit-Learn installed on it. The Python program running on BBB could load the persisted model using the method pickle.loads(). Once the model was loaded in the memory, it could be used for classification tasks on the fresh test ECG waveforms using the predict() method of that particular classifier. By using the pickle package the classifier was not required to be trained again on a different target machine, especially if the target device is a resource constrained like BBB, with less memory and processing power as compared to a desktop.

The feature vectors were extracted from the fresh ECG signal over a 10 seconds interval as it is a common practice with standard ECG signal acquisition in clinical environment. A single ECG strip is 10 seconds in duration. As the classifier model was already trained and persisted, it could be loaded back in the memory using the pickle.loads(file) function in the pickle package. The classification task was performed using the predict method on the classifier. The classification accuracy was analysed using metrics.accuracy_score, metrics.confusion_matrix and classification-report from sklearn.metrics package.

Classifier = pickle.loads(file) Classifier.predict(feature_vectors)

ECG Analysis and Arrhythmia

Detection

4.1 Introduction

A very important aspect of personalised healthcare is to monitor an individual’s health using wearable biomedical devices continuously and essentially to analyse and if possible to predict potential health hazards ahead of time. The prediction aspect embedded in the system helps in avoiding delays in providing timely medical treatment, even before an individual reaches a critical condition.

This chapter focuses on early signs cardiac arrhythmia detection and classification using the ECG samples obtained from a wearable 3-lead ECG kit. Also, the state-of-the-art research shows extensive use of Heart Rate Variability (HRV) analysis for arrhythmia classification, which depends largely on the morphology of the ECG waveforms and the sensitivity of the ECG equipment, induces errors in classification. The wearable 3-lead ECG kits are susceptible to calibration and measurement errors, so the accuracy of classification has to be dealt with at the machine learning phase. The clinical application of HRV analysis and effectiveness in its adoption are still a matter of research and the results tend to vary across age, gender, medications, health status, and physiological variations, among others (Voss et al. 2015). Furthermore, outliers due to spurious ectopy and motion artefact can have major effects on computed HRV values, especially as seen in elderly population with varying

4.1. Introduction 81

supra-ventricular rhythm. HRV analysis is based on the RR-intervals of the waveform, so the analysis can only provide time and frequency domain measures of the waveforms, however the subtle differences in the waveforms as observed in the normal sinus rhythm and Premature Atrial Comtractions (PAC), due to overlapping P-wave and QRS complex, do not produce significant time/frequency variations to provide a clear classification boundary between PACs, PVCs and normal sinus rhythm. With an aim to detect early warning to arrhythmia, the objective during this research study was to identify beats containing PACs and PVCs and separate these from the normal beat waveforms. The differentiating factor between the PACs, Premature Ventricular Contractions (PVC) from the normal sinus rhythm was the PR-interval portion of the waveform and the abnormal QRS complex. So, in addition to RR-interval other features had to be identified which could increase the accuracy of classification. As an ECG waveform is a power signal, power spectral analysis provided the Power Spectral Density (PSD) measures of the sub-waves in the ECG waveform, which provided the required features along with RR-intervals to improve the arrhythmia classification accuracy into the two early signs arrhythmia classes (PVCs and PACs), which was the key hypothesis during the research experiments described in this chapter. To derive these spectral estimates a unique feature engineering algorithm was developed during this research study and is presented in this chapter. The algorithm implements a finite state machine that takes as input the start and stop locations of the P-wave, QRS-wave and the T-wave and the respective peak locations of these sub-waves and calculates their power spectral densities. The P-waves, PR-intervals and QRS complexes were considered and the power spectral densities of these sub-waves were included in the feature vector that was used for arrhythmia classification. Once the power spectral densities were included as a feature vector, the accuracy of classification increased to 97% with the PSD of PR-interval alone, contributed in excess of 35% of total feature importance. The consolidated feature extraction algorithm was used to extract the following features: PR_Interval, PR_PSD (PR-Interval Power Spectral Density), QRS_Interval, QRS_PSD (QRS_Interval Power Spectral Density), RR_Interval, PowerSpectralDensity (PQRST waveform Power Spectral Density),

SignalToNoiseRatio. The class labels were the annotation set AnnotationType:V,

A, N with V representing PVCs, A representing PACs and N representing normal sinus rhythm.

As the heart-rate depends on RR-interval, these two were found to be negatively correlated (Pearson) amongst themselves (r = -0.359, p < 0.001) and F (2, 24187) = 1125.58, p < 001 R2 = 8.5%, table 4.2. From each of the 48 records in MITDB database about 650,000 (pre-filtered and denoised) samples per record were used to train the classifiers to classify a heartbeat sample as belonging to a category label (AnnotationType) of an abnormal beat annotation. The MITDB dataset had adequate number of samples to enable classification between four major annotation types V,A,L,R representing PVC, PAC, left branch bundle block, right branch bundle block respectively, so the initial classification task involved classification for these abnormal annotation types only. The feature vector for V,A,L,R classification consisted of feature set: age, gender, RR-Interval, ECG signal value mV. From the experiments performed using several classifiers, k-Nearest Neighbours (k-NN ) classifiers yielded 99.4% accuracy. The feature extraction algorithm extracted features for 24,190 annotations representing the abnormal V,A,L,R annotation types. Due to the disproportionate number of abnormal annotation types, which were approximately 81 abnormal beats per 100,000 beats, the dataset had imbalance in classification labels. The problems related to dataset imbalance was solved using the SMOTE (Synthetic Minority Oversampling Technique) imbalance reduction technique as the A-type and V-type annotations were only 5% and 25% of total annotation count. As the A-type and V-type annotations represented early signs arrhythmia, these two were considered for classification purpose along with the normal N-type annotation. Considering the dataset imbalance and spectral components of PR-Interval and QRS complex a consolidate feature extraction algorithm was used to extract the features. A classification model was developed using GridSearchCV and RandomForestClassifier with balanced-accuracy scoring, 500 estimators and

StratifiedKFold cross validation with 5 splits and a ’balanced’ class-weight as

parameters. An overall classification accuracy score of 97% was observed. The precision accuracy for classification of both V-type and A-type annotations was

In document Signal Processing for Early Warning Arrhythmia Detection and Survival Prediction for Clinical Decision (Page 96-101)