EEE07 - ECG-based Arrhythmia Classification using Machine Learning and Complexity Analysis Techniques

(1)

EEE0

7 - ECG-based Arrhythmia Classification using

Machine Learning and Complexity Analysis Techniques

Zhou Xinyan

Raffles Institution

Abstract- The electrocardiogram (ECG) is one of the most widely used diagnostic modality in the diagnosis and prediction of cardiovascular diseases, most commonly arrhythmias, by capturing irregularities in the electrical signals of the heart. Classification of ECG signals is crucial in identifying the classes of cardiac arrhythmias for more accurate diagnosis and effective treatment. Traditionally, ECG signals are classified manually or via waveform analysis, which is time-consuming and prone to error due to variability of ECG signals from different patients.

This research proposes a novel method of ECG classification using machine learning techniques, where a classification module makes use of feature vectors extracted via complexity analysis of an ECG signal. The optimal classification module is empirically determined using the Classification Learner software from Matlab, and has been identified to be the SVM Cubic classifier. It is able to classify ECG signals into 12 discrete classes: AFIB, AFL, APB, Bigeminy, IVR, LBBB, NSR, PVC, RBBB, SVTA, VT, and WPW, with 78.1% accuracy on average.

This classification module can be integrated with BP estimation abilities as shown in related works, where BP can be estimated using the same feature vectors in complexity analysis, which are later fed into a classification module, to produce a classifier that can classify both ECG and BP classes based on a single ECG signal.

Associate Professor Gwee Bah Hwee

Victor Adrian

Nanyang Technological University

School of Electrical and Electronic

Engineering

Keywords- Electrocardiogram (ECG), Arrhythmia, Machine Learning, Complexity Analysis

1 INTRODUCTION

Globally, cardiovascular diseases (CVDs) are the leading cause of human death, with over 17 million people known to lose their lives annually due to CVDs. One of the main sources of CVDs is cardiac arrhythmia, where heartbeats deviate from the normal sinus rhythm (NSR) [1], and such deviations can be classified into various subclasses. An accurate model to classify such arrhythmias at an early stage could help in validating the diagnosis and treatment of patients, and could greatly reduce mortality rate, as various subclasses tend to have identifiable characteristics.

Electrocardiogram (ECG) is a widely used non-invasive diagnostic modality that records the electrical activity of the heart. During every cycle of contraction and relaxation, the heart produces electrical impulses that spread through the heart muscle, which is detected via electrodes placed on the skin. [2] [3] [4]

(2)

While the ECG is a useful diagnostic tool, classification of ECG signals is a challenging problem. Manual classification is difficult due to the individuality of ECG patterns and variability in ECG waveforms of patients having the same conditions (making intra-class comparison difficult), its susceptibility to human error, and limitations in real-time analysis. [3] While more sophisticated classifiers such as deep learning techniques and neural networks have been developed recently, many of them require long computation times to optimize the classifiers. The use of complex classification or preprocessing methods is not suitable for online calculations or demand a lot of computational power.

Additionally, they are highly specific to arrhythmia analysis and cannot be implemented with other physiological markers to give a multidimensional health analysis, limiting their applications in the medical field. [4] [5] [6] [7]

Thus, there is a need to develop a fully automatic and fast classifier that can be implemented online, and can also be used as a classifier for other physiological signals such as blood pressure (BP). There have been recent developments in BP estimation using only ECG data via machine learning techniques using feature vectors extracted from complexity analysis of ECG signals. [8] [9] By developing an ECG classifier that utilises the same feature vectors in its classification, medical professionals and wearable technologies will be able to derive both ECG and BP class from a single ECG signal, improving health monitoring

technologies' capabilities and efficacy in this digital age of healthcare wearables.

2 AIMS

This research aims to develop a novel ECG classifier using machine learning techniques, via feature vectors extracted from complexity analysis of an ECG signal. This looks to combat limitations of existing models as explained above, and also to create a classification module that can be integrated to predict blood pressure (BP) class together with ECG class using solely ECG data.

I. CONSTRAINTS

ECG signals are typically collected from 12 leads. [2] However, in this paper, we utilise only data acquired from the MLII lead in the prediction of arrhythmic classes.

3. METHODOLOGY

A schematic representation of the proposed method is illustrated in Fig 1. The ECG signal obtained first undergoes pre-processing to remove artifacts and external interference to preserve valid ECG signals. Following this, the signals undergo complexity analysis via feature extraction. Extracted feature vectors are then inputted into a classification module which implements a flat machine learning algorithm, after which the best classification model is selected based on the percentage (%) accuracy of predictions produced. All relevant diagrams, such as data collection results as well as Matlab codes and data generated, will be included in the report.

(3)

Fig 1. Proposed methodology for ECG classification

I. DATA PRE-PROCESSING

During data collection, ECG signals may be contaminated by artifacts which distort the signal. Artifacts are unwanted signals that are merged with ECG signals, and are commonly caused by muscular activity and motion, or by external electrical devices, and may result in distortions such as baseline wander. [10] As this project involves feature extraction of these signals, highly accurate measurements are required.

A valid ECG signal information is considered to exist from 0.05-100 Hz. [8][9][10]To achieve this, the ECG signal is passed through an 18th order bandpass Butterworth filter applied to frequencies from 0.3-100 Hz, designed using Filter Designer from Matlab®. The threshold of 0.3 Hz was chosen as a reasonable frequency that assures complete baseline removal without deforming the ECG signal [8] [9] [11], while the

cutoff frequency of 100 Hz was determined empirically via the Spectrum Analyser software from Matlab®® to ensure that signals are not attenuated, where the frequency of raw ECG signals lie beyond the passband frequencies of the filter. The filter order was determined optimally by the Filter Designer software. Its parameters and design interface can be found in Fig 2.

Fig 2. Filter Designer interface depicting a butterworth bandpass filter of order 18

In this project, data segmentation was not required as the raw data obtained has already been segmented into 10s segments.

II. COMPLEXITY ANALYSIS

It has been observed that a normal and healthy biomedical system is highly complex, and once an abnormality occurs, in the case of arrhythmias, its complexity drops. [8] [9] [12] [13] Considering the feature vectors used for complexity analysis of ECG signals in related works, four metrics were selected as features to model the complexity of the ECG signals: signal mobility, signal complexity, fractal dimension and entropy. Age, while typically used in related works [8] [9], was unable to be utilised as a feature due to the

(4)

lack of data acquired. The features from complexity analysis are invariant to the lead number of the ECG recording [9], thus are good standards of comparison for all ECG signals. Each feature and its extraction process is described as follows. A summary of all extracted features from the 12 classes of ECG signals can be found in Fig 4.

A. SIGNAL MOBILITY

Signal mobility is defined as the square root of the ratio between the variance of the first derivative of the signal and the variance of the amplitude of the original signal. [15] [16] Using Hjorth mobility, it is calculated as: ar(y(t)) v = 1 N

(

∑ N i=1y(i) 2

)

ar v

(

dy(t)_dt

)

= 1 N−1

(

∑ N−1 i=2 dt dy(i)2

)

obility M =

√

var(_var(y(t))dt) dy(t)

The mobility parameter quantitatively measures the level of variation in the signal and represents the average frequency of the signal. Mobility is observed to be the highest for NSR, and falls for arrhythmias. [17]

B. SIGNAL COMPLEXITY

Signal complexity refers to the ratio between the mobility of the derivative of the time series and the mobility of the series itself. The mobility of the derivative is obtained via the variance of the second_{-order derivative.} [15] [16] Using Hjorth complexity, it is calculated as:

omplexity

C = mobility( dt

dy(t)₎

mobility (dy(t))

Complexity represents the change in signal frequency by comparing the similarity of

the signal to a pure sine wave, in which the value approaches 1 when it is most similar. Complexity is observed to be the lowest for NSR, and rises for arrhythmias. [17]

C. FRACTAL DIMENSION

Fractal dimension (FD) is a measure of the self-similarity of a signal (ie. detail and irregularity) in its time domain, and quantifies a signal’s complexity by measuring a change in detail to a change in scale. FD has been widely proven to aid in identifying cardiovascular diseases. [18] [19] [20]

Given L is the total length of the time series and d is the Euclidean (or also, Pythagorean) distance between the first point in the series and the point that provides the furthest distance with respect to this first point. If we set a to be the mean distance between successive points and, n as the number of steps in the curve, then n = L/a. distance. [21] Using Katz algorithm, FD is calculated as:

atz F D

K = _{log ( )}

10 ad

log ( )_{10 a}L

While there are other variations of FD calculations, Katz algorithm was chosen due to its relative insensitivity to noise [22], which allows for certain margins of error in ECG denoising. Higuchi’s algorithm, while also commonly used in related work, requires a parameter kmax, which may be difficult to determine for all data signals, and also tends to be more sensitive to noise in the signals.

D. ENTROPY

Entropy refers to the randomness of a signal, where the decrease in entropy often indicates a cardiovascular disease (eg.

(5)

arrhythmias). [8] [9] [23] Given an ECG signal X, with possible outcomes xi… xn which occur with probability Pxi, Shannon entropy is calculated as:

hannon Entropy ogP

S = − ∑N

i=1

P (x )_i l (x )_i

Shannon entropy was chosen over other algorithms (eg. Tsallis and Rényi), which are both related to the Shannon algorithm via their limits, but the Shannon algorithm remains the most widely-used algorithm in ECG analysis in related works. [23]

Fig 3a. (left) Matlab® code to calculate Hjorth Mobility Fig 3b. (right) Matlab® code to calculate Hjorth Complexity

Fig 3c. (left) Matlab® code to calculate Katz Fractal Dimension Fig 3d. (right) Matlab® code to calculate Shannon Entropy

(6)

III. MACHINE LEARNING

The labelled ECG signals from various arrhythmic classes assigned with their feature vectors are fed into the Classification Learner software from Matlab®, where all classifier models from 7 subclasses are trained to predict the 12 arrhythmic class using the 4 feature vectors.

The 7 subclasses are: Decision Trees, based on the path taken from input to predicted class, Discriminant Analysis, to measure the predicted class as a linear combination of feature vectors, Naive Bayes, to consider strong independence between features, Support Vector Machine (SVM), to identify the most distinguishing features, K-Nearest Neighbour (KNN), to recognise inter-instance similarity, Boosted Trees, where datasets are weighted to correct misrepresentation of classes, and Bagged Trees, to reduce variance of a decision tree. [8] [9]

As the dataset used is limited in sample size, k-fold cross validation was performed, where the dataset is split into k groups, and for each unique group, the group is used as a test data set while the remaining groups are used as the training set. As k gets larger, the difference in size between the training set and the resampling subsets gets smaller, decreasing the bias of the machine learning technique. The value of k=10 was chosen for this project, as it has been proven to result in a model skill estimate with low bias and modest variance. [24] [25] The classification model that yields the highest accuracy was chosen as the final classifier.

4. RESULTS AND FINDINGS

I. DATASET

The dataset is obtained from the open-access MIT-BIH arrhythmia database from PhysioNet. The ECG signals contained 12 classes: normal sinus rhythm (SNR), atrial premature beat (APB), atrial flutter (AFL), atrial fibrillation (AFIB), supraventricular tachyarrhythmia (SVTA), wolff parkinson white (WPW), premature ventricular complex (PVC), bigeminy, ventricular tachycardia (VT), idioventricular rhythm (IVR), left bundle branch block (LBBB) and right bundle branch block (RBBB). All ECG signals were recorded at a sampling frequency of 360 Hz and a gain of 200 adu/mV. The signals are segmented into 10-second (3600 samples) fragments, and only signals derived from one lead, the MLII, were used. A total of 219 samples were utilised, with instance information for the respective classes summarised in Table 1 in the next page.

II. FLAT

MACHINE

LEARNING

RESULTS

Table 2 presents the % accuracy of ECG class predicted by the 7 classifiers. The results from the overall accuracy ranks the classifiers as such, in decreasing order of accuracy: SVM (1), KNN, Discriminant Analysis, Bagged Trees, Boosted Trees, Decision Trees and Naive Bayes (7). Thus, Cubic SVM was chosen as the final classification module for this project.

(7)

Table 1. Summary of instances in dataset

Table 2. Summary of prediction accuracy of trained classification modules

From the results, it can be seen that accuracy of prediction is not correlated with sample size, which may be attributed to the k-fold cross validation performed which helped to lower the classifier bias despite differences in sample sizes for each class. Additionally, the trend in accuracy of prediction for each class tended to be similar for all classifiers, with LBBB ranking amongst the most accurately predicted arrhythmic class across all classifiers, and APB being classified the least accurately. However, SVM has managed to accurately predict APB signals with 52.4% accuracy, ranking the highest amongst all classifiers.

A detailed performance analysis of the Cubic SVM classifier module is presented using confusion matrices across all 12 classes in Fig 2, where diagonal elements represent accurately classified classes, and anything off the diagonal represents misclassification.

Confusion matrices of all other classifiers are shown below in Figures 6a-f. The Cubic SVM classifier has the highest accuracy rate for predicting LBBB (93.3%), with the lowest accuracy in predicting SVTA (44.4%). This may be attributed to higher intra-class variance of features extracted from SVTA signals, making intra-class comparison difficult. Additionally, there were no instances of ECG signals wrongly classified as IVR and VT, but there were a total of 9 misclassified instances of AFIB.

However, with an overall accuracy of 78.1%, it can be noted that this is a satisfactory model, given that this is the first iteration, and accuracy would improve with further testing and training using larger datasets.

(8)

Fig 5. Confusion matrix for Cubic SVM

Fig 6a (left). Confusion matrix for Decision Tree Fig 6b (right). Confusion matrix for Discriminant Analysis

(9)

Fig 6c (left). Confusion matrix for Naive Bayes classification Fig 6d (right). Confusion matrix for KNN classification

Fig 6e (left). Confusion matrix for Boosted Trees classification Fig 6f (right). Confusion matrix for Bagged Trees classification

5. CONCLUSION

In conclusion, this research has identified a novel ECG classification module using machine learning techniques involving complexity analysis of ECG signals. ECG signals are classified using the Cubic SVM classifier, based on 4 features: signal mobility, signal complexity, fractal dimension and entropy.

This classification module makes use of ECG signals obtained from only lead II, and can be performed in real-time. Additionally, as the complexity analysis does not involve the morphological features of the ECG and instead focuses on its intrinsic functions that differ between various arrhythmic classes, it is more suited for machine learning and predictions.

The proposed classifier is successful in classifying ECG signals into 12 classes: AFIB, AFL, APB, Bigeminy, IVR, LBBB, NSR, PVC, RBBB, SVTA, VT, and WPW, with 78.1% accuracy on average.

I. LIMITATIONS

Traditionally, ECG signals are obtained from 12 leads attached to the body. However, this project utilises ECG signals obtained only from lead II, thus the method proposed may not give an optimal solution with other or all 12 leads, and further experiments need to be performed to validate the proposed method.

Additionally, while age has been correlated with changes in ECG waveforms and an increase in the incidence of cardiac arrhythmias, the ages of the patients was unable to be used as a feature for complexity analysis due to insufficient data obtained. A more accurate classification module may have been developed with this data available.

II. FUTURE WORKS

While a novel classifier has been identified for ECG classification via machine learning techniques, it remains in its testing phase, and improvements in prediction accuracy is necessary before it can be

(10)

adopted in medical applications. This can be done via extracting a larger number of feature vectors for complexity analysis, which can be derived from related works, such as autocorrelation and age. These factors have been empirically proven to be effective features in classifying ECG signals.

On top of that, a stacking machine learning approach could be adopted on top of the flat approach to yield higher accuracy. This involves aggregating the probabilities produced using each classifier into new feature vectors that are fed into a single meta-classifier, whose output is a new feature that is included in the initial feature vector to be used for regression analysis to predict the ECG class. [8] [9]

Additionally, for future works, the study can expand its dataset to include a larger number of signals as well as more classes of arrhythmias on top of the current 12 for a more accurate and specific ECG classification module.

This ECG classification module can be combined with recent BP estimation models [8] [9] to yield a high performing physiological sensor that can be used for continuous health monitoring given the continuous nature of ECG signals and the prevalence of ECG-enabled wearables such as Apple Watch and Fitbit, aiding in the transition to the age of digital health.

5. ACKNOWLEDGEMENTS

I would like to express my sincere gratitude to Associate Professor Gwee Bah Hwee and Dr Adrian Victor for their invaluable

guidance and support throughout this project, as well as the opportunity to venture into the field of machine learning in translational medicine (diagnostics) research.

REFERENCES

[1] A. Ullah, S. Muhammad Anwar and M. Bilal, "Classification of Arrhythmia by Using Deep Learning with 2-D ECG Spectral Image Representation," Remote Sensing, 25 May 2020.

[2] I. Mykoliuk, Daniel Jancarczyk, Mikolaj Karpinski and Viktor Kifer, "Machine Learning Methods in Electrocardiography Classification," CEUR-WS, vol. 2300, p. 4, 2018.

[3] S. H.Jambukia, Vipul K. Dabhi and Harshadkumar B. Prajapati , "Classification of ECG signals using Machine Learning Techniques: A Survey," pp. 1-9, 2015. [4] S. Ortín, Miquel Alfaras and Miguel C. Soriano , "A Fast Machine Learning Model for ECG-Based Heartbeat Classification and Arrhythmia Detection," Frontiers in Physics, vol. 7, no. 103, July 2019.

[5] Y. Ji, Sen Zhang and Wendong Xiao, "Electrocardiogram Classification Based on Faster Regions with Convolutional Neural Network," MDPI, Beijing, China, 2019.

[6] Z. Ebrahimi, Mohammad Loni, Masoud Daneshtalab and Arash Gharehbaghi, "A review on deep learning methods for ECG arrhythmia classification," Elsevier Ltd. , Shahroud, Iran, June 2020.

[7] F. I. Alarsan and Mamoon Younes, "Analysis and classification of heart diseases

(11)

using heartbeat features and machine learning algorithms," Journal of Big Data, vol. 6, no. 81, 31 August 2019.

[8] M. Simjanoska, Martin Gjoreski, Ana Madevska Bogdanova, Bojana Koteska, Matjazˇ Gams and Jurij Tasic, "ECG-derived Blood Pressure Classification using Complexity Analysis-based Machine Learning," Proceedings of the 11th International Joint Conference on Biomedical Engineering Systems and Technologies (BIOSTEC 2018), vol. 5, pp. 282-292, 2018. [9] M. Simjanoska, Martin Gjoresk, Matjaž Gams and Ana Madevska Bogdanova, "Non-Invasive Blood Pressure Estimation from ECG Using Machine Learning Techniques," Sensors, vol. 18, no. 1160, April 2018. [10] F. Buendía-Fuentes, "High-Bandpass Filters in Electrocardiography: Source of Error in the Interpretation of the ST Segment," Hindawi, 2012.

[11] R. Rangayyan, Biomedical Signal Analysis, NJ, USA: John Wiley & Sons, 2015. [12] Y. Luo, Hargraves, R.H., Belle, A, Bai, O, Qi, X, Ward, K.R, Pfaffenberger, M.P. and Najarian, K, "A hierarchical method for removal of baseline drift from biomedical signals: Application in ECG analysis," Science World Journal, pp. 1-10, 2013.

[13] J. Bhattacharya, "Complexity analysis of spontaneous EEG.," Acta Neurobiol. Exp, no. 60, pp. 495-502, 2000.

[14] H. Zhang, Zhu, Y.S. and Wang, Z.M, "Complexity measure and complexity rate information based detection of ventricular

tachycardia and fibrillation," Med. Biol. Eng. Comput., no. 38, pp. 553-557, 2000.

[15] J. P. R. Leite and Robson L. Moreno, "Heartbeat classification with low computational cost using Hjorth parameters," IET Signal Processing, vol. 12, no. 4, pp. 431-438, June 2018.

[16] H. Naseri, M. R. Homaeinezhad and H. Pourkhajeh, "An expert electrocardiogram quality evaluation algorithm based on signal mobility factors," Journal of Medical Engineering and Technology, 2013.

[17] I. Wannawijit, Suvimon Kaiwansil, Sutthisak Ruthaisujaritkul and Thaweesak Yingthawornsuk, "ECG Classification with Modification of Higher-Order Hjorth Descriptors," in 2019 15th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS), Thailand, 2019.

[18] K. Kiani and Farzane Maghsoudi, "Classification of 7 Arrhythmias from ECG Using Fractal Dimensions," Journal of Bioinformatics and Systems Biology, 2019. [19] G. G. Dávalos and Pedro Freddy Huamaní Navarrete, "Application of Fractal Algorithms to Identify Cardiovascular Diseases in ECG Signals," Advances in Science, Technology and Engineering Systems Journal, vol. 4, no. 5, pp. 143-150, 2019.

[20] P. Nayyer, "Analyzing Electrocardiography (ECG) Signal using Fractal Method," International Journal of Current Engineering and Technology, vol. 7, no. 2, 2017.

[21] D. M. Garner, Naiara Maria de Souza and Luiz Carlos M. Vanderlei, "HEART RATE

(12)

VARIABILITY ANALYSIS: HIGUCHI AND KATZ’S FRACTAL DIMENSIONS IN SUBJECTS WITH TYPE 1 DIABETES MELLITUS," ILEX PUBLISHING HOUSE,, Bucharest, Roumania, 2018.

[22] C. K. Loo, Andrews Samraj and Gin Chong Lee, "Evaluation of Methods for Estimating Fractal Dimension in Motor Imagery-Based Brain Computer Interface," Hindawi, 2011.

[23] José M. Amigó and Sámuel G. Balogh , "A Brief Review of Generalized Entropies," MDPI, Spain, 2018.

[24] G. James, Daniela Witten, Trevor Hastie and Robert Tibshirani, An Introduction to Statistical Learning: with Applications in R, Springer.

[25] M. Kuhn and Kjell Johnson, Applied Predictive Modeling, Springer, 2018.