Procedia Computer Science 47 ( 2015 ) 222 – 228
1877-0509 © 2015 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
Peer-review under responsibility of organizing committee of the Graph Algorithms, High Performance Implementations and Applications (ICGHIA2014) doi: 10.1016/j.procs.2015.03.201
ScienceDirect
Available online at www.sciencedirect.com
Naïve Bayes Classifier for ECG abnormalities using
Multivariate Maximal Time Series Motif
Dr. S. Padmavathi
a, E. Ramanujam
b1aDepartment of Computer Science and Engineering, Thiagarajar College of Engineering, Madurai, Tamil Nadu, India- 625015. bDepartment of Information Technology, Thiagarajar College of Engineering, Madurai, Tamil Nadu, India- 625015.
Abstract
Analyze or diagnose of Cardiovascular activity under abnormal heart beat is extremely an intricate and vital job to the medical experts, made more complicated to a novice persons. Electrocardiogram is a way to measure or diagnose for research on human beings to spot heart disease by abnormal heart rhythms. These streaming medical signals can be well analyzed or diagnosed only with the prior knowledge. This paper proposes the methodology; Multivariate Maximal Time Series Motif with Naïve Bayes Classifier to classify the ECG abnormalities. The proposed model of predicting Time Series Motif is evaluated with the dataset contains the collection of ECG signals of patients recorded using Holter Monitor. The efficiency of the proposed work is proved by comparing the precision of existing with various Feature extraction Techniques.
© 2015 The Authors. Published by Elsevier B.V.
Peer-review under responsibility of organizing committee of the Graph Algorithms, High Performance Implementations and Applications (ICGHIA2014).
Keywords: Electrocardiogram; Time Series Motif; Naïve Bayes; Cardiovascular;
1. Introduction
The psychoanalysis of the Electrocardiogram (ECG) signal provides a non-invasive and inexpensive technique to examine the core functions for different cardiac conditions. The development of accurate and fast methods for automatic ECG classification is really significant for clinical diagnosis. ECG represents the electrical action of the heart showing the regular contradiction and relaxation of the muscle. In the past decades the computerized analysis of the ECG became a well-established practice, and many systems were achieved to assist cardiologists in the chore of analyzing long-term ECG recordings. One important analysis performed in the ECG is the classification of ECG signals. This classification can be made in terms of classifying the heart beats as like Normal (N), Left Bundle Branch Block (LBBB), Right Bundle Branch Block (RBBB), Premature Ventricular Contradictions (PVC), and Atrial escape beat (e) etc.,. In general, the signals with larger Normal beats are classified as Normal and those with LBBB, RBBB and others are abnormal. Hence the concept of Multivariate Maximal Time Series Motif Mining is proposed with Context Free Grammar (CFG) based classifier to classify the signal as Normal and Abnormal. Many Algorithms for ECG classification were proposed in [1-15].
1 Corresponding Author. Tel.: +91 97881 53765; E-mail address: [email protected]
© 2015 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
Peer-review under responsibility of organizing committee of the Graph Algorithms, High Performance Implementations and Applications (ICGHIA2014)
Automatic Detection of ECG Signals and arrhythmia Classification methods are based on Wavelet Coefficient [1], Autoregressive Modeling [2], Radial Basis Function (RBF) Neural Networks [3], Self Organizing Map (SOM) [4], and fuzzy c-means clustering techniques [5]. Stamkopoulos et al. [6] developed an Ischemia Detection method using Nonlinear Principal Component Analysis (PCA) Neural Networks. Their results produce an approximately 80% classification rate for the normal beats which is higher than 90% for the ischemic beats. Dokur and Olmez proposed a novel hybrid neural network structure for the classification of the electrocardiogram (ECG) beats and achieved 96% of accuracy to classify ten types of ECG beats [7]. Zhou [8] used Quantum Neural Networks for the Detection of Premature Ventricular Contractions (PVC) beat of ECG signal. Inan [9] also proposed a Robust Neural-Network-Based Classification of PVC and the accuracy was 95.16% over 40 files and 96.82% over the 22 files. Jiang and Kong [10] developed a Block-Based Neural Networks for ECG Signal Classification with high average detection accuracy. Melgani and Bazi [11] were classified ECG signals with high accuracy using Support Vector Machine (SVM) and Particle Swarm Optimization (PSO). Jadhav et al. [12], proposed a method of ECG Arrhythmia Classification using Modular Neural Network Model. Llamedo and Martınez [13], used Feature Selection Driven by database Generalization Criteria for Heartbeat Classification. Mai et al. [14], used MultiLayer Perceptron(MLP) and RBF Network for ECG classification and achieved 98% classification accuracy using MLP and 97% using RBF. Homaeinezhad et al. [15] proposed ECG arrhythmia recognition based on neuro-SVM– KNN hybrid classifier. Eduardo José da S. Luz [16] proposed ECG classification based on supervised graph based pattern recognition technique optimum-path forest classifier, compared the performance with three well known experts –SVM, Bayes classifier and Multilayer artificial Neural Network. Tang [17, 18] has proposed ECG signal classification based on RS and Quantum neural Networks and compared the performance against Back Propagation (BP) and RBF for the selected ECG signals from MIT/BIH database.
Most of the work has achieved or increased the performance in terms of either of measures like sensitivity, specificity, Accuracy, precision, Positive Prediction and Harmonic Mean. The significance of this work is to provide an automated Time Series classification with higher performance measures. In this work, ECG signal classification consists of step such as Pre-processing, Feature Extraction, Classification. The signal from the ECG signal database [19] is presented for Pre-processing. The original ECG signal should be pre-processed using Baseline wander removal, symbolic discretization using R-R interval [20]. In Feature extraction phase, the symbolic discretized signal is used to mine the Multivariate Maximal Time Series Motif. Finally NB classifier is applied to predict the classification based on Multivariate Maximal Motif of Time Series Signal. The paper is organized as follows. Section 2 describes the proposed methodology. Section 3 explains the classification using NB classifier with Proposed. Section 4 presents the Experimental results. Finally concludes in section 5.
2. Proposed Methodology
The proposed work classifies the abnormalities in ECG signals through the following steps. 1. Signal Processing 2. Feature Extraction from the Time series dataset.
2.1. Signal Preprocessing
ECG signal is a Time Series sequence say which is an ordered set of real-valued numbers with equal intervals of time t. The signals are pre-processed from noise for better accuracy and further discretized it to symbolic characters for efficient mining. The noise artifacts that generally affects ECG signals is baseline wandering. Normally it appears from respiration and lies between 0.15 and 0.3Hz. The baseline wander of ECG signal is eliminated using moving average window which diminishes the irregularities in beats. For efficient mining of Time Series Motif, the ECG signal is transformed to string representation as proposed by Syed et al[20]. This process segments the original signal into intervals and then assigns an alphabetic label to each token. Task of assigning labels has various approaches; here the clinical information is used to label the partitioned segmented tokens into equivalence classes. It provides a set of symbols that have a fixed meaning in a medical context. ECG signal is decomposed into R-R intervals where each R-R interval corresponds to the period between two successive contractions of the
heart ventricles that can be labelled using existing annotations of electrophysiological activity.
2.2. Feature Extraction
The Multivariate Maximal Time Series Motif detection is the important method in knowledge discovery of the proposed work. Time Series Motifs are the frequently occurring patterns in the discretized Time Series Signal. Given L be the length of the sequence, N be the number of sequences, with a user defined threshold ‘ ’. The pattern is a subset of represented as where
. The Pattern is said to be a Time Series Motif , if and only if the . is the number of sequences containing in the Time Series Records and is the user defined minimum support threshold. The Time Series Motif , is said to be Maximal if and only if no superset is frequent.
Example
Table 1 refers the Time Series database with attributes like Record No. and the discretized sequence, where the user defined threshold value is given as .
Table 1. Sequence Database Record No Sequence 10 AGACTGAACC 20 GACTAGCAGT 30 GCAATTGCAG 40 AACAGTACAA 50 GACTAACGAT
According to the definition of pattern, the candidate patterns of with their supports are shown in Table 2.
Table 2. Pattern with support value
Pattern Pattern A 5 GT 2 G 5 GC 2 T 5 ……. ……. C 5 …… ……. AA 4 AAA 0 AG 4 AAG 0 AC 4 AAT 1 AT 2 AAC 3 GA 3 AGA 1 GG 0 ……… ……….
As per the definition of Time Series Motif, the pattern with the support value greater than is discovered as Time Series Motif. From Table 2 it has been discovered and shown in Table 3.
Table 3. Time Series Motif with support value greater than 3
Pattern Pattern A 5 GA 3 G 5 CA 3 T 5 TA 3 C 5 CT 3 AA 4 CAG 3 AG 4 GAC 3 AC 4 GACT 3 AAC 3 ACT 3
Table 4. Maximal Motif Pattern Pattern AAC 3 GACT 3 ACT 3 TA 3 AG 4 CAG 3 - - CT 3
KDD is a sequential process of mining data from the large database also be defined as the non-trivial extraction of previously unknown or potentially useful information of data stored in the databases. This paper proposes Multivariate Maximal Time Series Motif detection, which integrates Tree construction and Support Based Mining. Tree construction constructs a compact TRIE like data structure to store the commonly occurring frequent patterns from the root to a leaf node. Each level of the TRIE contains the ∑
number of nodes with the positions of the frequent patterns as a pointer points to sequence position in the sequence. The node has been updated to TRIE if and only if the node satisfies the constraint. While constructing the TRIE, the Support Based Mining performs the prepruning of TRIE using a constraint named anti-monotonic property. The anti-monotonic property states that “if a pattern is infrequent, then all of its super pattern must be infrequent”. The property is to restraint the growth of the TRIE to mine the frequent pattern, to reduce the search space of the TRIE. The anti-monotonic property purely depends on the minimum support threshold given by the user. Basically, TRIE construction is an extension to FP-Growth technique [21], as it is easy to mine the short or longer frequent patterns. The Maximal Motifs are extracted from the root to leaf node of the TRIE to predict for classification. The Tree construction with Support Based pruning and Maximal Motif extraction are shown in Fig1, Fig2 respectively.
Fig. 2. Pseudo code for Support Based Pruning
Fig. 1. Pseudo code for TRIE Construction
3. Naïve Bayes Classifier
Classification is a form of data analysis that extracts models describing important data classes. Classification has numerous applications, including fraud detection, target marketing, performance prediction, manufacturing, and medical diagnosis because of high accuracy, prediction rate and automatic
Variable : - TRIE Tr - Node with Name X
- Number of sequences containing
DB- Discretized Sequence of Time Series Signals Initialize // Root Node of TRIE Tr
Procedure: Growth(DB, ,Tr) foreach i in N of DBdo foreachj in L of DBdo if then insert into ( ( end end end if ( then insert into Tr end foreach in Tr do Tr -Growth end Procedure: Tr -Growth foreach i in do if then insert into end end if ( then insert into end
method to search for hypothesis. Algorithms like SVM, NB, decision trees, hiking etc are used for diverse sorts of classification and prediction. In this phase, the Naïve Bayes (NB) classifier is used to classify the ECG signal as Normal and Abnormal implemented in Rapid Miner tool. NB classifiers are statistical classifiers can predict class membership probabilities such as the probability that a given tuple belongs to a particular class. Bayes classifiers have also exhibited high accuracy and speed when applied to large databases. Naïve Bayes classifiers assume that the effect of an attribute value on a given class is independent of the values of the other attributes. This assumption is called class conditional independence. It is made to simplify the computations involved and, in this sense, it is considered as naïve. Table 5 shows the Classification table format.
Table 5. Classification Table Format
Record No Record Value ……. Class 101 XXXXYYYY XX YY XYX A
102
…
..
..
…
.
4. Experimental Results
In this section, the proposed methodology is tested over a MIT–BIH Arrhythmia database. The dataset is taken from Physionet [19,23] and the proposed technique is evaluated using Machine learning classifier Naïve Bayes algorithm in Rapid Miner software package. The source of the ECGs of MIT–BIH Arrhythmia was obtained by the Beth Israel Hospital Arrhythmia Laboratory. This database contains 48 half-hour excerpt of Two-channel, 24-h, ECG recordings partitioned into two parts first one is of 23 files (numbered from 100 to 124 inclusive with some numbers missing) selected at random from this set, and another one contains 25 files (numbered from 200 to 234 inclusive, again with some numbers are absent). Each of the 48 records is slightly over 30 min long [18]. Among 48 recordings, 24 recordings are normal and 21 are abnormal classes and the recordings which have paced beat and 15 records are considered for the work. In which 10 records are Normal and 5 records are abnormal. Classification results are shown in terms of confusion matrix for the Naïve Bayes in Table 6. According to Table 6, all normal and abnormal classes are classified correctly and some misclassifications are also available. The classification performance was achieved with 93.33% accuracy.
Table 6. Confusion matrix of NB Classifier.
Classifier Output/ target Normal class Abnormal class Accuracy %
Naïve Bayes
Normal Class 10 0 100
Abnormal Class 1 4 80
Total 11 4 93.333
The performance of the proposed algorithm and classifier performance for classifying is measured using the standard familiar metrics like Sensitivity (Se), Specificity (Sp), Precision (Pr), Positive Predictive Value(PPV), Negative Predictive Value(NPV) were assessed using True Positive (TP), True Negative(TN), False Negative (FN) and False Positive (FP). Since it is remarkable to estimate the performance of classifiers based on the recognition of abnormal beats, the true positives (TP), false positives (FP), true negatives (TN), and false negatives (FN) are defined more appropriately as follows:
FP: Normal class classifies as abnormal. TP: Abnormal class classifies as abnormal. FN: Abnormal classifies as normal. TN: Normal class classifies as normal.
The most crucial metric for determining overall system performance is usually accuracy. The definition of the overall accuracy of the classifier is as follows:
Table 7. Performance Measure of NB Classifier
Classifier TP FP Pr Se % Sp % PPV NPV Naïve Bayes 80 0 100 91 100 100 80
This section shows the performance comparison of the proposed work against the [22], [23] and [24]. As mentioned in the existing works, comparison is evaluated with some of records of the MIT–BIH arrhythmia
database. . Existing models are trained from 20
other different waveforms of the MIT-BIH arrhythmia database for each signal. Table 8 shows the performance comparison of the proposed against various Neural Network and Back Propagation implemented in existing work using Precision as performance measure. The algorithm in [22] and [23] proposes Wavelet Transformation (WT) as feature extraction combined with Principal component analysis (PCA) as Feature Selection method to combine with various classifiers. Table 8 clearly shows that the Proposed Time Series Motif detection as Feature extraction when combined with NB classifier it outperforms better than the existing Feature selection with classifiers. Fig 3 shows the performance of NB Classifier when compared to others with the precision rate of 98 when used with the proposed methodology.
Table 8. Experimental Comparison of Proposed with Neural Network Models. Feature extraction Classifier Training % Precision
RR[24] Back Propagation 100 83.4 RR[24] Radial Basis Functions 100 86.6
WT + PCA [22] QNN 100 91.7
WT+PCA[23] SVM 100 92.0
Proposed NB Tree 100 98
Fig 3. Performance comparison
5. Conclusion
In this paper, Multivariate Maximal Time Series Motif with Naïve Bayes Classifier is proposed to classify the ECG abnormalities in absence of prior knowledge. Also it achieves the accuracy of 93.33% and the precision rate of 98% than the existing techniques. In future, it is planned to identify beats in AAMI records for each signals.
References
1. P.de. Chazal, B.G. Celler, R.B. Bei. Using Wavelet coefficients for the classification of the electrocardiogram, in: Proceedi ngs of the 22nd Annual EMBS International Conference, Chicago IL, July 23-28, 2000.
83.4 86.6 91.7 92 98 80 85 90 95 100 BP RBF RS-QNN PD-SVM NB Precision
2. N. Srinivasan, D.F. Ge, S.M. Krishnan. Autoregressive modelling and classification of cardiac arrhythmias, in: Proceedings of the second joint Conference Houston, TX, USA, October 23-26, 2002.
3. Hafizah Hussain, Lai Len Fatt. Efficient ECG Classification using sparsely connected radial basis neural network, in: Proceedings of the 6th WSEAS International Conference on Circuits, Systems, Electronics, Control and signal Processing, December 2007, pp.
412-416.
4. Marcelo R. Risk, Jamil F. Sobh, J. Philip Saul, Beat Detection and Classification of ECG using Self Organizing maps, in: Proceedings – 19th International Conference – IEEEIEMBS, Chicago, IL, USA, October 30- November 2,1997.
5. Yuskel. Ozbay, Rahime. Ceylan, Bekir, Karlik, Integration of type-2 fuzzy clustering and wavelet transform in a neural network based ECG classifier, Expert Syst. Appl. 38 (2011) 1004–1010.
6. Telemachos Stamkopoulos, Nicos Maglaveras, Konstantinos Diamantaras, Michael Strintzis, ECG Analysis using nonlinear PCA neural networks for ischemia detection, IEEE Trans.signal process. 46 (11) (1998).
7. Zumray. Dokur, Tamer. Olmez, ECG beat classification by a novel hybrid neural network, Comput. Methods Prog. Biomedicine 66 (2001) 167–181.
8. Jie Zhou, Automatic detection of premature ventricular contraction using quantum neural networks, in: Proceedings of the Third IEEE Symposium on BioInformatics and BioEngineering, 2003.
9. Omer T. Inan, Laurent Giovangrandi, Gregory T.A. Kovacs, Robust Neural-Network-Based classification of premature ventricular contractions using wavelet transform and timing interval features, IEEE Trans. Biomedical Eng. 53 (12) (2006).
10. Wei Jiang, Seong G. Kong, Block-Based Neural Networks for personalized ECG signal classification, IEEE Trans. Neural Networks 18 (6) (2007).
11. Farid Melgani, Yakoub Bazi, Classification of electrocardiogram signals with support vector machines and particle swarm optimization, IEEE Trans. Inf. Technol. Biomedicine 12 (5) (2008).
12. Shivajirao M. Jadhav, Sanjay L. Nalbalwar, Ashok A. Ghatol, ECG Arrhythmia classification using modular neural network model, in: IEEE EMBS Conference on Biomedical Engineering & Sciences (IECBES 2010), Kuala Lumpur, Malaysia, 30th November–December 2, 2010.
13. Mariano Llamedo, Juan Pablo Martınez, Heartbeat classification using feature selection driven by database generalization criteria, IEEE Trans. Biomedical Eng. 58 (3) (2011).
14. Vu Mai, Ibrahim Khalil, Christopher Meli, ECG biometric using multilayer perceptron and radial basis function neural networks, in: 33rd Annual International Conference of the IEEE EMBS Boston, Massachusetts USA, August 30–September 3, 2011. 15. M.R. Homaeinezhad, S.A. Atyabi, E. Tavakkoli, H.N. Toosi, A. Ghaffari, R. Ebrahimpour, ECG arrhythmia recognition via a
neuro-SVM–KNN hybrid classifier with virtual QRS image-based geometrical features, Expert Syst. Appl. 39 (2012) 2047–2058. 16. Eduardo José da S. Luz, Thiago M. Nunes, Victor Hugo C. de Albuquerque, João P. Papa, David Menotti, ECG arrhythmia
classification based on optimum-path forest, 40, 3561-3573, 2013.
17. X. Tang, L. Shu, A Framework of Automatic Analysis System of Electrocardiogram Signals, Int’l J. of Signal Processing, Image
Processing and Pattern Recognition, 7(2), 211-222, 2014.
18. X. Tang, L. Shu, Classification of Electrocardiogram Signals with RS and Quantum Neural Networks, Int’l J. of Multimedia and
Ubiquitous Engineering, 9(2), 363-372,2014.
19. Physiobank archive index, MIT-BIH arrhythmia database. <http:// www.physionet.org/physiobank/database.>
20. Z. Syed , C. Stultz, M. Kellis, P. Indyk, J. Guttag, Motif Discovery in Phyiological Datasets: A Methodology for inferring predictive elements. ACM Trans. On Knowledge Discovery from Data, 4(1)(2), (2010)
21. Han.J., Pei.J and Yin.Y., “Mining Frequent Patterns without Candidate generation, ACM SIGMD Record, vol. 29, no.2, pp. 1-12.
22. X. Tang, L. Shu, A Framework of Automatic Analysis System of Electrocardiogram Signals, Int’l J. of Signal Processing, Image
Processing and Pattern Recognition, 7(2), 211-222, 2014.
23. X. Tang, L. Shu, Classification of Electrocardiogram Signals with RS and Quantum Neural Networks, Int’l J. of Multimedia and
Ubiquitous Engineering, 9(2), 363-372,2014.
24. J. Feng, “ECG Classification Based on Feature-extraction and Neural Network”, Master's thesis, Sichuan Normal University,