• No results found

Methods: Data analysis pipeline for V, A, N classification

classification

The Python Scikit Learn toolkit with Anaconda distribution was used for the following data analysis steps. The Python Pandas data frames were used to import the tables containing the target and predictor variables. The

AnnotationType variable being the target variable. As there were three classes for

target variables, with annotation types V, A, N, one-hot encoding was performed to obtain a binary denotation of the annotation types. Data analysis steps for machine learning based classification were as follows:

The Data analysis pipeline:

• Step 1: Since the scale of the feature values in the feature vectors affected the bias and the variance of the classifier, the feature vectors had to be scaled and normalised accordingly. The Scikit StandardScaler could have been used on the data such that its distribution would have a mean value 0 and standard deviation of 1. StandardScaler works well when normalisation is required where all the features are scaled within a normalised range of values e.g. {-1, +1} though more than normalisation, since the predictor variables had outliers with RR-intervals in excess of 1.1 sec, PR-intervals in excess of 300 ms and QRS intervals greater than 190 ms, standardisation to remove outliers was required. RobustScaler based standardisation was performed so that the outliers would have no effect on the classification tasks by using quantile_range between (15, 85) and scaling set to true. The

RobustScaler removed the outliers and limited the QRS PSD and PR PSD

in the following ranges: (-15, 10) and (-15, 15) dB respectively as shown in figure 4.9

• Step 2: Having performed robust scaling, since some features attributed to the least amount of variance only the first ‘k’ best features were considered for a classification task. SelectKBest feature selector was used to identify first six features (k=6) that attributed to maximum variance in the feature values, figure 4.10. The selector uses the f_classif scoring function which is essentially the Analysis Of Variance (ANOVA) F-value score.

Figure 4.9: Standard and Robust scalers distribution and scatter plot of QRS vs PR intervals power spectral densities over RR-Interval

4.3. Methods: Spectral Analysis of V, A, N annotation types 121

Figure 4.10: Feature importance after extended feature extraction algorithm considering power spectral densities and using RandomForestClassifier

• Step 3: A RandomForestClassifier with 500 estimators (number of trees in the forest), and ‘balanced’ class weight was used only to determine feature importance (not for classification task). The “balanced” mode uses the values of the features to automatically adjust weights inversely proportional to class frequencies in the input data. Due to the novel feature extraction algorithm which considered power spectral densities instead of intervals, the PR_PSD (power spectral density of PR-interval) alone contributed to the maximum variance (41%) in the input dataset. The QRS_PSD (power spectral density of QRS-interval) was almost as important as the QRS-interval. Despite PR-interval being most important features in any ECG classification task to differentiate between PVC, PAC and Normal sinus rhythm, its power spectral density turned out to be more important feature to differentiate between these arrhythmia types.

• Step 4: Dataset imbalance removal: The target response class variable

AnnotationType, had the following distribution from a total of 38371

samples: A 2132, N 26362, V 9877. Considering a 70-30% training-test dataset split, the training set was 24,941 samples and validation-test set

was 13,430 samples. In order to eliminate or at least reduce the imbalance of the dataset, over and under sampling along with regularisation techniques were applied to the dataset. There were several techniques available, like adding more samples to the less represented class, resampling, using penalised models, using a variety of performance metrics and generating synthetic samples or by using a combination of these techniques. It was not possible is to add more samples for the A-type annotation, as the entire MITDB database had no more than 2132 samples. The N-type annotation had the maximum number of samples. A complete oversampling of the underrepresented class variable or under sampling of most-represented class variable may have been possible, though these may have introduced synthetic values to just one of these class types. As a solution and to impart balance between under an oversampling the Synthetic Minority Oversampling Technique (SMOTE) type balancing was performed. The SMOTE balancing technique not only oversamples the underrepresented class but it also under samples the overrepresented class. The fit_sample() transformed the feature set to {Training Set } = { 51405} from an original {Training Set } = { 24941} samples.

• Step 5: Classification based on SMOTE balancing: Initially,

LogisticRegression with Lasso regularisation was attempted by setting

the penalty attribute to ‘penalty=l1’. The Lasso technique was used as it shrinks the less important feature’s coefficient to zero thus, removing or at least eliminating the effect of less important features altogether.

LogisticRegression(penalty=’l1’, multi_class = ’ovr’, solver = ’liblinear’)

The multi_class = ’ovr’ option was set for multiclass classification using One-vs-Rest classifier and a ‘liblinear’ solver was chosen as it is most suitable for multiclass classification for LogisticRegression models. Even with SMOTE type balancing and Lasso regularisation the LogisticRegression model with GridSearchCV cross validation showed an overall balanced_accuracy score of 90%, table 4.8 using 10-fold cross validation.

4.3. Methods: Spectral Analysis of V, A, N annotation types 123 Hyper-parameters: C = [0.1, 10, 100, 1000, 10000] Gamma = [0.001, 0.0001] Scoring = ’balanced_accuracy’ Cross validation (cv) = 10

It was observed that the precision, recall, f1-score metrics showed significant improvement from the previous classification after SMOTE balancing was performed as shown in table 4.8 . The classification accuracy for A-type annotation increased from 39% to 87% after SMOTE balancing was performed; the A-type annotation beats being the most underrepresented response class type in the entire dataset.

Comparison of accuracy scores before and after SMOTE balancing

Table 4.8: Comparison of classification accuracy scores before and after SMOTE balancing using LogisticRegression

A ‘balanced_accuracy’ score seemed more appropriate instead of just the ‘accuracy’ scoring, as it would take into consideration the class imbalance of the feature dataset. This was largely due to the linear separation due to the LogisticRegression models.

• Step 6: LogisticRegression works well with linear classifications, however, it may not be the most appropriate model for nonlinear feature sets. As the feature importance calculations were already performed in Step3, a classification model based of feature importance was chosen with the

RandomForestClassifier which was used, in this step, as a classifier along

with GridSearchCV to perform a hyper-parameter tuning, table 4.9. In addition, StratifiedKFold with 5 splits was used for cross validation

(cross_val_score ) with balanced_accuracy scoring, which increased the overall balanced_accuracy for all the target response class variables and for the A-type response class variable to more than 95%.

A similar experiment was performed using KNeighborsClassifier (k-NN) with

k=5 and an overall balanced_accuracy of more than 95% was obtained.

Both RandomForestClassifier and k-NN are prone to overfitting. In order to circumvent the problem of overfitting, cross validation with StratifiedKFold and balance_accuracy scoring was used. Rather than relying only on the precision accuracy score, the scores such as balanced_accuracy, recall,

f1-score metrics were obtained as a classification report. The Scikit Learn cross_val_score cross validation was used on training as well as test data

sets with StratifiedKFold cross validation with 5 splits which made sure that all the classes were equally represented in the cross validation process.

4.4

Results: Data analysis pipeline for V,A,N

classification

On executing the data analysis pipeline from section 4.3.3 an overall classification accuracy score of 97% was observed. The training and the test accuracy scores were more than 97% and the prediction accuracy score was more than 96%. The precision accuracy for classification of V-type and the A-type annotations was 100% and for N-type annotation the precision accuracy was 91% as shown in table 4.9.

The k-NN classifier and the RandomForestClassifier are known to be quick learners and are quite accurate when the data is skewed. Having reduced the dataset imbalance and as the classification models could obtain classification accuracy of more than 97%. In previous experiments, due to dataset imbalance it wasn’t possible to obtain higher accuracy of classification, especially for the A-type annotation. As could be observed in the feature importance table obtained earlier using RandomForestClassifier, the classification models could obtain 90% precision recall in classifying the A-type annotation which was the most under-represented class type before SMOTE imbalance reduction.

4.4. Results: Data analysis pipeline for V,A,N classification 125

GridSearchCV and RandomForestClassifier based classification parameters and results

Parameter grid for

GridSearchCV { ’n_estimators’: [200, 500], ’max_features’: [’auto’], ’max_depth’ : [4,8], ’criterion’ :[’gini’], ’n_jobs’:[2] } RandomForestClassifier n_estimators=500 class_weight="balanced" GridSearchCV estimator= RandomForestClassifier scoring=’balanced_accuracy’ Cross validation using

cross_val_score and StratifiedKFold validation

scoring=’balanced_accuracy’, cv=StratifiedKFold(n_splits=5)

GridSearchCV RandomForestClassifier best params:

{‘criterion’: ‘gini’, ‘max_depth’: 8, ‘max_features’: ’auto’, ‘n_estimators’: 200

Results: GridSearchCV RandomForestClassifier training accuracy: 0.974 +/- 0.001

GridSearchCV RandomForestClassifier test accuracy: 0.974 +/- 0.001

GridSearchCV RandomForestClassifier Prediction Accuracy: 0.967378346158

Table 4.9: GridSearchCV and RandomForestClassifier based classification

parameters and classification report following SMOTE imbalance reduction for V, A, N annotation types using the features extracted from the consolidated feature extraction algorithm.

The classification model was persisted in binary format using the Scikit-Learn package, section 3.5, and deployed on the target device for prediction. In the following section signal acquisition and conditioning of fresh ECG samples have been discussed. The denoised and filtered ECG samples were converted to a recognisable MITDB format as described in section 4.5.3. It was this MITDB

compatible signal that was provided as an input to the classification model persisted on the target device. The model could then classify between the V, A, N annotation types in real time. As the research focused on implementing the classification and prediction of arrhythmia on the wearable resource constrained device, it was essential that the trained model could be ported and persisted to the target device to only make predictions without having to train the classifier model on the training set again on the resource constrained device. As the classification model could be ported and executed on the target device, the feature-set fitting and transformation methods, which normally required greater processing power and had larger memory requirements, were not required to be executed on the target device, nor were any of the regularisation, dataset balancing or cross validation tasks performed again on the target device. Since the model was already trained, tested and cross-validated, it performed its classification tasks on the target device with optimal accuracy, section 4.7 . In order to perform the classification tasks on the test ECG waveforms, feature vectors had to be extracted from ECG signals captured in real time, using the same feature extraction algorithm that was used to extract features from the MITDB arrhythmia database. In order to run the feature extraction algorithm on the test ECG waveforms, these test ECG waveforms had to be converted to a WFDB compatible records and prior to that the signal had to be filtered and denoised. The method of real time signal acquisition is presented in the next subsection 4.5 with the details of the signal processing algorithms and input output parameters.