Respiratory pattern analysis using SVM - Breathing pattern characterization in patients with re

The respiratory pattern describes the mechanical function of the pulmonary system. One way to characterize the respiratory pattern is through the respiratory time series that is extracted from the respiratory flow signal. One approach to finding differences between the GS group, patients who can maintain spontaneous breathing, and the

GF group, those who cannot, is to analyse respiratory pattern variability. An SVM- based feature selection algorithm optimizes the feature subset for better classification.

4.2.1 Methodology

4.2.1.1 Respiratory pattern characterization

Several time series are obtained from the respiratory flow signal: inspiratory time (TI), expiratory time (TE), breath duration (TT ot), tidal volume (VT), fractional inspiratory time (TI/TT ot), mean inspiratory flow (VT/TI) and frequency-tidal volume ratio (f /V_T). These time series characterize the respiratory pattern.

Each one of the seven time series is processed by a running window that consists of several consecutive breath cycles and has a width that ranges from 3 to 100. The mean (m), standard deviation (s), kurtosis (k), skewness (Sk) and interquartile range (I) of the value are calculated for each window. Thus, 35 new time series are obtained for each patient. The optimal width of the running window is selected from between 3 to 100 using a Mann-Whitney test, and the best width result is 15, with p < 0.001 in all cases.

Once the window width has been selected, the data for each patient are analysed independently by applying an algorithm based on the k-means method, which auto- matically determines the best number of clusters for all patients. For the patients in this study, one main cluster contains most of the patterns and has considerable internal cohesion (low intra-cluster variance) that corresponds to more than 96% for each group.

This result is exploited to reduce the data, so that a single pattern of 35 features is associated with each patient. This pattern is computed as the mean value of the data

points in the main (largest) conglomerate of the patient, using the k-means clustering algorithm.

The patients are distributed as follows: 80% for training and 20% for testing. The best classification result is obtained by applying leave-one-out cross-validation with the following 8 features: s(TE), m(TT ot), m(TI), m(TE), s(TT ot), I(TT ot), m(f /VT), m(TI/TT ot). These features are used for the final SVM-based classification process.

4.2.1.2 Histogram equalization

A reduction in the overlap between successful and unsuccessful patients (GS and GF ) may be attained if the variances of the features are similar. However, variances cannot always be expected to be similar. To solve this problem, we propose an equalization of the histograms of the previously selected features, as a nonaffine normalization process [111], [112] and [113].

Histogram equalization or cumulative distribution function (CDF) equalization is a nonparametric method to match the CDF of some given data to a reference distribution. The principle of this method is to find a nonlinear transformation to reduce the mismatch of the two signals. This transformation maps the distribution of a signal back to the distribution of the reference signal, and is defined by means of the CDFs of the signals in the process.

The CDF is estimated by equally spaced intervals to obtain more reliable data. Each interval x ∈ [qi, qi+1[ is represented by (xi, F (xi)), which corresponds to the average of scores (xij) and the maximum cumulative distribution value F (xi), both of which are calculated for each interval of the reference signal, given by

xi= Pki j=1xij ki where F (xi) = Ki M (4.1)

where xij = x ∈ [qi, qi+1[, ki is the number of data in the interval [qi, qi+1[, and Ki is the number of data in the interval [q0, qi+1[, and M is the total number of data items. F (xi) defines the boundaries of the intervals in the CDF that will be equalized. These boundaries [q_i0, q_i+10 [ limit the interval of values that fulfil the following expres- sion: F (qi) ≤ F (y) < F (qi+1). All values of y that are in the interval [qi0, qi+10 [ will be transformed to their corresponding xi value.

As a reference, the designed equalization takes the feature whereby the minor classification error is obtained by the leave-one-out cross-validation process, which is the s(T E) feature. Therefore, the CDF of this feature is the reference distribution.

4.2.2 Results

A grid search is performed to find the optimum penalty parameter, C. The minimum C that provides the best classification error accuracy is selected (C = 15). Thus, the cost and the generalization error are reduced. An internal n-fold cross-validation shows that the best value of the parameter σ that is used in the kernel function is σ = 0.5. When all 35 features are used for each patient, the average correct classi- fication rate is 66.67%. A feature selection process is carried out to select the most discriminative feature subset and to remove the remaining noisy features. Both the computational cost and classification error are reduced. The histogram equalization technique is applied to the selected features, to match their CDF to the distribution of the most discriminative feature. This study showed a classification accuracy of 80%, a sensitivity of 86.67%, and a specificity of 73.34%.

Orthogonal projections with principal component analysis are used to visualize the high dimensional input space data on a plane. Fig. 4.1 shows the overlapping position of patients of GS and GF . Figs 4.2 and 4.3 show the final classification of

the training set and test set, respectively.

Figure 4.1 Successful (GS) and unsuccessful (GF ) patients before the classification. Orthogonal projections with principal component analysis are used to represent the hyperspace data on a plane before the classification.

Figure 4.2 Training set of successful (GS) and unsuccessful (GF ) patients

after the classification. Appropriate feature selection and histogram equalization are obtained with the training set before the final classification.

Figure 4.3 Testing set of successful (GS) and unsuccessful (GF ) patients

after the classification. Appropriate variable selection and histogram equalization are obtained with the training set before the final classification.

In document Breathing pattern characterization in patients with respiratory and cardiac failure (Page 109-114)