Shapelet transform results - Mining time-series data using discriminative subsequences

We compare the accuracy of the ensemble classifier to the accuracy of the bench- mark TSC algorithm, 1NN with dynamic time warping distance (1NNDTW), over 75 datasets. We also compare our approach to the previous best shapelet-based algorithms.

The previous state-of-the-art shapelet algorithms are the Fast Shapelet algorithm [141] and the Logical Shapelets algorithm [130]. The major weakness of both

is that they are limited to an embedded decision-tree classifier (see Chapter 3 for discussion of the shapelet decision tree. In theory, the Fast Shapelets approach could be implemented as a transform; as it stands, it exists only as a tree classifier).

5.3.1 Comparison with 1NNDTW

The 1NNDTW algorithm has proved to be effective at accurately classifying time- series data [51]. Time series are classified significantly more accurately by 1NNDTW than by 1NN with Euclidean distance [117], assuming that the width of the warping window, R, is set using cross validation.

Our first experiment compares the ensemble classifier on shapelet-transformed data to 1NNDTW on the raw data. We set R, the width of the DTW warping window, using cross validation. We compare accuracies over all 75 datasets, and test the differences using a Wilcoxon Signed Rank test at a significance level of 0.01.

The results of our test show that the ensemble classifier on shapelet-transformed data is significantly more accurate than 1NNDTW. The p value is 3.92 × 10−3. Ta- ble 8.1 in the appendix presents the complete set of results. The accuracy results are displayed graphically in Figure 5.2.

Table 5.2 shows the results broken down by problem type. For all problem types but Image, the ensemble classifier on shapelet-transformed data strongly outperforms 1NNDTW. On the Image problems, 1NNDTW marginally outperforms the ensemble, but the difference is not statistically significant. It seems plausible that this is a case of 1NNDTW performing better on Image problems than it does on the other problem types, rather than the ensemble classifier in the shapelet space performing worse. Intuitively, 1NNDTW is well suited to the Image problems we use, as the start and end points of the series are fixed in most cases, and the unrestricted warping window

0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Accuracy Ensemble Classifier on Shapelet-transformed Data 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Accuracy DTW_Rn_1NN DTW_Rn_1NN better here

Ensemble Classifier on Shapelet-transformed Data better here

Figure 5.2: Comparison of accuracy over 75 datasets between ensemble classifier on shapelet-transformed data and 1NN with DTW distance, warping window size set by cross validation. The ensemble classifier on shapelet-transformed data is better on 44 datasets, 1NN with DTW is better on 27.

can correct for rotation in the other cases (at least to some extent). The image- outline datasets we use also tend to be less noisy and involve less variation than, for example, the motion-classification datasets. The motion-classification datasets have noise in both indexing and in measurement, and have greater variation between series of the same class. These factors diminish the classification accuracy of the 1NNDTW classifier much more than they affect the accuracy of the ensemble classifier in the shapelet space, as the shapelet space intrinsically corrects for noise in indexing, is more robust to noise in measurement (because the local features are shorter, and therefore have less total noise than the whole series), and is less affected by global within-class variation. Hence, the performance of 1NNDTW on image-classification problems is comparable to that of the ensemble classifier on the shapelet space because those problems have fewer of the factors that diminish the accuracy of the 1NNDTW classifier.

Table 5.2: Counts of wins broken down by problem type for ensemble classifier on shapelet-transformed data and 1NNDTW with warping window size set by cross validation.

Problem type Wins shapelet Wins DTW Total

Image 12 15 27 Motion 8 4 12 Sensor 17 6 23 Human sensor 4 1 5 Simulated 3 1 4 Total 44 27 71

5.3.2 Comparison with Logical Shapelets

The Logical Shapelets algorithm [130] is the current best-performing shapelet-based classifier. It finds exact shapelets and deploys them in a more sophisticated variant of the shapelet tree. We compare the accuracies of the ensemble classifier on shapelet- transformed data to that of the Logical Shapelets algorithm on raw data.

We compare over the 31 datasets used in [141]. We have used every dataset from [141], except for ECG200 and Cricket(Small). Cricket(Small) is not in the UCR Repository [165], and ECG200 is a broken dataset that should no longer be included in such comparisons [7]. The datasets, and the accuracies and ranks, are shown in Table 8.5 (see appendix).

We test the differences in accuracy using a Wilcoxon Signed Rank test at a significance level of 0.01. The test shows that the ensemble classifier is significantly more accurate than the Logical Shapelets algorithm. The p value is 1.875 × 10−6. Figure 5.3 displays the results graphically.

5.3.3 Comparison with Fast Shapelets

The Fast Shapelets algorithm [141] is a development of the original shapelet algorithm [181] that speeds up the shapelet extraction process by discretising the time

0.5 0.6 0.7 0.8 0.9 1.0 Accuracy Ensemble Classifier on Shapelet-transformed Data 0.5 0.6 0.7 0.8 0.9 1.0

Accuracy Logical Shapelets

Logical Shapelets better here

Ensemble Classifier on Shapelet-transformed Data better here

Figure 5.3: Comparison of accuracy over 31 datasets between ensemble classifier on shapelet-transformed data and Logical Shapelets. The ensemble classifier on shapelet- transformed data is better on 28 datasets, Logical Shapelets is better on 3.

series using Symbolic Aggregate Approximation (SAX ) [115]. The Fast Shapelets algorithm, in contrast to the Logical Shapelets algorithm, does not find exact shapelets. This results in a large increase in speed, which makes experiments with the Fast Shapelets algorithm more tractable than with Logical Shapelets; hence, accuracy results are available on a wider variety of datasets.

We compare the accuracy of the ensemble classifier on shapelet-transformed data to that of the Fast Shapelets algorithm on raw data over 44 datasets. The datasets, accuracies, and ranks are listed in Table 8.6 (see appendix). We test the differences in accuracy using a Wilcoxon Signed Rank test at a significance level of 0.01. The test shows that the ensemble classifier is significantly more accurate than the Fast Shapelets algorithm. The p value is 4.522×10−8. The results are displayed graphically in Figure 5.4.

0.5 0.6 0.7 0.8 0.9 1.0 Accuracy Ensemble Classifier on Shapelet-transformed Data 0.5 0.6 0.7 0.8 0.9 1.0

Accuracy Fast Shapelets

Fast Shapelets better here

Ensemble Classifier on Shapelet-transformed Data better here

Figure 5.4: Comparison of accuracy over 44 datasets between ensemble classifier on shapelet-transformed data and Fast Shapelets. The ensemble classifier on shapelet- transformed data is better on 39 datasets, Fast Shapelets is better on 5.

In document Mining time-series data using discriminative subsequences (Page 108-113)