• No results found

The practical evaluation is done for both motifs of similar and variable lengths. In order to analyse the obtained results several quality measures are used during the experiments.

The usefulness of the proposed method is demonstrated with data sets, from various domains and with different properties such as amount of noise, length of the time-series, length of the motif.

Moreover, four common used motif discovery algorithms namely Mr. Motif [8], Enumeration motif [5], Mueen-Keogh (MK) [24], and the grammar- based method [6] are applied as benchmarks to evaluate the recent approach.

Quality Measures. A motif is perfect when it matches all the subse-

quences in the target class/group and no other subsequences out of that class. Various quality measures are applied to examine the result of the pro- posed motif discovery algorithm. Four possible cases to qualify a motif π‘š

matching a subsequent 𝑆𝑝are: True Positive rate (TP), False Negative rate

(FN), True Negative rate (TN), and False Positive rate (FP), which can be obtained for each class label. Moreover, other quality measures which can

obtain from the four mentioned measures are Sensitivity (𝑆𝑛 = 𝑇 𝑃

𝑇 𝑃 +𝐹 𝑁)

which measures the capacity of subsequences of the target class correctly

matched by the motif, and Specificity (𝑆𝑝 = 𝑇 𝑁 +𝐹 𝑃𝑇 𝑁 ) that calculates the

proportion of subsequences outside the target class that are not matched by

the motif. Precision (𝑃 π‘Ÿ = 𝑇 𝑃 +𝐹 𝑃𝑇 𝑃 ) provides the fraction of subsequences

of the target class that are matched by the motif and the subsequence that

are not correctly matched by the motif. F-Measure (𝐹 βˆ’ 𝑀 = 2 Β· (𝑃 π‘Ÿ+𝑆𝑛𝑃 π‘ŸΒ·π‘†π‘›))

considers both precision and sensitivity. Additionally, we obtain the correct

motif discovery rate 𝐢𝑅 = 𝑛𝑁+ where 𝑛+is number of correctly detected

motifs and 𝑁 is number of all motifs.

Test Cases. The Gun data set [25] is gathered from the video surveillance

domain. Two types of motifs are included in the data set: Gun draw and gun point. For each motif, 100 examples are considered.

Swedish leaf data set [25] comes from a project at LinkΓΆping University. The data set contains of leaves from 15 different Swedish trees. In our experiments, only 3 types of leaves are considered as motifs.

AutoSense data is gathered from the running research project called β€œAdap- tive energy self-sufficient sensor network for monitoring safety-critical self-service-systems” [26]. The focus of this project is monitoring security critical systems, e. g., the identification of criminal attacks on automated teller machines (ATMs). After identification of several relevant attacks on ATMs, these attacks are tested and the results of them are gathered from sensors connected to the system. This data consists of 24 signals with different lengths, gathered from 8 sensors in 3 different experiments done on an ATM machine.

Results. Based on the mentioned quality measures, the obtained result

of the proposed method and the other benchmark methods for motifs of equal length are given in Tab. 1-3.

The MK algorithm [24] is able to detect only one pair of motif of similar length. The MK algorithm is useful in the case of one motif/pattern data. However, tested data sets here contain motifs/patterns in more than one

Table 1: Evaluation results of equal length motifs in Gun Data [25], Sn: Sensitivity, Sp: Specificity, Pr: Precision, F-M: F-Measure, CR: Correct motif discovery rate

Method Sn(%) Sp (%) Pr (%) F-M (%) CR (%)

SIMD 90.0 90.0 90.0 90.0 90.0

Mr. Motif 70.0 70.0 70.0 70.0 75.0

Enum. Motif 42.5 65.0 54.8 47.9 50.0

Table 2: Evaluation results of equal length motifs in Swedish Leaf [25], Sn: Sensitivity, Sp: Specificity, Pr: Precision, F-M: F-Measure, CR: Correct motif discovery rate

Method Sn(%) Sp (%) Pr (%) F-M (%) CR (%)

SIMD 94.9 96.8 93.3 94.1 95.0

Mr. Motif 54.2 95.8 86.4 66.6 73.3

Enum. Motif 55.9 85.9 66.0 55.9 55.0

Table 3: Evaluation results of equal length motifs in AutoSense [26], Sn: Sensitivity, Sp: Specificity, Pr: Precision, F-M: F-Measure, CR: Correct motif discovery rate

Method Sn(%) Sp (%) Pr (%) F-M (%) CR (%)

SIMD 80.8 97.4 81.9 81.4 80.0

Mr. Motif 25.0 95.2 42.8 31.5 54.1

Enum. Motif 62.5 95.7 65.2 63.8 62.5

classes. The grammar-based method is also able to identify most of the motifs of equal lengths, although one need to match the best parameters combination. For this reason, only the results of Mr. Motif and Enum. Motif are comparable with our results.

As shown, the SIMD method provides better result compared with other algorithms. In all tables the specificity shows high percentage due to the large amount of TN. Also, the percentage of precision in the SIMD is higher than other methods, which depicts the larger amount of correctly found motifs. The F-measure is considered as the overall accuracy measurement which is large for our proposed method in comparison with others.

Table 4: Evaluation results of the SIMD method for variable length motifs, Sn: Sensitivity, Sp: Specificity, Pr: Precision, F-M: F-Measure, CR: Correct motif discovery rate

Data Sn (%) Sp (%) Pr (%) F-M (%) CR (%)

Gun 72.5 72.5 72.5 72.5 75.0

Swedish Leaf 94.9 95.8 91.8 93.3 93.3

AutoSense 83.3 97.6 83.3 83.3 73.0

In most of the cases, Mr. Motif algorithm also provides reasonable results. Although, in the case of the AutoSense data the result are not acceptable. This is due to the representation method applied in Mr. Motif which is unable to handle noise in the data.

Additionally, in the second experiments the aforementioned methods are tested to find motifs of variable lengths. However, only the SIMD is able to discover such motifs. Correspondingly, we are not able to compare our method with other algorithms in the case of motifs with multi-variable lengths. Nevertheless, the result of the proposed method for motifs of vari- able lengths are given in Tab. 4. As depicted, only the proposed method detects motifs of variable length, which proves the success of our algorithm. It should be mentioned that these methods were examined against more data sets in [25], but due to lack of space the results of three data sets are given in this paper.

Related documents