• No results found

We have seen length-invariant motifs and how resampling is at the core of the solution.

This brings us to our first problem with the algorithms, i.e. if the resampling rates are close to 1, the algorithms produce the same results with a standard Matrix Profile motif search. This happens because of the inaccuracy that creeps in due to the resampling method. Since it is impossible to get the exact resampling this problem shall remain and the standard same length motifs would have a smaller Euclidean distance. We could mitigate this problem by introducing weights corresponding to each resampling rate. Secondly, the resampling algorithm used here can be improved upon as hinted earlier. We can locally model the time series and extract needed data points from the model. Another approach would be to use a statistical interpolation method.

The algorithms use ABjoins that require a fixed length parameter for subsequence length.

This issue can be addressed by using one of the umpteen variable length motif discovery algorithms as a wrapper.

CONCLUSION

Motif discovery has been a key component to many time series data mining tasks. Four ways to compute length invariant motifs were discussed; as opposed to variable length motif discovery that tries to find the best subsequence lengths to look for a motif. These four algorithms, Iterative STOMP, Appended STAMP, Pruned STOMP and Piecewise STAMP were compared for efficiency and completion. Their utility was demonstrated through bird song analysis, power consumption data and through analysis on insect behavior. Limitations and problems with the algorithms were presented. Using the algorithms in conjunction with Matrix Profile can lead to a more efficient extraction of motifs.

REFERENCES

[1] Rakthanmanon, T., Keogh, E.J., Lonardi, S. et al. “MDL-based time series clustering”, Knowl Inf Syst (2012) 33: 371.

[2] Truong, C.D. & Anh, D.T. “A novel clustering-based method for time series motif discovery under time warping measure”, Int J Data Sci Anal (2017) 4: 113.

[3] Keogh, Eamonn J., Li Wei, Xiaopeng Xi, Sang-Hee Lee and Michail Vlachos. “LB_Keogh Supports Exact Indexing of Shapes under Rotation Invariance with Arbitrary Representations and Distance Measures.” VLDB (2006).

[4] C. M. Yeh et al., "Matrix Profile I: All Pairs Similarity Joins for Time Series: A Unifying View That Includes Motifs, Discords and Shapelets," 2016 IEEE 16th International Conference on Data Mining (ICDM), Barcelona, 2016, pp. 1317-1322.

doi: 10.1109/ICDM.2016.0179

[5] Y. Zhu et al., "Matrix Profile II: Exploiting a Novel Algorithm and GPUs to Break the One Hundred Million Barrier for Time Series Motifs and Joins," 2016 IEEE 16th International conference on Data Mining (ICDM), Barcelona, 2016, pp. 739-748.

doi: 10.1109/ICDM.2016.0085.

[6] Zhu, Yan & Yeh, Chin-Chia Michael & Zimmerman, Zachary & Kamgar, Kaveh & Keogh, Eamonn. (2018). Matrix Profile XI: SCRIMP++: Time Series Motif Discovery at Interactive Speeds. 837-846. 10.1109/ICDM.2018.00099.

[7] N. Begum and E. Keogh. Rare time series motif discovery from unbounded streams.

Proceedings of the VLDB Endowment, 8(2):149–160, 2014.

[8] B. Chiu, E. Keogh, and S. Lonardi. Probabilistic discovery of time series motifs. In Proceedings of the 9th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 493–498, 2003.

[9] J. Meng, J. Yuan, M. Hans, and Y. Wu. Mining motifs from human motion. In Proc. of EUROGRAPHICS, volume 8, 2008.

[10] Y. Zhu, M. Imamura, D. Nikovski and E. Keogh, "Matrix Profile VII: Time Series Chains: A New Primitive for Time Series Data Mining (Best Student Paper Award)," 2017 IEEE

International Conference on Data Mining (ICDM), New Orleans, LA, 2017, pp. 695-704. doi:

10.1109/ICDM.2017.79

[11] A. Mueen. Enumeration of time series motifs of all lengths. In Data Mining (ICDM), 2013 IEEE 13th International Conference on, pages 547–556. IEEE, 2013.

[12] P. Nunthanid, V. Niennattrakul and C. A. Ratanamahatana, "Discovery of variable length time series motif," The 8th Electrical Engineering/ Electronics, Computer, Telecommunications

[13] Senin, Pavel & Lin, Jessica & Wang, Xing & Oates, Tim & Gandhi, Sunil & P. Boedihardjo, Arnold & Chen, Crystal & Frankenstein, Susan. (2018). GrammarViz 3.0: Interactive Discovery of Variable-Length Time Series Patterns. ACM Transactions on Knowledge Discovery from Data. 12. 1-28. 10.1145/3051126.

[14] N. Castro and P. J. Azevedo. Multiresolution motif discovery in time series. In SDM, pages 665–676. SIAM, 2010.

[15] Y. Li, J. Lin, and T. Oates. Visualizing variable-length time series motifs. In SDM, pages 895–906. SIAM, 2012.

[16] Gao, Yifeng and Jessica D Lin. “Efficient discovery of time series motifs with large length range in million scale time series.” 2017 IEEE International Conference on Data Mining (ICDM) (2017): 1213-1222.

[17] Linardi, Michele & Zhu, Yan & Palpanas, Themis & Keogh, Eamonn. (2018). Matrix Profile X: VALMOD - Scalable Discovery of Variable-Length Motifs in Data Series. 1053-1066.

10.1145/3183713.3183744.

[18] H. Ding, G. Trajcevski, P. Scheuermann, X. Wang and E. J. Keogh, “Querying and mining of time series data: experimental comparison of representations and distance measures,” PVLDB 1(2): 1542-1552. 2008.

[19] Xeno-canto Website, URL: www.xeno-canto.org

[20]Lei W., Li P., Han Y., Gong S., Yang L., Hou M., 2016. EPG Recordings Reveal

Differential Feeding Behaviors in Sogatella furcifera in Response to Plant Virus Infection and Transmission Success. Scientific reports, 6, p.30240.

[21] Consensus Motif, Citation Pending. Work under review.

[22]Murray, D., Liao, J., Stankovic, L., Stankovic, V., Hauxwell-Baldwin, R., Wilson, C., ...

Firth, S. (2015). A data management platform for personalised real-time energy feedback. In Procededings of the 8th International Conference on Energy Efficiency in Domestic Appliances and Lighting: EEDAL’15

[23] Hoang Anh Dau, Eamonn Keogh, Kaveh Kamgar, Chin-Chia Michael Yeh, Yan Zhu, Shaghayegh Gharghabi , Chotirat Ann Ratanamahatana, Yanping Chen, Bing Hu, Nurjahan Begum, Anthony Bagnall , Abdullah Mueen and Gustavo Batista (2018). The UCR Time Series Classification Archive. URL: https://www.cs.ucr.edu/~eamonn/time_series_data_2018/

[24] Yifeng Gao, Jessica Lin, 2018. Efficient Discovery of Variable-length Time Series Motifs with Large Length Range in Million Scale Time Series.

[25]C. A Ratanamahatana and E. Keogh, "Making Time-series Classification More Accurate Using Learned Constraints," in Proceedings of the 2004 SIAM International Conference on Data

Related documents