Framework for stream clustering of trajectories based on temporal micro clustering technique
40
0
0
Full text
(2) PM. IG. H. T. U. FRAMEWORK FOR STREAM CLUSTERING OF TRAJECTORIES BASED ON TEMPORAL MICRO CLUSTERING TECHNIQUE. R. By. ©. C. O. PY. MUSAAB RIYADH ABDULRAZZAQ. Thesis Submitted to the School of Graduate Studies, Universiti Putra Malaysia, in Fulfillment of the Requirements for the Degree of Doctor of Philosophy. March 2018 1.
(3) COPYRIGHT. PM. All material contained within the thesis, including without limitation text, logos, icons, photographs, and all other artwork, is copyright material of Universiti Putra Malaysia unless otherwise stated. Use may be made of any material contained within the thesis for non-commercial purposes from the copyright holder. Commercial use of material may only be made with the express, prior, written permission of Universiti Putra Malaysia.. ©. C. O. PY. R. IG. H. T. U. Copyright © Universiti Putra Malaysia. i.
(4) DEDICATION. ©. C. O. PY. R. IG. H. T. U. PM. Dedicated to my amazing wife Zainab for her love, support and encouragement, my daughter Layan, my parents Riyadh and Badeeah, my brother Saad and my sisters May, Nada, and Dina for their love and affection. ii.
(5) Abstract of thesis presented to the Senate of Universiti Putra Malaysia in fulfillment of the requirement for the degree of Doctor of Philosophy. FRAMEWORK FOR STREAM CLUSTERING OF TRAJECTORIES BASED ON TEMPORAL MICRO CLUSTERING TECHNIQUE. PM. By. March 2018. T. : Associate Professor Norwati Mustapha, PhD : Computer Science and Information Technology. H. Chairman Faculty. U. MUSAAB RIYADH ABDULRAZZAQ. ©. C. O. PY. R. IG. In recent years, spatio-temporal data has rapidly increased due to the tremendous prevalence of geolocation devices such as Global Positioning System, mobiles, and motion sensors. In some scenarios, spatio-temporal data are received in a streamed manner. Clustering of the stream data is a challenging task due to single pass over the data, the unbounded size of data stream, and the limited processing time and memory space. However, it is a vital process for many applications such as traffic management, studying of animal behavior, and weather forecasting. Many existing algorithms for stream clustering of trajectory stream data exploit the time window technique which partition trajectory stream data into time-bins and cluster the segments in each timebin separately. Initiating clustering from scratch in each time-bin and not considering the relationships between the objects from two consecutive time-bins lead to creating redundant micro clusters centralizes in the border area between two adjacent time bins. This is because, the clustering process which exploits the time window technique creates new micro clusters for some trajectory segments at the start of each time-bin even though these segments are too close (within distance threshold) to the micro clusters at the end of previous time-bin . It is true that most similar micro clusters at two consecutive time-bins can be merged to reduce memory space but creating redundant micro clusters will dramatically slow down the clustering task. On the other hand, trajectories preprocessing such as segmentation and noise points filtering is a vital step which precedes the mining task. It aims to reduce the size of the trajectory to minimize clustering time. Most of the existing algorithms consider the noise filtering step precedes trajectory segmentation step which mean that there is no preprocessing method to partition trajectory into set of segments and remove noise points in real time with low computational cost.. i.
(6) PM. As a response towards the limitations of time window technique and the offline preprocessing methods, a framework for stream clustering of trajectories based on temporal micro clustering technique (SCT-TMC) is proposed. The framework consists of two stages: the preprocessing stage and the stream clustering stage. In the preprocessing stage, the On-Line Noise Filtering Algorithm for Trajectory Segmentation Based on Minimum Description Length concept (ONF_TRS) is proposed to achieve trajectory segmentation and remove noise points in real time with low computational cost. The minimum description length is an important concept in information theory and computational learning in which the best hypothesis for a given data set is the one that leads to the best compression of the data.. ©. C. O. PY. R. IG. H. T. U. The stream clustering stage consists of two phases: the online and the offline phases. In the online phase, the stream clustering algorithm for trajectories based on the lifespan of the cluster is proposed (CC_TRS) to overcome the limitations of the time window technique. The clustering algorithm consists of two components: the temporal micro-clusters generation and the temporal micro clusters merging. The temporal micro cluster data structure is proposed in CC_TRS algorithm to store the summarized information for each group of similar segments. The algorithm assigns a life time for each newly created temporal micro clusters so that the new incoming batch of trajectories segments will interact only with the non-expired temporal micro clusters. This is because the expired temporal micro clusters become far from segments at current time either spatially or temporally which in turn, minimizes the searching time for the nearest temporal micro cluster. When the size of the temporal micro clusters is exceeded a given memory space, the most similar temporal micro-clusters will be merged to save memory. On the other hand, the offline phase is evoked when the user requests to view the overall clustering results. The DBSCAN algorithm is used to perform the macro clustering task by replacing the distance between trajectories segments with the distance between the temporal micro-clusters. The DBSCAN is a density based clustering approach which is convenient for stream clustering due to the properties: firstly, it does not require to specify the number of clusters in advance. Secondly, the algorithm can find arbitrarily shaped clusters. Finally, it is robust to noise and outliers. The suggestion of the temporal micro cluster data structure in the online phase leads to the expansion of the functionality of the macro clustering phase. The proposed functions for the offline phase are the detection of spatial and spatiotemporal outliers and the macro clustering. A comprehensive experimental analysis was conducted to evaluate the efficiency and effectiveness of the proposed algorithms (ONF-TRS, CC-TRS). The results shows that the algorithm ONF-TRS has low computational cost and high compression rate as compared with the existing works. Furthermore, the CC-TRS improves and speeds up the clustering task as compared with the latest works.. ii.
(7) Abstrak tesis yang dikemukakan kepada Senat Universiti Putra Malaysia sebagai memenuhi keperluan untuk ijazah Doktor Falsafah RANGKA KERJA UNTUK PENGKLUSTERAN ALIRAN TRAJEKTORI BERASASKAN TEKNIK PENGKLUSTERAN MIKRO TEMPORAL. PM. Oleh. Mac 2018. T. : Profesor Madya Norwati Mustapha, PhD : Sains Komputer dan Teknologi Maklumat. H. Pengerusi Fakulti. U. MUSAAB RIYADH ABDULRAZZAQ. ©. C. O. PY. R. IG. Dalam beberapa tahun ini, data spatio-temporal telah meningkat dengan pesat disebabkan oleh penyebaran luas peranti Geo-lokasi seperti Global Positioning System (GPS), telefon bimbit, sensor gerakan. Dalam sesetengah senario, data spatiotemporal diterima secara aliran. Pengklusteran data aliran adalah tugas mencabar disebabkan laluan tunggal terhadap data, saiz yang tiada had bagi data aliran, dan masa pemprosesan dan ruang ingatan yang terhad. Walau bagaimanapun, ianya adalah proses penting untuk banyak aplikasi seperti pengurusan trafik, kajian tingkah laku haiwan, dan ramalan cuaca. Banyak algoritma sedia ada untuk pengklusteran aliran bagi data trajektori mengeksploitasi teknik tetingkap masa yang mempartisi data aliran trajektori ke dalam bin-masa dan klusterkan segmen-segmen dalam setiap bin-masa secara berasingan. Memulakan pengklusteran dari awal dalam setiap bin-masa dan tidak mempertimbangkan hubungan antara objek daripada dua bin-masa berturutan mendorong kepada pembinaan kluster mikro bertindan yang memusat di kawasan sempadan antara dua bin-masa yang bersebelahan jika ianya adalah segmen trajektori yang sangat padat. Ini adalah kerana proses pengklusteran yang mengeksploitasi teknik tetingkap masa membina kluster mikro baharu untuk beberapa segmen trajektori pada permulaan setiap bin-masa walaupun segmen-segmen ini adalah sangat rapat (dalam jarak ambang) kepada kluster mikro pada hujung bin-masa sebelumnya. Ianya adalah benar iaitu kebanyakan mikro kluster yang paling dekat pada setiap binmasa yang berturutan boleh digabungkan bagi mengurangkan ruang ingatan tetapi pembinaan kluster mikro yang bertindan akan melambatkan tugas pengklusteran secara dramatik. Sebaliknya, pra-pemprosesan trajektori seperti segmentasi dan penapisan titik gangguan adalah langkah penting sebelum tugas melombong. Ini bertujuan mengurangkan saiz trajektori untuk meminimumkan masa pengklusteran. Kebanyakan algoritma sedia ada mengambil kira langkah penapisan gangguan sebelum langkah segmentasi trajektori yang bermaksud tiada kaedah prapemprosesan. iii.
(8) untuk mempartisikan trajektori ke dalam set segmen dan membuang titik gangguan dalam masa nyata dengan kos pengiraan yang rendah.. T. U. PM. Sebagai tindak balas terhadap kekurangan teknik tetingkap masa dan kaedah prapemprosesan secara luar talian sebagaimana yang telah diterangkan di atas, satu rangka kerja untuk pengklusteran aliran trajektori berasaskan jangka hayat kluster (SCT_TMC) dicadangkan. Rangka kerja ini terdiri dari dua peringkat: peringkat prapemprosesan dan peringkat pengklusteran aliran. Dalam peringkat prapemprosesan, Algoritma On-Line Noise Filtering untuk segmentasi trajektori berasaskan konsep Minimum Description Length (ONF_TRS) dicadangkan untuk mencapai segmentasi trajektori dan pembuangan titik gangguan dalam masa nyata dengan kos pengiraan yang rendah. Minimum Description Length adalah konsep penting dalam teori maklumat dan pembelajaran pengiraan yang merupakan hipotesis terbaik untuk set data yang diberi iaitu yang mendorong kepada pemampatan terbaik bagi data.. O. PY. R. IG. H. Peringkat pengklusteran aliran terdiri dari dua fasa: fasa dalam talian dan fasa luar talian. Dalam fasa dalam talian, algoritma pengklusteran aliran untuk trajektori berasaskan jangka hayat kluster dicadangkan (CC_TRS) untuk mengatasi kelemahan teknik tetingkap masa. Algoritma ini terdiri dari dua komponen: penjanaan mikro kluster temporal dan penggabungan kluster mikro temporal. Struktur data kluster mikro temporal dicadangkan dalam algoritma CC_TRS untuk menyimpan maklumat yang telah diringkaskan bagi setiap kumpulan segmen yang sama. Algoritma mengumpukkan masa hayat untuk setiap kluster mikro temporal yang baharu dibina supaya kumpulan segmen trajektori yang baru masuk akan berinteraksi hanya dengan kluster mikro temporal yang tidak tamat tempoh. Ini kerana kluster mikro temporal yang telah tamat tempoh menjadi jauh dari segmen semasa sama ada secara spatial atau secara temporal yang meminimumkan masa carian untuk kluster mikro temporal yang paling dekat. Apabila saiz kluster mikro temporal melebihi ruang ingatan yang diberikan, kluster mikro temporal yang paling dekat akan digabungkan untuk menjimatkan ingatan.. ©. C. Sebaliknya, fasa luar talian dipanggil apabila pengguna memohon untuk melihat keseluruhan hasil pengklusteran. Algoritma DBSCAN digunakan untuk melaksana tugas pengklusteran makro dengan menggantikan jarak antara segmen dengan jarak antara kluster mikro temporal. DBSCAN adalah pendekatan pengklusteran berasaskan ketumpatan yang sesuai untuk pengklusteran aliran disebabkan ciri-ciri; pertama, ia tidak memerlukan untuk menentukan bilangan kluster terlebih dahulu. Kedua, algoritma dapat mencari kluster yang pelbagai bentuk. Akhirnya, ia tahan terhadap gangguan dan unsur luaran. Cadangan stuktur data kluster mikro temporal dalam fasa dalam talian mendorong kepada penambahan fungsian bagi fasa pengklusteran makro. Fungsi yang telah dicadangkan untuk fasa luar talian ialah pengesanan unsur luaran spatial dan spatio-temporal dan pengklusteran makro.. iv.
(9) ©. C. O. PY. R. IG. H. T. U. PM. Analisis eksperimen yang komprehensif telah dijalankan untuk menilai kecekapan dan keberkesanan algoritma yang dicadangkan (ONF-TRS, CC-TRS). Hasil menunjukkan ONF-TRS mempunyai kos pengiraan yang rendah dan kadar mampatan yang tinggi berbanding dengan kerja sedia ada. Sementara itu, CC-TRS menambahbaik kelajuan tugas pengklusteran berbanding dengan hasil kerja terkini.. v.
(10) ACKNOWLEDGEMENTS. PM. First, I would like to express my sincere gratitude to my supervisor Associate Professor Dr. Norwati Mustapha for giving me an opportunity to start this study. Through the courses of my study, I have had the great fortune to get to know and interact with her. Her comments and suggestions for further development as well as her assistance during writing this thesis are invaluable to me.. U. I would like to express my sincere thanks and appreciation to the supervisory committee members Associate Professor Dr. Md Nasir Sulaiman and Associate Professor Dr. Nurfadhlina Mohd Sharef for their guidance, valuable suggestions and advice throughout this work in making this a success.. H. T. My deepest appreciation to my wife Ms. Zainab and my daughter Layan who have been supportive and patiently waiting for me to complete my study. Finally, I owe my sincere thanks to my parents, brother, and sisters for their encouragement and affirmation, which made it possible for me to achieve this work.. ©. C. O. PY. R. IG. For the others who have directly or indirectly helped me in the completion of my work, I thank you all.. vi.
(11) ©. T. H. IG. R. PY. O. C. PM. U.
(12) Norwati Mustapha, PhD Associate Professor Faculty of Computer Science and Information Technology Universiti Putra Malaysia (Chairman). T. U. MD. Nasir B Sulaiman, PhD Associate Professor Faculty of Computer Science and Information Technology Universiti Putra Malaysia (Member). PM. This thesis was submitted to the Senate of the Universiti Putra Malaysia and has been accepted as fulfillment of the requirement for the degree of Doctor of Philosophy. The members of the Supervisory Committee were as follows:. PY. R. IG. H. Nurfadhlina Mohd Sharef, PhD Associate Professor Faculty of Computer Science and Information Technology Universiti Putra Malaysia (Member). C. O. ROBIAH BINTI YUNUS, PhD Professor and Dean School of Graduate Studies Universiti Putra Malaysia. ©. Date:. viii.
(13) Declaration by graduate student. Date:. R. Signature:. IG. H. T. U. PM. I hereby confirm that: this thesis is my original work; quotations, illustrations and citations have been duly referenced; this thesis has not been submitted previously or concurrently for any other degree at any institutions; intellectual property from the thesis and copyright of thesis are fully-owned by Universiti Putra Malaysia, as according to the Universiti Putra Malaysia (Research) Rules 2012; written permission must be obtained from supervisor and the office of Deputy Vice-Chancellor (Research and innovation) before thesis is published (in the form of written, printed or in electronic form) including books, journals, modules, proceedings, popular writings, seminar papers, manuscripts, posters, reports, lecture notes, learning modules or any other materials as stated in the Universiti Putra Malaysia (Research) Rules 2012; there is no plagiarism or data falsification/fabrication in the thesis, and scholarly integrity is upheld as according to the Universiti Putra Malaysia (Graduate Studies) Rules 2003 (Revision 2012-2013) and the Universiti Putra Malaysia (Research) Rules 2012. The thesis has undergone plagiarism detection software. ©. C. O. PY. Name and Matric No: Musaab Riyadh Abdulrazzaq, GS38799. ix.
(14) Declaration by Members of Supervisory Committee. Associate Professor Dr. MD. Nasir B Sulaiman. Signature: Name of Member of Supervisory Committee:. Associate Professor Dr. Nurfadhlina Mohd Sharef. ©. C. O. PY. R. IG. H. T. Signature: Name of Member of Supervisory Committee:. U. Signature: Name of Chairman of Supervisory Associate Professor Committee: Dr. Norwati Mustapha. PM. This is to confirm that: the research conducted and the writing of this thesis was under our supervision; supervision responsibilities as stated in the Universiti Putra Malaysia (Graduate Studies) Rules 2003 (Revision 2012-2013) were adhered to.. x.
(15) TABLE OF CONTENTS. Page i iii vi vii ix xiv xv xix. U. PM. ABSTRACT ABSTRAK ACKNOWLEDGEMENTS APPROVAL DECLARATION LIST OF TABLES LIST OF FIGURES LIST OF ABBREVIATIONS CHAPTER INTRODUCTION 1.1 Background 1.2 Problem statement 1.3 Research objectives 1.4 Research scope 5.1 Research contributions 1.6 Organization of the thesis. 2. LITERATURE REVIEW 2.1 Introduction 2.2 Background 2.3 Challenges and Issues in Trajectory Streams Clustering Trajectory data preprocessing 2.4 2.4.1 Data cleaning 2.4.2 Noise filtering 2.4.3 Stay point detection 2.4.4 Trajectory compression 2.4.5 Trajectory segmentation 2.4.6 Map matching 2.5 Distance/ Similarity measure of trajectories 2.5.1 Global distance measures 2.5.2 Local distance measures 2.5.2.1 Minimum Bounding Rectangles (MBR) 2.5.2.2 Trajectory Hausdorff Distance 2.5.2.3 Structural SIMilarity Measure (SSIM) 2.5.2.4 Trajectory segment distance 2.5.2.5 The tightened distance 2.6 Trajectories clustering 2.6.1 Static clustering of trajectories 2.6.1.1 Model based clustering 2.6.1.2 Partition and group based clustering. ©. C. O. PY. R. IG. H. T. 1. xi. 1 1 2 4 4 5 5. 7 7 7 10 11 11 12 13 14 15 16 17 17 20 20 21 22 22 23 23 25 26 26.
(16) RESEARCH METHODOLOGY 3.1 Introduction 3.2 Research steps 3.3 Experimental Design 3.3.1 Data sets 3.3.1.1 Starkey project data set 3.3.1.2 Atlantic Hurricane data set 3.3.2 Performance metrics 3.3.2.1 ONF_TRS algorithm evaluation 3.3.2.2 The Evaluation of Temporal Micro Clustering Technique 3.3.2.3 The overall clustering evaluation 3.4 System requirements (Software and Hardware) 5.1 Summary. 49 50 51 51. STREAM CLUSTERING OF TRAJECTORIES BASED ON TEMPORAL MICRO CLUSTERING TECHNIQUE (SCT-TMC) 4.1 Introduction 4.2 Pre-processing stage 4.2.1 Pre-treatment component 4.2.2 Pre-processing component 4.2.2.1 Trajectory Noise Points Filtering 4.2.2.2 Trajectory Segmentation 4.2.2.3 Minimum Description Length 4.2.2.4 ONF-TRS algorithm 4.3 The Stream Clustering Stage 4.3.1 The Online phase 4.3.1.1 The limitations of time window technique 4.3.1.2 Temporal micro cluster generation 4.3.1.3 CC-TRS algorithm. 52 52 53 54 54 54 55 56 58 63 63 63 64 69. ©. C. O. 4. PY. R. IG. H. T. 3. 27 27 28 29 30 30 30 32 32 36 36 36 37 37 38. PM. 2.8 2.9. U. 2.7. 2.6.1.3 Place of interest based clustering 2.6.1.4 Moving together pattern 2.6.1.5 Uncertain trajectory clustering 2.6.1.6 Semantic based clustering 2.6.1.7 Optimization strategies 2.6.2 Stream clustering of trajectories 2.6.2.1 Window Models 2.6.2.2 Object based model 2.6.2.3 Stream clustering algorithms Clustering Validation 2.7.1 Overall similarity 2.7.2 Precision and recall 2.7.3 Internal measures The importance of trajectory data stream clustering application Summary. xii. 39 39 39 43 43 45 46 47 48.
(17) 4.4. 4.3.1.4 Temporal Micro Clustering Merging 4.3.2 The Offline phase 4.3.2.1 The DBSCAN algorithm 4.3.2.2 Spatial Outliers detection 4.3.2.3 Spatio-temporal Outliers detection Summary. 72 76 77 78 80 80. RESULTS AND DISCUSSION 81 5.1 Introduction 81 5.2 ONF-TRS algorithm evaluation (pre-processing stage) 81 5.2.1 Compression rate 81 5.2.2 Computational Complexity 87 5.2.3 Purity and coverage 87 5.3 Stream clustering evaluation 88 5.3.1 The online clustering evaluation 88 5.3.1.1 Temporal Micro Clustering Technique vs Time Window Technique 88 5.3.1.2 Parameter Φ effect 95 5.3.1.3 Parameter dmax effect 95 5.3.2 Macro clustering (the offline phase) 96 1.5.3.5 Clustering quality evaluation 96 1.5.3.3 Running time evaluation 98 5.4 Summary 100. 6. CONCLUSIONS AND FUTURE WORK 6.1 Conclusions 6.2 Future work. PY. R. IG. H. T. U. PM. 5. 103 114 115. ©. C. O. REFERENCES BIODATA OF STUDENT LIST OF PUBLICATIONS. 101 101 102. xiii.
(18) LIST OF TABLES. Table. Page Time series distance functions for two trajectories of length m, n. 18. 2.2. Traditional and Streaming processing. 24. 2.3. Well-Known Methods in Stream Clustering of Trajectories with their main features. 35. 3.1. The main variables of the Starkey project. 46. 3.2. Number of trajectories for each specie for Starkey data. 46. 4.1. The (region/ length) threshold value for different triangle base length. 62. 5.1. The compression ratio of MDL, GRASP-UTS, and for Elk 93. ONF-TRS. 84. 5.2. The compression ratio of MDL, GRASP-UTS, and using Deer 95 data set. ONF-TRS. 86. 5.3. Evaluate the generated segments from ONF-TRS, MDL, GRASPUTS algorithms based on the average purity and coverage. 87. ©. C. O. PY. R. IG. H. T. U. PM. 2.1. xiv.
(19) LIST OF FIGURES. Figure. Page Time windows technique for k trajectories. 2. 2.1. Classes of Spatio-temporal data. 8. 2.2. Noise Points in Trajectory. 2.3. Stay Points in a Trajectory. 2.4. A schematic of the trajectory compression. 2.5. Similarity measure based on the overlapping between MMBs. 19. 2.6. Area based similarity measure. 19. 2.7. MBR distance between trajectories segments. 2.8. Trajectory-Hausdorff distance. 2.9. Trajectory segment distance a) segment distance b) the tightened distance. 23. 2.10. Two semantic trajectories A&B. 29. 2.11. Sliding windows model. 31. PM. 1.1. 12 13. PY. R. IG. H. T. U. 14. 22. Damped windows model. 31. 2.13. Landmark window model. 31. O. 2.12. 2.14. object-based data stream clustering framework. 32. 3.1. Research methodology steps. 42. 3.2. Trajectory points Elk 93 data set. 44. 3.3. Starkey project data for two species of animal Elk 93 and Deer 95. 45. 3.4. Format of Hurricane trajectories data sets. 46. 3.5. Tropical data for Atlantic Hurricane year 2000 & 2010. 47. 4.1. The proposed Framework SCT-TMC. 53. C. ©. 20. xv.
(20) Trajectory Characteristics Points. 55. 4.3. Formulation of the MDL cost. 56. 4.4. MDLnpar and MDLpar for each points in trajectory. 57. 4.5. Collinear Points (region/length) = 0. 58. 4.6. Non-significant point small (region/length). 58. 4.7. Noisy point, (region/length) is small. 4.8. Significant point High value of (region/length). 4.9. The ONF-TRS algorithm steps. 4.10. The value of H such that MDLpar=MDLnopar. 4.11. Algorithm for estimation of (region/ length) threshold for three points. 4.12. Time windows technique for k trajectories. 4.13. The representative line of TMC. 66. 4.14. Three components of the distance function (dcen, dθ, and d‖). 67. 4.15. Temporal Micro Cluster Extent. 68. 4.16. Extent value (loose vs tight TMC). 69. 4.17. The temporal micro clustering technique. 70. 4.18. The CC_TRS algorithm. 71. O. PY. R. IG. H. T. U. 59. ©. 59 60 61 62. 64. Merging tight temporal micro-clusters. 72. 4.20. Merging loose temporal micro clusters. 73. 4.21. Distance with extent for tight and loose TMC. 73. 4.22. The center distance with extend (δ). 74. 4.23. The angle distance with extend (δ). 75. 4.24. The parallel distance with extend (δ). 75. 4.25. Temporal micro cluster merging. 76. C. 4.19. PM. 4.2. xvi.
(21) The Entire stream clustering stage (online and offline phases). 77. 4.27. The output of DBSCAN clustering algorithm. 78. 4.28. The common moving trend shared by various objects for Elk 93 data set. 78. 4.29. The common moving trend shared by various objects of Deer 95 data set. 79. 4.30. The plotting of the Dear 95 data set. 79. 4.31. The spatial outlier of Deer 95 data set. 5.1. The actual size and the compressed size of MDL, GRASP-UTS and ONF-TRS for Elk 93.. 82. 5.2. Trajectory noise point in Elk 93 data set. 83. 5.3. The actual size and the compressed size of MDL, GRASP-UTS, and ONF-TRS for Deer 95.. 85. 5.4. Trajectory noise point in Deer 95 data set. 85. 5.5. The number of MCs which created by TWT and TMCT for Elk 93 data set. 89. 5.6. The clustering quality (SSQ) for TWT and TMCT for Elk 93 data set. PM. 4.26. PY. R. IG. H. T. U. 80. The number of MCs which created by TWT and TMCT for Deer 95 data set. 90. 5.8. The clustering quality (SSQ) for TWT and TMCT for Deer 95 data set. 90. The number of MCs which created by TWT and TMCT for Hurricane data set. 90. 5.10. The clustering quality (SSQ) for TWT and TMCT for Hurricane data set. 91. 5.11. The number of MCs which created by TWT and TMCT for Elk 93 data set. 91. 5.12. The Clustering Quality (SSQ) For TWT and TMCT for Elk 93 data Set. 92. O. 5.7. C. 5.9. ©. 89. xvii.
(22) The Number Of MCs Which Created By TWT And TMCT for Deer 95 Data Set. 92. 5.14. The clustering quality (SSQ) for TWT and TMCT for Dear 95 data set. 92. 5.15. The number of MCs which created by TWT and TMCT for Hurricane data set. 93. 5.16. The clustering quality (SSQ) for TWT and TMCT for Hurricane data set. 93. 5.17. The running time of TMCT and TWT for of Elk 93 data set. 94. 5.18. The running time of TMCT and TWT for Deer 95 data set. 94. 5.19. The running Time of TMCT and TWT for Atlantic Hurricane data set. 94. 5.20. The minimum running time for Elk 93, Deer 95, and Atlantic Hurricane based on different value of transfer time Φ. 95. 5.21. The sensitivity of parameter dmax on clustering quality (SSQ). 96. 5.22. The sensitivity of parameter dmax on running time. 96. 5.23. The Clustering Quality (SSQ) for SCT-TMC, TCMM, ConTraClu, and CUTiS On Elk 93 Data Set. 97. 5.24. The Clustering Quality (SSQ) For SCT-TMC, ConTraClu, and CUTiS On Deer 95 Data Set. TCMM,. 97. 5.25. The clustering quality (SSQ) For SCT-TMC, TCMM, ConTraClu, and CUTiS On Hurricane Data Set. 98. O. PY. R. IG. H. T. U. PM. 5.13. The running Time For SCT-TMC, TCMM, ConTraClu, and CUTiS on Elk 93 Data Set. 98. 5.27. The running Time For SCT-TMC, TCMM, ConTraClu, and CUTiS on Deer 95 Data Set. 99. 5.28. The Running Time For SCT-TMC, TCMM, ConTraClu, and CUTiS on Atlantic Hurricane Data Set. 99. ©. C. 5.26. xviii.
(23) LIST OF ABBREVIATIONS. Compression Ratio. MBB. Minimal Bounding Boxes. MBR. Minimum Bounding Rectangle. MC. Micro Cluster. MDL. Minimum Description Length. POI. Points of interest. SSQ. Sum of SQuare distance. TD. Trajectory Data. TMC. Temporal Micro Cluster. TMPT. Temporal Micro Clustering Technique. U. T. H. IG. Time Window Technique. ©. C. O. PY. R. TWT. PM. CR. xix.
(24) CHAPTER 1 1 1.1. INTRODUCTION. Background. IG. H. T. U. PM. The rapid advances in hardware technology such as Global Position Systems (GPS) devices, radio-frequency identification (RFID) readers, mobiles phones, and navigational systems lead to generate huge amount of online trajectories data for different kind of free moving objects such as people, natural phenomena, and animals (Zheng et al. 2015; Pelekis et al. 2017). These kind of online trajectories data are referred to as trajectory stream data. The stream clustering of trajectory data aims to extract useful knowledge regarding different applications such as locating the heavy traffic paths in road networks, weather forecasting and monitoring the migration behavior of animals. However, many challenges can be arisen from stream clustering of trajectory data: Firstly, the massive size of stream data and limited system memory. Secondly, constantly arriving data and limited time for processing; therefore, it is not feasible to process the data efficiently using numerous passes. Finally, sensors data subject to noise, missing, errors, and incomplete readings because of sensors inaccuracy (Aggarwal et al. 2013; Galić et al. 2016). Therefore, traditional clustering approaches must be redeveloped to mitigate the difficulties associated with stream data clustering.. ©. C. O. PY. R. The challenges of trajectory stream clustering have captured the attention of many researchers, e.g. ( Li et al. 2004, Elnekave et al 2007; Jensen et al. 2007; Zhou at el 2008; Li et al. 2010; Yu et al. 2013a; Yu et al. 2013b; Gianni et al. 2014; Costa et al. 2014; Jin et al. 2014; Da Silva et al. 2016a; Da Silva et al 2016b; Da Silva et al 2016c; J. Mao et al. 2016). Most of the researches conducted to date have relied on the object based model, this model consists of two phases: the online phase (also called microclustering) and the offline phase (macro clustering). In the online phases, a data structure named micro-clusters MC is used to keep summarized information for each similar group of sub-trajectories since the entire stream data cannot be stored in the memory. While in the macro clustering phase, any density based approach can be used to group MCs instead of the raw data. On the other hand, finding the similarity between pair of trajectories is the essence of clustering task. There are two approaches to find the distance between two trajectories: the global distance approach and local distance approach. The global distance approach considers the entire trajectory as one unit into similarity computation. This approach requires significant memory storage and processing time. Moreover, the local patterns in a certain area are rarely located. The local distance approach aims to compute the similarity between two subtrajectories. Lee et al. 2007 suggests a novel method based on the minimum description length to partition trajectories into set of sub-trajectories and compute the similarity between them. Finally, Data pre-processing such as segmentation and noise filtering is an important step precedes the mining task. Various segmentation algorithms has been suggested to partition trajectory into set of segment, some of these 1.
(25) algorithms have a high computation cost such as (Panagiotakis et al. 2012; Buchin et al. 2013; Alewijnse et al. 2014), others are noise sensitive methods such as (Soares et al. 2015). The pre-processing step aims to reduce the size of the trajectory and enhanced its quality to enhance the efficiency of the clustering algorithm.. Problem statement. T. 1.2. U. PM. In this study, a framework for stream clustering of trajectories based on temporal micro clustering technique is proposed, the framework consists of two stages: preprocessing and stream clustering stage. In pre-processing stage, a new algorithm has been proposed base on minimum description length to achieved trajectory segmentation and noise points filtering in real time with low computational cost. In stream clustering stage, the temporal micro clustering technique has been proposed to improve the clustering quality and minimize computational cost of clustering in online and offline phases.. O. PY. R. IG. H. Clustering of trajectory stream data requires algorithms which can quickly update clustering results for the continually arriving data. Commonly, processing of stream data which rely on a time window model (e.g. Li et al. 2010; Yu et al. 2013a; Yu et al. 2013b; Da Silva et al 2016c) disjoints the stream data into portions (time-bins) based on time or on the number of processed items and perform the clustering task at each time-bins separately without taking into the consideration the relationships among the objects in those consecutive time-bins. Initiating clustering from scratch in each timebin leads to the disturbance in the clustering task which centralizes in the border area between two adjacent time-bins especially if it is very dense of trajectory segments. This is because the clustering process applying the time window technique creates redundant micro-clusters MCn for some trajectory segments Si at the beginning of a certain time-bink+2 even though these segments are very close (within distance threshold) to the micro-clusters MCm at the end of the preceding time-bink+1 as illustrated in Figure 1-1.. Y. Time. MCm. ©. C. Tr1. Tr2. Trk Time-bink. MCn Time-bink+1. Figure 1.1 : Time window technique for k trajectories 2. Time-bink+2. X.
(26) It is true that the similar micro-clusters of MCn and MCm can be grouped during the merging phase when the MCs size exceed a given memory space, but creating redundant micro-clusters will significantly affect the efficiency and effectiveness of clustering algorithms in the online phase and offline phase for four reasons:. IG. H. T. U. PM. 1. The memory size which occupied by redundant MCs will quickly magnify the overall size of the MCs which lead to minimizing the time period between two consecutive merging operations. Merging is an extremely time consuming task since it finds the similarity between all MCs to merge the most similar ones. 2. The additional number of redundant MCs will significantly maximize the running time of the merging task. This is because the computational complexity of merging task for TCMM framework (Li et al. 2010) is O(n!) where n represents the number of MCs. 3. Any clustering approach aims to find an optimal trade-off between two properties: accuracy and conciseness. Accuracy means the sum of all distances between the cluster centers and their members must be as small as possible; while conciseness means the number of clusters should be as small as possible. The redundant number of MCs will outweigh accuracy over conciseness. Therefore, the clustering quality will be degraded and the clustering task will be time consuming. 4. In the offline phase, the DBSCAN algorithm is used to deliver the overall clustering results, the computational cost of DBSCAN is O(n log(n)) where n denotes the number of MCs, therefore redundant MCs will also increase the running time of the offline phase.. ©. C. O. PY. R. On the other hand, the trajectory pre-processing task such as segmentation and noise points filtering is a crucial step prior to clustering step. It aims to reduce the size of the trajectory and improve its quality. Lee et al. 2007; Li et al. 2010 proposed methods based on the minimum description length principle to partition the trajectory into set of characteristic points where the behavior of a trajectory changes rapidly. Two problems can be addressed with these methods: firstly, they needs at least three points in advance to decide whether the trajectory points are characteristic or not. Secondly, these methods are significantly affected by noisy data. Soares et al. 2015 suggested GRASP-UTS algorithm based on minimum description length to partition trajectory into a set of land-marks. The algorithm modifies the land-mark locations (i.e. deleting, inserting, and changing) to gain maximum homogeneity. The GRASP-UTS is an iterative algorithm and noise sensitive. Buchin et al 2011, Buchin et al 2013 specified a set of spatial and temporal features such as shape, speed, and direction to partition trajectories into minimum number of segments. Theses algorithms are inconvenient to pre-processing data in real time due to their high computational cost O(n log n). Thus, as far as our present knowledge in this area, there is no pre-processing method currently available to divide trajectory into set of segments and robust to noise with low computational cost.. 3.
(27) 1.3. Research objectives. The main objective of this study is to propose an efficient and effective framework for stream clustering of trajectories based on temporal micro cluster technique. To achieve the objective, the following ideas are adopted: . PM. To propose pre-processing algorithm based on minimum description length concept to divide trajectory into set of characteristics point and filtering noise points in real time. The algorithm must require minimum number of points in advance and has low computational cost. To propose the temporal micro clustering technique in online phase which assigns a life time for each newly created micro cluster. The new batch of segments will interact only with the non-expired micro clusters which lead to reduce the running time in online phase since the most time-consuming part is finding the nearest micro-cluster. To propose new data structure which supports the spatio-temporal merging of two micro clusters in the online phase and to expand the functionality of the offline phase.. U. . Research scope. IG. 1.4. H. T. . This study is dedicated for free space trajectories where objects do not follow a path within a road network. This is because, free space trajectories has wide spectrum of crucial applications such as extract the migratory trajectories of animals, weather forecasting, and detection of suspicious behaviors in trajectories of people for security issue The similarity between sub trajectory pair is based on their geometric properties (length, center, and orientation) due to the complexity of free moving object trajectory. Any semantic information such as semantic labels are not taken into the consideration in this study since they cannot be generated in real time. The discovery of groups of trajectories that move close to each other for a period of time such as flock and swarm is out of scope of the study since it needs to monitor the individual trajectories separately. The stream clustering algorithm assigns the same weight for both recent and historical data which means no time fading models is considered.. O. . PY. R. The scope of this study is focused on mining knowledge from the unlabeled stream data of trajectories without any previous knowledge about the data. Therefore, an unsupervised learning method such as clustering must be applied. It is important to outline some of the critical assumptions concerning the proposed framework:. ©. C. . . . 4.
(28) Collecting the data is the first priority of trajectories data mining. There are a few real trajectories data sets that are available for free moving objects of trajectories such as GeoLife Trajectory Dataset which comprises Elk 93 (33 trajectory), and Dear 95 (32 trajectories), and Atlantic Hurricane trajectories (1740 trajectories). These data set has been adopted in this study due to their high sampling rate (high speed generation). Therefore, the data sets is convenient for stream clustering. Research contributions. PM. 1.5. U. The main contribution of this study is to devising an efficient framework for stream clustering of trajectories data. The framework aims to reduce running time and improve the clustering quality compared with existing works which exploit time window technique. The novel characteristics of the proposed framework in preprocessing and stream clustering stages are as follow:. PY. R. IG. H. T. 1. The ONF-TR algorithm which is based on the minimum description length concept to partition trajectory into set of segments and filtering noise points. The main characteristics of the ONF_TR algorithm are: requires only three points in advance and has low computational cost O(n) where n is the number of points in trajectory. These characteristics highly met the requirements of stream data clustering. 2. The “temporal micro clustering” technique in the online phase. This technique minimizes the number of micro clusters which are created by time window technique. This leads to enhance the efficiency of clustering algorithm CC-TRS and improve its clustering quality. 3. The temporal micro cluster data structure which supports the spatio-temporal dimensions when merging two temporal micro cluster. Besides that, the data structure aids to detect the spatial outliers and spatio-temporal outliers efficiently in the offline phase.. Organization of the thesis. O. 1.6. ©. C. The thesis is formatted according to the standard structure of thesis and dissertation of universiti Putra Malaysia. Chapter 2 presents background knowledge about trajectories which include trajectory stream mining and its challenging, trajectories preprocessing techniques and methods, and trajectories similarity. Besides that, a literature review of three areas related to the concerns of this study is done which comprise: trajectory data clustering, the traditional data stream clustering and trajectory stream clustering. Chapter 3 covers the methodology steps which are followed in this study. Also, it highlights the proposed framework, the data sets and performance metrics which are used to evaluate framework algorithms, and how the proposed algorithms will be assessed. Chapter 4 is dedicated for the suggested framework for trajectories stream clustering and elaborately explains the new algorithms and Models which are proposed for the different stages of the proposed framework. Chapter 5 highlights the experimental results which got from comparing 5.
(29) ©. C. O. PY. R. IG. H. T. U. PM. of the proposed algorithms and the major latest works related to stream clustering of trajectories. Chapter 6 concludes the thesis by summarizing the findings and shed light on the open issues which can be followed in the future studies.. 6.
(30) 7. REFERENCES. Aggarwal, Charu C. Data mining: the textbook. Springer, 2015. Aggarwal, Charu C. Data streams: models and algorithms. Vol. 31. Springer Science & Business Media, 2007.. PM. Aggarwal, Charu C., and Chandan K. Reddy, eds. Data clustering: algorithms and applications. CRC Press, 2013.. Aggarwal, Charu C., ed. Managing and mining sensor data. Springer Science & Business Media, 2013.. U. Aggarwal, Charu C., et al. "A framework for clustering evolving data streams." Proceedings of the 29th international conference on Very large data bases-Volume 29. VLDB Endowment, 2003.. H. T. Alewijnse, Sander, Kevin Buchin, Maike Buchin, Andrea Kölzsch, Helmut Kruckenberg, and Michel A. Westenberg. "A framework for trajectory segmentation by stable criteria." In Proceedings of the 22nd ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, pp. 351-360. ACM, 2014.. IG. Alon, Jonathan, Stan Sclaroff, George Kollios, and Vladimir Pavlovic. "Discovering clusters in motion time-series data." In Computer Vision and Pattern Recognition, 2003. Proceedings. 2003 IEEE Computer Society Conference on, vol. 1, pp. I-I. IEEE, 2003.. PY. R. Alon, Jonathan, Stan Sclaroff, George Kollios, and Vladimir Pavlovic. "Discovering clusters in motion time-series data." In Computer Vision and Pattern Recognition, 2003. Proceedings. 2003 IEEE Computer Society Conference on, vol. 1, pp. I-I. IEEE, 2003. Alt, Helmut, Alon Efrat, Günter Rote, and Carola Wenk. "Matching planar maps." Journal of Algorithms 49, no. 2 (2003): 262-283.. O. Alt, Helmut. "The computational geometry of comparing shapes." In Efficient Algorithms, pp. 235-248. Springer Berlin Heidelberg, 2009.. ©. C. Alvares, L.O., Loy, A.M., Renso, C. and Bogorny, V., 2011. An algorithm to identify avoidance behavior in moving object trajectories. Journal of the Brazilian Computer Society, 17(3), pp.193-203. Andersson, Mattias, Joachim Gudmundsson, Patrick Laube, and Thomas Wolle. "Reporting leaders and followers among trajectories of moving point objects." GeoInformatica 12, no. 4 (2008): 497-528. Andrienko, Natalia V., Gennady L. Andrienko, Nikos Pelekis, and Stefano Spaccapietra. "Basic Concepts of Movement Data." (2008): 15-38. Bashir, Faisal I., Ashfaq A. Khokhar, and Dan Schonfeld. "Real-time motion trajectory-based indexing and retrieval of video sequences." IEEE Transactions on Multimedia 9, no. 1 (2007): 58-65. 103.
(31) Benkert, Marc, Joachim Gudmundsson, Florian Hübner, and Thomas Wolle. "Reporting flock patterns." Computational Geometry 41, no. 3 (2008): 111-125. Birch, Zhang T. "an efficient data clustering method for very large databases/T. Zhang, R. Ramakrishnan, M. Livny." In Proceedings of the 1996 ACM SIGMOD international conference on Management of data (SIGMOD'96).-New York: ACM, pp. 103-114. 1996.. PM. Brakatsoulas, Sotiris, Dieter Pfoser, Randall Salas, and Carola Wenk. "On mapmatching vehicle tracking data." In Proceedings of the 31st international conference on Very large data bases, pp. 853-864. VLDB Endowment, 2005.. U. Bu, Yingyi, Lei Chen, Ada Wai-Chee Fu, and Dawei Liu. "Efficient anomaly monitoring over moving object trajectory streams." In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 159-168. ACM, 2009.. T. Buchin, Maike, Anne Driemel, Marc van Kreveld, and Vera Sacristán Adinolfi. "Segmenting trajectories: A framework and algorithms using spatiotemporal criteria." Journal of Spatial Information Science 3 (2011): 33-63.. H. Buchin, Maike, Helmut Kruckenberg, and Andrea Kölzsch. "Segmenting trajectories by movement states." In Advances in spatial data handling, pp. 15-25. Springer Berlin Heidelberg, 2013.. IG. Cao, Feng, et al. "Density-based clustering over an evolving data stream with noise." Proceedings of the 2006 SIAM international conference on data mining. Society for Industrial and Applied Mathematics, 2006.. R. Chawathe, Sudarshan S. "Segment-based map matching." In Intelligent Vehicles Symposium, 2007 IEEE, pp. 1190-1197. IEEE, 2007.. PY. Chen, Jingyu, Qiuyan Huo, Ping Chen, and Xuezhou Xu. "Sketch-based uncertain trajectories clustering." In Fuzzy Systems and Knowledge Discovery (FSKD), 2012 9th International Conference on, pp. 747-751. IEEE, 2012.. O. Chen, Lei, and Raymond Ng. "On the marriage of edit distance and Lp norms." In Proc. of 30th. International Conference on Very Large Data Bases' 04. 2004.. ©. C. Chen, Lei, M. Tamer Özsu, and Vincent Oria. "Robust and fast similarity search for moving object trajectories." In Proceedings of the 2005 ACM SIGMOD international conference on Management of data, pp. 491-502. ACM, 2005. Chen, Yunbo, Hongchu Yu, and Lei Chen. "A Spatiotemporal Cluster Method for Trajectory Data." In Proceedings of the 4th International Conference on Computer Engineering and Networks, pp. 3-10. Springer, Cham, 2015. Civilis, Alminas, Christian S. Jensen, and Stardas Pakalnis. "Techniques for efficient road-network-based tracking of moving objects." IEEE Transactions on Knowledge and Data Engineering 17, no. 5 (2005): 698-712. Costa, Gianni, Giuseppe Manco, and Elio Masciari. "Dealing with trajectory streams by clustering and mathematical transforms." Journal of Intelligent Information Systems 42.1 (2014): 155-177. 104.
(32) Costa, Gianni, Giuseppe Manco, and Elio Masciari. "Effectively grouping trajectory streams." In International Workshop on New Frontiers in Mining Complex Patterns, pp. 94-108. Springer, Berlin, Heidelberg, 2012. Da Silva, Ticiana L. Coelho, et al. "CUTiS: optimized online ClUstering of Trajectory data Stream." Proceedings of the 20th International Database Engineering & Applications Symposium. ACM, 2016b.. PM. Da Silva, Ticiana L. Coelho, Karine Zeitouni, and José AF de Macêdo. "Online Clustering of Trajectory Data Stream." Mobile Data Management (MDM), 2016a 17th IEEE International Conference on. Vol. 1. IEEE, 2016.. U. Da Silva, Ticiana L. Coelho, Karine Zeitouni, José AF de Macêdo, and Marco A. Casanova. "A framework for online mobility pattern discovery from trajectory data streams." In Mobile Data Management (MDM), 2016 17th IEEE International Conference on, vol. 1, pp. 365-368. IEEE, 2016.. T. Dodge, Somayeh, Patrick Laube, and Robert Weibel. "Movement similarity assessment using symbolic representation of trajectories." International Journal of Geographical Information Science 26, no. 9 (2012): 1563-1588.. IG. H. Douglas, David H., and Thomas K. Peucker. "Algorithms for the reduction of the number of points required to represent a digitized line or its caricature." Cartographica: The International Journal for Geographic Information and Geovisualization 10, no. 2 (1973): 112-122.. R. Elnekave, Sigal, Mark Last, and Oded Maimon. "Incremental clustering of mobile objects." Data Engineering Workshop, 2007 IEEE 23rd International Conference on. IEEE, 2007.. PY. Ester, Martin, Hans-Peter Kriegel, Jörg Sander, and Xiaowei Xu. "A density-based algorithm for discovering clusters in large spatial databases with noise." In Kdd, vol. 96, no. 34, pp. 226-231. 1996.. O. Ester, Martin, Hans-Peter Kriegel, Jörg Sander, and Xiaowei Xu. "A density-based algorithm for discovering clusters in large spatial databases with noise." In Kdd, vol. 96, no. 34, pp. 226-231. 1996. Feng, Zhenni, and Yanmin Zhu. "A Survey on Trajectory Data Mining: Techniques and Applications." IEEE Access 4 (2016): 2056-2067.. ©. C. G. Yuan, S. X. Xia, L. Zhang, Y. Zhou, and C. Ji, “An efficient trajectory-clustering algorithm based on index tree,” Transactions of the Institute of Measurement and Control, vol. 32, no. 7, pp. 850–861, 2012. G. Yuan, X. Xia, L. Zhang, Y. Zhou, C. Ji, Trajectory outlier detection algorithm based on structural features, J. Comput. Inform. Syst. 7 (11) (2011) 4137– 4144. Gaffney, Scott, and Padhraic Smyth. "Trajectory clustering with mixtures of regression models." In Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 63-72. ACM, 1999. Galić, Zdravko. "Spatio-Temporal Data Stream Clustering." In Spatio-Temporal Data Streams, pp. 71-103. Springer New York, 2016. 105.
(33) Gama, Joao. Knowledge discovery from data streams. CRC Press, 2010. Giannotti, Fosca, and Dino Pedreschi, eds. Mobility, data mining and privacy: Geographic knowledge discovery. Springer Science & Business Media, 2008. Greenfeld, Joshua S. "Matching GPS observations to locations on a digital map." In 81th annual meeting of the transportation research board, vol. 1, no. 3, pp. 164173. 2002.. PM. Griinwald, Peter D., In Jae Myung, and Mark A. Pitt. Advances in minimum description length: Theory and applications. Cambridge, MA: The MIT Press, 2005.. U. Gudmundsson, Joachim, and Marc van Kreveld. "Computing longest duration flocks in trajectory data." In Proceedings of the 14th annual ACM international symposium on Advances in geographic information systems, pp. 35-42. ACM, 2006.. T. Guttman, Antonin. R-trees: A dynamic index structure for spatial searching. Vol. 14, no. 2. ACM, 1984.. H. Har-Peled, Sariel. "Clustering motion." Discrete & Computational Geometry 31, no. 4 (2004): 545-565.. IG. Hyde, Richard, Plamen Angelov, and A. R. MacKenzie. "Fully online clustering of evolving data streams into arbitrarily shaped clusters." Information Sciences 382 (2017): 96-114.. R. J. Mao, Q. Song, C. Jin, Z. Zhang, and A. Zhou. Tscluwin: trajectory stream clustering Communications over sliding window. In DASFAA, pages 133–148, 2016.. PY. Jensen, Christian S., Dan Lin, and Beng Chin Ooi. "Continuous clustering of moving objects." IEEE Trans. Knowl. Data Eng. 19.9 (2007): 1161-1174. Jeung, Hoyoung, Man Lung Yiu, and Christian S. Jensen. "Trajectory pattern mining." In Computing with spatial trajectories, pp. 143-177. Springer, New York, NY, 2011.. ©. C. O. Jeung, Hoyoung, Man Lung Yiu, Xiaofang Zhou, Christian S. Jensen, and Heng Tao Shen. "Discovery of convoys in trajectory databases." Proceedings of the VLDB Endowment 1, no. 1 (2008): 1068-1080. Jin, Cheqing, Jeffrey Xu Yu, Aoying Zhou, and Feng Cao. "Efficient clustering of uncertain data streams." Knowledge and Information Systems 40, no. 3 (2014): 509. Kalnis, Panos, Nikos Mamoulis, and Spiridon Bakiras. "On discovering moving clusters in spatio-temporal data." In SSTD, vol. 3633, pp. 364-381. 2005. Kang, Jong Hee, William Welbourne, Benjamin Stewart, and Gaetano Borriello. "Extracting places from traces of locations." In Proceedings of the 2nd ACM international workshop on Wireless mobile applications and services on WLAN hotspots, pp. 110-118. ACM, 2004.. 106.
(34) Keogh, Eamonn. "Exact indexing of dynamic time warping." In Proceedings of the 28th international conference on Very Large Data Bases, pp. 406-417. VLDB Endowment, 2002. Khalilian, Madjid, and Norwati Mustapha. "Data stream clustering: Challenges and issues." arXiv preprint arXiv:1006.5261 (2010).. PM. Kisilevich, Slava, Florian Mansmann, Mirco Nanni, and Salvatore Rinzivillo. "Spatiotemporal clustering." In Data mining and knowledge discovery handbook, pp. 855-874. Springer US, 2009. Kranen, Philipp, Ira Assent, Corinna Baldauf, and Thomas Seidl. "The ClusTree: indexing micro-clusters for anytime stream mining." Knowledge and information systems 29, no. 2 (2011): 249-272.. U. Krempl, Georg, Indre Žliobaite, Dariusz Brzeziński, Eyke Hüllermeier, Mark Last, Vincent Lemaire, Tino Noack "Open challenges for data stream mining research." ACM SIGKDD explorations newsletter 16, no. 1 (2014): 1-10.. H. T. Lee, Jae-Gil, Jiawei Han, and Kyu-Young Whang. "Trajectory clustering: a partitionand-group framework." Proceedings of the 2007 ACM SIGMOD international conference on Management of data. ACM, 2007.. IG. Lee, Jae-Gil, Jiawei Han, and Xiaolei Li. "Trajectory outlier detection: A partitionand-detect framework." In Data Engineering, 2008. ICDE 2008. IEEE 24th International Conference on, pp. 140-149. IEEE, 2008.. R. Lee, Wang-Chien, and John Krumm. "Trajectory preprocessing." In Computing with spatial trajectories, pp. 3-33. Springer New York, 2011.. PY. Li, Deren, Shuliang Wang, and Deyi Li. Spatial Data Mining: Theory and Application. Springer, 2016. Li, Yifan, Jiawei Han, and Jiong Yang. "Clustering moving objects." In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 617-622. ACM, 2004.. O. Li, Zhenhui, et al. "Incremental clustering for trajectories." International Conference on Database Systems for Advanced Applications. Springer Berlin Heidelberg, 2010.. ©. C. Li, Zhenhui, Jiawei Han, Ming Ji, Lu-An Tang, Yintao Yu, Bolin Ding, Jae-Gil Lee, and Roland Kays. "Movemine: Mining moving object data for discovery of animal movement patterns." ACM Transactions on Intelligent Systems and Technology (TIST) 2, no. 4 (2011): 37. Lim, Joasang, Joongjin Kook, and Jinman Kim. "DBSCAN-D: A Density-Based Clustering Method of Directionality." International Journal of Applied Engineering Research 12, no. 13 (2017): 3927-3932. Lin, Kunhui, Zhentuan Xu, Ming Qiu, Xiaoli Wang, and Tianxiong Han. "Noise filtering, trajectory compression and trajectory segmentation on GPS data." In Computer Science & Education (ICCSE), 2016 11th International Conference on, pp. 490-495. IEEE, 2016. 107.
(35) Liu, Wei, Yu Zheng, Sanjay Chawla, Jing Yuan, and Xie Xing. "Discovering spatiotemporal causal interactions in traffic data streams." In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 1010-1018. ACM, 2011. Lloyd, Stuart. "Least squares quantization in PCM." IEEE transactions on information theory 28, no. 2 (1982): 129-137.. PM. Lou, Yin, Chengyang Zhang, Yu Zheng, Xing Xie, Wei Wang, and Yan Huang. "Mapmatching for low-sampling-rate GPS trajectories." In Proceedings of the 17th ACM SIGSPATIAL international conference on advances in geographic information systems, pp. 352-361. ACM, 2009.. U. Magdy, Nehal, Mahmoud A. Sakr, Tamer Mostafa, and Khaled El-Bahnasy. "Review on trajectory similarity measures." In Intelligent Computing and Information Systems (ICICIS), 2015 IEEE Seventh International Conference on, pp. 613-619. IEEE, 2015.. H. T. Mao, Jiali, Cheqing Jin, Xiaoling Wang, and Aoying Zhou. "Challenges and Issues in Trajectory Streams Clustering upon a Sliding-Window Model." In Web Information System and Application Conference (WISA), 2015 12th, pp. 303308. IEEE, 2015.. IG. Mazimpaka, Jean Damascène, and Sabine Timpf. "Trajectory data mining: A review of methods and applications." Journal of Spatial Information Science 2016, no. 13 (2016): 61-99.. R. Meratnia, Nirvana, and R. A. De By. "A new perspective on trajectory compression techniques." In Proc. ISPRS Commission II and IV, WG II/5, II/6, IV/1 and IV/2 Joint Workshop Spatial, Temporal and Multi-Dimensional Data Modelling and Analysis. 2003.. PY. Muckell, Jonathan, Paul W. Olsen, Jeong-Hyon Hwang, Catherine T. Lawson, and S. S. Ravi. "Compression of trajectory data: a comprehensive evaluation and new approach." GeoInformatica 18, no. 3 (2014): 435-460.. O. Nanni, Mirco, and Dino Pedreschi. "Time-focused clustering of trajectories of moving objects." Journal of Intelligent Information Systems 27.3 (2006): 267-289.. ©. C. Nock, Richard, and Frank Nielsen. "On weighting clustering." IEEE transactions on pattern analysis and machine intelligence 28, no. 8 (2006): 1223-1235. Palma, Andrey Tietbohl, Vania Bogorny, Bart Kuijpers, and Luis Otavio Alvares. "A clustering-based approach for discovering interesting places in trajectories." In Proceedings of the 2008 ACM symposium on Applied computing, pp. 863-868. ACM, 2008. Panagiotakis, Costas, Nikos Pelekis, Ioannis Kopanakis, Emmanuel Ramasso, and Yannis Theodoridis. "Segmentation and sampling of moving object trajectories based on representativeness." IEEE Transactions on Knowledge and Data Engineering 24, no. 7 (2012): 1328-1343.. 108.
(36) Parent, C., Spaccapietra, S., Renso, C., Andrienko, G., Andrienko, N., Bogorny, V., Damiani, M.L., Gkoulalas-Divanis, A., Macedo, J., Pelekis, N. and Theodoridis, Y., 2013. Semantic trajectories modeling and analysis. ACM Computing Surveys (CSUR), 45(4), p.42. Pelekis, Nikos, and Yannis Theodoridis. Mobility data management and exploration. Springer, 2014.. PM. Pelekis, Nikos, Gennady Andrienko, Natalia Andrienko, Ioannis Kopanakis, Gerasimos Marketos, and Yannis Theodoridis. "Visually exploring movement data via similarity-based analysis." Journal of Intelligent Information Systems 38, no. 2 (2012): 343.. U. Pelekis, Nikos, Ioannis Kopanakis, Evangelos E. Kotsifakos, Elias Frentzos, and Yannis Theodoridis. "Clustering uncertain trajectories." Knowledge and Information Systems 28, no. 1 (2011): 117-147.. T. Pelekis, Nikos, Panagiotis Tampakis, Marios Vodas, Christos Doulkeridis, and Yannis Theodoridis. "On temporal-constrained sub-trajectory cluster analysis." Data Mining and Knowledge Discovery (2017): 1-37.. IG. H. Pink, Oliver, and Britta Hummel. "A statistical approach to map matching using road network geometry, topology and vehicular motion constraints." In Intelligent Transportation Systems, 2008. ITSC 2008. 11th International IEEE Conference on, pp. 862-867. IEEE, 2008. Pires, Telmo JP, and Mário AT Figueiredo. "Shape-based Trajectory Clustering." In ICPRAM, pp. 71-81. 2017.. PY. R. Potamias, Michalis, Kostas Patroumpas, and Timos Sellis. "Sampling trajectory streams with spatiotemporal criteria." In Scientific and Statistical Database Management, 2006. 18th International Conference on, pp. 275-284. IEEE, 2006. Quddus, Mohammed A., Robert B. Noland, and Washington Y. Ochieng. "A high accuracy fuzzy logic based map matching algorithm for road transport." Journal of Intelligent Transportation Systems 10, no. 3 (2006): 103-115.. ©. C. O. Sacharidis, Dimitris, Kostas Patroumpas, Manolis Terrovitis, Verena Kantere, Michalis Potamias, Kyriakos Mouratidis, and Timos Sellis. "On-line discovery of hot motion paths." In Proceedings of the 11th international conference on Extending database technology: Advances in database technology, pp. 392-403. ACM, 2008. Sankararaman, Swaminathan, Pankaj K. Agarwal, Thomas Mølhave, and Arnold P. Boedihardjo. "Computing similarity between a pair of trajectories." arXiv preprint arXiv:1303.1585 (2013). Shekhar, Shashi, Zhe Jiang, Reem Y. Ali, Emre Eftelioglu, Xun Tang, Venkata Gunturi, and Xun Zhou. "Spatiotemporal data mining: a computational perspective." ISPRS International Journal of GeoInformation 4, no. 4 (2015): 2306-2338. Silva, Jonathan A., Elaine R. Faria, Rodrigo C. Barros, Eduardo R. Hruschka, André CPLF de Carvalho, and João Gama. "Data stream clustering: A survey." ACM Computing Surveys (CSUR) 46, no. 1 (2013): 13. 109.
(37) Soares Júnior, Amílcar, Bruno Neiva Moreno, Valéria Cesário Times, Stan Matwin, and Lucídio dos Anjos Formiga Cabral. "GRASP-UTS: an algorithm for unsupervised trajectory segmentation" International Journal of Geographical Information Science 29, no. 1 (2015): 46-68.. PM. Su, Han, Kai Zheng, Kai Zeng, Jiamin Huang, Shazia Sadiq, Nicholas Jing Yuan, and Xiaofang Zhou. "Making sense of trajectory data: A partition-and-summarization approach." In Data Engineering (ICDE), 2015 IEEE 31st International Conference on, pp. 963-974. IEEE, 2015. Sumpter, Neil, and Andrew Bulpitt. "Learning spatio-temporal patterns for predicting object behaviour." Image and Vision Computing 18, no. 9 (2000): 697-704.. U. Sun, Penghui, Shixiong Xia, Guan Yuan, and Daxing Li. "An Overview of Moving Object Trajectory Compression Algorithms." Mathematical Problems in Engineering 2016 (2016).. T. Tao, Yufei, and Dimitris Papadias. "Efficient historical R-trees." In Scientific and Statistical Database Management, 2001. SSDBM 2001. Proceedings. Thirteenth International Conference on, pp. 223-232. IEEE, 2001.. IG. H. Tao, Yufei, Dimitris Papadias, and Jimeng Sun. "The TPR*-tree: an optimized spatiotemporal access method for predictive queries." In Proceedings of the 29th international conference on Very large data bases-Volume 29, pp. 790-801. VLDB Endowment, 2003. van Kreveld, Marc, Maarten Loffler, and Frank Staals. "Central trajectories." arXiv preprint arXiv:1501.01822 (2015).. PY. R. Vlachos, Michail, George Kollios, and Dimitrios Gunopulos. "Discovering similar multidimensional trajectories." In Data Engineering, 2002. Proceedings. 18th International Conference on, pp. 673-684. IEEE, 2002. Wai, Khaing Phyo, and Nwe New. "Measuring the distance of moving objects from big trajectory data." In Computer and Information Science (ICIS), 2017 IEEE/ACIS 16th International Conference on, pp. 137-142. IEEE, 2017.. O. Wang, Shuang, Lina Wu, Fuchai Zhou, Cuicui Zheng, and Haibo Wang. "Group Pattern Mining on Moving Objects’ Uncertain Trajectories." International Journal of Computers Communications & Control 10, no. 3 (2015): 428-440.. ©. C. Wang, Wei, Jiong Yang, and Richard Muntz. "STING: A statistical information grid approach to spatial data mining." In VLDB, vol. 97, pp. 186-195. 1997. Wang, Xiaofeng, Gang Li, Guang Jiang, and Zhongzhi Shi. "Semantic trajectorybased event detection and event pattern mining." Knowledge and information systems 37, no. 2 (2013): 305-329. Wang, Yilun, Yu Zheng, and Yexiang Xue. "Travel time estimation of a path using sparse trajectories." In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 25-34. ACM, 2014.. 110.
(38) Wu, Fei, Tobias Kin Hou Lei, Zhenhui Li, and Jiawei Han. "MoveMine 2.0: mining object relationships from movement data." Proceedings of the VLDB Endowment 7, no. 13 (2014): 1613-1616. Xie Xing, and Zheng, Yu. "Learning travel recommendations from user-generated GPS traces." ACM Transactions on Intelligent Systems and Technology (TIST) 2, no. 1 (2011): 2.. PM. Xiao, Xiangye, Yu Zheng, Qiong Luo, and Xing Xie. "Inferring social ties between users with human location history." Journal of Ambient Intelligence and Humanized Computing 5, no. 1 (2014): 3-19.. U. Yan, Zhixian, Dipanjan Chakraborty, Christine Parent, Stefano Spaccapietra, and Karl Aberer. "Semantic trajectories: Mobility data computation and annotation." ACM Transactions on Intelligent Systems and Technology (TIST) 4, no. 3 (2013): 49.. T. Yan, Zhixian, Nikos Giatrakos, Vangelis Katsikaros, Nikos Pelekis, and Yannis Theodoridis. "SeTraStream: semantic-aware trajectory construction over streaming movement data." Advances in Spatial and Temporal Databases (2011): 367-385.. IG. H. Yang, Di, Elke A. Rundensteiner, and Matthew O. Ward. "Neighbor-based pattern detection for windows over streaming data." In Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, pp. 529-540. ACM, 2009.. R. Yang, Guodong, Zhitao Huang, and Xiang Wang. "Comparison study of subtrajectory clustering in data mining." In IOP Conference Series: Earth and Environmental Science, vol. 69, no. 1, p. 012143. IOP Publishing, 2017.. PY. Ye, Yang, Yu Zheng, Yukun Chen, Jianhua Feng, and Xing Xie. "Mining individual life pattern based on location history." In Mobile Data Management: Systems, Services and Middleware, 2009. MDM'09. Tenth International Conference on, pp. 1-10. IEEE, 2009.. O. Ying, Josh Jia-Ching, Wang-Chien Lee, Tz-Chiao Weng, and Vincent S. Tseng. "Semantic trajectory mining for location prediction." In Proceedings of the 19th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, pp. 34-43. ACM, 2011.. ©. C. Yu, Yanwei, Qin Wang, and Xiaodong Wang. "Continuous clustering trajectory stream of moving objects." China Communications 10, no. 9 (2013a): 120-129. Yu, Yanwei, Qin Wang, Xiaodong Wang, Huan Wang, and Jie He. "Online clustering for trajectory data stream of moving objects." Computer science and information systems 10, no. 3 (2013b): 1293-1317. Yuan, Guan, Penghui Sun, Jie Zhao, Daxing Li, and Canwei Wang. "A review of moving object trajectory clustering algorithms." Artificial Intelligence Review 47, no. 1 (2017): 123-144.. 111.
(39) Yuan, Guan, Shixiong Xia, Lei Zhang, Yong Zhou, and Cheng Ji. "An efficient trajectory-clustering algorithm based on an index tree." Transactions of the Institute of Measurement and Control 34, no. 7 (2012): 850-861. Yuan, Jing, Yu Zheng, Chengyang Zhang, Xing Xie, and Guang-Zhong Sun. "An interactive-voting based map matching algorithm." In Mobile Data Management (MDM), 2010 Eleventh International Conference on, pp. 43-52. IEEE, 2010.. PM. Yuan, Jing, Yu Zheng, Chengyang Zhang, Wenlei Xie, Xing Xie, Guangzhong Sun, and Yan Huang. "T-drive: driving directions based on taxi trajectories." In Proceedings of the 18th SIGSPATIAL International conference on advances in geographic information systems, pp. 99-108. ACM, 2010.. U. Yuan, Jing, Yu Zheng, Xing Xie, and Guangzhong Sun. "T-drive: Enhancing driving directions with taxi drivers' intelligence." IEEE Transactions on Knowledge and Data Engineering 25, no. 1 (2013): 220-232.. T. Yuan, Nicholas Jing, Yu Zheng, Liuhang Zhang, and Xing Xie. "T-finder: A recommender system for finding passengers and vacant taxis." IEEE Transactions on Knowledge and Data Engineering 25, no. 10 (2013): 2390-2403.. H. Zaghlool, Ehab, Saleh ElKaffas, and Amani Saad. "A Density-Based Clustering of Spatio-Temporal Data." In New Contributions in Information Systems and Technologies, pp. 41-50. Springer, Cham, 2015.. IG. Zeitouni, Karine, José AF de Macêdo, and Marco A. Casanova. "CUTiS*: optimized online ClUstering of Trajectory data Stream.". PY. R. Zhang, Fuzheng, Nicholas Jing Yuan, David Wilkie, Yu Zheng, and Xing Xie. "Sensing the pulse of urban refueling behavior: A perspective from taxi mobility." ACM Transactions on Intelligent Systems and Technology (TIST) 6, no. 3 (2015): 37. Zhang, Ji, Hongzhou Li, Qigang Gao, Hai Wang, and Yonglong Luo. "Detecting anomalies from big network traffic data using an adaptive detection approach." Information Sciences 318 (2015): 91-110.. O. Zhang, Tian, Raghu Ramakrishnan, and Miron Livny. "BIRCH: an efficient data clustering method for very large databases." ACM Sigmod Record. Vol. 25. No. 2. ACM, 1996.. ©. C. Zheng, Yu, and Xiaofang Zhou, eds. Computing with spatial trajectories. Springer Science & Business Media, 2011. Zheng, Yu, Lizhu Zhang, Xing Xie, and Wei-Ying Ma. "Mining interesting locations and travel sequences from GPS trajectories." In Proceedings of the 18th international conference on World wide web, pp. 791-800. ACM, 2009. Zheng, Yu, Quannan Li, Yukun Chen, Xing Xie, and Wei-Ying Ma. "Understanding mobility based on GPS data." In Proceedings of the 10th international conference on Ubiquitous computing, pp. 312-321. ACM, 2008. Zheng, Yu. "Trajectory data mining: an overview." ACM Transactions on Intelligent Systems and Technology (TIST) 6.3 (2015): 29. 112.
(40) ©. C. O. PY. R. IG. H. T. U. PM. Zhou, Aoying, Feng Cao, Weining Qian, and Cheqing Jin. "Tracking clusters in evolving data streams over sliding windows." Knowledge and Information Systems 15, no. 2 (2008): 181-214.. 113.
(41)
Figure
Outline
Related documents