Conclusions - Conclusions and Future Work

5. Conclusions and Future Work

5.1. Conclusions

The mobility profile of an individual user was generated by implementing data analytics processes for discovering insights about raw data collected from drive-test. By using the HAC method, 29 mobility sequences taken in five routes were classified accurately into 5 groups of trajectories, plus one additional group where mobility sequences were registered with a SIM card of a different operator. The process implemented in this work is scalable, allowing discovering, learning and categorizing new routes when new mobility sequences are added, opposed to classification methods, where categories of data have to be known in advance.

Dissimilarity calculations between data sequences have to be performed as part of the knowledge discovery processes. For instance, when manipulating categorical data, edit distance methods provides the best outcome compared with numerical distance functions based on Euclidean or Manhattan distances. Moreover, as data collected from mobile networks is time variable, it is needed to combine distance calculation methods with dynamic programing such as sequence alignment techniques, for taking into account the changes in the velocity and path uncertainty added by events like road traffic, public transport waiting, etc.

The number of groups generated is given by the dissimilarity level where the clustering structure is observed. Because of this, a grouping threshold parameter is proposed in this work, for tuning the identification of the different groups of trajectories discovered.

Finally, the framework for AI based knowledge process for radio access optimization and planning was effective for methodology execution of this work, and can drive decisions that can find applications in future 5G, enhanced SON proposals and for improving new business use cases such as cognitive intelligence platforms and OTT support.

5.2. Future Work

The algorithm proposed in this work can be more efficient if different methods for distance calculation are adapted and evaluated for constructing the dissimilarity matrix between categorical sequences. Dynamic Time Warping is a method for calculating dissimilarity between time varying sequences, detecting stretches in the data given by events such as obstacles, minor route changes among other uncertainties in the walking path. However, it is designed for numerical sequences only. The adaptation of dynamic time warping

algorithm to categorical sequences like the mobility vector sequences proposed, can improve the accuracy in the dissimilarity computation.

Once the routes are discovered and categorized into the mobility pattern profile, it can be used of training set to ML classification algorithms. However, the HAC also classifies by itself the different groups without perceptible computing effort.

Also, mobility behaviour can be characterized according to the states of mobility of the user, for instance, the periods where the user is in movement or in low mobility. In the early stages of research for this work, an approximation for this case using hierarchical clustering functions was studied and is described in the Appendix A and Appendix B. It can be of interest, combined with the results obtained in this work, for resource scheduling and for finding further applications in RAN automation.

Bibliography

[1] S. Feng; E. Seidel. “Self-Organizing Networks (SON) in 3GPP Long Term Evolution”. Nomor Research GmbH; Munich, Germany, 2008.

[2] G. Bhutani. “Application of Machine-Learning Based Prediction Techniques in Wireless Networks”. Int. J. Communications, Network and System Sciences, n.o. 7, 131- 140, 2014

[3] A. Imran, A. Zoha, and A. Abu-Dayya. “Challenges in 5G: How to Empower SON with Big Data for Enabling 5G”. IEEE Network., pp 27-33, 2017.

[4] M. Agiwal, A. Roy, N. Saxena. “Next Generation 5G Wireless Networks:

A Comprehensive Survey”. IEEE Communications Surveys & Tutorials. Vol 18, n.o. 3; Third Quarter 2016, pp 1617-1655.

[5] J. Perez-Romero, O. Sallent, R. Ferrus, and R. Agusti.. “Knowledge-based 5G Radio Access Network Planning and Optimization”. 2016.

[6] He, Y., et al. “Big Data Analytics in Mobile Cellular Networks”. IEEE Access., Vol 4, pp 1985-1994, 2016.

[7] D. Katsaros, et al. “Clustering Mobile Trajectories for Resource Allocation in Mobile Environments”. in Advances in Intelligent Data Analysis V., Springer Berlin Heidelberg, 2003, pp. 319–329.

[8] S. Bi; R. Zhang; Z. Ding and S. Cui. “Wireless Communications in the Era of Big Data”. IEEE Communications Magazine. October 2015. p 190.

[9] A. Osseiran; et al. “Scenarios for 5G Mobile and Wireless Communications: The Vision of the METIS Project”. IEEE Communications Magazine. May 2014. p 26. [10] S. Mwanje,et al; “Network Management Automation in 5G: Challenges and Opportunities”. IEEE 27th Annual International Symposium on Personal, Indoor, and Mobile Radio Communications (PIMRC): Workshop: 6th International Workshop on Self- Organizing Networks (IWSON); Munich, Germany. 2016.

[11] E. Dahlman; et al. “5G Wireless Access: Requirements And Realization”. IEEE Communications Magazine — Communications Standards Supplement. December 2014. pp 42-47.

[12] C. Wan; et al. “Cellular Architecture and Key Technologies for 5G Wireless

Communication Networks”. IEEE Communications Magazine. February 2014. pp 112-130 [13] Anagnostopoulos Theodoros, et al. “Mobility Prediction based on Machine Learning”. IEEE International Conference on Mobile Data Management. 2011

[14] J. Pan; et al. “Tracking Mobile Users in Wireless Networks via Semi-Supervised Colocalization”. IEEE Transactions On Pattern Analysis And Machine Intelligence. Vol 34, n.o. 3; March 2012, p587

[15] T. Duong and D. Tranm. “A Fusion of Data Mining Techniques for Predicting Movement of Mobile Users”. Journal Of Communications And Networks, Vol. 17, No. 6, December 2015. pp 568 – 580.

[18] 3rd Generation Partnership Project(3GPP), ed. (16 de enero de 2013). Technical Specification 25.101 V11.4.0

[19] 3GPP TS 36.101 E-UTRA: User Equipment (UE) radio transmission and reception [20] https://www.telefonica.com/es/web/press-office/-/telefonica-presents-aura-a-

Appendices

Appendix A.

Hierarchical clustering techniques comparison.

For testing and the Hierarchical Clustering functions, two datasets “Dataset 1” and “Dataset 2” were prepared, where only mobility states are represented.

For building the datasets, data vectors of the same size were needed. Therefore, a time scale of one day divided in time spans of 15 minutes were selected. For each 15 minutes time span a PCI evaluation was evaluated according to the following mobility criteria:

 If a PCI was stable for more than 60% of each time span, this PCI was chosen as the dominant PCI. This case was considered as a low mobility scenario.

 On the contrary, if no PCI was stable for more than 60% of the time, this case was considered as a medium-high mobility scenario, and therefore, no dominant PCI was chosen.

According to the mobility criteria, a dataset table was built as shown in Fig.

Fig. 31. Extract of Dataset1 pre-processed.

The time scale was arranged as columns. In the Dataset 1, the 15 minute frames were labelled and ordered from 0 to 23.45, being 0 the time 00:00 and 23.45 as the time 23:45. For calculating the edit distance between the mobility sequences the R library “stringdist” is used. The function stringdistmatrix() builds a dissimilarity matrix by comparing the hamming distance between the different sequence vectors. The resultant distance matrix “D” has the following structure where each coordinate is the distance cost between the column and row datasets.

Fig. 32. Edit Distance Matrix from Dataset1

This distance matrix is passed as input for AGNES and DIANA functions for building the cluster structure. The first cluster build is the HAC with average distance method:

Fig. 33. AGNES dendrogram of Dataset1 in R.

Fig. 34. DIANA dendrogram of Dataset1 in R.

The outcome of both hierarchical clustering techniques, AGNES and DIANA are highly similar. However, the fact that DIANA does not incorporate the option for controlling the linkage method used for joining the different cluster objects, provides uncertainty on the manner the cluster structure is built. For this project, only AGNES is used to overcome this limitation.

Appendix B

Hierarchical agglomerative clustering for characterizing mobility periods.

The Dataset 2, shown in Fig. 35 takes into account only the high mobility sequences along a normalized time scale. For the dataset preparation, only high mobility periods of ten registers labelled from 1 to 10, each one representing 15 minutes spans. The same criterion for PCI evaluation and high mobility periods used for the first dataset was used.

Fig. 35. Dataset2 representing high mobility and non-mobility periods in a normalized time scale.

This case focuses in characterizing the different mobility states described in the Dataset 2 by implementing HAC technique through AGNES function in RStudio.

After calculating the edit distance on the dataset, the AGNES algorithm smartly characterizes the different mobility routines in two big groups: morning and afternoon. The algorithm also subclassifies into both groups the changes in the summer routine.

Fig. 36. RStudio Hierarchical Clustering dendrogram with average linkage.

The HAC algorithm is capable of classifying the mobility behavior according to the mobility periods of the user and the base station registered in the low mobility state. As the Fig. 36 shows three main groups including the morning branch, morning summer branch and the afternoon branch.

However, it can be extended for characterizing the mobility periods in of full days, helping to identify if one day is a labor day, weekend and vacation season.

In document Characterization of user mobility trajectories by implementing clustering techniques (Page 64-72)