2017 2nd International Conference on Computer Science and Technology (CST 2017)
ISBN: 978-1-60595-461-5
Algorithms for Mining Human Spatial-Temporal Behavior
Pattern from Mobile Phone Trajectories
Pei-ming BAO
a*, Gen-lin JI, Chen-lu WANG and Yi-bo ZHU
School of Computer Science and Technology, Nanjing Normal University Nanjing,Jiangsu, China a[email protected]
*Corresponding author
Keywords: Mobile phone trajectories, Movement pattern, Co-occurrence pattern, Spatial-temporal trajectories mining.
Abstract. The research on human spatial-temporal behavior has always been the focus of urban geographic studies. In today’s society, social networks, smart phones and smart cards can provide massive data for researchers to analysis human behavior. From mobile phone trajectories, this paper presents algorithms for mining human spatial-temporal behavior pattern, which includes not only the regular movement pattern of individuals, but also the co-occurrence pattern among user groups. The parallel algorithms in this paper are proposed for big data of user trajectories, and the experimental results show that the proposed algorithms are effective. The research on human spatial-temporal behavior pattern can be applied to urban spatial-temporal behavior analysis, smart city construction and other areas.
Introduction
The research on human spatial-temporal behavior has always focused on urban geography[1-9]. With the change of the data collecting methods, the research on human spatial-temporal behavior has showed different characteristics. In an early phase, the research data were mainly obtained through the investigation or the interview method. However, the data collecting cost is high, the sample is small, the time span is short, and the questionnaire is subjective. With the development of GIS and network map technology, GPS and other methods has become the main means of data acquisition. Using network map and thematic information integration technology, the research of human spatial-temporal behavior can clearly express the changes of urban spatial structure and human activities. In recent years, the rapid development of the Internet and smart devices, social networking, mobile phone, smart card and so on provide massive data for human behavior research, for example, mobile phone call traffic and base station data, taxi trajectory information, bus or subway smart card data. These massive data have promoted the development of the research methods of human spatial-temporal behavior.
Ahas introduces a model for the location of meaningful places for mobile telephone users, such as home and work anchor points, using passive mobile positioning data[14]. At present, massive data can be obtained by the GPS, the Internet and mobile phone etc. There are new demands in the technical means for the research of human behavior. Parallel technology and data mining technology are the important way in the era of big data[15]. These can be used to discover valuable information in human temporal behavior. However, there is little research on mining human spatial-temporal behavior pattern by massive data of mobile phone base station at home and abroad.
This paper proposes an algorithm for mining user’s regular movement pattern based on the mobile base station data. It is helpful to analyze individual spatial-temporal behavior, as well as abnormal behavior. By comparing regular movement patterns, this paper also proposes an algorithm for mining co-occurrence pattern among users. It is helpful to analyze group spatial-temporal behavior, and can provide data support for the research of urban space behavior, the construction of Smart City and other fields.
Problem Definition
Definition 1(Mobile trajectory): Mobile user’s trajectory is a path showing how user moves in space and time. The trajectory can be expressed as a set of base stations with a time stamp, denoted asTr={<c1,t1>,<c2,t2>,,<cn,tn>},where ciis a set of base station during the time interval ti,i=1,2,,n.
Definition 2(Mobile trajectory segment): Mobile trajectory segment reflect user’s track in a day and night, denoted as trMu ={<cm0,0>,<cm1,1>,,<cm23,23>} . Where
23 , ,
0 represent the 24 time units in a day. cmiindicates a set of base station where the user is located during the i-th time unit, i∈[0,,23]. u is a user label. M is the date. Mobile trajectory segment is abbreviated as trajectory segment.
Each trajectory segment has same dimension, so time stamp can be omitted. The trajectory segment can be abbreviated as trMu ={cm0,cm1,,cm23} . For example,
} , , , , , , , , , , , , , , , , , , , , , , , {
= aaaaφaaaabbbbbbbbbbabaφaφa
trMu stands for a trajectory segment, where
a and b represent different base stations, φindicates no sampled data.
Definition 3(trajectory segment similarity): For measuring the similarity between trajectory segments, such as trM1 ={<c1m0,0>,<c1m1,1>,,<cm123,23>} and
} > 23 , < , , > 1 , < , > 0 , < {
= 20 21 223
2
n n
n
N c c c
tr , the similarity is as Eq.1.
24 / ) | | | | ( = ) , ( 23 0 = 1 2
2 1 2 1 ∑ ∪ ∩ i mi ni
ni mi N M c c c c tr tr
Sim . (1)
Obviously, Sim(trM1,trN2)∈[0,1].
spatial-temporal behavior. In this paper, the user’s movement pattern is discovered according to the regularity of the user's trajectories.
Definition 4(movement pattern): The mobile trajectories of a user u is divided into
many trajectory segments by day, donated as u
Md u
M u
M tr tr
tr 1, 2,, . Extracting frequent trajectory segment is as follow.
} > 23 , < , , > 1 , < , > 0 , < {
= 0u 1u 23u
u p p p
P . (2)
If the following conditions are met, then Pu is called for a movement pattern of the user u duringT , which is expressed as<pu,T>.
(1) T⊆{M1,M2,,Md} And|T|≥q, ∀t∈T, Sim(Pu,trtu)≥h.q And h are the thresholds. (2) u
k
p Is a set of base station, which frequently appears in the k-th hour unit duringT. If no base station frequently appear in the k-th hour unit, thenpku=φ,k∈[0,23].
Movement pattern indicates the regularity of the daily trajectories of the user u over a period of time. Frequent trajectory segment Pu can also be abbreviated asPu={p0u,p1u,,pu23}.
Definition 5(co-occurrence pattern): Movement patterns of user A and B are
> ,
<pATA and<pB,TB>, Pu={p0u,p1u,,p23u},u∈{A,B}.If the following conditions are met, then there is co-occurrence pattern of user A and userB.
(1)TA∩TB≠φ
(2) B
i A i AB
i p p
p = ∩ , and ∃piAB≠φ, piAB is a common set of base station in hour unit ] 23 , 0 [ ∈
i for userAandB.
Co-occurrence pattern is expressed as<{(piAB,i)|piAB≠φandi∈[0,23]},TA,TB>. For example, the movement pattern of user Aand Bare as follows:
> }, , , , , , , , , , , , , , , , , , , , , , , , {
< aaaφaa aabbbbbccbbbbbaa aaaaa TA
and<{e,e,e,e,φ,e,e,ef,f,f,f,fc,c,f,f,f,f,fe,e,e,e,e,e,e},TB>, a、b、c、eand frepresents base station respectively, co-occurrence pattern of Aand B is <{(c,11),(c,12)},TA,TB>. In the eleventh and twelfth hour unit, the user AandBfrequent appear at the base station c.
Movement pattern reflects the individual's spatial-temporal behavior. Co-occurrence pattern reflects the common spatial-temporal behavior between two users, also can be extended to multiple users, so it shows group pattern.
Algorithms for Mining the Human Spatial-temporal Behavior
In this section, firstly, the algorithm for mining user’s movement patterns is described, and then the algorithm for mining co-occurrence pattern between two users is proposed. The detailed steps of the two algorithms are described as later.
The algorithm for mining user’s movement pattern is based on the principle of frequent item set. Firstly, the trajectories are converted and merged into binary sequence; the binary sequence is constantly shifted to right several bits. New binary sequence is compared with the original binary sequence, the frequent elements are found by convolution principle. Finally, movement pattern is calculated. The algorithm of Mining Movement Pattern (MMP) is shown in algorithm 1.
Algorithm 1: Mining Movement Pattern (MMP)
Input: Base station trajectories of one user’s mobile phone, thresholds q and h for the number of trajectory segments in clustering and similarity.
Process:
Step1: Base station trajectories of the user’s mobile phone are divided into trajectory segments by day.
Step2: According to the similarity thresholdh, the user's all trajectory segments are clustered. The similarity measure method is adopted by definition 3 for the trajectory segments.
When the number of trajectory segments is more than qin a clustering, step 3 to step 7 needs to be implemented for the clustering.
Step3: The trajectory segments in a clustering are converted to a binary sequence. 3.1 The number of different base station in the clustering is counted, denoted as δ; 3.2 δbase stations are sorted, and each base station is given a position number in order, marked as from 0 toδ-1. Each base station is encoded with δ bits, the corresponding base station bit on the encoding is marked as "1", the remaining bits are recorded as "0".
3.3 If there are two or more base stations at one-hour unit, OR logic function of their codes is performed at this hour unit. A trajectory segment contains 24 sets of base station, so a trajectory segment is represented as δ×24 bit.
3.4 In chronological order, k trajectory segments in clustering are connected to 24
× ×δ
k bit binary sequence.
Step4. The binary sequence is constantly shifted to right δ×24bit. New binary sequence is compared with the original binary sequence. Each position corresponds to a number, start bit number is 0. If the value of the corresponding position is “1”, then the position number in the original binary sequence is recorded. Get a set of position number, denoted asW.
Step5. Wis grouped into Wi by base station.
Each position number in W take the remainder when dividing by δ, the remainder is base station code. The position number which remainder is i is stored in Wi, that is to say, all position number of base station i are placed in Wi.
Step6. For eachWi, each position number x in Wi is converted to a hour unit l, composed of new set Wi'.
For each position number x inWi, formulas such as l=(k×24-|x÷δ|-1)mod24 are calculated, where l indicates hour unit, in this period, the base station i appear in trajectory segment.
Step7. Constructing movement pattern
The hour units of frequent emergence are counted in Wi'. According to the time sequence, base station i is added into movement pattern. If there is no frequent item in a hour unit, then base station is represented by φ.
Table 1. The trajectory segments in M1-M4 day after clustering.
date 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
M1 a a a a 0 0 a a a ab b b 0 0 b b b b ab a a a a a
M2 a 0 a 0 a a a a a ab b b 0 0 b b b b ab a a a a a
M3 a a a a a 0 a a a ab b b 0 0 b b c c abc a a a 0 a
M4 a a a 0 a a a a 0 ab b b b 0 b bc bc b ab a a a a a
Table 2. Binary code of trajectory segments.
date 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
In table 1, there are four trajectory segments; every trajectory segment represents a sequence of base station locations in 24-hour units per day. Table 2 shows binary code of the trajectory segments. By algorithm 1, movement pattern extracted from four trajectory segments is written as: <{a,a,a,φ,a,φ,a,a,a,ab,b,b,φ,φ,b,b,b,b,ab,a,a,a,a,a},{M1,M2,M3,M4}>
The movement pattern in the four days shows that there is regular activity such as a
b
a→ → in the user’s life. We can assume that the user’s home is in the range of base station a and he works in the range of base stationb.
Using algorithm 1 to extract one user's movement pattern, it can be used to analyze individual spatial-temporal behavior. For mining co-occurrence pattern between two users, an approach of intersection between movement patterns is adopted. The algorithm of Mining Co-occurrence Pattern (MCP) is shown in algorithm 2.
Algorithm 2. Mining Co-occurrence Pattern (MCP) between two users Input: movement pattern of user A and B
Output: If there are co-occurrence patterns between A andB, then output them. There may be more than one.
Process:
Step1. The total number of movement pattern of user A and B are counted respectively, denoted as m andn. Initializationi=1,j=1,Flag=0.
Step2. Select one of movement patterns of user A and Brespectively, denoted as:<{p0A,p1A,,p23A},TA> and<{p0B,p1B,,p23B},TB>.
Step3. IfTA∩TB=φ, then there is no co-occurrence pattern between two movement patterns, and go to step5.
Step4. If ∃ptiAB=ptiA∩ptiB≠φ, ti∈[0,23], then output co-occurrence pattern between two movement patterns as <{(ptAB1 ,t1),(ptAB2 ,t2),,(ptkAB,tk)},TA,TB>,and Flag=1.
Step5. j= j+1. if j≤n, then return Step2.
Step6. i=i+1. if i≤m, then j=1, and return Step2.
Step7. If Flag=0, then output “no co-occurrence pattern between two users” Obviously, algorithm 2 is easy to be extended to multiple-users.
Parallel Algorithms for Mining Human Spatial-temporal Behavior Pattern In 2016, the number of mobile phone users in China has reached 620 million; the number of users using mobile phones is far more than the number of users using PC. There is a lot of potential useful information in the massive data of mobile phone, and the research based on massive data can produce better results than small data. Therefore, the analysis of human spatial-temporal behavior based on massive data is the inevitable trend. In the ear of massive data, research on the parallel algorithm is an important for mining human spatial-temporal behavior pattern.
MMP and MCP are suitable for massive data analysis and processing, and can be performed on MapReduce programming model. Its reasons are as follows.
First, although the total amount of users’ trajectories is huge, a single user’s trajectory data is limited. Movement pattern is mined on a single user’s trajectories in MMP, and can be processed in a single node. Using the "divide and rule" strategy, different user’s movement patterns can be distributed parallel mining on different nodes. Similarly, MCP can also be carried out on distributed platform.
Finally, trajectories data are easy to be expressed in the form key/value, which is needed in the MapReduce programming model.
A parallel algorithm framework for mining movement pattern and co-occurrence pattern in massive data is shown in Figure 1. Before mining the co-occurrence pattern, all mining tasks of movement patterns must be completed, so a synchronization barrier is required. At the same time, it is necessary to collect, sort and distribute the data from the previous stage.
Algorithm 1 ……
……
…… data are divided into disjoint trajectory segment
Algorithm 2 D2
D1 Dn
Co-occurrence pattern
iteration
synchronization barrier Movement
[image:6.612.216.413.171.327.2]pattern <P1,T> <P2,T> <Pn,T>
Figure 1. A parallel algorithm framework for mining movement pattern and co-occurrence pattern.
Experiments
Data Source
Experiment 1 used Reality mining data set, provided by 106 volunteers. Trajectories of 81 users during September to December in 2004 were chosen for experiment. Experiment 2 used synthetic data.
Result of Experiment
In the experiment 1, Movement pattern were mined for every user. The 40 users only be found a single movement pattern, and other 41 users have two or more than two kinds of movement patterns. There are 5 laboratory graduates. They appeared in the scope of the base station 5119 during working hours daily, and appeared in the scope of the base station 5113 at the rest of the time. It can be understood as the “home-laboratory-home” regular pattern.
Table 3. Co-occurrence pattern among 3 users.
user1 user2 co-occurrence pattern time of co-occurrence
4 26 <{(b,8),(b,9),(b,10),(b,11),(b,12),(b,13),(b,14),(b,15),(b,16),(b,17),(b,18),(b,19) ,(b,20),(b,21)},T
1,T3>
Early September ~ Middle of November 4 26 <{(c,0),(c,1),(c,2),(c,3),(c,4),(c,5),(c,6),(c,7),(c,8),(c,10),(c,11),(c,12),(c,13),(c,14),(c,15),(c,16),(c,17),(c,18),(c,19),(c,20),(c,21),(c,22),(c,23)},T
2,T4> Late November
26 82 <{(b,0),(b,1),(b,2),(b,3),(b,4),(b,5),(b,6),(b,7),(b,8),(b,9),(b,10),(b,11),(b,12),(b,17),(b,18),(b,19),(b,20),(b,21),(b,22),(b,23)},T
3,T5>
Late September ~ Middle of November 26 82 <{(c,0),(c,1),(c,2),(c,3),(c,4),(c,5),(c,6),(c,7),(c,8),(c,9),(c,10),(c,11),(c,12),(c,13),(c,14),(c,15),(c,16),(c,17),(c,18),(c,19),(c,20),(c,21),(c,22),(c,23)},T
4,T6> Late November
4 82 <{(b,8),(b,9),(b,10),(b,11),(b,12),(b,17),(b,18),(b,19),(b,20),(b,21),(b,22),(b,23)},T1,T5> Early November
4 82 <{(c,0),(c,1),(c,2),(c,3),(c,4),(c,5),(c,6),(c,7),(c,8),(c,9),(c,10),(c,11),(c,12),(c,13),(c,14),(c,15),(c,16),(c,17),(c,18),(c,19),(c,20),(c,21),(c,22),(c,23)},T
[image:6.612.116.510.571.695.2]From the experiment results of co-occurrence pattern, there are 17 users, and there are at least two kinds of co-occurrence pattern between any two users. Table 3 only lists the co-occurrence patterns among 3 users. A group is about the base station 5119(denoted as b in table 3),the other group is about the base station 5188(denoted as c in table 3). According to changes of co-occurrence pattern, these 17 users together were from the scope of base station 5119 to the scope of base station 5188 in late November 2004. Combined with the communication between the users, it can be inferred that this is a group of 17 people.
According to the survey report of their identity, user 4 is no first-year graduate student in the laboratory, user 26 and 82 are first-year graduate students. In table 3, it is obvious that there is strong co-occurrence pattern between user 26 and 82. They are always in the scope of same base station all day. From early September to middle of November, there are co-occurrence pattern only during working hours per day between user 4 and user 26 or 82. But they almost stay together in late November. In late November, the laboratory's students transferred together from the base station b to the base station c, there may be a common activity. The co-occurrence pattern not only reflects the same activity regulation among users, but also reflects the change regulation in their activity.
Efficiency of Parallel Algorithm
[image:7.612.217.415.399.514.2]In the experiment 2, the environment adopts 18 servers (1 master node, 17 slave nodes), and each node has the same configuration. The parallel mining algorithm is implemented on MapReduce platform. The efficiency of the algorithm is shown in Figure 2. With the increase of data, the running time of the parallel algorithm is slow, while the running time of the serial algorithm is rising rapidly.
Figure 2. Running time of parallel algorithm.
Summary
Based on spatial-temporal data of mobile phone base station, this paper proposed algorithm for mining movement pattern of individual regular activity. By the similar movement pattern, algorithm is presented for mining the co-occurrence pattern between users. Experimental data is derived from a public data set. Experimental results show that the research work in this paper can effectively find the significant relationship among mobile phone users, and can find the changes of the life of individuals and groups. The algorithms can apply to other spatial-temporal data for human behavior. The algorithms are universality.
co-occurrence pattern can be used to provide data support for the city's intelligent transportation, smart security and intelligent urban management, and widely service the wisdom city construction. Such as urban planning, social management, and the residential services, etc. On the other hand, the co-occurrence pattern can provide the basis for discovering human social relationship.
Acknowledgment
This work is supported by National Natural Science Foundation of China under Grants No. 41471371.
References
[1] Alexandra Millonig and Georg Gartner. Exploring Human Spatio-Temporal Behaviour Patterns[C]. Proceedings of the 17th International Research Symposium on Computer-based Cartography. 2008.
[2] Günther Sagl, Bernd Resch, Bartosz Hawelka, Euro Beinat. From Social Sensor Data to Collective Human Behaviour Patterns – Analysing and Visualising Spatio-Temporal Dynamics in Urban Environments. Jekel, T., Car, A., Strobl, J. & Griesebner, G. (Eds.) (2012): GI_Forum 2012: Geovizualisation, Society and
Learning. © Herbert Wichmann Verlag, VDE VERLAG GMBH,
Berlin/Offenbach.2012:54-63.
[3] Xingxing Xing, Man Li, Weisong Hu, Wenhao Huang, Guojie Song, Kunqing Xie. A Spatial-temporal Topic Segmentation Model for Human Mobile Behavior. 15th International Conference, WAIM 2014, Macau, China, June 16-18, 2014:255-267.
[4] Gunarto Sindoro Njoo, Xiao Wen Ruan, Kuo-Wei Hsu, Wen-Chih Peng. Inferring User Activities from Spatial-Temporal Data in Mobile Phones. UbiComp/ISWC'15 Adjunct: Adjunct Proceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing and Proceedings of the 2015 ACM International Symposium on Wearable Computers,2015:65-68.
[5] Jameson L. Toole, Michael Ulm, Marta C. González, Dietmar Bauer. Inferring land use from mobile phone activity. UrbComp '12: Proceedings of the ACM SIGKDD International Workshop on Urban Computing. August 2012:1-8.
[6] Mi Diao, Yi Zhu, Joseph Ferreira Jr and Carlo Ratti. Inferring individual daily activities from mobile phone traces: A Boston example. Environment & Planning B Planning & Design, 2015, 43:1-20.
[7] Huang Xiao Ting, Wu Bi Hu. Intra-attraction Tourist Spatial-Temporal Behaviour Patterns[J]. Tourism Geographies, 2012, 14(4):1-21.
[8] Francesco Calabrese, Mi Diao, Giusy Di Lorenzo, Joseph Ferreira, Jr., Carlo Ratti. (2013) Understanding individual mobility patterns from urban sensing data: A mobile phone trace example. Transportation Research Part C: Emerging Technologies 2013,26(1):301–313.
[10] Yuan Y, Raubal M, Liu Y. 2012. Correlating mobile phone usage and travel behavior: A case study of Harbin, China[J]. Computers, Environment and Urban Systems, 2012,36(2):118-130.
[11] LONG Ying, ZHANG Yu, CUI Chengyin. Identifying Commuting Pattern of Beijing Using Bus Smart Card Data [J]. Acta Geographica Sinica, 2012,67(10):1339-1352.
[12] Farrahi K, Gatica-Perez D.2011.Discovering routines from large-scale human locations using probabilistic topic models[J]. ACM Transactions on Intelligent Systems and Technology (TIST), 2011,2(1):135-136.
[13] Phithakkitnukoon S, Horanont T, et al. 2010.Activity-aware map: iIdentifying human daily activity pattern using mobile phone data[C]. Salah A A, Gevers T, Sebe N, et al. Human Behavior Understanding. Berlin, Germany: Springer 2010:14-25. [14] Ahas R, Silm S, Järv O, et al. 2010.Using mobile positioning data to model locations meaningful to users of mobile phones[J]. Journal of Urban Technology, 2010,17(1):3-27.