• No results found

Prediction of Next Location of Visit by using GPS Data

With the mobile phone becoming widespread human mobility data is now captured and stored, as never before. This motivated the research into human mobility patterns which in recent years started to receive a lot of attention as increasingly more volumes of detailed

1Global Positioning System

mobility data become available. The advancement and pervasiveness of wireless commu- nication technologies did not only cause an increase in the number of users taking part in human mobility studies but also meant that the areas considered in such studies are much larger in size than ever before. As a result, we have interesting findings from some of the recent research about users’ mobility patterns. It has been established that there is high regularity in mobility patterns exhibited by individuals despite the size differences of the areas in which people move [49, 93]; for instance, users in a city environment oscillate between home and work every weekday while students at a university campus regularly visit a set of specific rooms to attend classes [33]. Setting aside any strange or unusual visiting habits, the researchers in [92] were able to find universal laws that govern the users mobility behaviour when visiting new places or revisiting locations that they have already been to in the past. In [28] researchers found that members of the same social group exhibit the same mobility behaviours. A similar finding was provided in [33], where users from the same social group are likely to visit the same location when they are in the company of one another. A key benefit that can be drawn from these findings is that a reliable model can be developed for inferring users’ future movement. In the remainder of this section, we provide a critical review of selected works, from the technical literature, that address the problem of predicting the next Location of visit by using GPS Data.

2.2.1 Extracting Locations of Visit From Raw Data

An observed user’s location of visit can be a place that a user frequently visited in the past or a place that s/he stayed at for some significant time. Such a location does not have to be a place that the user visits in order to socialise with other people; for instance a restaurant. It can be any frequently visited place such as a petrol station or a busy junction in the user’s daily journey to work. Figure 2.1 highlights the set of locations learned from one individuals GPS recordings obtained from the Nokia MDC data set [71]. Some of these locations shown in the figure, i.e. Figure 2.1, correspond to geographical meaningful locations such as “home” or “work place” but equally there are other locations that do not correspond to such meaningful geographic places; for instance a busy junction on the road. Generally, a mobility trace in a GPS data set is a sequence of latitude and longitude pairs where each pair is associated with a time-stamp. In order to extract the locations of visit from such data a host of methods have been proposed over the past few years [5, 18, 31, 84, 118]. For example, in [5] a dual step method was proposed for extracting significant locations of visit which are later analysed to predict the next location of visit using a Markov model [84]. In step one, the significant locations of visit

are detected by using the points where the mobile device loses connection to the GPS satellites. In step two, clusters of locations are formed by using a variant of the K-Means algorithm. At the start of the clustering process, the locations clusters are centred at K selected points with a given radius - a cluster with a large radius here may correspond to a city while a cluster with a small one may correspond to a campus or an office building. The drawback of this method comes from its dependence on the lose of signal in order to detect locations of visit, i.e. the method would fail to detect locations that have continuous reception of signal; for example, it would fail in detecting open locations such as an open market with stalls for selling goods where the signal reception is likely to be uninterrupted. On the other hand, the method would probably succeed in detecting office buildings and other similar locations which are likely to have no GPS signal reception. The authors of [118] proposed a clustering method called DJ-Cluster which uses density and joining concepts in order to extract significant locations of visit. Similar to other density-based methods, a dense point in this method is a point that has a total number of neighbours greater than or equal to a user-defined minimum threshold required for all dense points. Clusters are then created by joining density points together in the same cluster if they have common neighbouring points between them. An improved method, proposed by the same authors, removes a GPS reading if it has speed greater than zero or if its distance from the previous reading is below a given threshold. The tests result of their new method indicate an improvement over the K-Means in terms of precision and recall, and DBSCAN [35] in terms of time and memory requirements. In [18] a semantics- enhanced clustering algorithm, called SEM-CLS, was proposed for extracting semantically meaningful locations. This method separates semantically different locations into different clusters and merges those locations with similar semantics into the same clusters.

2.2.2 Next Place Prediction Models

To decipher the complexity of predicting human mobility, many approaches have been proposed for building models that can accurately predict individuals’ future locations of visit. Generally, these approaches can be divided into three major categories based on the perspective from which the data is being considered: spatial, temporal, and joint spatio-temporal approaches. Researchers have investigated the user’s spatial patterns on mobile data and various prediction approaches have been proposed, such as [114]. Other proposed methods that rely on the user’s temporal patterns in order to predict the next place of visit, as shown in [3]. However, discovering the correct temporal patterns in human mobility is challenging, since temporal behaviour includes much more uncertainty

Figure 2.1: The locations visited by one of the users as detected by the DBSCAN algorithm [36]. The grey colour shows clusters of GPS points identified as noise while the other colours show the discovered locations of interest.

in comparison to the spatial behaviour [94].

In [87], Scellato et al. proposed a spatio-temporal framework, called NextPlace, which used non-linear time series analysis of users’ arrival time and residence time to predict temporal behaviour. Chon et al. [23], used fine-grained and continuous mobility data to evaluate several mobility models. They argued that the granularity of mobility data used in the literature is too coarse to precisely capture users’ daily movement patterns. Although joint Wi-Fi/Bluetooth traces were used as opposed to GPS data, Vu et al. in [103] introduced a framework for building predictive models of people movement. The proposed framework used a type-of-day categorisation (such as weekday and weekend) to filter redundant information from users’ historical data. Noulas et al. on the other hand, studied the problem of predicting the next venue that a mobile user will visit, by extracting features from check-ins data of Foursquare users. The extracted features exploit information about transitions between types of places, movement between different venues, and spatio-temporal patterns of user check-ins [79]. They proposed two learning models, based on linear regression and M5 model trees, which combine all individual features.

Using a list of thousands of candidate venues, the proposed supervised methodology which combines multiple features offered high levels of prediction accuracy, where M5 model trees was able to rank in the top fifty venues one in two user check-ins.

2.2.2.1 A Single-user Model Versus a Multi-user Model

Prediction models of future locations of visit have been predominantly implemented using a one-model-per-user approach. For example, Krumm [68], used a Markov model for mak- ing short-term route predictions for vehicle drivers. Ashbrook and Starner [4] suggested a model in which locations are incorporated into a Markov model that can be consulted for use with a variety of applications in both single-user and collaborative scenario where multiple single-user models can be shared. Unfortunately, it is not clear how they eval- uated their models apart from showing that the predictions for their single user model were compared against “random chance”. Also they did not address the situations in which the user has no mobility history to be exploited when predicting future location of visit. Moreover, sharing multiple single-user models inevitably raises concerns relating to the privacy of users’ information being compromised; for example, by a service provider gaining access to a user’s mobility history embedded in a single-user model for such a user. Contrary to the modelling style adopted in [4] and [68], Chapter 3 of this thesis presents a collective (i.e. a multi-user) next location prediction model which does not specifically store an identifiable individual user mobility records in order to predict future location of visits for such a user. This collective model is a principled and scalable implementation of a variable length Markov model. Furthermore, the same chapter, i.e. Chapter 3, presents various models that address the situations in which the user has no mobility history to be exploited for inferring future locations of visit.