PoI Network Analysis - The ComeTogether Methodology

3.3 The ComeTogether Methodology

3.3.2 PoI Network Analysis

Within the large set of available measures to quantify different phenomena in networks [109], we consider the clustering coefficient and the average shortest path length, as they are a good means for distinguishing real networks from random ones.

The number of triangles, representing the transitivity among three nodes in a network, is measured by the clustering coefficient of the nodes. This measure is used to investigate how clustered is the network, i.e., which is the probability of having an edge between two nodesA and C if there are the links A and B, and B and C. This property is usually evaluated in networks to identify some fundamental characteristics, where real networks, like biological and social networks, usually have larger cluster coefficient w.r.t to graphs randomly generated with the size number of nodes.

The distance between any two nodes in the network is also important for a first understanding of the structure. This is usually computed by the average shortest path between any two nodes in the network, contributing to understand if one node could be reached from another one in a few links on average. For this evaluation, low values are usually found in real networks.

We next present a characterization framework for the nodes in our networks, which is used in Section 3.5, together with the above measures, to study the basic properties of the PoI networks that we have built.

From network connectivity to mobility-related measures

Our aim here is to give a meaning of the nodes based on some properties representing their usage from the mobile users. Let us denote A a set of attributes assign to each node in the network. We

Algorithm 1: PoI Network Builder

Input: A set of positional observations O, a set of GPOIs V ,

an assigning function f from candidate stops to GPOIs, a temporal threshold (minimum stop time) as ∆t,

spatial threshold (maximum stop area) as ∆s for stop detection and

a temporal threshold (maximum moving time) as ∆m for creating users’ trips Output: POI network N = (V, E, W )

1 N.V ← ∅

2 N.E ← ∅

3 N.W ← ∅

4 for each Ou∈ O do

// create users’ history

5 Hu← userHistory(Ou)

// identify candidate stops, def. 3

6 Su← candidateStop(Hu, ∆t, ∆s)

// create user’s trajectories based on def. 5

7 Tu← createUserTrajectory(f, Su, V )

// create user’s trips based on def. 6

8 T ripu← createUserTrip(Tu, ∆m)

9 for each t ∈ T ripudo

10 for each hvi, vji t do

// create nodes vi and vj

11 N.V ← V ∪ {vi, vj}

// create edge eij

12 eij← hvi, vji

13 N.E ← N.E ∪ eij

// update the weight wij of the edge eij

14 update wijin N.W

15 end 16 end 17 end

define it as

A = {Ausers, Astoptime, Amovement}

Each attribute in A has a specific contribution to capture some relevant information about the nodes in the networks. They described below.

• users quantifies the number of users associated to the node (PoI), that is, the number of users that visited the PoI represented by that node. It gives some feedbacks regarding the popularity in terms of visits that such PoI has;

• stoptime is a relevant measure to quantify the duration of the visit. This is computed by the average stop time over all trips;

• movement represents the spatial dimension to capture how far are the node’s neighbors. This attribute interestingly captures the notion of how far the users are willing to move to reach another PoI in the city. We can see it as the spatial proximity between two node in the network.

With these selection of attributes, we can intuitively introduce an interpretation for the nodes depending on the different values of each attribute. We qualitatively categorize the values of these attributes to low and high. Combining these values, we propose a set of classes composed by Personal Spot, Popular Local, Popular Global, Hot Spot Local, Hot Spot Global and Undefined. These classes are discussed below and they summarized in Figure 3.3.

Users Stop Time Stop Time Movement Movement Popular Local Popular Global Hot Spot Global Hot Spot Local Personal Spot Unclassified High Low High Low Low High High Low High Low

Figure 3.3: Summarization of node classes based on users, stoptime and movement attributes.

Personal Spot. Nodes with low number of users, but high stop time. This class stands for PoIs that a few users have visited, but they have spent long time. Therefore, they could be deemed as personal spots that are visited by a few users, and for a long time and thus it is probably of personal interest of the user. Examples of PoIs in this class are the gym or some clubs;

Popular Local. Nodes with high number of users and high stop duration, but low movement value. This represents popular places since many users are visiting the place for a long time, but in a local perspective, like places that are popular in their corresponding neighborhood. An example for this class could be a popular supermarket that is mainly visited by people from the neighborhood;

Popular Global. Nodes with high values of all the attributes. In essence, it corresponds to places that are popular, people tend to spend long time and tend to displace from distance places. An example could be an important Shopping Mall that attracts people from different parts of the city, and where they spend long time;

Hot Spot Local. Nodes with high number users, low stop time, and low spatial values. This class encompasses places where people move to spend short time moving for short distances. Possible examples could be a pharmacy or a bar, where many people that live close stop for a few minutes to buy some medicines or drinking a coffee. We could represent this class with facility places; Hot Spot Global. Nodes with high number of users, low stop time and high movement. This class represents places that receive many people coming from many areas of the city to spend short time. We could interpret the airport as a member of this class, where people go there to bring or pick up friends or relatives and they tend to come from different parts of the city;

Undefined. All the other combinations are considered undefined, since they are not statistically meaningful w.r.t the attributes in A.

We can easily notice that local and global properties are mainly related to the values of the attribute movement, while hot spot and popular are related to the stop duration. This classification provides some interesting meanings to the PoIs in the city in addition to their standard categories. Indeed, while categories are static label assigned by some domain expert, these labels are given by the networks, based on where the places are located in the graph.

These attributes can still be used by location-based services taking into account the PoIs’ characteristics. For instance, urban agents might identify PoIs that tend to cause traffic congestions (e.g. popular global PoIs) or PoIs for which people are willing to move far distances (e.g. global PoIs) and related them to possible traffic problems. Even location-based recommender systems could exploit the characteristics of the PoIs and users to produce meaningful recommendations.

There may be several ways to define a good threshold for splitting the distribution of the above attributes into low and high values. The details of the method chosen in this chapter are presented in Section 3.5. In general, there are at least two possibilities: exploiting domain knowledge from experts in a top-down fashion, or using a bottom-up approach where these values are directly inferred from data. In our experiments, we chose the bottom-up approach. However, our methodology is general and does not depend on the chosen strategy.

We can relate the above definitions to research Question 1, namely, can we study urban mobility at a global scale from the perspective of PoIs instead of users? We believe that the characterization of PoIs based on people’s mobility is a possible way to answer this question.

Once we have identified such characteristics of the PoIs, it is still important to understand how they relate to each other at a global scale, i.e., how the movements among them create structures of

groups known as communities. This leads to the research Question 2, i.e. are there any patterns of such behaviors? Finding communities is a way to find patterns in the POI network, as presented hereafter.

In document Recommending places blased on the wisdom-of-the-crowd (Page 64-68)