• No results found

2.2 Trajectory Data Management and Applications

2.2.2 Trajectory Data Queries

In trajectory database, where objects are a non-uniform series of spatial locations, attached with temporal attributes, one must take sequentiality and the temporal dimension into account. For distance- base queries, the problem is even more challenging, once we need a distance function to calculate the

distance (i.e. similarity) between trajectories, which is not a trivial problem, due to the non-uniform sequential nature and temporal dimension of trajectories. Nevertheless, tens of similarity distance measures for trajectory data have been proposed in the literature [157] [162], we discuss trajectory distance measures in more details in Section 2.2.4.

Formally, given an input trajectory dataset T with n records, for any two trajectories Ti, Tj∈ T,

d(Ti, Tj) denotes the distance (or similarity) between them. In this work we consider the following

operations over trajectory datasets, due to their wide application in practice [31] [33] [35] [56] [130] [153] [158] [172].

A spatial-temporal selection retrieve all trajectories within a given spatial region and time interval, similar to a SELECT/FROM/WHERE clause in relational databases, where the predicate is the trajectory overlapping with both the region area and time interval.

Definition 3. (Spatial-Temporal Selection) Given a trajectory dataset T, a spatial region R, and a time interval from t0 to t1, a spatial-temporal selection, namely ST(T, R,t0,t1), finds all trajectory

segments si∈ T active during [t0,t1] which intersects with the region of R, that is ST (T, R,t0,t1) =

{si∈ T | si⊂ (T ∩ R ∩ [t0,t1])}.

Selection queries are useful to select a small sample of a big dataset for a given time interval and spatial predicate (e.g. range selection, intersect, overlap), for example: “select all trajectories from a given neighborhood in New York city, active yesterday during peak time”. An example of a spatial-temporal selection query area R is given in Figure 2.6.

R

[t

0

, t

1

]

Figure 2.6: Example of spatial-temporal selection, where given a query region R over the city of Brisbane, we want to retrieve only those trajectories inside the area of R and active during a given time interval[t0,t1].

2.2. TRAJECTORY DATA MANAGEMENT AND APPLICATIONS 21

Definition 4. (Topological Selection) Given a trajectory dataset T, a query object Q (e.g. a polygon, a trajectory, a circle), and a topological predicate ⊗ (e.g. intersect, touch, overlap), a topological selection finds all trajectories Ti∈ T, such that Ti⊗ Q is true.

Definition 5. (Distance Selection) Given a trajectory dataset T, a query trajectory Q, a trajectory distance function d(Ti, Tj), and a distance threshold τ, a distance selection finds all trajectories Ti∈ T,

such that d(Ti, Q) ≤ τ.

Definition 6. (Shortest Path) Given a trajectory dataset T (as a sequence of spatial points, or in map representation), two query locations Qiand Qj, the Shortest-Path operation finds the trajectory Ti∈ T

connecting Qito Qjwith the shortest distance.

The k-Nearest-Neighbors (k-NN)1 for trajectories is a distance-based query that returns the k closest (i.e. most similar) trajectories from a given trajectory Q, in a given time interval t0to t1.

Definition 7. (k-NN Trajectories) Given a trajectory dataset T, a query trajectory Q (Q might be a series of query locations), a time interval from t0to t1, a trajectory distance function d(Ti, Tj), and an

integer k ≥ 1, the k-Nearest-Neighbor trajectories of Q, denoted as k-NN(Q,t0,t1), is a subset of T,

such that for every trajectory Ti∈ k-NN(Q,t0,t1), and for every trajectory Tj∈ T – k-NN(Q,t0,t1),

d(Q, Ti) ≤ d(Q, Tj), where Tiand Tj are active during [t0,t1], and |k-NN(Q,t0,t1)| = k.

Definition 8. (k-NN Trajectories Join) Given two trajectory datasets S and R, a time interval from t0 to t1, and an integer k ≥ 1, the k-Nearest-Neighbor trajectories Join, denoted as S onkNN R, finds

in R the k-NN(si,t0,t1) for all trajectories si∈ S. This problem is also known in the literature as

All-Nearest-Neighbors (ANN), when S = R.

The Nearest-Neighbor query (NN) is a special case of the k-NN for k = 1. The problem of identifying similar (or close) trajectories, in particular, is useful for automatic classification and recommendation systems, origin-destiny analysis, and identify objects that move in a same pattern, for instance. As an illustrative example, suppose that in a big city a subway service has been under construction; it would be of great assistance to the experts in the field to know the similarity between the current public transportation services (e.g. bus lines) and the subway lines under construction; in order to re-organize the public transportation routes, and propose timetables and metro stations, for instance [56]. Another particular case of distance-based queries for trajectory dataset is Reverse- Nearest-Neighbors (RNN).

Definition 9. (RNN Trajectories) Given a trajectory dataset T, a query trajectory Q, and a time interval [t0,t1], the Reverse-Nearest-Neighbors of Q, denoted as RNN(Q,t0,t1) finds all trajectories

Ti∈ T active during [t0,t1] which have Q as their Nearest-Neighbor (NN), that is, a trajectory Ti∈ T

belongs to RNN(Q,t0,t1) iff 1-NN(Ti,t0,t1) = {Q}.

A similar problem to the k-NN trajectories introduced by Chen et al. [33] aims to search for the k-Best-Connected trajectories (k-BCT) to a given set of query points (i.e. trajectories that are close to all given locations); the k-BCT algorithm can be applied for trip planing, for instance.

Most approaches for trajectory data processing execute some sort of primitive query beforehand over the entire dataset, or a combination with keyword-based queries [190] [192], for instance: “Retrieve all trajectories in the city center of Brisbane, between March and April this year”, so that

one can identify points of interest in the city center for a given season [197] [198], and suggest transportation modes [196].

Or nearest neighbor queries, for instance: “Given the trajectory T of a route between two locations, retrieve the closest (most similar) trajectories from T ”, which can be used, for instance, in alternative routes suggestion [176] [177], public transportation analysis [34], or outliers detection [20] [82].