• No results found

GKD Process in Movement Databases

2.4 GKD in Movement Data

2.4.2 GKD Process in Movement Databases

Recognizing the importance and complexity of moving object data, the most remarkable advances in GKD have taken place in spatio-temporal and moving object databases alongside the increased production of such datasets using po- sitioning technologies, and geo-sensor networks (Miller and Han, 2009). In this context, Giannotti and Pedreschi (2008) introduced three main steps for knowl- edge discovery from movement databases, including, trajectory reconstruction, knowledge extraction, and knowledge delivery, as shown in Figure 2.5.

Trajectory reconstruction and preprocessing

In the first step, the subject movement datasets shall be processed to obtain trajectories of individual moving objects. The preprocessing stage involves all or some of the following processes depending on the purpose of analysis. The order may vary and some steps may even be revisited according to the application domain.

• Filtering, to detect and remove outliers using statistical approaches. • Resampling, to obtain regularly sampled trajectories (fixed time granular-

ity) using interpolation techniques.

• Smoothing, to remove the effect of noise from tracking data (e.g. GPS data) using approximation techniques such as Kalman filtering, moving average,

or Kernel-based smoothing (Jun et al., 2006).

• Map matching, to match the position of the individual with the actual map; mostly relevant in transportation and navigation applications (Bernstein and Kornhauser, 1996; Brakatsoulas et al., 2005).

Knowledge extraction

This stage refers to the process of exploiting knowledge discovery and data mining techniques, in order to discover patterns and structure in movement data and acquire useful knowledge about the behavior of moving objects. The knowledge extraction process can be carried out using the main data mining techniques such as pattern discovery, classification, clustering, and similarity analysis (Miller and Han, 2009; Giannotti and Pedreschi, 2008) (Figure 2.5). These techniques are briefly expanded on in the following:

a) Movement pattern discovery:

Movement pattern discovery is referred to as the process of finding interesting patterns in a large movement dataset by applying data mining methods such as exploratory data analysis, descriptive and predictive modeling, mining as- sociation rules, and other pattern recognition techniques. In their definition of KDD process, Fayyad et al. (1996) relate pattern extraction to fitting a model to data, finding structure, or making any high-level description from data.

Definition. A pattern reflects the behavior of a subset of data (Andrienko and Andrienko, 2007), and is defined as non-random properties and relation- ships that are valid, novel (i.e. nontrivial, unexpected), useful, understandable (i.e. simple, interpretable), and interesting (Fayyad et al., 1996; Laube and Purves, 2006).

Research Paper 1 (page 81) provides a broad overview and a classification of different types of movement patterns from the related literature. Recently, in a comprehensive review Laube (2009) documented the research progress on the development of techniques to formalize, discover, and understand movement patterns.

b) Trajectory classification:

In KDD, classification is denoted as “finding rules or methods to assign data items to pre-existing classes” (Miller and Han, 2009, p.7). Accordingly, tra- jectory classification is defined as the process of applying model construction, segmentation, and recognition algorithms for identifying the class labels (i.e type) of moving objects based on their movement trajectories (Lee et al., 2008). Trajectory classification is very important in real world applications. For instance, extraction of information about the mode of transport (e.g., bi- cycle, car, train, and boat) from a movement dataset is essential for domains

such as travel behavior research, transportation planning, and traffic man- agement. A number of studies applied classification techniques in modeling and differentiating moving object trajectories in imagery and video surveil- lance databases (Fraile and Maybank, 1998), recognition of object activities (Bashir et al., 2007), and behavior studies of individuals (Blythe and Miller, 1996; Bay and Pazzani, 2001), to name but a few.

c) Trajectory clustering:

Trajectory clustering is one of the exploratory data mining techniques that fa- cilitate studying movement data and understanding its structure by reducing its complexity. In general, clustering is defined as the process of grouping a set of objects into classes of similar objects. Trajectory clustering is a process of grouping moving object trajectories based on their spatial and/or tempo- ral similarity. It can be applied to identify typical trends in datasets; and hence, supports deviation analysis to detect outliers and anomalies in data (Miller and Han, 2009). Furthermore, trajectory clustering can support data aggregation in empirical user studies to gain a better understanding of dy- namic cognitive processes and for evaluation purposes (Fabrikant et al., 2008; Çöltekin et al., 2010).

Miller and Han (2009) and Kisilevich et al. (2010) provided a survey of the re- cent progress in the development of trajectory clustering techniques. Overall, trajectory clustering techniques can be classified into two main categories:

i) distance-based clustering approaches, such as hierarchical and K-means clustering (Miller and Han, 2009), where a distance function is required to compute the distance (i.e. dissimilarity) between trajectories in space, or in space and time.

ii) density-based clustering approaches, such as DBSCAN (Ester et al., 1996) and OPTIC (Ankerst et al., 1999), where clusters are identified as a dense region in space based on a density threshold.

A large number of proposed trajectory clustering approaches rely on the sim- ilarity of the geometric shapes (Fu et al., 2005; Lee et al., 2007; Rinzivillo et al., 2008; Giannotti and Pedreschi, 2008; Miller and Han, 2009; Li et al., 2010). Geometric clustering techniques proposed so far do not necessarily capture spatio-temporal similarity between the movements of objects. Ad- ditional information is required to cluster trajectory data according to the spatio-temporal characteristics of moving objects. In this respect, recent work has focused on developing spatio-temporal clustering techniques for trajectory data (Nanni and Pedreschi, 2006; Etienne et al., 2010). However, this problem has not been fully addressed so far and effective techniques still need to be developed.

d) Movement similarity analysis:

Movement similarity analysis, also called movement similarity assessment, is referred to as the process of finding similarities in a large dataset, and is a key task in knowledge discovery. In fact, similarity analysis can also be seen as a low-level knowledge extraction technique, since its outcomes can substantially be exploited in the aforementioned data mining techniques (i.e. pattern dis- covery, classification, and clustering). For instance, most movement patterns, such as flocking and concurrency (Laube, 2005), emerge from similarity in one or several movement parameters. Also, clustering and classification processes rely on existing similarities among objects in datasets. Specifically, similar- ity assessment is a prerequisite for the first group of clustering approaches (i.e. distance-based clustering). Therefore, it is crucial to develop effective approaches to assess and extract similarities in movement data.

Considering that the major focus of this thesis is on similarity analysis of movement data, section 2.5 gives a detailed state of the art of the related literature.

Knowledge delivery

After the knowledge extraction process, it is essential to reason about the de- tected patterns, and evaluate the reliability, meaningfulness, and interestingness of the outcomes. Effective visualization techniques are required in order to ap- propriately present the results, support suitable interpretation of the results, and eventually deliver the appropriate knowledge about the subject movement dataset (Giannotti and Pedreschi, 2008).

Related documents