• No results found

Abstract. Figure 1: The Taxonomy of Spatial and Spatiotemporal data warehouse

N/A
N/A
Protected

Academic year: 2021

Share "Abstract. Figure 1: The Taxonomy of Spatial and Spatiotemporal data warehouse"

Copied!
25
0
0

Loading.... (view fulltext now)

Full text

(1)

CSCI 8715 Spatial databases

Spatial Data warehouses

A Survey

Group 4:

Nipun Garg 4282567

Surabhi Mithal 4282643

(2)

Abstract

The field of Spatial Data warehouses has been emerging since the past decade due to the need to analyze large volumes of spatial data. The data once stored in a spatial data

warehouse has to be queried using spatial online analytical processing (SOLAP) systems. The research in field of spatial data warehouses has been on conceptual models, materialization of spatial indexes, aggregation operations and SOLAP. In this paper we give an overview of the core concepts in a spatial data warehouse and recent advancements in the field.

1.

Introduction

Spatial data warehouses aim at effective and efficient querying of spatial data. Spatial databases are suited for answering regular transactional queries where there is not a lot of historical component or aggregation. The class of queries that are needed to support the decision making process are difficult on spatial data bases. This gave a rise to the field of spatial data warehouses which is idea of combining the traditional data warehouses with spatial databases. Spatial data warehouses are based on the concepts of Data warehouses and additionally provide support to store, index, and aggregate and analyze spatial data [33].

A data warehouse consists of facts and dimensions modeled in a star or snowflake schema. A data cube is a lattice of cuboids which represents hierarchies. The data cube may have cells which are pre computed for efficient query processing. Common OLAP operations include slicing, dicing, Roll up and Roll down. These concepts are extended to spatial data in a spatial data warehouse. Integration of spatial data and data cubes make many interesting spatial aggregation queries possible.

Figure 1: The Taxonomy of Spatial and Spatiotemporal data warehouse Figure 1 shows the relationship between spatial, spatiotemporal and a data warehouse.

(3)

The major characteristics of a spatial data warehouse include:

• Conceptual model: Star and snowflake schemas for spatial attributes.

• Spatially enable components: Spatial measures, spatial dimensions and spatial hierarchies. • Spatial OLAP operations: Operations like Roll up, drill down extended to spatial

predicates [36].

• Efficient query processing:

o Indices and materialized views o Joins and aggregations queries.

Spatial data warehouses have been an active topic of research in the past decade. This is because of the popularity of spatial information such as maps created from the images/information received from satellites has increased tremendously. These data sets are huge and have to be analyzed efficiently to make best use of the information gathered. The research in this field mostly focuses on:

• Spatial multidimensional models: Conceptual models for efficient representation of

spatial data warehouses.

• Materialized spatial indexes: Extension of spatial Indexes to store aggregated spatial data. • Aggregate operations: Various aggregation operations over hierarchies.

• SOLAP: Client applications on spatial data warehouses [19].

Some examples of spatial data warehouse are the US census dataset, EOS archives [54], Microsoft Terra server [17] and Spatial Eye [55].

Figure 2 shows the high level view of the various phases in implementation of a spatial data warehouse Spatial Data Acquisition & Consolidation (ETL) Addition of Temporal features (Optional) Spatial Data warehouse (Spatial Data Marts and Metdata) Spatial Cubes Formation Presentation thorugh SOLAP tools

(4)

Spatial data warehouses have a wide variety of application domains like Logistics, Forecasting, security, detection of environment changes and health monitoring. These involve spatial data which has to be analyzed over various dimensions on multiple resolutions.

1.1. Contributions and Related work

Related work on spatial data warehouses is in the form of:

• Journal papers and conference proceedings [1-45][48-52]: These are the basic resources

for research in the field of spatial data warehousing and current trends.

• Books [46] which explain in details the basic concepts of spatial and temporal data

warehouses.

• Reference book [47] explains the details of terms and terminologies of spatial databases. • Online Encyclopedias like Wikipedia [53] are good source for basic level information

and concepts.

Spatial data warehouses and SOLAP is widely gaining importance due to the capabilities it provides. To our knowledge, there is no recent survey paper in literature on spatial data warehouses. There exists some literature on overview of general concepts of spatial data warehouses [19] [20] [48] but we could not find a recent detailed survey which covers all the important aspects of a spatial data warehouse. In [48] though some aspects of spatial and spatio temporal data warehouses are covered some important concepts like benchmarks and spatial OLAP tools are not discussed at length.

There are many open research issues in spatial data warehouses and we have classified them and presented them in a consolidated manner. We have also analysed trends like

(5)

The below table Figure 2 shows the classification in terms of topics and research literature which exists for spatial data warehouses.

Topic Subcategory Subcategory Papers

Conceptual Models Spatial Multidimensional model [1],[4],[ 5],[12],[23]

Requirements of a conceptual

model for SDW [21]

Mapping of Conceptual model to

Physical schema [3]

Spatio Temporal model [6], [14]

Storage and Querying Indexing Materialized [7],[8],[10]

Spatiotemporal [9], [43], [44]

GIST [27]

Selective Materialization Object-Based [11]

Aggregation General Concepts and Issues Pre Aggregation [12],[28] Geometric

Aggregation

Model [30]

Aggregation Operations [15], [22], [47]

Spatial OLAP ( SOLAP) General Concepts [20], [30], [32]

Tools for SOLAP [32], [34], [35], [36]

Benchmark

[25], [37], [38], [39], [40], [41]

Extension of OLAP cubes [18]

Spatio-temporal DW General concepts and Issues [13], [16], [48] Trajectory Data Warehousing [13], [42], [44]

(6)

The broad classification in terms of concepts and algorithms is presented below in Figure 3.

Figure 3: Broad classification presented as hierarchy tree

1.2. Scope

The scope of this paper is to study the broad concepts in a spatial data warehouse and the research needs. The paper discusses conceptual models, Indices for spatial data warehouses, Materialization and Aggregation over hierarchies, SOLAP and benchmarks. It also outlines the broad research areas in these topics. In addition, the latest emerging trend in this field spatiotemporal data warehouse is explored.

Spatial Data warehouses Index Materialized R*a tree aRtree GIST based Spatiotemporal aRB tree aHRB tree 3 RDB tree Conceptual Model MultiDimER model Other Multidiemnsional models Aggregation Concepts Pre Aggregation Geometric Aggregation Aggregation Operations Distribuive, Algebraic and Holistic Big Cube Aggrregation operations

(7)

1.3. Organization

The paper is organized in the following order: First we give a brief overview of conceptual models for spatial warehouses. Focus will be on existing models. In section 3, the current storage and indexing techniques for SDW are presented as well as future research needs are analyzed. Section 4 describes the important concept of aggregation. Section 5 and 6 cover the SOLAP tools for efficiently querying spatial data warehouses and evaluation

benchmarks in SDWs respectively. Finally, section 7 describes spatio-temporal data warehouses which are the latest trend.

2.

Conceptual models for Spatial Data warehouses

A conceptual model is a representation of the concepts and relationships between them [37]. It is primarily for capturing requirements of the decision making users without worrying about the implementation details. There are a conceptual models existing for relational and spatial data bases but they do not scale well to spatial data warehouses due to presence of hierarchies, aggregations, measures and dimensions.

2.1. Existing models.

There have been various proposals for multidimensional models for spatial data warehouses.

[1] proposes a multidimensional model where measures and dimension are modelled as complex objects. It provides concepts of entity schema and entity instances and uses these to define hierarchies, aggregations and data cube.

MultiDimER model [4] [5] is a conceptual model for spatial data warehouse which introduces concepts like spatial level, spatial hierarchies, spatial measures and spatial fact relationships. It is quite flexible in the sense that it does not require spatial dimensions to present for a spatial fact to exist. It allows representation of real word hierarchies [5] in the model. The basic concepts in MultiDimER multidimensional model are:

• Spatial Level

Spatial levels are levels where spatial characteristics are stored. A topological relationship exists between different spatial levels.

• Spatial Hierarchy

A hierarchy which includes at least one spatial level.

• Spatial dimensions:

Extending the concept of dimensions in a Data warehouse, spatial dimensions are dimensions that have at least one spatial hierarchy. In the model spatial dimensions are of 3 types with the following hierarchies:

o Non-spatial.

(8)

o Fully spatial • Spatial Fact Relationship:

A Spatial fact relationship is fact relationship that requires a spatial join between two or more spatial dimensions.

• Spatial Measures:

Spatial measures are measures are either numerical values calculated using topological operators (Length in the figure below) or Geometries which can be aggregated with the hierarchies.

Figure 4: MultidimER model for Highway maintenance [4]

The MultidimER model in figure 4 has the following attributes, Length is a spatial measure, city and Highway Segment are spatial dimensions as they have spatial hierarchies. An example for the spatial hierarchy here are City and State.

[3] describes the mapping of the MultidimER conceptual model into a physical model. The physical model is implemented in Oracle 10g spatial. The paper discusses the

implementation issues of schemas created using conceptual models. A Spatial level defined in MultidimER model corresponds to a table in a database. The relationships between levels are represented by many to one relationship between tables.

The basic requirements for the design of an effective multidimensional model for spatial data warehouses are described in [21].

We have classified the requirements presented in [21] based on the area they belong to. Figure 5 summarizes our classification:

(9)

Figure

3.

Storage and Indexing

3.1.Indexing

Indices form an important part of a data warehouse spatial or non spatial. If the right index structures are built on columns of dimensions and facts, the performance of queries, especially ad hoc queries is

for extending the index structures which currently exist warehouse.

•Easy to understand but capturing all basic elements •Independence in specification s and Implmentation Simplicity

•Implementation independent

•Flexible in terms of spatia and non spatial attributes Conceptual Model

•Multiple as well as Explicit

•Handle data with different granularities •Irregular spatial hierarchies supported Hierarchy

•Support for thematic & geometric aggregation •User defined aggregates

•Avoid incorrect aggregation

•Dimenensionless and measure less aggregation •Goespatial aggregation

Aggregation

•Handles changes over time

•OLAP operations including drill through and drill across •Handle Uncertainity

Data

Figure 5: Requirements of a spatial multidimensional model

Storage and Indexing

form an important part of a data warehouse spatial or non spatial. If the right index structures are built on columns of dimensions and facts, the performance of queries, especially ad hoc queries is greatly enhanced. There have been some proposals [8] [ for extending the index structures which currently exist to suit the needs of a spatial data

Easy to understand but capturing all basic elements Independence in specification s and Implmentation

Implementation independent

Flexible in terms of spatia and non spatial attributes Conceptual Model

Multiple as well as Explicit

Handle data with different granularities Irregular spatial hierarchies supported

Support for thematic & geometric aggregation User defined aggregates

Avoid incorrect aggregation

Dimenensionless and measure less aggregation Goespatial aggregation

Handles changes over time

OLAP operations including drill through and drill across Handle Uncertainity

nsional model

form an important part of a data warehouse spatial or non spatial. If the right index structures are built on columns of dimensions and facts, the performance of queries,

There have been some proposals [8] [11] [27] to suit the needs of a spatial data

(10)

[27] describes the extension of the Generalized Index search tree [GIST] framework for efficient OLAP queries on a Spatial Data warehouse. GIST provides 2 interfaces to extend. These are the Predicate and gist interfaces. The search algorithm in GIST uses the predicate “Consistent” to find all the leaf nodes which are consistent with query predicate. A new state for this predicate is introduced called “Partial true”. Also, a new search algorithm is proposed for efficient results during a OLAP query.

3.1.1. Materialized Indexing

R*a tree [10] extends the R*- tree for efficient OLAP operations using materialization of the index structure. The paper shows that storing aggregates in the inner nodes of the index tree will improve the response time of OLAP slice and dice queries as the number of

accesses to the secondary memory will reduce. A modified recursive range query algorithm is presented in the paper which uses this pre computation and highlights this will be quite useful in range queries. The results also show that the extra space needed for storing the aggregated data is linear to the size of the structure.

While R*a tree [10] highlights of concepts of storing aggregates in the index it does not consider spatial objects. The aR tree [8] extends along the same idea of materialization of the index by extending R tree for spatial data warehouses. OLAP operations may need a specific hierarchy which is not defined at design time for spatial data. aR tree stores the results of aggregation functions on all the objects stored by each MBR.

The example shown in Figure 6 depicts an aR tree which shows 5 MBRs (a1, a2 … a5) and the COUNT of spatial objects within them.

`

Figure 6: The aR tree [8] The advantages of this approach are:

(11)

• aR tree defines a hierarchy among MBRs that forms a data cube lattice model. This

will give scope for selective materialization of the structure.

• This idea can be extended to storing results of window queries or all other types of

aggregate operators.

While the aR tree is considered quite effective for aggregation queries, the effectiveness it provides gets degraded when the number of dimensions is quite large [37]. The complexity is similar to the sequential scanning when the number of dimensions is significant. [7] presents an implementation and exploration of aR-trees [8] for spatial data warehouses.

3.1.2. Spatiotemporal Indices

Most indexing approaches for spatial data warehouses focus on spatial [8] [10] [27] or temporal indexes [43] [44]. Spatio-temporal data warehouses need the integration of spatial and temporal structures for efficiency.

The aggregate R- B-tree (aRB-tree) [9] is an extended R tree which has a pointer to a B tree which stores historical aggregated data about the MBR. This has been proposed for static spatial dimensions. Figure 7 shows the structure for an aRB tree.

Figure 7: an aRB tree [9]

The Aggregate Historical R-B-tree (aHRB) [9] combines the concept of aRB tree and Historical R tree [HR tree] for indexing of dynamic spatial dimensions. Each node stores the time span to indicate if it is valid or not and when was it valid in history. Other form of entries of a node is similar to the aRB tree. Each time an update happens a new R tree is created at that timestamp. Figure 8 shows the structure of an aHRB tree.

(12)

Figure 8: an aHRB tree [9]

Another proposal for dynamic spatial dimension indexing is the aggregate 3 dimensional RB tree (3RDB-tree) [9] which improves on the limitation of the size of the tree for the aHRB tree. It forms one large R tree for the whole history as opposed to the many small R trees created in aHRB tree. The large R tree stores different version of all the regions in the same tree.

3.2.Selective materialization

The selective materialization of a data cube has been studied in detail and techniques have been proposed for effectively choosing the set of cuboids to materialize [40].

Figure 9: Example lattice with space cost for selective materialization [40]

Figure 9 above shows the lattice model, which forms the key for selective materialization. The edge from node 1 to node 2 (from up to down) shows that the query for node 2 can be answered by the grouping done for node 1. The greedy algorithm proposed in [40] gives the output as the selected nodes to materialize based on space cost minimization.

A B C E D H G F 75 40 10 100 7 20 50 30

(13)

Though the problem of selective materialization extends naturally to spatial data cubes, the difference in the spatial case is that the computation cost. If there is no materialization of the spatial data cubes the online computation becomes very time consuming. This is due to the computationally expensive joins and other operations for spatial data.

In [11] a finer granularity approach is suggested for spatial data a cube which focuses on cell level materialization instead of cuboid level. The approach is called object based

materialization and focuses on selecting a few spatial objects. The selective materialization is based on the relative access frequency of the sets of mergeable spatial regions. The pre computation occurs if they are expected to be accessed frequently. The algorithms they propose assume that pre computation cuboids are already identified by algorithms in [40] or by minor extension to them.

3.3.Research needs

The index structures discussed above focus on materialization of the index structure storing aggregations of spatial measures. Most of the existing work is limited to numerical

aggregations and other simple operations. There is a need to study the materialization of indexes for supporting spatio temporal measures like the direction in which a movement is happening.

The index selection problem is widely known problem in the databases world. The problem extends naturally to spatial data warehouses where efficiency of retrieval is of prime

importance.

The methods proposed for selective materialization of spatial data cubes assume that there exists information about the access frequencies of a set of selected cuboids. Methods need to be proposed which are independent of this assumption.

4.

Aggregation

Aggregation in data warehouses refers to the summarizing of the properties of data over particular dimensions of interest. The most commonly used of these are time and geographic location and applying an aggregation operation of interest to the measure/fact data.

Aggregation over spatial data warehouses refers to computing of the aggregated operations on measures on the union of the areas which are considered for aggregation. An example is computing the total size of a union of a number of areas.

4.1.Aggregation operations and techniques

(14)

Aggregations functions for spatial data have been grouped into three categories Distributive, Algebraic and Holistic [22] [47]. Table 2 describes the grouping of the spatial aggregate operations based on three basic categories.

Aggregate Operations

Data Type Distributive Algebraic Holistic

Set of numbers Count, Min, Max, Sum Average, Standard

Deviation, MaxN() & Min N()

Median,

MostFrequent, Rank

Set of Geometries Minimal, Orthogonal

Bounding Box, Geometric union , Geometric Intersection

Centroid, Center of Gravity and Center of Mass

Nearest Neighbor Index,

Equi Partition.

Table 2: Set of aggregate operations [47]

[15] describes the BigCube model for multidimensional spatial data. They define aggregation operations as additive, semi-additive and Non Additive and describe how these are incorporated in the multidimensional model. The operations defined are listed in Table 3.

Big Cube Aggregate Operator

Type

Additive Semi Additive Non Additive

Count, Min ( Base), Max, Sum (Apex), Concatenate, Convex Hull, Spatial Union , Spatial Intersection

Average, Standard Deviation, Variance ,MaxN() & Min N() Centroid, Center of Gravity, Center of Mass Median, MostFrequent, Rank, LastNonNullValue, FirstNonNullValue, Minimum Bounding Box, Nearest Neighbor, Equi Partition.

Table 3: Aggregate operations for BigCube [15]

4.2. Agggregation concepts

The operators presented in [15] [22] [47] work well with spatial objects but aggregation of spatial measures requires to consider the topological relationships existing between them. This is because of the problem of double counting while aggregation. A building listed as a bowling alley and discotheques would be counted twice under aggregation for entertainment [12].

(15)

[28] deals with this problem and describes the pre-aggregation of spatial measures. The pre-processing of facts is done for computing their disjoint parts. They propose the classification of topological relationships between spatial measures. The

pre-aggregation works if the spatial properties of the objects are distributive over some aggregate function.

The drawback of the approach in [28] is that they do not address forms other than polygons. [30] describes a formal model for geometric aggregation. They define three parts namely algebraic part, geometric part, and the Classical OLAP or Application part each of which maintain separate hierarchies and interact with each other to answer queries. Figure 10 shows an example of the three parts.

Figure 10: Geometric, algebraic and application part [30]

4.3. Research Needs

The Multiple representation problem is widely known problem in spatial databases [52]. The same spatial object may be considered as a point in one application as a polygon in the other. In some other scenario 3 dimensional representation may be followed which considers the object as a cuboid or polyhedron.

Figure 11 depicts the three different representations possible of the same spatial object which may be a building in this case.

(16)

Point Polygon Cuboid

Figure 11: Multiple representation of the same object

The multiple representation problem is particularly problematic in case of spatial data warehouses because of 2 major reasons [12].

1. Aggregation and Consolidation of data from different sources where a different representation is followed.

2. SOLAP operations: While doing an operation like roll up and drill down over hierarchies same level may have different representation for the same object making it difficult to choose one.

Double Counting while Aggregation

Double Counting means incorrect aggregation of measures due to some overlapping property. An example would be the same park being used for a concert and a fair may be counted twice while aggregating the objects classified as entertainment.

The problem of double counting has been addressed in [11] considering topological relationships between spatial measures and only doing aggregations over objects which are disjoint thus avoiding the problems of incorrect aggregation. This is still an open problem due to concepts of multiple representation and topological relationships when the objects are represented in 3 dimensions.

5.

Spatial Online Analytical Processing (SOLAP)

OLAP is an approach to swiftly answer multi-dimensional analytical (MDA) queries [48]. It is a category of decision-support tools often used to provide access in an efficient and intuitive manner to a data warehouse. Some of the examples include Cognos Powerplay, Business Objects and Oracle Express. OLAP tools are not robust to analyze spatial and temporal data. GIS tools are also helpful in analyzing spatial data but still are not good enough to make full utilization of spatio temporal datasets [32]. Therefore, a new approach is to couple of OLAP and GIS functionalities. In this way it will be possible to have decision support tools that are

(17)

better adapted for spatio temporal exploration and analysis of data. These are called Spatial OLAP systems, or SOLAP.

5.1.Concepts

• OLAP supports spatial data but it treats a spatial dimension as any other dimension

and it does not pay attention to the cartographic component of the data.

• Data visualization facilitates better understanding of the structure of the data and

helps in better decision making capabilities [34].

• Maps and graphics do more than make data visible, they can help in driving the

historical data analysis.

• Without a cartographic display, OLAP tools lack an essential feature, which could

help the completion of spatiotemporal exploration and analysis processes [30].

Figure 12: SOLAP is created by combining concepts/features of conventional OLAP &GIS This creates a need for SOLAP which has been defined in [24] as a visual platform built

especially to support rapid and easy spatio temporal analysis and exploration of data following a multidimensional approach comprised of aggregation levels available in cartographic displays as well as in tabular and diagram.

5.2.Tools for SOLAP

In this section, we have summarized the current available tools for SOLAP. SOLAP tools can be divided in three different categories. [32] [34] [35][36].

• OLAP dominant (Business Objects, Cognos, Knosys) which provide means for

aggregation of data.

• GIS dominant which focus on geometric operations.

• Visual data selections or Integrated OLAP and GIS solutions (Geo cube, Sovat). OLAP

GIS

(18)

Figure 13 shows the classification of SOLAP tools into various categories.

Figure 13: An overview of Spatial OLAP tools

6.

Benchmarks for Spatial Data warehouses

Benchmarking is to evaluate or check (something) by comparison with a standard.

o How well is the performance of your spatial data warehouse? o Does it need improvements or is it really good?

To answer such questions, it is critical to assess the warehouse's performance, relative to an achievable "standard" or "benchmark." Every benchmark should have well-defined success criteria. Before creating a detailed benchmark specification, it is important to decide about the most crucial technical requirements of a data warehouse. This helps in focusing the benchmark in those lines.

6.1.Types of benchmarks

There are 2 types of benchmarks:

o Functional benchmarks – These are the standards to evaluate what functions a

system can do.

o Performance benchmarks – These benchmarks helps to determine and compare how

fast the system is.

In past few years several concepts are implemented to improve query processing over spatial data warehouses. Few are indices creation and materialized views. In order to evaluate how efficient these techniques are, different datasets with different properties are used. The benchmarks used for spatial data warehouses query processing should fit spatial data

warehouse evaluation needs. Also, the benchmark should be able to analyze the performance of operations such as spatial roll up and drill down.

SOLAP

TOOLS

OLAP based

Business

Objects Cognos Knosya

GIS based

LGS Group Inc.

Integrated

(19)

6.2.Overview of existing benchmarks for spatial data

The following benchmarks exist for spatial data:

Benchmark Description/ Limitations

VESPA [37] Both of these benchmarks focus on the spatial predicate computation but not aimed at assessing the efficiency of SOLAP operations.

Measuring performance by considering spatial joins [38]

TPC-D Benchmark by Transaction Processing Performance Council for decision support systems. But it does not support indices nor materialized views [39].

TPC-H [39] This provides individual queries that are not known in advance. However, its schema differs from the traditional star schema.

TPCDS - [40] This benchmark is more realistic then the previous ones. It suppresses the schema issue with a snowflake schema, but is aimed at refreshing warehouse with new and changed data originating from the operational side of the business. Star Schema Benchmark

(SSB)[41]

It extends the TPC-H to enable the analysis of historical trends and provides a set of predefined queries to run over its star schema. The SSB’s queries refer to descriptive locations of suppliers and customers. However, the SSB does not hold spatial attributes nor stores maps that would enable multidimensional queries with spatial predicates. Spadawan benchmark [25] Spatial data warehouse benchmark (Spadawan), focuses on

this problem by using predefined spatial hierarchies. Helps to address the query processing performance on spatial roll-up and drilldown operations. It is a performance benchmark.

Table 3 : Various Benchmarks and their limitations

Spadawn [25] is the considered very effective for spatial data warehouse benchmarking as it not only generates SDW datasets composed of points and polygons in spatial attributes but also supports evaluation of different types of spatial queries (SOLAP) that enable the performance evaluation of intersection range queries, containment range queries and enclosure range queries in the spatial predicate. It enables the evaluation of spatial roll-up and drill-down operations.

6.3.Research Needs

There is a need to research on developing benchmarks for the evaluation of:

• Spatial data such as lines, polygons with holes and with islands. • Spatial data generation and SOLAP query processing.

(20)

• Additional SOLAP query types to analyze drill-across operations on extended

SDW schemas.

7.

Trends: Spatiotemporal data warehouses

Data warehousing applications are based on high-performance databases. Many fields deal with the data that has spatial information as well, like address, location. If we integrate the spatial component of the data with the data warehouse, the decision making potential of such organizations grows manifold.

7.1.Introduction to Spatio- temporal data warehouses

Consider the query, “How many objects visit a given area during a given time period?” This query includes both spatial component and time component. While spatial data warehouses look at many types and dimensions of data including the spatial context, there is a need to include the temporal aspects as well. This will allow applications to see hidden relationships and patterns in data.

7.1.1. Challenges

Many applications refer to moving objects and require spatio-temporal modeling for specific analysis. This type of object motion defines a continue variation in space and time which makes it very difficult to handle such huge datasets.

7.1.2. Organization of Temporal data

Two concepts of time are involved in temporal characteristic of geographic entities- World time and System time [13]. World time refers to the time when an entity change take place in reality whereas the system time means the time that records the entity change in database. Depending on the requirement, users might want to use only system time (eg GIS) or both (Data warehouses) which makes it even harder to model two types of time dimensions in the spatio temporal data warehouses.

7.2.Trajectory data warehousing – Tools and techniques

Trajectory data warehousing is a branch of spatiotemporal warehousing. Spatio temporal data cubes are essential to support trajectory data. It should allow analysis along temporal dimensions, spatial dimensions at different levels of granularity (point, cell, road) and thematic dimensions, containing, for instance, demographic data.

(21)

• STAU: A spatio-temporal extension for the ORACLE DBMS. It provides

data management infrastructure for historical moving objects. It is a system extension to Oracle 10g ORDBMS data management infrastructure for historical MODs.

• Hermes[13] :Hermes is a database engine for handling objects that change

location, shape and size, either discretely or continuously in time.The prototype has been designed as an extension of STAU and it supports the demands of real time dynamic applications (e.g. Location-Based Services – LBS).

o It is a robust framework that provides functionality for handling

spatio-temporal data.

o It enables the modeling, construction and querying a database with

dynamic objects that change location, shape and size.

o Hermes provides spatio-temporal functionality to state-of-the-art

Object-Relational DBMS (ORDBMS).

• The GeoPKDD trajectory data warehouse [42]- GeoPKSS is a project which

aims at extracting user-consumable forms of knowledge from large amounts of raw spatio temporal geographic data. Figure 14 below illustrates the GeoPKDD architecture.

Figure 14: The GeoPKDD architecture [44]

Description of the architecture

• At the beginning, location data is captured, and is forwarded to a trajectory

stream manager, which does some preprocessing operations such as, splitting the raw data according to some criteria, providing a trajectory identifier.

(22)

• These trajectories are then loaded into a moving object database (MOD).

• MOD is managed by the Hermes system. Basically, the MOD includes a relation

MOD Trajectories with schema (Oid, trajectoryid, trajectory), where trajectory is of type Moving Point.

• In MOD, appropriate querying and Extract-Transform-Load (ETL) processes

are applied to update the TDW with trajectory information.

• The trajectory data warehouse model mentioned is based on the classic star

schema. It has a standard temporal dimension, and two spatial dimensions.

7.3.Research needs

The motivation of having spatio temporal data warehouses is to utilize valuable

information that can be used for decision making purposes in applications, such as mobile marketing, location-based services and traffic control management.

Trajectory warehousing is an important step in this. It is an invaluable field which has lot of scope. Owing to high scalability of this type of historical data, future research should focus on modeling, aggregation and indexing to improve efficiency in such warehouses.

8.

Future Work

Domain specific application of spatial data warehouses are much talked about the research literature [49][50][51]. Future work in this direction would be classification of literature that exists in specific domain and identification of common concepts in each domain. This not only will give present a broad example of use of spatial data warehouses in a domain but also would give an idea of the core concepts which are applied in each domain.

Other research directions we would like to include going ahead would be 3 Dimensional spatial objects in terms of spatial data warehouses. 3 Dimensional queries on spatial data warehouses may be helpful in domains like urban planning and Disaster management [12]. The topological relationships for 3 dimensional objects would include relationships like INSIDE,

ANYINTERACT [56].

9.

Summary

Spatial data warehouses have been an active area for research over the last decade. Concepts like big data are evolving with a big chunk of spatial information to process, store and analyze. Given the latest trends, spatial data warehouses can be considered as a big part of the future research due to their capability to provide decision making users relevant and concise data. The survey we presented covers the broad topics of spatial data warehouses and overview of trends like spatio temporal data warehouses. The topics include conceptual models, storage and indexing, aggregations and spatial OLAP. For some of the topics we have provided the areas

(23)

where future research is needed. We have also summarized the benchmarks that currently exist and compared and contrasted them for spatial data warehouses.

References

[1] Towards a Spatial Multidimensional Model - S. Bimonte, A. Tchounikine and M. Miquel DOLAP’05, November 4–5, 2005, Bremen, Germany

[2] Modelling multiple representations into spatial data warehouses: a UML-based approach Bédard Yvan, Ph.D, Marie-Josée Proulx, M.Sc, Suzie Larrivée B.Sc., Eveline Bernier, M.Sc.

[3] IMPLEMENTING SPATIAL DATAWAREHOUSE HIERARCHIES IN

OBJECT-RELATIONAL DBMSs Elzbieta Malinowskiand Esteban Zim´anyi [4] Representing Spatiality in a Conceptual Multidimensional model

[5] Spatial Hierarchies and Topological Relationships in the Spatial MultiDimER model? E. Malinowski?? and E. Zimanyi

[6] Multidimensional Model Representing Continuous Fields in Spatial Data Warehouses Alejandro Vaisman, Esteban Zimányi

ACM GIS ’09 November 4-6, 2009. Seattle, WA, USA Copyright 2009 [7] Materialized aR-Tree in Distributed Spatial Data Warehouse

Marcin Gorawski∗and Rafal Malczok

[8] Efficient OLAP Operations in Spatial Data Warehouses Dimitris Papadias, Panos Kalnis, Jun Zhang and Yufei Tao [9] Indexing Spatio-Temporal Data Warehouses

Dimitris Papadias, Yufei Tao, Panos Kalnis, and Jun Zhang

[10] The R*a-tree: An improved R*-tree with Materialized Data for Supporting Range Queries on OLAP-Data - Marcus J¨urgens, Hans-J. Lenz

[11] Object-Based Selective Materialization for Efficient Implementation of Spatial Data Cubes Nebojsa Stefanovic, Member, IEEE Computer Society, Jiawei Han, Member, IEEE Computer Society, and Krzysztof Koperski, Member, IEEE Computer Society

[12] Spatial Data Warehouses: Some Solutions and Unresolved Problems Elzbieta Malinowski∗ and Esteban Zim´anyi

[13] RESEARCH ON THE FRAMEWORK OF SPATIO-TEMPORAL DATA WAREHOUSE WANG Jizhou, LI Chengming

[14] Spatio-Temporal Data Warehouse Design for Human Activity Pattern Analysis L. Savary, T. Wan, K. Zeitouni

[15] Viswanathan, G., Schneider, M.: BigCube: A MetaModel for Managing Multidimensional Data. In: Proceedings of the 19th Int. Conf. on Software Engineering and Data Engineering (SEDE). (2010) 237–242

[16] What is Spatio-Temporal Data Warehousing? Alejandro Vaisman and Esteban Zimanyi

[17] Microsoft TerraServer: A Spatial Data Warehouse Tom Barclay, Jim Gray and Don Slutz

[18] Map Cube: A Visualization Tool for Spatial Data Warehouses S. Shekhar, C.T. Lu, X. Tan, S. Chawla, and R. Vatsavai

[19] Fundamentals of spatial data warehousing for geographic knowledge discovery

Yvan Bédard, Tim Merrett and Jiawei Han

[20] SOLAP: a new type of user interface to support spatio-temporal multidimensional data exploration and analysis S. Rivest, Y. Bédard, M.J. Proulx, M. Nadeau

[21] On the Requirements for User-Centric Spatial Data Warehousing and SOLAP Ganesh Viswanathan & Markus Schneider

(24)

Generalizing Group-by, Cross-tabs and Subtotals. ICDE,1996

[23] Spatial OLAP Modelling: An Overview Base on Spatial Objects Changing over Time GabrieI Pestana Miguel Mira da Silva Yvan BCdard

[24]Bédard, Y., S. Larrivée, M.-J. Proulx, P.-Y. Caron and F. Létourneau. 1997. Geospatial Data Warehousing: Positionnement technologique et stratégique. Rapport pour le Centre de recherche pour la defense de Valcartier (CRDV)

[25] Benchmarking Spatial Data Warehouses

Thiago Luís Lopes Siqueira1,2, Ricardo Rodrigues Ciferri2, Valéria Cesário Times3, Cristina Dutra de Aguiar Ciferri

[26] Efficient OLAP Operations for Spatial Data Using Peano Trees

Baoying Wang Fei Pan Dongmei Ren Yue Cui Qiang Ding William Perrizo [27] Spatial Hierarchy and OLAP-Favored Search in Spatial Data Warehouse

Fangyan Rao, Long Zhang, Xiu Lan Yu, Ying Li, Ying Chen [28] Pre Aggregation in Spatial Data Warehouses

Torben Bach Pedersen and Nektaria Tryfona

[29] Spatial Aggregation: Data Model and Implementation Sofie Haesevoets, Bart Kuijpers and Alejandro Vaisman

[30] Hermes – A Framework for Location-Based Data Management- Nikos Pelekis, Yannis Theodoridis, Spyros Vosinakis and Themis Panayiotopoulos.

[31] Selective Materialization: An Efficient Method for Spatial Data Cube Construction Jiawei Han, Nebojsa Stefanovic, and Krzysztof Koperski

[32] TOWARD BETTER SUPPORT FOR SPATIAL DECISION MAKING: DEFINING THE CHARACTERISTICS OF SPATIAL ON-LINE ANALYTICAL PROCESSING

(SOLAP)GEOMATICA Vol. 55, No. 4,2001, pp. 539 to 555

[33] MacEachren, A. M. and M.-J. Kraak. 2001. Research challenges in geovisualization. Cartography and Geographic Information Science.

[34] Bimonte, S., Tchounikine, A., Miquel, M.: Geocube, a multidimensional model and navigation operators handling complex measures: Application in spatial olap. Advances in Information Systems (2006) 100–109

[35] Scotch, M., Parmanto, B.: SOVAT: Spatial OLAP visualization and analysis tool. In:

Proceedings of the 38th Annual Hawaii Int. Conf. on System Sciences (HICSS), IEEE (2005) 142b

[36] Marchand, P., Brisebois, A., B´edard, Y., Edwards, G.: Implementation and evaluation of a hypercube-based method for spatiotemporal exploration and analysis ISPRS journal of photogrammetric and remote sensing 59(1-2) (2004) 6–20

[37] Paton, N.W., Williams, M.H., Dietrich, K., Liew, O., Dinn, A., Patrick, A.: "VESPA: a benchmark for vector spatial databases", In BNCOD, pages 81-101, 2000.

[38] Günther, O., Oria, V., Picouet, P., Saglio, J., Scholl, M.: "Benchmarking spatial joins à la carte", In SSDBM, pages 32-41, 1998.

[39] Poess, M., Floyd, C.: "New TPC benchmarks for decision support and web commerce", SIGMOD Record, 29(4):64-71, 2000.

[40] Poess, M., Smith, B., Kollar, L., Larson, P.: "TPC-DS, taking decision support benchmarking to the next level", In SIGMOD, pages 582-587, 2002.

[41]O'Neil, P., O'Neil, E., Chen, X., Revilak, S.: "The star schema benchmark and augmented fact table indexing", In TPCTC, pages 237-252, 2009.

[42] Geographic privacy aware Knowledge Discovery and Delivery, Damiani, Vangenot, Frentzos, Marketos, Theodoridis, Veryklos, and Raffaeta (2007)

[43]Kim, J., Kang, S., Kim, M. Effective Temporal Aggregation using Point-based Trees. DEXA, 1999.

[44]Yang, J., Widom, J. Incremental Computation of Temporal Aggregates. ICDE, 2001. [45]Harinarayan V., Rajaraman A., Ullman J. Implementing Data Cubes Efficiently.ACM

SIGMOD, 1996.

(25)

(Data-Centric Systems and Applications) - Elzbieta Malinowski, Esteban Zimányi ; Springer; 1st ed. 2008. Corr. 2nd printing edition (April 6, 2011)

[47]Spatial Databases: A Tour - Shashi Shekhar, Sanjay Chawla ; Prentice Hall 1 edition (June 20, 2003)

[48]Leticia I. Gómez, Bart Kuijpers, Bart Moelans, Alejandro A. Vaisman: A Survey of Spatio-Temporal Data Warehousing ;International Journal of Data Warehousing and Mining 2009 [49]Spatial Data Warehousing for Hospital Organizations : An ESRI whitepaper

[50]Octavio Glorio, Jose-Norberto Mazón, Irene Garrigós, Juan Trujillo - Using Web-based Personalization on Spatial Data Warehouses

[51]Michael McGuirea, Aryya Gangopadhyayb, Anita Komlodib, Christopher Swanc - A user-centered design for a spatial data warehouse for data exploration in environmental research

[52]S. Zlatanova, J.E. Stoter and W.Quak : Management of multiple representations in spatial DBMSs

[53]http://en.wikipedia.org/wiki/Online_analytical_processing

[54]http://terra.nasa.gov/

[55]http://www.spatial-eye.com/Engels/Applications/Spatial-DWH/page.aspx/117

References

Related documents

The Working Group recommends that recently lost funding in a number of NIFA pest management programs (e.g., CAR, RAMP, RIPM and PMAP) be recaptured and restructured into a new

The study, a joint project between BEEF magazine and Kansas State University, is determining the preparedness and willingness of producers to implement the National

The average area of corn grain produced per finished animal was much greater in the Midwest than the Northern Plains, but production values per animal for corn silage,

I think that is why I stated earlier, when we talked about getting groups of people or getting companies together to decide how we can actually use this for emergency management

A more useful approach to reconstructing command line history is to search the memory capture for the signature of the data structures used to store the command history..

This essay also explored a number of subsidiary matters, including the relationship between the adoption of a consumption or wage tax and capital levies, what sort of transition

AUTHOR (year of submission) "Full thesis title", University of Southampton, name of the University School or Department, PhD Thesis, pagination... empower  people  through

The above Proposition points out that the probability of facing a tough authority plays a key role in setting the ranking between the covenant equilibrium and the late monopoly