e-gnosis E-ISSN: Universidad de Guadalajara México

(1)

e-gnosis@cencar.udg.mx Universidad de Guadalajara México

Vargas S., Genoveva; Zechinelli M., José L.

Spatial data integration from distributed and heterogeneous sources e-Gnosis, núm. 2, 2004, p. 0

Universidad de Guadalajara Guadalajara, México

Available in: http://www.redalyc.org/articulo.oa?id=73000201

How to cite Complete issue

More information about this article Journal's homepage in redalyc.org

Scientific Information System Network of Scientific Journals from Latin America, the Caribbean, Spain and Portugal Non-profit academic project, developed under the open access initiative

(2)

ISSN: 1665-5745 - 1/6 - www.e-gnosis.udg.mx/vol2/art1

SPATIAL DATA INTEGRATION FROM DISTRIBUTED

AND HETEROGENEOUS SOURCES

I

NTEGRACIÓN DE DATOS ESPACIALES A PARTIR DE FUENTES

HETEROGÉNEAS Y DISTRIBUIDAS SPIDHERS Project

Genoveva Vargas S.1_{, José L. Zechinelli M.}2

Genoveva.Vargas@imag.fr / zechinel@mail.udlap.mx

Recibido: diciembre 4, 2003 / Aceptado: enero 30, 2004 / Publicado: febrero 18, 2004

ABSTRACT. This paper discusses challenges associated to the integration of spatial data and presents the approach adopted in the project SPIDHERS3_{(SPatial data Integration from Distributed and HEteRogeneous Sources) based on mediation techniques,}

efficient retrieval, analysis, and data mining. Finally, the paper shows the way technology produced in SPIDHERS can be used for integrating, analyzing, and making decisions using data of the Popocatépetl, volcano located in the region of Puebla and México City.

KEYWORDS: Electron energy spectrum, Michel parameter, Accelerators of high energy, mass effect.

RESUMEN. El presente trabajo discute los retos asociados a la integración de datos espaciales y presenta el enfoque adoptado en el proyecto SPIDHERS (SPatial data Integraction from Distributed and HEteRogeneous Sources – Integración de Datos Espaciales a partir de Fuentes Heterogéneas y Distribuidas) basado en técnicas de mediación, recuperación, análisis y extracción eficientes de datos. Finalmente, este trabajo muestra la forma en que se puede utilizar la tecnología producida en SPIDHERS para integrar, analizar y tomar decisiones utilizando datos del Popocatépetl, volcán ubicado en la región de Puebla y la Ciudad de México.

PALABRAS CLAVE: Espectro de energía del electrón, parámetro de Michel, aceleradores de alta energía, efecto de masa.

1. Introduction

Today's computerized and globalized world offers the possibility to have an analytical and universal vision of the evolution of environmental, administrative, economical, and social situations through the access of heterogeneous information sources. Data stored in such sources report the emergence of environmental events (e.g., temperature changes, avalanches, river flow growth, and volcano eruptions) of their causes and consequences. For these reasons, in the last few years, the use and integration of spatial or geographical and environmental data has turned out to be a crucial requirement for applications providing solutions for transport problems, environment, economic development, urban planning, and decision making in disaster situations.

1_{Laboratory Logiciels Systèmes Réseux, LSR-IMAG, 681, rue de la Passerelle, BP. 72, 38402 Saint Martin d'Hères Cedex,}

France -www-lsr.imag.fr

2_{Centro de Investigación en Tecnologías de Información y Automatización–UDLAP, Sta. Catarina Mártir. Cholula, Puebla. C.P.}

72820. México - www.udlap.mx/~centia

3_{The work presented in this paper is conducted in the SPIDHERS project supported by the Franco-Mexican Laboratory on}

(3)

ISSN: 1665-5745 - 2/6 - www.e-gnosis.udg.mx/vol2/art1 Within this globalized context, data must be shared and exchanged for building applications that enrich them and generate new ones. Technology (e.g., networks, distributed processes, federated database systems) necessary for building information systems on top of heterogeneous and autonomous sources is available. However, for spatial or geographical data integration, semantic heterogeneity of data stemming from different sources has to be considered (e.g., data can be defined under different geospatial models). Without knowing data semantics, it is impossible to use them and integrate them properly.

Few works have tackled spatial or geographical data integration taking into consideration their semantics. The reason comes from the difficulty that implies identifying and representing semantic aspects associated to a set of data or to a particular application. This paper presents SPIDHERS an infrastructure that provides mechanisms for integrating, exploiting, and visualizing geographical data considering semantic aspects. Such an infrastructure is validated for the integration and exploitation of geographical and environmental data about the Popocatépetl, an active volcano in the region of Puebla and México City.

The remainder of the paper is organized as follows. Section 2 describes problems and solutions associated to the integration of geographical data. Section 3 introduces the approach adopted in SPIDHERS for contributing to the problem of transparent exploitation and maintenance of geographical data stored in distributed sources. Finally, Section 4 concludes the paper; it discusses originality and main contributions of SPIDHERS.

2. Problems and solutions

Integration of geographical data. Structural and semantic heterogeneity are the major issues in data integration. Structural heterogeneity refers to differences between sources in terms of model and data structures. Semantic heterogeneity denotes differences in the meaning and the interpretation of the same data unit. To overcome these problems, schemata, and data integration techniques are used [4, 2, 8]. Technology has been proposed for supporting spatial and geographical data integration and management [1,

3]. However, the majority of existing solutions do not consider spatial data semantic issue. Some works [6,

7, 10] have tackled data integration issues by translating data models supported by different geographical information systems (GIS).

Geographical meta data standards. The Federal Geographic Data Committee (FGDC) proposes the Content Standard for Digital Geospatial Metadata, (CSDGM). The standard defines meta data categories, for example, information about the identity, quality, version, geospatial reference system, the organization that acquires data. The U.S. National Committee on Digital Cartographic Data Standards (NCDCDS) proposes a standard for the transference of spatial data (Spatial Data Transfer Standard, SDTS [12]). The standard describes data quality in terms of alignment, precision, and associated descriptive attributes and logic consistency. Other organizations such as the International Cartographic Association (ICA) extend SDTS with temporal and semantic categories [11].

Exploiting data. Requirements associated to data analysis (biological, environmental, geographical) introduce challenges with respect to querying techniques. Data analysis is a complex querying process that involves several queries. Current proposals concerning data warehouses mediation are not well adapted to

(4)

ISSN: 1665-5745 - 3/6 - www.e-gnosis.udg.mx/vol2/art1 analytical query processing where results are not obtained directly from one query and where the objective is to extract information without a priori knowledge.

• Querying. Important families of database query languages have been defined that simplify documents querying and mediation on the WEB. Proposed solutions are related to partial results [13], dynamic queries refinement [5], and knowledge or information discovery. Other languages oriented to non specialists have also been proposed. For example, LDAP4 adapted to perform searching operations on directories. Independently of the language, query evaluation techniques must be integrated. Some of them were proposed in the 80's, a detailed state of the art can be found in [9].

• Analysis. Data mining provides methods for identifying patterns, regular data or implicit knowledge. Ideas and techniques proposed in different domains (databases, statistics, automatic learning, and artificial intelligence) are combined for achieving information (knowledge) discovery. Methods used in data mining can be (i) descriptive, to provide information about general properties of data; (ii) predictive, to make inferences about data (classification, prediction, diagnosis) and support decision making.

• Visualization of geographic information (virtual maps) results from the interpretation of data base contents according to specific application needs. Map generation depends on the context (i.e., user requirements) and on the platform they use for visualizing data.

3. SPIDHERS: overview of our approach

The framework SPIDHERS provides tools to support spatial or geographical data mediation. Geographical data mediation is specified taking into account the semantics of different data representation models and application requirements. SPIDHERS tools support meta data management, indexing, and exploitation described in the following.

Meta data exploitation adapted to geographical data. Data semantics description must be managed (stored and maintained), accessed by data integration tools and enriched for supporting data analysis.

In SPIDHERS we assume that data are exploited according to different topics that describe application contexts: government, geographical, environmental, social, and economical analysis. Different types of annotations are represented by meta data associated to geographic data. Geographical data exploitation is based on a meta data type taxonomy that provides a meta data classification. Such taxonomy also characterises views on geographical data associated to different use profiles (e.g., cartographer, geographer, civilians). This approach is based on the idea that it is useful to group objects according to a topic (rivers, urban). A topic describes a "view'' on data. For example, in the context of civilians’ protection three views on geographic and environmental data are considered: Threat, process or event produced by a natural phenomenon (material and personal lost). Risk is possibility that a particular threat comes up. Disaster, is a social phenomenon that happens when a community is altered, in many cases, due to natural events or technology failures. For those applications managing contingency plans in case of disasters, highly risky zones are identified and marked in existing maps. The result is a view of existing data (the map) enriched

4_{LDAP, Lightweight Directory Access Protocol, is the standard on directories proposed by the IETF (Internet Engineering Task}

(5)

ISSN: 1665-5745 - 4/6 - www.e-gnosis.udg.mx/vol2/art1 with new information.

Indexing. The nature of geographical and environmental data (i.e., volume, diversity, and associated contexts of use) impacts the complexity of their associated models and their management. In fact, geographical data are characterized by different dimensions: location with respect to a spatial referential, granularity, geometric characteristics (lines, points, polygons). Efficient access to geographical data is a key aspect for supporting decision making based on these data, particularly, within those contexts where time constraints are important. For example, in situations such as the exploitation of a volcano, river overflows, traffic jams in railways and highways. SPIDHERS takes into account such aspects and provides configurable multidimensional indexing tools for geographical data.

This approach enables the identification of queries that can be answered only by accessing meta data. This is interesting in terms of query response efficiency but also in terms of communication and access cost. In general, accessing geographical data is not temporally and economically cheap. In those cases where sources access is necessary, meta data based indexes can support efficient geographical data retrieval. Exploitation. SPIDHERS uses a set of techniques adapted for geographical data analysis, planning, and decision making (agents, data mining, data warehouse). The objective of geographical data analysis is to aggregate them5and define analysis criteria. For example, growth ratio of the population living around the

Popocatépetl zone, in the last five years. Therefore, we use topological, quantitative, statistic, and generalization methods.

Visualization of environmental and geographical data is done through the dynamic generation of cartographies. Cartography generation and its visualization is supported by meta data. Given a set of predefined requirements, data necessary to generate the cartography are searched, retrieved, integrated, and aggregated. Rules describing the semantics of these operations are also represented by meta data. A cartography is the result of a set of processes that integrates an image with a message in response to individual or social requirements taking into consideration cost and the visualization context. Building a cartography implies converting multidimensional properties of the spatial distribution of a phenomenon into an appropriate and synthetic representation. Geographical information transformation involves selection, classification, and simplification, exaggeration, and symbolization processes. SPIDHERS couples data and meta data for producing and visualizing analysis results as graphics, tables and digital cartographies. Meta data represent descriptive aspects, aggregation and analysis operation types that can be executed on data. They also represent rules that associate default representations given specific objectives defined by the application context.

4. Experimentation:

Popocatépetl

At the time, institutions responsible of studying natural risks in México manage independently different types of information about the Popocatépetl (images, geological measurements, description dangerous zones, evacuation paths). Analysis, prevention and the organization of contingency plans are descriptive information, stored in personal computers with restricted access. The description of risky zones, location of monitoring centres, shelters, and evacuation paths are specified on maps of the volcano region which are managed by the CENAPRED.

5

(6)

ISSN: 1665-5745 - 5/6 - www.e-gnosis.udg.mx/vol2/art1 Analysis and decision making. The objective of the analysis and decision making around data of the

Popocatépetl is to determine procedures to be undertaken to evacuate 200,000 inhabitants (distributed around 50 cities) living in highly and moderate risky zones. An evacuation plan specifies how to drive the population from a highly risky zone to a less risky one in the fastest and most efficient way. In order to do so, it is necessary to estimate the availability of (i) transport infrastructure (highways and roads) for defining evacuation routs; (ii) vehicles available for transporting people; (iii) shelter, provisions and services. SPIDHERS contributes with data management tools that support the observations, analysis, and reasoning on phenomena produced by the volcano activity.

It is only through the integration of geographical, environmental, social, and economic data that accurate analysis and decision making can be done. Meta data captures experts’ knowledge about these data (identification of risky zones). They also represent information requirements of experts needing to exploit data in order to derive evacuation strategies. Having integrated data and meta data provides a general vision of the phenomenon, of its implications and possible solutions.

Evacuation plans. In SPIDHERS data mining technology is used for ''discovering'' those social, economic, environmental, and geographical data involved in evacuation plans. Prediction and visualization methods are used for characterizing inherent properties of geographic data.

Contingency plans and digital cartography. Contingency plans deduction is done using geographic data, environmental, and physical data. Plans are materialized in cartographies or textual documents giving more or less details according to their audience. SPIDHERS explores the generation of virtual maps using meta data (symbols, colours, visualization rules associated to geographical data).

5. Conclusions

Current solutions with respect to geographical data processing and exploitation are based on the definition of data models and meta data standards that model aspects such as: geographical and spatial models, scale, geometry, measure systems. However, the number of standards (public and private) reintroduces the problem of integration given the requirements of transparent access. Furthermore, application requirements evolution and the need to manage geographical information in order to support decision making, introduce the need of mechanisms that can support analysis and visualization of great volumes of geographical data. The work proposed in SPIDHERS contributes to answer to information needs, providing general solutions that can be adapted to different application contexts. The main contribution is to propose a strategy that retrieves and generalizes existing solutions and integrates them in the same framework. Such mechanism supports the construction of applications oriented to the exploitation of geographical data coming from heterogeneous and distributed sources.

Referencias

1. O. Balovnev, M. Breunig, A. B. Cremers, and S. Shumilov. Geotoolkit: Opening the access to object-oriented geodata stores. In International Conference on Interoperating Geographic Information Systems, Santa Barbara, CA, december 1997.

2. C. Batini, M. Lenzerini, and S.B. Navathe. A comparative analysis of methodologies for database schema integration. ACM Computing Surveys, 15(4), december, 1986.

3. C. Behrens, C. Shklar, N. Basu, N. Yeager, and E. Au. The goespatial interoperability problem: Lessons learned from building the geolens prototype. In International Conference on Interoperating Geographic Information Systems, Santa Barbara, CA, december 1997.

(7)

ISSN: 1665-5745 - 6/6 - www.e-gnosis.udg.mx/vol2/art1 5. J. Chen, D.J. DeWitt, F. Tian, and Y. Wang. Niagaracq: A scalable continuous query system for internet databases. In

Proceedings of SIGMOD, Dallas Texas-USA, 2000.

6. M. Gahegan. Accounting for the semantic differences between various geographic information systems. In International Conference on Interoperating Geographic Information Systems, Santa Barbara, CA-USA, december 1997.

7. M.F. Goodchild. Geographical data modelling. Computers and Geosciences, 18(4), 1992.

8. Z. Kedad and E. Métais. Dealing with semantic heterogeneity during data integration. In Proceedings of the 18th International Conference on Conceptual Modeling (ER'99). Springer, 1999.

9. D. Kossman. The state of the art in distributed query processing. ACM Computing Surveys, september 2000.

10. W. Kuhn. Defining semantics for spatial data transfers. In Proceedings of the 6th International Symposium on Spatial Data Handling, Edinburgh, 1994.

11. J.L. Morrison. Spatial Data Quality, chapter Elements of Spatial Data Quality. Elsevier Science Ltd, Oxford, 1995.

12. National Institute of Standards and Technology. Federal Information Processing Standard Publication 173 (Spatial Data Transfer Standard Part 1, Version 1.1),. U.S. Department of Commerce, 1994.

13. J. Shanmugasundaram, K. Tufte, D. Dewitt, J. Naughton, and D. Maier. Architecturing a network query engine for producing partial results. In Proceedings of the Third International Workshop on the Web and Databases, WebDB 2000, Dallas, Texas-USA, 2000.