• No results found

Web-Services as an Instrument for Data Retrieval and Data Mapping - The DIOXIN-POP Database of Germany

N/A
N/A
Protected

Academic year: 2021

Share "Web-Services as an Instrument for Data Retrieval and Data Mapping - The DIOXIN-POP Database of Germany"

Copied!
8
0
0

Loading.... (view fulltext now)

Full text

(1)

Web-Services as an Instrument for Data Retrieval

and Data Mapping - The DIOXIN-POP Database of Germany

Gerlinde Knetsch

1

, Erich Weihs

2

Abstract

The contribution describes a Web Service for environmental data on persistent organic pollutants, POPs for short. This service supports access to these data and their evaluation on the basis of a defined role concept. Different groups of users are granted access to the database according to their rights. The concept includes a service that offers the possi-bility of interactive data input via an XML interface to allow data suppliers from competent Länder authorities, for example, to import additional measurement programmes with metadata and analysis results into the database. The aim of this Web Service is to provide interoperability between different information systems. This concerns, firstly, the integration into metadata information systems of the Federal Government and the Länder (German states) such as PortalU and Bavaria’s Environmental Objects Catalogue (UOK Bayern). Secondly, it seeks to promote net-working among such Web Services through Internet search engines such as Google. Technical implementation is based on the W3C (World Wide Web Consortium) standard.

1.

Introduction

Based on the fact that environmental data are used in an interdisciplinary fashion by different groups of users in science, politics and the public, this contribution presents a flexible solution for data input, quali-ty assurance, processing and data search. The Dioxins-POPs Database, a Web Service of the German fe-deral government and the German Länder, provides the conditions necessary for the provision of metada-ta and results from monitoring programmes esmetada-tablished for different spheres of the environment, health and consumer protection. It is thus an online service providing easy and timely access to the data base as required by the German Environmental Information Act.

The role of metadata grows with increasing integration, and these metadata are a key element in the creation of Web Services. For it is especially this descriptive information that is an important basis for users - who are no longer in direct personal contact with data suppliers - when interpreting and evaluating measured environmental parameters. This information is a prerequisite for integration with other metadata and environmental information systems and for the creation of a “network” of environmental and health data. The concept of a service-oriented architecture has an important role to play here, but also makes hi-gh demands on database quality and therefore, on quality assurance procedures.

The development of the Web Service is a joint project designed by the Federal Environment Agency, Berlin, the Bavarian State Ministry of the Environment, Public Health and Consumer Protection, and the Federal Office of Consumer Protection and Food Safety, Berlin. Technical development was in the hands of the Bavaria-based company deborate GmbH.

2.

Integration of the Dioxins Database into the Web Services

The use of existing applications in overall E-Government systems is a general ambition from the strategic point of view. But what exactly does this mean? Do all systems have to be newly developed? Should all

1 Gerlinde Knetsch, Federal Environment Agency, Wörlitzer Platz 1, D-06844 Dessau, Germany. Internet:

www.umweltbundesamt.de [email protected]

2 Erich Weihs, wedata, Insterburger Strasse 7, D-81929 Munich, Germany.

[email protected]

EnviroInfo 2007 (Warschau)

Environmental Informatics and Systems Research

Copyright © Shaker Verlag, Aachen 2007. ISBN: 978-3-8322-6397-3

(2)

data / process controls be transferred into a single database or accessed via a single, obligatory central interface layer?

Such a centralised solution is not in keeping with the experience gained with integration systems und would not be feasible, neither financially nor institutionally, in the given circumstances. The Dioxins Da-tabase is operated by the Federal Environment Agency (Umweltbundesamt, or UBA for short). Users and data suppliers are the UBA as well as independent federal institutions (e.g., the Federal Office of Consu-mer Protection and Food Safety (BVL)) and the relevant Länder authorities. Not only professional users, but also the public has the right to have access, based on the German Environmental Information Act (UIG). A centralised solution would be technologically outdated today, since the order of the day is gene-ral networking of largely autonomous systems via Web Services within a service-oriented architecture

The term “Web Services” as used at relevant government agencies is often understood to mean general access, of any kind, to information (“Services”) on the Internet (“Web”). As we understand it, we use the term for the following cases:

1. In a narrower sense, as defined by W3C and other relevant standardisation bodies (Web Service): The “nature of a Web Service” is not focused on websites which can be viewed in one’s browser, but on global interoperability between (information) systems. It is a communication protocol based on http, XML and SOAP, and these protocols are extended in particular by a formal description language for se-rvices, the “Web Service Description Language [WSDL 2003]. The “Usage Scenarios” list three aspects (“view-points”) that illustrate quite well what WSDL means:

"The Web Service description defines a contract that the Web Service implements. The description language is used by tools to generate proper stubs.

The Web Service description captures information that allows one to reason about them semantically." 2. Web Services that are not compliant with W3C (the normal case today), but can in principle be transformed into such (Community Profiles and “tolerant interfaces” [ISO 19115]. These services are go-od examples of highly flexible solutions. The ISO standard defines no less than 409 attributes for metada-ta, of which, however, there are only 22 which are counted among the “core metadata” and of these, only seven are mandatory.

In Annex C "Metadata extensions and profiles", all of these attributes are termed "generic", and users are invited to add metadata elements and entities or overwrite existing attributes, for example by adjusting permitted domain ranges. This possibility has been used extensively for the metadata of the Dioxins Da-tabase.

3. Applications which are accessible over the Internet (general case, “Web” services) with which da-ta can be imported, checked, extracted and processed.

The realisation differs from the ideal textbook case and from interface to interface. It depends primari-ly on the conditions described in Chapter 3. For example, public search engines such as Google do not have a Web Service interface compliant with the W3C and ISO standard.

3.

Conditions and requirements for Web Services

In choosing suitable services for the Dioxins Database, we had to start out from the specific data situation and requirements.

3.1

Data Model

The dioxins data are stored in a relational database under ORACLE 10g. The most important components are the master data – descriptive information on the various measurement programmes. These data are

(3)

compartment-specific and describe administrative and technical characteristics such as information on the measurement programme, sample collection and treatment, and analysis. They also include information on data protection and on the quality of the analytical parameters measured. The analysis results have a reference to space and time – the basic prerequisite for depicting pollution trends in space and time.

3.2

Present Users

Due to the classical client-server architecture (Access -Front-END – ODBC – RDBMS), the application was used mainly by professional users at the Federal Environment Agency (UBA) and the Federal Institu-te of Consumer ProInstitu-tection and Food Safety (BVL). Other professional users, particularly data suppliers from Länder authorities, had no direct access to it.

3.3

Environmental Information Act

In order to come closer to meeting the obligation to actively inform the public in particular, the use range of the Dioxins Database has to be increased through the following requirements:

− Integration into meta-information systems of federal and Land authorities. If the meta-information systems mentioned make up the essential “ingredients” of an integration layer, and Web Services are the “steam pot”, so to say, that was still missing in the 1990s, then we still needed a key level: The transfer of integratable metadata into the status data themselves: “The required access to the data itself has been realised in a second stage that takes account of legal use issues”.

− Direct on-line access to the technical data in accordance with the Environmental Information Act, i.e. not only to metadata and secondary statistical analyses.

− A Web Service interface to read the data

− The interface to PortalU, the German Search Engine for public environmental data

− A closer “proximity” to the technical data already when searching metadata. Since it is only possi-ble to search information that is covered by the metadata, a very detailed description is necessary already at that point.

− Extension by adding public access via search engines such as Google, to offer an access not only via the own systems.

3.4

Quality assurance and flagging

Data provision via a Web Service makes higher demands on the quality of the data covered and made available via the Web Service. This is true for both metadata and analytical measurement data. The reli-ability and validity of the measurement data are closely related to the method used to generate the rele-vant value, including the methods used for sample collection and treatment, analysis, and validation. Flags are used to mark the value as an outlier for example, or as uncertain due to an insufficient popula-tion of individual values.

3.5

Data protection, security and release

A data protection concept regulates the protection of personal and sensitive data. In particular, it covers the protection of confidential business information, particularly in the case of emissions data, and protec-tion of site data in the case of soil samples. The release status of the dioxin data according to their confi-dentiality level is the responsibility of the data supplier and the quality assurer. It is ultimately the quality assurer who controls the release process. Only data suppliers have writing access to his data.

(4)

4.

Technical implementation

The structure of interfaces is mainly determined by the requirements of the online search engine such as PortalU or Google, etc. The online-controllable release process is not dealt with here, since it has to meet familiar requirements and its implementation only has to satisfy “small craft” criteria.

Searching with public search engines requires an index to be produced in the search engine. This me-ans that search engines would first have to search through the Dioxin Database to produce their indices. At the same time, the engine would have to prepare an abstract which would later appear in the list of re-sults concerning the criteria of retrieval by the user. Since the database is relational, the engine would have to follow a predefined search process, in XML we would have a tree structure in the Document Ob-ject Model (DOM) with a defined path. Even when leaving aside all concerns concerning data protection, etc., this would be impossible for the simple fact that Google does not have such an interface, not to men-tion a standardisamen-tion in relamen-tion to other search engines. The procedure chosen is presented in Figure 1. Why it was chosen results from the following points:

Figure 1: Simplified schematic of the UOK Web Service interfaces to the metadata of the Dioxins Database

The base is a XML Data Model and a XML schema of all transmitted data. The XML schema permits an exact definition of data types including permitted domain ranges and the derivation of own data types.

Exact data model definitions at the interface are the price for convenience and reliability. But these exact definitions are also necessary from a semantic viewpoint.

Similar tasks have already been tackled successfully in several cases, at least for metadata, such as in the Environmental Objects Catalogue (UOK) or the integration of [OGC] Catalog S ervices and [ISO 19115and ISO19119] into GeoIS.Bund. However, the metainformation of the Dioxins Database far exce-eds the level of detail covered by these examples. The task was tackled by transforming the relational data model into an XML data model. The XML tree is determined by the master data of the compartment and its individual data. The XML data model is supplemented by linked address trees (e.g. laboratory, sample collection, owner of the data, measurement programme, etc.).

(5)

The data model follows the requirements of ISO 19115. A set of metadata is thus available for further processing. Whether these are “true” metadata is a matter of the perspective from which these data are viewed: For someone interested in analytical data, the pH value of a measurement is a datum while from the measurement viewpoint it is the metadata of a chemical-physical process. The situation is similar with address data: for an address broker, they are data and according to ISO 19115, they are a metadata, etc. Each object is given a more detailed description and a spatial and temporal reference, as far as available.

Figure 2: Screenshot from the dioxin metadata presentation in the Environmental Objects Catalogue (UOK)

Each object receives a link (parametric servlet request) into the Dioxins Database to the special data being searched. It is thus not necessary for the retrieval process to be repeated in the Dioxins Database. This provides an interface that enables a direct leap into the Dixoins Database and ensures that always the most current data are shown (in contrast to index data which are dependent on the search engine’s last crawler run).

For further processing, we have imported the metadata into the Bavarian Environmental Objects Cata-logue (UOK) and made them searchable. The import process eliminates redundancies in terms of ad-dresses and other metadata. With this, a detailed search can be made of Bavarian dioxin data, e.g. by sample collection, analysis, compartment, etc.

Each object (document) of the UOK has a unique address through which it can be searched on the In-ternet (e.g.. http://www.uok.bayern.de/portal/view/uok-1169517708924-75933.htm, after Figure 2). The address appears like an htm page, but conceals the request to the UOK data base. Since this is a data-base request, data can be searched as soon as they are stored in the datadata-base. This is not generally the case with search engines, as their up-to-dateness is determined by the moment they are indexed. The html page is generated the moment the request is made via server-provided stylesheets. These take into account different data protection requirements, e.g. reduction of the 7-digit Gauss-Krüger coordinates in the case of soil samples to prevent identification of the precise location - which would constitute personal data ac-cording to current legal opinion. Also, the link to the Dioxins Database is generated from the metadata at this time.

(6)

The UOK is linked to PortalU via a server interface (g2k) which can be taken to be a simplified SOAP interface. The search query to PortalU is routed to UOK via a server interface. The search takes place the-re and the the-result in the form of a list is sent to PortalU via the interface. The list of the-results is shown in Por-talU along with other results. Clicking on any of the results produces the relevant metainformation from the UOK in current and detailed form (in contrast to indexed data), as shown in Figure 2. In addition to geographical information, Figure 2 also shows the metainformation excerpt. The link under “Online Da-tenbezug” takes the user to the document or the dataset in the Dioxins Database and – depending on his defined role - to detailed further information.

The metadata have been prepared for a Web Service according to ISO 19119. At present, a map-based or gazetteer-based spatial search is possible, as shown in Figure 3.

Figure 3: Spatial search of dioxin data following the step shown in Figure 2

5.

Outlook

Environmental awareness is growing again among the public, and not only because of climate change. Through environmental information legislation and the Freedom of Information Act, the legislature in Germany has already taken account of the resulting need for environmental information. The initiative for the release of information and data goes back to activities of the European Union. Relevant professional and technical standards have been defined by industry (ISO) and the EU (e.g., Inspire, reporting obliga-tions, Water Framework Directive, etc.). Future IT developments have to take these standards into acco-unt. It must be noted, however, that in many cases a paradigm change – a change towards active provision of information - must take place within the administration. The public’s data interpretation competence is being denied too often.

With the Web Service presented here, a first step towards the publication of environmental data under the Environmental Information Act has been taken. Taking a pragmatic perspective, we did not follow the

(7)

pure theory of Web Services but oriented ourselves to existing possibilities of integrating the existing ap-plication into Web Services on the basis of cost-benefit considerations.

On the technical side, the future development will focus on the following aspects:

• improving existing graphical evaluation possibilities and reporting

• support of SOA Architecture with Web Services interfaces according to specification of the W3C and ISO Standards

• provision of a map service interface for integration into map services such as that of the UOK

• provision of a W3C-compliant interface to PortalU

• provision of services for specific specialised information systems such as the Federal Soil Infor-mation System (B-BIS)

• On the professional side, it is planned to expand the database in keeping with the POP Convention.

• inclusion of further POP substance groups

• extending the range of users and provision of data by further data suppliers

Bibliography

Koordinierungs- und Beratungsstelle der Bundesregierung für Informationstechnik in der Bundesverwal-tung (KBSt) (2006): SAGA Standards und Architekturen für E-Government-Anwendungen

http://www.kbst.bund.de/nn_836802/Content/Standards/standards.html__nnn=true access 02.05.2007

Deborate GmbH (2006): Moderne Webarchitektur mit rollenspezifischer Datennutzung

http://www.deborate.de/home/deborate/de/branchen/public_sector/success_stories/dioxin_db.html access 02.05.2007

BMU/UBA (2003): 3. und 4. Bericht der Bund-Länder-Arbeitsgruppe DIOXINE http://www.umweltbundesamt.de/chemikalien/dioxine-dbla.htmaccess 02.05.2007

Bandholtz, T. (2004): Machbarkeitsstudie Integrationsschicht Umweltbeobachtung, Im Auftrag des Umweltbundesamtes, unpublished

Bandholtz, T. (2004): A General Communication Infrastructure for Sharing Environmental data and Ser-vices, In: Proceedings of the 18th International Symposium, Informatics for Environmental pro-tection: Sharing. October 2004, CERN Graz p. 324-330

Knetsch, G., Weihs, E. (2004): Development of a Web-Service for the DIOXIN-Database of Germany, In: Proceedings of the 18th International Symposium, Informatics for Environmental protection: Sharing. October 2004, CERN Graz p. 413-418

Wiersman, G.B. (2004): Environmental Monitoring. Boca Raton; London; New York; Washington; ISBN 1-56670-641-6; 733 p.

Brüders, N., Weihs, E., Pöschel, R. (2005): Web-Service mit XML-Technologie für die Dioxin-Datenbank des Bundes und der Länder, Präsentation Cebit 2005 – Public Sector,

www.egovernment- wettbewerb.de/site/upload/Fruhere%20Wettbewerbe/5.%20Wettbewerb/Umweltbundesamt-DioxinDB.ppt access 02.05.2007

Knetsch, G., Bandholtz, T. (2006): Integration von heterogenen Umweltdaten, In: Proceedings des Work-shops des Arbeitskreis „Umweltdatenbanken“, UB - Text 11/06. Umweltdatenbanken und Net-zwerke ISSN 0722-186X

Weihs, E. (2006): Der Nachweis von »Umwelt« Daten mit dem Umweltobjektkatalog UOK in: DVW Nachrichten, Heft 1, 58 Jg.

(8)

Weihs, E. (2006): Zur Anwendung “intelligenter“ Suchmaschinen zur Vermittlung von Umweltdaten In: Proceedings des Workshops des Arbeitskreis „Umweltdatenbanken“, UBA-Text 11/06. Um-weltdatenbanken und Netzwerke ISSN 0722-186X

Figure

Figure 1: Simplified schematic of the UOK Web Service interfaces to the metadata   of the Dioxins Database
Figure 2: Screenshot from the dioxin metadata presentation   in the Environmental Objects Catalogue (UOK)
Figure 3: Spatial search of dioxin data following the step shown in Figure 2

References

Related documents