• No results found

Linked Data Management and INSPIRE Compliant Data Export

N/A
N/A
Protected

Academic year: 2021

Share "Linked Data Management and INSPIRE Compliant Data Export"

Copied!
37
0
0

Loading.... (view fulltext now)

Full text

(1)

Dimitris Kotzinos

FORTH-ICS & University of Cergy - Pontoise

Yannis Roussakis, Kyriakos Kritikos, Athina Kritsotaki, Martin Doerr FORTH-ICS

Linked Data Management and

INSPIRE Compliant Data Export

(2)

Data integration

Data retrieval in a uniform way

Data export to other formats

(3)

The

Web of data

becomes a reality by connecting so far

isolated islands or data silos such as

– Enterprise silos (Disparate DBMS Engines & Development

Frameworks)

– Social Media silos (Social Networking Services, Discussion Forums,

Blogs, Wikis etc.)

– Scientific silos (in biology, physics, earth sciences, etc.)

Linked Open Data

(LOD) is a way of publishing data on

the (Semantic) Web that:

(4)

Data Integration Layer

• Abstraction layer for data access

abstract the applications from the

specific setup of the data management service (such as local vs. remote,

federation, and distribution)

• Beyond Data Access • Enabling automation of

discovery, composition, and use of datasets

• Data Markets

• Online Visualization Services • Data Publishing Solutions • Data Aggregators

• BI / Analytics as a Service

Linked Open Data as Service

4 November 26, 2013 Query Engine Rel DB Rel DB files Excel files files XML files

A P I Query Update Import Export

Linked Data

Extensible Application Pool

Visualization Collaboration Cross Data sets' Querying

(5)

Query results viewed as:

– RDF/XML – JSON – GML – KML – Shapefiles

Transformation services to user provided

(6)

In this project we deal with scientific data described

in different formats and notions from diverse areas

– ground waters, boreholes, chemical analyses, etc. • GEUS, EKBAA, BRGM

– earthquakes, recordings, stations, etc. • EPPO

– Landslides, susceptibility maps, etc. • GEO-ZS, EKBAA

Purpose: integrate, describe and query such

heterogeneous data in a uniform way

Introduction on Available Datasets in the

project

(7)

Approach

– Creation of a Conceptual Model to integrate and cover all the thematic fields

– Map the source relational data into RDF data compliant to the Conceptual Model

– Provide unique URIs for data

– Rely on a scalable RDF Triple Store to enforce the mappings and enable the storage and query of the RDF data

(8)

Models, Mappings and Integration

November 26, 2013

InGeoCloudS Workshop, Brussels 21 November

8 Providers’ Data GSOM mappings query tsv/csv xml json results xml trig trix n3 … export formats Inspire compliant GeoJSON KML Shapefile …. transform

(9)

Our model:

– needs to be a generic, extendable and interoperable, which relies not only to geographic data

– Integrates different datasets created from sources w.r.t. their semantics – developed for the documentation of scientific observation –

measurement and sampling

• An event-centric model that captures complex contexts where

time, place and participants are associated.

• Capable to describe complex relationships – to integrate or connect rich information

(10)

E39 Actor P14 carried out by E53 Place P7 took place at E52 Time-span P4 has time-span E7 Activity E70 Thing P12B was present at E55 Type P2 has type E41 Appellation P1 is identified by E62 String E1 CRM Entity P3 has note

E73 Information Object P67B is referred to by

S18 Map

P101 had as general use

S19 Observable Entity S21 Spatial Distribution Model

P16 used specific object

dc:Agent rdf:Literal dc:coverage dc:date dc:format dc:creator dc:subject dc:LocationPeriodOrJurisdiction dc:MediaTypeOrExtent

E42 Place Appellation

Basic Concepts

Proposed (New) Concepts

Reused Concepts (CIDOC) Reused Concepts (DC, RDF)

(11)

The model in details - Observation,

Measurement, Sample Taking

Proposed (New) Concepts

Reused Concepts (CIDOC) Reused Concepts (DC, RDF)

(12)
(13)
(14)

Data Mappings to Conceptual Model

The data providers’ relational data were informally

mapped into our Conceptual Model

Such mappings can be formally expressed using

R2RML

– R2RML is a W3C Recommendation

– http://www.w3.org/TR/r2rml/

– Enables the automatic creation and synchronization of the Linked Data w.r.t. the Relational Data

(15)

S16.Borehole S17.Borehole_Collar O7.consists_of P1F.is_identified_by P43F.has_ Dimension E42.Identifier BOREHOLENO S15.Acquifer_Concept INTAKE P1F.is_identified_ by P87F.is_identified_by P2F.has_type P87F.is_identified_by O9.contains_or_ confines E41.Appelation BOREHOLENAME E54.Dimension DRILLDEPTH P90F.has_value P2F.has_type E55.Type “Depth” E60.Number E42.Identifier INTAKENO E47.Spatial_Coordinates LATITUDE E47.Spatial_Coordinates E47.Spatial_Coordinates XUTM E47.Spatial_Coordinates YUTM E47.Spatial_Coordinates ELEVATION E55.Type “EPSG:32632”

GEUS data -> Conceptual Model – Borehole

and Location

(16)

P1F.is_identified_by P4F.has_ time-span O4.sampled_at O12.upper_vertical_limit O13.lower_vertical_limit S13.Sample O5.removed P2F.has_type E55.Type “GrWater_Sampling” E53.Place O9.contains_or_ confines E52.Time-Span SAMPLEDATE E47.Spatial_Coordinates TOP E47.Spatial_Coordinates BOTTOM E42.Identifier SAMPLEID S16.Borehole S15.Acquifer_Concept INTAKE P1F.is_identified_ by E42.Identifier INTAKENO S2.Sample_Taking P53F_has_former_or_ current_location P16F.used_specific_ object O9.contains_or_ confines

GEUS data -> Conceptual Model – Ground

Water Sample

(17)

E16.Measurement P1F.is_identified_by P2F.has_type P14F.carried_out_by P40F.observed_dimension P4F.has_time-span O9.contains_or_ confines S16.Borehole S15.Acquifer_Concept INTAKE E52.Time-Span TIMEOFMEAS E42.Identifier WATLEVELNO E55.Type QUALITY E39.Actor CATEGORY E54.Dimension WATERLEVEL P90F.has_value E54.Dimension WATLEVMSL P90F.has_value E54.Dimension WATLEVGRSU P90F.has_value E54.Dimension WATLEVMP P90F.has_value P7F.took_place_at P16F.used_specific_ object E55.Type METHOD P32F.used_general_technique

GEUS data -> Conceptual Model – Water Level

Measurement

(18)

E16.Measurement

S13.Sample P39F.measured

P1F.is_identified_by

P40F.observed_ dimension

E54.Dimension P91F.has_unit E58.Measurement_UnitUNIT

P90F.has_value P1F.is_identified_by P1F.is_identified_by P1F.is_identified_by P2F.has_type P2F.has_type O15.is_bounded_by E42.Identifier ANALYSISNO E55.Type “Chemical_Analysis” E60.Number AMOUNT E55.Type Comp_Descr E60.Number DETECTIONLIMIT E42.Identifier COMPOUNDNO E42.Identifier EUNO E42.Identifier CASNO

GEUS data -> Conceptual Model – Ground

Water Chemical Analysis

(19)

INSPIRE (

Infrastructure for Spatial Information in

Europe) covers 34 Spatial Data Themes. It uses ISO

19100 series standards; oriented to

environmental

and

spatial

data

The Generic conceptual schema is

not event-oriented

(especially the schema that uses the

observation-measurement specification)

Our data and INSPIRE

Data are not INSPIRE-compliant but there is a specific

(20)

INSPIRE is not made for information integration

INSPIRE lacks semantic clarity of concepts in various

places, i.e.

“An observation is an act that results in the estimation of the value of a feature property…

– Confusion of the action (event) aka observation with the result

– “The observation itself is also a feature, since it has properties and identity.

– Observation is the value of a feature property and also a feature

(21)

Geosciml: borehole

Geology core: borehole

Borehole is used by 2

Why do not use

INSPIRE directly?

INSPIRE schemas

(22)

How many observations definitions (structures) INSPIRE includes?

1) ISO 19156 Observations and Measurements/OM Observation

2) GML Observation

3) Observation OGC-WaterML:Observation Process, MeasurementTimeseries `and

TimeseriesObservation: What is their relation with the other observation schemas(ISO 19156)?

1) ISO 19156 2) GML 3) WaterML

(23)
(24)

Model Mapped to INSPIRE

(25)

Virtuoso Universal Service is a hybrid of DB engine &

middleware combining the func. of a RDBMS,

ODBMS, and a RDFStore

Supports various Internet & Web protocols:

– HTTP(s), WebDav, SOAP, UDDI, SPARQL

Implemented various data access APIs:

– ODBC, JDBC, OLE DB, ADO . NET

Provides a Web-based DB Administration UI

Provides an open-source version

(26)
(27)

Reasons for selection:

– Scalable to handle millions of triples + exhibits a very good performance

– Offers API for RDF data management

– Supports SPARQL and SPARUL

– Supports 2-D features and offers spatial operators

– Supports R2RML and provides the respective relational-to-RDF data mapping mechanism

– Offers a cloud-based version

(28)

Scalability / Elasticity

– Planned at least 2 instances of Elastic LD Storage to be available in the platform to handle the query load (cross-provider queries are more demanding)

– The elastic capabilities of Amazon Cloud are exploited to increase/decrease the number of instances when the load surpasses or goes below a specific threshold

– Stress tests are currently conducted to determine these upper and lower thresholds

(29)

• Generated Linked Data for ground waters (EKBAA, BRGM, GEUS)

• Successfully inserted ~1B triples into Virtuoso

• Created specific queries – For each partner’s dataset

– Combining similar notions from cross provider datasets

• Relative small execution times depending on the number of fetched results + query complexity

• At the moment using only 1 instance in the Amazon Cloud

(30)

Conceptual Data Integration & Linking API

• Methods for:

– Querying data/metadata via SPARQL (both provider-specific & cross-provider queries are supported)

– Importing data/metadata in various formats (both blocking and non-blocking depending on the size of meta-data to be imported)

• Direct through respective API method

– Data provider should be responsible of synchronizing his/her relational and RDF data

• Indirect through the API method for the definition of relational-to-RDF mappings

– Apart from the definition of mappings, the data provider does not need to perform any kind of synchronization, as the system will be responsible for performing such a synchronization on his/her behalf

– Exporting data/metadata in various formats (inline in the response or available in a specific URL)

– Updating data/metadata via SPARUL

(31)

Conceptual Data Integration & Linking API

Data Integration Layer

Query Engine

A P I Query Update Import Export

Linked Data V ir tu o s o Different datasets Different formats, models e.g. INSPIRE

Conclusions

• Easy integration

• Easy querying

• Hide complexities

(32)

Cross query over datasets

– Find the boreholes and their locations in all datasets if ground water samples where taken from them after the date 01/01/2005.

(33)

select ?bhole ?date ?latitude ?longitude where { {

select ?bhole ?date ?latitude ?longitude where { ?st sci:O5.removed ?sample;

crm:P4F.has_time-span ?date; sci:O4.sampled_at ?place;

crm:P2F.has_type "GRWATER_SAMPLING". ?bhole sci:O9.contains_or_confines ?place;

a sci:S16.Borehole; sci:O7.consists_of ?bcollar.

?bcollar crm:P87F.is_identified_by ?latitude; crm:P87F.is_identified_by ?longitude. filter (regex(?latitude, "Latitude")). filter (regex(?longitude, "Longitude")). filter(?date > "2005-01-01"^^xsd:date).

} limit 100 }

UNION {

select ?bhole ?date ?latitude ?longitude where { ?anal crm:P2F.has_type "QUALITY_MEASUREMENT";

Demo Example

UNION {

select ?bhole ?date ?latitude ?longitude where { ?st crm:P4F.has_time-span ?date;

sci:O4.sampled_at ?bhole. ?bhole a sci:S16.Borehole;

sci:O7.consists_of ?bcollar.

?bcollar crm:P87F.is_identified_by ?latitude; crm:P87F.is_identified_by ?longitude. filter (regex(?latitude, "Latitude")). filter (regex(?longitude, "Longitude")).

filter(?date > "2005-01-01"^^xsd:date). } limit 100

} } Cont’d

(34)
(35)
(36)

How to publish your data as Linked Open Data using

GSOM?

How to query your data?

How to export in different formats?

How to export your data as INSPIRE compliant data?

17:15 (45min)

Room E “Next Door”

Yannis Roussakis

(37)

References

Related documents