• No results found

Disributed Query Processing KGRAM - Search Engine TOP 10

N/A
N/A
Protected

Academic year: 2021

Share "Disributed Query Processing KGRAM - Search Engine TOP 10"

Copied!
30
0
0

Loading.... (view fulltext now)

Full text

(1)

fédération de données et de ConnaissancEs

Distribuées en Imagerie BiomédicaLE

Data fusion, semantic alignment, distributed queries

Johan Montagnat

CNRS, I3S lab, Modalis team

on behalf of the CrEDIBLE consortium

(2)

Motivations

Biomedical data

– High heterogeneity: images, clinical data, biomarkers, biology....

– Increasing amount / number of (open) sources – Big Data

● Large-scale medical studies

(statistical medical studies, epidemiology...)

– Need for cross-factors analysis – Linked Data

● Data (re)analysis opportunities ● Translational research

Centralized approaches encounter limitations

– Multiple data source kinds

– Large data volumes to transfer / archive / search

– Sensitive patient data / complex access control policies

– Need to adopt uniform data model & format

(3)

Biomedical data mediation & federation

– Data federation through distributed querying and query rewriting

Client

Federator

(query decomposition, planning & results federation)

Mediator

(ETL or query rewriting)

Mediator

(ETL or query rewriting)

Site 1 Site 2

Query-based access to data

(4)

Domain ontology-based federation

Client

Federator

Mediator Mediator

Site 1 Site 2

Query-based access to data

Remote sub-queries

Medical domain ontology (reference model)

Data querying

Data

(5)

Challenges and expertise

Challenges

– Representation of data semantics for heterogeneous

data sources → biomedical ontology building

– Data federation → distributed query engine

– Data mediation → RDB2RDF, ontology alignment

Partnership

I3S/Modalis

INRIA/I3S/Wimmics LTSI/MediCIS

Semantic representation

Distributed query engine

(6)

Scientific networking – annual workshop

● Objectives

– Multidisciplinary workshop gathering experts in biomedical data

representation, semantic, distribution, federation, integration...

● October 2012: data integration, ontologies, data models,

reasoning

– Semantic models are now widely accepted although ontologies

resources are not sufficient

– Existing systems are mostly centralized but strigent need to

support multi-centric studies

● October 2013: data reuse, ontologies, mediation,

federation, graphs

– Both horizontal and vertical data partition schemes need to be

addressed

– Data models in use are often constructed bottom-up

(7)

Bibliography study

Distributed (semantic) Query Processing

– Report CrEDIBLE-12-2, november 2012

– A. Gaignard PhD thesis, march 2013, U. Nice Sophia

Relational data mediation

– Report I3S 2013-04-FR, november 2013

– Submitted to Journal of Web Semantics

Ontology of scientific measures (data acquisition,

(8)

Reference ontology

● 3-levels structure: foundational (DOLCE), core, domain

● Domain-specific rules – Inference abilities Particular Endurant Perdurant Action Dataset acquisitions Conceptualization Expression Inscription Dataset acquisition equipment Studies Examinations Dataset processings Datasets Medical image expressions Medical image files Subjects Investigators Centres Languages Medical Image formats ● DataTop ontology – Current focus on measurements ● Derived relational schema

(9)

Ontology modules

Modularized ontology to improve reuse and

lightweightness

– ONL-MR-DA: MR Dataset Acquisition

– ONL-DP: Data Processing

– ONL-MSA: Mental State Assessment

– OntoVIP: Medical Image Simulation

Wide diffusion

(10)

Data query and federation engine

● KGRAM (Knowledge Graph Abstract Machine) Semantic

query engine:

– Full support of SPARQL1.1

– Generic interface for heterogeneous backends

Flexible architecture facilitating different deployment scenarios

● Mediation interface to access relational data

(11)

Distributed Query Processing

Query federator decoupled from data sources

Asynchronous querying of multiple data sources

Query planning and parallel querying

(12)

Distributed Query Processing

KGRAM query processing

SELECT ?name ?date

WHERE { ?x foaf:name ?name . ?x dbpedia:birthDate ?date . FILTER (CONTAINS (?name, 'Bob')) }

(13)

Distributed Query Processing

KGRAM query processing

Q1

SELECT ?name ?date

WHERE { ?x foaf:name ?name . ?x dbpedia:birthDate ?date . FILTER (CONTAINS (?name, 'Bob')) } Q2

(14)

Distributed Query Processing

KGRAM query processing

Asynchronous execution

Q1

SELECT ?name ?date

WHERE { ?x foaf:name ?name . ?x dbpedia:birthDate ?date . FILTER (CONTAINS (?name, 'Bob')) } Q2

Q Interface KGRAM core Data source #1 Data source #2 Q

(15)

Distributed Query Processing

KGRAM query processing

Asynchronous execution

Q1

SELECT ?name ?date

WHERE { ?x foaf:name ?name . ?x dbpedia:birthDate ?date . FILTER (CONTAINS (?name, 'Bob')) } Q2

Q Interface KGRAM core Data source #1 Q

(16)

Distributed Query Processing

KGRAM query processing

Asynchronous execution

Q1

SELECT ?name ?date

WHERE { ?x foaf:name ?name . ?x dbpedia:birthDate ?date . FILTER (CONTAINS (?name, 'Bob')) } Q2

Q Interface KGRAM core Data source #1 Data source #2 Q Q1

(17)

Distributed Query Processing

KGRAM query processing

Asynchronous execution

Q1

SELECT ?name ?date

WHERE { ?x foaf:name ?name . ?x dbpedia:birthDate ?date . FILTER (CONTAINS (?name, 'Bob')) } Q2

Q Interface KGRAM core Data source #1 Q Q1

(18)

Distributed Query Processing

KGRAM query processing

Asynchronous execution

Q1

SELECT ?name ?date

WHERE { ?x foaf:name ?name . ?x dbpedia:birthDate ?date . FILTER (CONTAINS (?name, 'Bob')) } Q2

Q Interface KGRAM core Data source #1 Data source #2 Q Q1 Q2'

(19)

Distributed Query Processing

KGRAM query processing

Asynchronous execution

Q1

SELECT ?name ?date

WHERE { ?x foaf:name ?name . ?x dbpedia:birthDate ?date . FILTER (CONTAINS (?name, 'Bob')) } Q2

Q Interface KGRAM core Data source #1 Q Q1 Q2'

(20)

Distributed Query Processing

KGRAM query processing

Asynchronous execution

Q1

SELECT ?name ?date

WHERE { ?x foaf:name ?name . ?x dbpedia:birthDate ?date . FILTER (CONTAINS (?name, 'Bob')) } Q2

Q Interface KGRAM core Data source #1 Data source #2 Q Q1 Q2' Q2''

(21)

Distributed Query Processing

KGRAM query processing

Asynchronous execution

Q1

SELECT ?name ?date

WHERE { ?x foaf:name ?name . ?x dbpedia:birthDate ?date . FILTER (CONTAINS (?name, 'Bob')) } Q2

Q Interface KGRAM core Data source #1 Q Q1 Q2' Q2''

(22)

Distributed Query Processing

KGRAM query processing

Asynchronous execution

Q1

SELECT ?name ?date

WHERE { ?x foaf:name ?name . ?x dbpedia:birthDate ?date . FILTER (CONTAINS (?name, 'Bob')) } Q2

Q Interface KGRAM core Data source #1 Data source #2 Q Q1 Q2' Q2''

(23)

Distributed Query Processing

KGRAM query processing

Asynchronous execution

Q1

SELECT ?name ?date

WHERE { ?x foaf:name ?name . ?x dbpedia:birthDate ?date . FILTER (CONTAINS (?name, 'Bob')) } Q2

Q Interface KGRAM core Data source #1 Q Q1 Q2' Q2''

(24)

Performance results

● Mixed (relational / semantic) stores querying

(25)

Exploitation in ANR GINSENG

(epidemiology)

● Partnership with GINSENG

consortium and Mnemotix SME

● Federation of heterogeneous

epidemiology repositories

– Multiple epidemiology data

acquisition networks

– Cross-correlation with “external”

data (e.g. demographic: IGN)

(26)

Collaboration with Pitié Salpétrière

CAC database

– Neurodegenerative diseases (esp. Alzheimer's)

– Mediation of relational CAC schema

● Comparative study of different relational-to-RDF

mediation techniques

– R2RML transformation language

(27)

Perspectives

● DataTop ontology development

– Data distribution (spatial and temporal) and data collections

structure

– Alzheimer disease

● Query engine architecture

– Code modularity for alternative optimization strategy

implementation

● Query performance optimization

– Query plan improvement (especially when dealing both with

horizontal and vertical partitioning)

(28)

Perspectives: collaborations

Université de Laval au Québec

– Ontology design: Alzheimer's disease data

ANR CONTINT 2013 BIOMIST

– Industrial system for large databases management

(Product Lifecycle Management - PLM)

– Application to medical databases in the area of brain

functions analysis

France Life Imaging excellence infrastructure

– National-level infrastructure for medical data sharing

(29)

Publications

Peer-reviewed

– Medical domain: iDash image informatics workshop,

sept. 2012; DCICTIA-MICCAI workshop, oct. 2012

– Distributed Query Processing: Web Intelligence, dec.

2012; Journal of Web Semantics (submitted)

– Mediation: Journal of Web Semantics (submitted)

PhD thesis

– A. Gaignard (march 2013, U. Nice Sophia Antipolis)

– N. Cerezo (december 2013, U. Nice Sophia Antipolis)

Research reports

(30)

Conclusions

Query-based data federation grounded on

semantic web standards (SPARQL, RDF, RDFS)

– Emphasis on query language expressivity

– Support for both horizontal and vertical data partitioning

– Broad applicability (given that ontologies are available)

Ontology-based

– Reference model for data alignment and query terms

Currently using Extract-Transform-Load

– Static conversion of data sources in RDF format

– Work on dynamic data mediation on-going

Optimization work on-going

– Flexible software architecture

References

Related documents