S. Benkner, Department of Scientific Computing, University of Vienna. @neurist-NeuroLOG Workshop, Paris, May 18, 2010
Data Services
@neurIST and beyond
Siegfried Benkner
Department of Scientific Computing
Faculty of Computer Science
University of Vienna
http://www.par.univie.ac.at
Parallel Computing / HPC
Programming Models and Languages
Compiler and Runtime Technologies
Programming Environments and Tools
Vienna Fortran, HPF, HPF+, Hybrid Programming, Multicore…
Grid/SOA/Cloud Computing
Parallel Application Services
On-demand supercomputing
Data Virtualization & Integration & Mining
Grid Miner, Vienna Grid Environment, Cloud, …
S. Benkner, Department of Scientific Computing, University of Vienna. @neurist-NeuroLOG Workshop, Paris, May 18, 2010
Service Oriented Architecture
- Compute Services
- Data Services
Virtualization
- HPC-Applications-as-a-Service
- Data-as-a-Service
Service Environment
- Service provisioning, deployment & hosting
Client Framework
- High-level client API; Workflow support
Vienna Grid Environment (VGE)
Data Service Compute Service Client Client Client RegistryRegistry Compute Service DBS DBS Client Data Service DBS DBS Data Service
Capabilities
- Application provisioning
- Data provisioning
- Job handling
- Query handling
- Client APIs
- Security
Virtualization of heterogeneous data sources as services
Data Access Services
access to single data source
Data Mediation Services
integration of multiple data sources
via single virtual schema
Based on standards
- OGSA/DAI, OGSA/DQP
- SQL, XML
DBS DBS CSV File CSV File Data Access Service Data Access Service XML DB Data Mediation Service Data Mediation Service DBS RegistryRegistry Client Client Client@neuInfo Data Services
S. Benkner, Department of Scientific Computing, University of Vienna. @neurist-NeuroLOG Workshop, Paris, May 18, 2010
GDMS
Transform.
Functions
OGSA-DAI
Perform
Document
OGSA-DAI
Response
Document
GDMS
Mapping
Schema
Data Service
OGSA/DAI
GDMS
Relational
XML
CSV
Virtual DB
Transparent access to multiple data sources
–
Virtual global schema
– Data stays where it is; always live
– Schema, language & interface transparency
GDMS Mapping Schema
– Global-as-View query reformulation
– Different views of data
GDMS Transformation Functions
– On-the-fly data transformation
via user-defined Java methods
Data Mediation Services
Optimize complex queries using multiple
evaluation services on different hosts.
based on
OGSA-DQP
GDMS generates query plan
from query
against global schema
DQP coordinator service distributes
query plan
onto evaluation services
Evaluation services execute parts
of query plan in parallel
.
Distributed Query Processing
Relational
Data
Service
Host 1
XMLData
Service
Host 2
RelationalEvaluation
Service
Data
Service
Host 3
Data Service
GDMS
OGSA-DQP
Coordinator
OGSA-DAI
Evaluation
Service
S. Benkner, Department of Scientific Computing, University of Vienna. @neurist-NeuroLOG Workshop, Paris, May 18, 2010
Clients
EHR
PDD
EHR
EHR
PACS
PUBMED
PACS
Data Service
Data Service
Data Service
Approach
– Semantic Data Mediation
– Federation of Services
– CRIM
, Ontology
– Security, Pseudonymization, …
Hospital information systems
– Sheffield, Geneva,
Rotterdam,…
– EHR, PACS, …
Public databases
– Genetic: EBI,NCBI
– Literature: Medline,
– etc.
Product design databases
– COTS stents, coils, etc.
@neurIST Data Integration Scenario
Defines all information to be captured for a patient
– clinical information (imaging, diagnostic and treatment data, …)
– administrative information
– research results produced (indicators)
Biomedical data infostructure – two different architectures
- ANO: CIS anonymized DB
- OTF: on-the-fly access to CIS
Research Context
Treatment context
Federated
Biomedical
Info
Structure
Data Capturing
Imaging data
Patient record
Application Suites
Processing &
Analysis
Distributed
queries
Knowledge
discovery
Denormalization
Re-identification
Normalization
De-identification
Information
aquisition &
structuring
Information
access, analysis
& enrichment
S. Benkner, Department of Scientific Computing, University of Vienna. @neurist-NeuroLOG Workshop, Paris, May 18, 2010
@neurIST Testbed
Data (model)
annotation by domain specialist
(currently manually)
Data Provider Sites
offering DBs behind an
OGSA-DAI WS interface
Researchers/Applications
exploit annotations to retrieve data
through ontology concepts
Annotations
Ontology
S. Benkner, Department of Scientific Computing, University of Vienna. @neurist-NeuroLOG Workshop, Paris, May 18, 2010
@neurIST Ontology
– Global “schema” of the disease
– Implemented in OWL-DL
– Incorporates existing ontologies
» FMA (Foundational Model of Anatomy)
» GO (Gene Ontology), DOLCE as Upper Ontology
» Concepts mapped to UMLS
(Unified Medical Language System)
Semantic support in @neurIST
– Semantic
annotation
of services
– Semantic
broker
(semantic service discovery)
– Semantic
query resolver
(reduce relational complexity)
– Semantic
mediation
between data sources (generation of mapping files)
1183
85
2319
UMLS Map
Relationship
Types
Classes
@neurIST Semantic Technologies
Knowledge Base
Supporting ontologies (e.g. data source
relations) Semantic data source
annotation @neurIST domain ontology
registered
Data
Service
Semantic
Broker
Semantic
Query Resolver
1
2
4
3
5
6
Data
Service
Goal: simplify access to distributed data sources utilizing ontology concepts
Semantic Broker
: „What data to combine?“
Semantic Query Resolver
: „How to combine?“
– reduces relational complexity
– (semi-)automatic generation of mapping schemes
SQR based on UNITY framework
by University of British Columbia.
Semantic Query Resolver
Not fully realized within
@neurIST!
S. Benkner, Department of Scientific Computing, University of Vienna. @neurist-NeuroLOG Workshop, Paris, May 18, 2010
DMS
DAS
DAS
DAS
Schema
Schema
Schema
Virtual
Schema
Mapping
Data
Access
Services
Data
Mediation
Services
Ontology
Semantic Data Integration
Developmemts beyond @neurIST
Optimized Download Mechanisms for Data Services
- SOAP Attachments (standard mechanism)
- Data blocks (speed-up up to 5X)
- via HTTP URL (speed-up up to 10X)
Support for Cloud Computing
- deployment of compute services and data services within Cloud
- Ubuntu, Eucalyptus
Workflow Services
- Based on WEEP workflow engine; WS-BPEL v. 2.0 compliant
Large-Scale Data services
- based on Hadoop HDFS; Map/Reduce framework
- installation on 64 core cluster at Vienna
S. Benkner, Department of Scientific Computing, University of Vienna. @neurist-NeuroLOG Workshop, Paris, May 18, 2010