• No results found

Data and beyond

N/A
N/A
Protected

Academic year: 2021

Share "Data and beyond"

Copied!
8
0
0

Loading.... (view fulltext now)

Full text

(1)

S. Benkner, Department of Scientific Computing, University of Vienna. @neurist-NeuroLOG Workshop, Paris, May 18, 2010

Data Services

@neurIST and beyond

Siegfried Benkner

Department of Scientific Computing

Faculty of Computer Science

University of Vienna

http://www.par.univie.ac.at

Parallel Computing / HPC

Programming Models and Languages

Compiler and Runtime Technologies

Programming Environments and Tools

Vienna Fortran, HPF, HPF+, Hybrid Programming, Multicore…

Grid/SOA/Cloud Computing

Parallel Application Services

On-demand supercomputing

Data Virtualization & Integration & Mining

Grid Miner, Vienna Grid Environment, Cloud, …

(2)

S. Benkner, Department of Scientific Computing, University of Vienna. @neurist-NeuroLOG Workshop, Paris, May 18, 2010

Service Oriented Architecture

- Compute Services

- Data Services

Virtualization

- HPC-Applications-as-a-Service

- Data-as-a-Service

Service Environment

- Service provisioning, deployment & hosting

Client Framework

- High-level client API; Workflow support

Vienna Grid Environment (VGE)

Data Service Compute Service Client Client Client RegistryRegistry Compute Service DBS DBS Client Data Service DBS DBS Data Service

Capabilities

- Application provisioning

- Data provisioning

- Job handling

- Query handling

- Client APIs

- Security

Virtualization of heterogeneous data sources as services

Data Access Services

access to single data source

Data Mediation Services

integration of multiple data sources

via single virtual schema

Based on standards

- OGSA/DAI, OGSA/DQP

- SQL, XML

DBS DBS CSV File CSV File Data Access Service Data Access Service XML DB Data Mediation Service Data Mediation Service DBS RegistryRegistry Client Client Client

@neuInfo Data Services

(3)

S. Benkner, Department of Scientific Computing, University of Vienna. @neurist-NeuroLOG Workshop, Paris, May 18, 2010

GDMS

Transform.

Functions

OGSA-DAI

Perform

Document

OGSA-DAI

Response

Document

GDMS

Mapping

Schema

Data Service

OGSA/DAI

GDMS

Relational

XML

CSV

Virtual DB

Transparent access to multiple data sources

Virtual global schema

– Data stays where it is; always live

– Schema, language & interface transparency

GDMS Mapping Schema

– Global-as-View query reformulation

– Different views of data

GDMS Transformation Functions

– On-the-fly data transformation

via user-defined Java methods

Data Mediation Services

Optimize complex queries using multiple

evaluation services on different hosts.

based on

OGSA-DQP

GDMS generates query plan

from query

against global schema

DQP coordinator service distributes

query plan

onto evaluation services

Evaluation services execute parts

of query plan in parallel

.

Distributed Query Processing

Relational

Data

Service

Host 1

XML

Data

Service

Host 2

Relational

Evaluation

Service

Data

Service

Host 3

Data Service

GDMS

OGSA-DQP

Coordinator

OGSA-DAI

Evaluation

Service

(4)

S. Benkner, Department of Scientific Computing, University of Vienna. @neurist-NeuroLOG Workshop, Paris, May 18, 2010

Clients

EHR

PDD

EHR

EHR

PACS

PUBMED

PACS

Data Service

Data Service

Data Service

Approach

– Semantic Data Mediation

– Federation of Services

– CRIM

, Ontology

– Security, Pseudonymization, …

Hospital information systems

– Sheffield, Geneva,

Rotterdam,…

– EHR, PACS, …

Public databases

– Genetic: EBI,NCBI

– Literature: Medline,

– etc.

Product design databases

– COTS stents, coils, etc.

@neurIST Data Integration Scenario

Defines all information to be captured for a patient

– clinical information (imaging, diagnostic and treatment data, …)

– administrative information

– research results produced (indicators)

Biomedical data infostructure – two different architectures

- ANO: CIS  anonymized DB

- OTF: on-the-fly access to CIS

Research Context

Treatment context

Federated

Biomedical

Info

Structure

Data Capturing

Imaging data

Patient record

Application Suites

Processing &

Analysis

Distributed

queries

Knowledge

discovery

Denormalization

Re-identification

Normalization

De-identification

Information

aquisition &

structuring

Information

access, analysis

& enrichment

(5)

S. Benkner, Department of Scientific Computing, University of Vienna. @neurist-NeuroLOG Workshop, Paris, May 18, 2010

@neurIST Testbed

Data (model)

annotation by domain specialist

(currently manually)

Data Provider Sites

offering DBs behind an

OGSA-DAI WS interface

Researchers/Applications

exploit annotations to retrieve data

through ontology concepts

Annotations

Ontology

(6)

S. Benkner, Department of Scientific Computing, University of Vienna. @neurist-NeuroLOG Workshop, Paris, May 18, 2010

@neurIST Ontology

– Global “schema” of the disease

– Implemented in OWL-DL

– Incorporates existing ontologies

» FMA (Foundational Model of Anatomy)

» GO (Gene Ontology), DOLCE as Upper Ontology

» Concepts mapped to UMLS

(Unified Medical Language System)

Semantic support in @neurIST

– Semantic

annotation

of services

– Semantic

broker

(semantic service discovery)

– Semantic

query resolver

(reduce relational complexity)

– Semantic

mediation

between data sources (generation of mapping files)

1183

85

2319

UMLS Map

Relationship

Types

Classes

@neurIST Semantic Technologies

Knowledge Base

Supporting ontologies (e.g. data source

relations) Semantic data source

annotation @neurIST domain ontology

registered

Data

Service

Semantic

Broker

Semantic

Query Resolver

1

2

4

3

5

6

Data

Service

Goal: simplify access to distributed data sources utilizing ontology concepts

Semantic Broker

: „What data to combine?“

Semantic Query Resolver

: „How to combine?“

– reduces relational complexity

– (semi-)automatic generation of mapping schemes

SQR based on UNITY framework

by University of British Columbia.

Semantic Query Resolver

Not fully realized within

@neurIST!

(7)

S. Benkner, Department of Scientific Computing, University of Vienna. @neurist-NeuroLOG Workshop, Paris, May 18, 2010

DMS

DAS

DAS

DAS

Schema

Schema

Schema

Virtual

Schema

Mapping

Data

Access

Services

Data

Mediation

Services

Ontology

Semantic Data Integration

Developmemts beyond @neurIST

Optimized Download Mechanisms for Data Services

- SOAP Attachments (standard mechanism)

- Data blocks (speed-up up to 5X)

- via HTTP URL (speed-up up to 10X)

Support for Cloud Computing

- deployment of compute services and data services within Cloud

- Ubuntu, Eucalyptus

Workflow Services

- Based on WEEP workflow engine; WS-BPEL v. 2.0 compliant

Large-Scale Data services

- based on Hadoop HDFS; Map/Reduce framework

- installation on 64 core cluster at Vienna

(8)

S. Benkner, Department of Scientific Computing, University of Vienna. @neurist-NeuroLOG Workshop, Paris, May 18, 2010

Cloud/Virtualization Platform

(Amazon EC2, Eucalyptus, Xen, KVM,…)

VGE Cloud Image

VGE Service Environment

VGE 

Apache Tomcat 6

VGE Service 

Resource

HPC 

Application

BPEL

Workflow

Resource types

Data Source

Hadoop

Job

Data Sources types: Virtual, Relational, XML, Files

Cloud-enabled VGE

Hadoop Installation on (virtual machine) cluster

Name Node

– Start Hadoop job

– Distribute Map and Reduce Tasks

Data Nodes

– Execute Map and/or Reduce tasks

HDFS file system

– Replicate Files

– Partition Files

References

Related documents