• No results found

Grille de calcul pour le biomédical Journée BioSTIC, 10 juillet 2008

N/A
N/A
Protected

Academic year: 2021

Share "Grille de calcul pour le biomédical Journée BioSTIC, 10 juillet 2008"

Copied!
43
0
0

Loading.... (view fulltext now)

Full text

(1)

EGEE-II INFSO-RI-031688

Enabling Grids for E-sciencE

www.eu-egee.org

Grille de calcul pour le biomédical

Journée BioSTIC, 10 juillet 2008

Johan Montagnat Diane Lingrand

(2)

Enabling Grids for E-sciencE

EGEE Grid Infrastructure

Flagship European grid infrastructure project

Now in 3rd phase with more than 100 partners Size of the infrastructure today: > 250 sites in 48 countries

> 70 000 CPU cores

~ 5 PB disk + tape MSS

> 150 000 jobs/day

(3)

DICOM Image Management, J. Montagnat, HealthGrid, June 2, 2008

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688 3

Middleware components

Computing Resources Storage Resources

Site X

Logging, real time monitoring

Workload Management Sites Resources Information Service Dynamic evolution Data Management DataSets info Author. &Authen. queries queries User requests Resour

ces allocation Publication resour

ces info

(4)

Enabling Grids for E-sciencE

gLite middleware

Complex middleware stack

– On top of Globus Toolkit 2, VDT, CONDOR, batch schedulers...

Fundational services

– User authentication and authorization

– Information Index: registration of Computing Elements (CEs), Storage Elements (SEs), etc.

– Workload Management System: workload distribution over sites

– Data Management System: virtual file hierarchy, standard (SRM) interface to storage resources

Batch oriented

– EGEE is a federation of computing centers

(5)

Medical image analysis on EGEE, J. Montagnat, BioGrid, June 2, 2008

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Why grids for medical imaging?

Sharing computing resources and algorithms

– Research (populations studies, models design, validation, statistics)

– Complex analysis (compute intensive image processing, time constraints...) S e q 1 > d c s c d s s d c s d c d s c bscdsbcbjbfvbfvbvfbvbvbhvb hsvbhdvbhfdbvfd S e q 2 > b v d f v f d v h b d f v b bhvdsvbhvbhdvrefghefgdscg dfgcsdycgdkcsqkc S e q n > b v d f v f d v h b d f v b bhvdsvbhvbhdvrefghefgdscg dfgcsdycgdkcsqkchdsqhfduh dhdhqedezhhezldhezhfehfle zfzejfv Data Algorithms Procedures Computing power 4

(6)

Enabling Grids for E-sciencE

Building on Production Grids

Resources

Communication layer (GEANT, Internet...) (low-level) Middleware

Applications

Production

grid

infrastructure level Resources Resources Resources Resources

Applications Applications Applications

Application-level services

(high-level middleware) Application-level services(high-level middleware)

Applications

(7)

Medical image analysis on EGEE, J. Montagnat, BioGrid, June 2, 2008

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688 7

The

Biomed

Virtual Organization

Fostering scientific communities

– EGEE is an international, multi-disciplinary research infrastructure

– Scientific communities are expanding beyond administrative boundaries

Virtual Organizations

– Authentication and Authorization management

– Share resources (computers, data, algorithms...) inside a VO

– Application areas identification and support unit

The Biomed VO now divided in 3 sectors

– Medical imaging

– Bioinformatics

(8)

Enabling Grids for E-sciencE

Biomed Virtual Organization

Size of the infrastructure today:

> 250 sites in 48 countries

> 70 000 CPU cores

~ 5 PB disk + tape MSS

> 150 000 jobs/day

Out of which, Biomed VO: > 9000 registered users

> 100 sites in 30 countries (170 CEs, 130 SEs)

~ 17 000 CPU

(9)

Medical image analysis on EGEE, J. Montagnat, BioGrid, June 2, 2008

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Irregular use through time

Year 2007 statistics collected per month

(10)

Enabling Grids for E-sciencE

Total production in 2007

~800 CPU years (1000 SpecInt 2000 Normalised CPU time)

(11)

Medical image analysis on EGEE, J. Montagnat, BioGrid, June 2, 2008

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Total production in 2007

(12)

Enabling Grids for E-sciencE

Computing resources provision

Computing hours per EGEE

(13)

Medical image analysis on EGEE, J. Montagnat, BioGrid, June 2, 2008

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Life sciences compared to other

sciences

Biomed VO share

Biomed VO Number of jobs CPU time

(14)

Enabling Grids for E-sciencE

Building on Production Grids

Resources

Communication layer (GEANT, Internet...) (low-level) Middleware

Applications

Production

grid

infrastructure level Resources Resources Resources Resources

Applications Applications Applications

Application-level services

(high-level middleware) Application-level services(high-level middleware)

Applications

(15)

Application to life sciences, J. Montagnat, June 18, 2008

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

WISDOM In silico Drug Discovery

WISDOM:

http://wisdom.healthgrid.org/

Goal: find new drugs for neglected and emerging

diseases

– Neglected diseases lack R&D

– Emerging diseases require very rapid response time

Need for an optimized environment

– To achieve production in a limited time

– To optimize performances

Method: grid-enabled virtual docking

– Cheaper than in vitro tests

– Faster than in vitro tests

(16)

Enabling Grids for E-sciencE

High Throughput Virtual Docking

Chemical compounds : Chembridge – 500,000 Drug like – 500,000 Targets : Plasmepsin II (1lee, 1lf2, 1lf3)‏ Plasmepsin IV (1ls5)‏ Millions of chemical compounds available in laboratories

High Throughput Screening

1-10$/compound, nearly impossible

Molecular docking (FlexX, Autodock)‏

~80 CPU years, 1 TB data

Computational data challenge

~6 weeks on ~1000/1600 computers

Hits screening using assays performed on living cells

Chemical compounds : ZINC

Molecular docking : FlexX, Autodock

Targets structures : PDB

Grid infrastructure : EGEE

Leads

Clinical testing Drug

(17)

Healthgrid … – March 8th, 2007 – V. Breton

Enabling Grids for E-sciencE

INFSO-RI-508833 17

The new WISDOM architecture

Credit: J. Salzeman, V. Bloch

Major new features:

- improved data management - secure storage

(18)

Enabling Grids for E-sciencE

WISDOM-II: A huge international

effort

1%2% 2% 3% 3% 3% 3% 5% 6% 7% 12% 15% 39%

EGEE Germany Switzerland EGEE Asia Pacific

EGEE Russia Auvergrid EuChinaGrid EELA

EGEE South Western Europe EGEE Central Europe EGEE Northern Europe EGEE Italy

EGEE South Eastern Europe EGEE France

EGEE UKI

Significant contributions from several International grid infrastructures

Over 420 CPU years in 10 weeks

(19)

Application to life sciences, J. Montagnat, June 18, 2008

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688 19

The grid added value

Univ. Los Andes: Biological targets,

Malaria biology

LPC Clermont-Ferrand:

Biomedical grid Knowledge extraction,SCAI Fraunhofer: Chemoinformatics Univ. Modena: Biological targets, Molecular Dynamics ITB CNR: Bioinformatics, Molecular modelling Univ. Pretoria: Bioinformatics, Malaria biology Academica Sinica: Grid user interface

Biological targets In vitro testing HealthGrid:

Biomedical grid, Dissemination

CEA, Acamba project: Biological targets,

Chemogenomics

Chonnam nat. univ.: In vitro testing New

• The grid provides the centuries of CPU cycles required on demand

• The grid provides the reliable and secure data management services to store and replicate the biochemical inputs and outputs

• The grid offers a collaborative environment for the sharing of data in the research community on avian flu and malaria

(20)

Enabling Grids for E-sciencE

Statistics of deployment

First Data Challenge: July 1st - August 15th 2005

– Target: malaria

– 80 CPU years

– 1 TB of data produced

– 1700 CPUs used in parallel

– 1st large scale docking deployment world-wide on a e-infrastructure

Second Data Challenge: April 15th - June 30th 2006

– Target: avian flu

– 100 CPU years

– 800 GB of data produced

– 1700 CPUs used in parallel

– Collaboration initiated on March 1st: deployment preparation achieved in 45 days

Third Data Challenge: October 1st - 15th December 2006

– Target: malaria

– 400 CPU years

– 1,6 TB of data produced

– Up to 5000 CPUs used in parallel

(21)

Application to life sciences, J. Montagnat, June 18, 2008

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688 21

GPSA: Bioinformatics Grid Portal

Scientific objectives

–Molecular Bioinformatics: protein sequence analysis

–Analyse data from high-throughput Biology: complete genome projects, EST, complete proteomes, structural biology, ….

–Integration of biological data and tools

Method

–Provide Biologists with an usual Web interface: NPS@

 NPS@ Web portal online since 1998

 46 tools & 12 updated databases

 + 10,000,000 jobs & 5,000 jobs/day

–Ease the access to updated databases and algorithms.

–Protein databases are stored on grid storage as flat files.

–Legacy bioinformatics applications

 Wrapping usual binary in grid environment

 transparent remote access with local

filesystem

–Display results in graphical Web interface.

Status: Production

Contact: Christophe.Blanchet@ibcp.fr

(22)

Enabling Grids for E-sciencE • Grid-enabled bioinformatics resources –9 algorithms –3 protein databases • Bioinformatics descriptors

–XML framework, Web portal,

Web Services

Encryption system: EncFile

–Security: AES, Key sharing,

M-of-N

Transparent access

to grid files: Parrot

http://gpsa-pbil.ibcp.fr

(23)

Medical image analysis on EGEE, J. Montagnat, BioGrid, June 2, 2008

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688 23

EGEE Medical imaging sector

Pharmacokinetics

– UPV (University Polytechnic of Valencia)

GATE

– LPC (CNRS – Clermont Ferrand)

SiMRI3D, ThIS, CAVIAR

– CREATIS (CNRS - Lyon)

gPTM3D

– LRI-LAL-LIMSI (CNRS - Orsay)

Bronze Standard

– I3S (CNRS – Sophia), INRIA

SPM Alzheimer's

– U. of Genova / DEST

(24)

Enabling Grids for E-sciencE

Pharmacokinetics:

contrast agent diffusion

Scientific objectives

– Study of contrast agent diffusion

Computations

– Image sequences co-registration (independent computations)

– Parallel contrast agent diffusion modeling

Features

– Application portal for use in clinical environment

– Enable on-line analysis of exams acquired daily  Examination time ~15 minutes

(25)

Medical image analysis on EGEE, J. Montagnat, BioGrid, June 2, 2008

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688 25

SiMRI3D: MRI simulator

Scientific objectives

– Better understand MR physics.

– Study MR sequences in-silico.

– Study MR artefacts.

– Validate MR Image processing algorithms on synthetic

– yet realistic images.

Method

– Simulate Bloch's electromagnetism equations.

– Parallel (MPI) implementation to speed-up computations.

10243 3D image simulation:

~100 CPU years.

(26)

Enabling Grids for E-sciencE

Open GATE collaboration

Scientific objectives

Geant4 Application for Tomographic Emission

Medical physics applications: PET camera simulation, radiotherapy, occular brachytherapy treatment

http://www.opengatecollaboration.org

Method

GEANT4 base software to model physics of nuclear medicine.

Use Monte Carlo simulation to

improve accuracy of computations (as compared to the deterministic classical approach)‏

(27)

Medical image analysis on EGEE, J. Montagnat, BioGrid, June 2, 2008

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688 27

GATE application to radiotherapy

Scanner slices:

DICOM or DICOMRT

format

(28)

Enabling Grids for E-sciencE

gPTM3D: radiology analysis

Scientific objectives

– User-assisted reconstruction of 3D anatomical structures

– Quantification (e.g. volume measurements)

– Use models for augmented reality

Computations

– Parallleization of segmentation processes

– Large number of short jobs

Features

– Highly interactive (almost real-time)‏

– Used in clinical environment

(29)

Medical image analysis on EGEE, J. Montagnat, BioGrid, June 2, 2008

Software technologies for integration of process, data and knowledge in medical imaging

NeuroLOG ANR-06-TLOG-024 29

Use case: multiple sclerosis

Segmentation

Classification

Multimodal MRI 4 classes: - White matter - Gray matter - CSF - lesions Non-uniformity correction Intensity normalisation Preprocessing Stereotaxic registration Resampling

Registration and resampling

Brain segmentation Classification Segmentation MR Images collection (T1, T2, PD) Quantification Statistical tests Analysis

(30)

Software technologies for integration of process, data and knowledge in medical imaging

Experiments requirements

Large databases

– 256 3D images (test database)

– 120 3D images (mono-site clinical database)

– 2 400 3D images (multi-site clinical trial, TBs of data)

Various computing tools

– 10 to 15 processing stages in the pipeline

Computing power

– > 2 months sequential execution time

Pipeline description and execution

– Workflow description

(31)

Medical image analysis on EGEE, J. Montagnat, BioGrid, June 2, 2008

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Application to rigid registration

algorithms evaluation

Unregistered

Registered

O

1

O

2

(32)

Enabling Grids for E-sciencE A Params PFRegister Service GetFromEGEE Yasmina PFMatchICP CrestLines B Baladin

FormatConv GetFromEGEE GetFromEGEE GetFromEGEE FormatConv FormatConv FormatConv MultiTransfoTest Params Params Params Params Params WriteResults WriteResults WriteResults WriteResults Params MethodToTest

Bronze standard workflow

Params Params

~100 image pairs

~800 EGEE jobs

(33)

Medical image analysis on EGEE, J. Montagnat, BioGrid, June 2, 2008

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688 33

Application specific requirements

“Embarassingly” parallel problem

Fast turn over of short jobs

Data confidentiality

Interactivity

Fine grain parallelism (MPI)

Workflow-based

(34)

Enabling Grids for E-sciencE

Building on Production Grids

Resources

Communication layer (GEANT, Internet...) (low-level) Middleware

Applications

Production

grid

infrastructure level Resources Resources Resources Resources

Applications Applications Applications

Application-level services

(high-level middleware) Application-level services(high-level middleware)

Applications

(35)

Medical image analysis on EGEE, J. Montagnat, BioGrid, June 2, 2008

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

WAN medical image exchange

Cross-enterprise exchange of radiology reports and

images

Radiology Enterprise A Radiology Enterprise B Physician Office

Wide-area

Health

Information

Network

PACS B

PACS A

Radiology-to-radiology

Radiology-to-Physicians

34

(36)

Enabling Grids for E-sciencE

WAN medical image exchange

Grid technologies

Radiology Enterprise A Radiology Enterprise B Physician Office

Wide-area

Health

Information

Network

PACS B

PACS A

– Cross-enterprise authentication – Access control – Secured communications

(37)

Medical image analysis on EGEE, J. Montagnat, BioGrid, June 2, 2008

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688 37

Medical Data Manager

Objectives

– Expose a standard grid interface (SRM) for medical image servers (DICOM)

– Use native DICOM storage format

– Fulfill medical applications security requirements

– Do not interfere with clinical practice

User Interfaces Worker Nodes

DICOM clients

DICOM Interface SRM

(38)

Enabling Grids for E-sciencE

Medical Data Registration

LCG File Catalog AMGA Metadata Hydra Key store GFAL API 1. Image is acquired

2. Image is stored in DICOM server 3. lcg client

3a. Image is registered (a GUID is associated) 3b. Image key

is produced and registered

4. image metadata

(39)

Medical image analysis on EGEE, J. Montagnat, BioGrid, June 2, 2008

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688 39

Medical Data Retrieval

DICOM server SRM-DICOM interface LCG File Catalog AMGA Metadata User Interface Worker Node Hydra Key store 2. lcg client

3. get SURL from GUID 4. request file

5. get file key

6. on-the-fly encryption and anonimyzation return encrypted file

7. get file key and decrypt

file locally File ACL control

Metadata ACL control

Key ACL control

Anonimization & encryption

1. get GUID from metadata

GFAL

(40)

Grid Workflow Efficient Enactment for Data Intensive Applications

Application workflows representation and efficient

enactment on grids

(41)

Medical image analysis on EGEE, J. Montagnat, BioGrid, June 2, 2008

Grid Workflow Efficient Enactment for Data Intensive Applications

GWENDIA ANR-06-MDCA-009 41

Workflow management

MOTEUR, I3S laboratory, CNRS

– http://egee1.unice.fr/MOTEUR

High level interface

– Hides grid complexity to the user

Service-based approach

– Legacy code service wrapper

Scufl language (myGrid / Taverna)

– Pure data flow approach

Grid submission interfaces

– EGEE (LCG2, gLite)

– Grid5000 (OAR, DIET)

Transparent parallelism exploitation

– Code and data parallelism

(42)

Grid Workflow Efficient Enactment for Data Intensive Applications

Efficient parallel execution

A workflow naturally provides application parallelization

MOTEUR transparently exploits 3 kinds of parallelism

Jobs grouping strategy in sequential branches

in order to reduce grid latency

S2 S3 S1

D

0

,

D

1 S2 S3 S1

D

0

,

D

1 S2 S3 S1

D

0

,

D

1 S2 S3 D1D0 D1 D0

Workflow parallelism Data parallelism Service parallelism

http://egee1.unice.fr/ MOTEUR

(43)

Medical image analysis on EGEE, J. Montagnat, BioGrid, June 2, 2008

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688 43

Conclusions

Using the European grid infrastructure for production

– Regular usage in medical imaging community

– Sustained production since 2004

– Closer to clinical end-users

 RSNA'07 demonstration (EGEE-MEDICUS)  Molecular docking for drug discovery

Higher level services

– More data protection (Role-based access control)

 Considering multiple actors involved in clinical procedures (patient, doctor, surgeon, radiologist, nurse, anesthetist..)

 Data semantics

 Metadata associated to data, Ontologies, Content-based data identification, indexing, mining...

– User interfaces

References

Related documents

The end result is a better professional workflow: more accuracy, more readily available information and significant reduction in daily documentation chores that affect your quality

We have developed and tested a Bayesian estimation and prediction methodology for the general SSOE state space model and demonstrated our approach using the local level with constant

For the data set of this study, the volume (equally) weighted market impact relative to the opening price of the trading day is 81.25 bp (38.98 bp), more than twice as much as the

benefit tremendously from this information, we recommend Congress provide federal seed money for 

 Panelist, Western States Petroleum Association Educational Forum, Fall 1996 (mock assessment appeal—new construction, co-generation facility secondary recovery program to

• The model involves the  integration of the event‐ based structure (demand  generation) with a time‐ based controller •

The purpose of the study is to analyse the trends in the interaction between state companies (hereinafter – state-owned companies, in Russia, companies with

To ensure that police restrictions on such peaceful assemblies are only imposed on grounds that are legitimate and necessary under OSCE commitments and international