• No results found

DTI Image Processing Pipeline and Cloud Computing Environment

N/A
N/A
Protected

Academic year: 2021

Share "DTI Image Processing Pipeline and Cloud Computing Environment"

Copied!
22
0
0

Loading.... (view fulltext now)

Full text

(1)

DTI

 

Image

 

Processing

 

Pipeline

 

and

 

Cloud

 

Computing

 

Environment

Kyle Chard

Computation Institute

(2)

Introduction

DTI

 

image

 

analysis

 

requires

 

the

 

use

 

of

 

many

 

tools

QC Registration ROI Marking Fiber Tracking

– QC, Registration, ROI Marking, Fiber Tracking,  ..

Constructing

 

analyses

 

is

 

challenging

Data & tool discovery selection orchestration

– Data & tool discovery, selection, orchestration, ..

We

 

have

 

made

 

huge

 

strides

 

in

 

terms

 

of

 

data

D f i i l d CDE

– Data formats, repositories, protocols, metadata, CDEs

We

 

now

 

need

 

infrastructure

 

to

 

reduce

 

the

 

barriers

 

h

i b

d

id

l d

l

that

 

exist

 

between

 

data

 

providers,

 

tool

 

developers,

 

researchers,

 

and

 

clinicians

– Big Science. Small Labs

o We have exceptional infrastructure for the 1%, what about the 

99%? 99%?

(3)

Common

 

Approach

 

to

 

Analysis

Modify (Re)Run  Script Install Camino

(4)

How

 

can

 

we

 

improve?

We

 

need

 

a

 

platform

 

where

 

users

 

can

 

easily

 

constr ct and e ec te anal ses

construct

 

and

 

execute

 

analyses

– Using best of bread tools and pipelines 

– Abstracting low level infrastructure and platform 

heterogeneity

– Supporting automation and parallelism – Supporting experimentation

– Supporting experimentation

=> Make existing tools and common analyses mundane 

building blocks building blocks

(5)

DTI

 

Metric

 

Reproducibility

 

Pipeline

Ultimate

 

Goal:

 

Investigate

 

the

 

feasibility

 

of

 

using

 

DTI in clinical practice

DTI

 

in

 

clinical

 

practice

Automatic

 

calculation

 

of

 

DTI

 

metrics

 

(FA,

 

MD)

 

from

 

48

 

automatically

 

generated

 

ROIs

– Using existing tools to create a reusable analysisUsing existing tools to create a reusable analysis 

workflow that can be easily repeated 

Investigate the ability to scale analyses over large

– Investigate the ability to scale analyses over large 

datasets

E l

h

d ibili

f 20

Explore

 

the

 

reproducibility

 

over

 

a

 

group

 

of

 

20

 

(6)

DTI

 

Processing

 

Pipeline

 

(1)

1 ECC DTI (FSL)

DTI T1

BVEC & BVAL

1. ECC DTI (FSL) 2. BET DTI (FSL)

3. BET T1 (FSL) ( )

4. Linear Registration  DTI / T1 

(FSL FLIRT) T1/T7. Linear Registration l t (FSL FLIRT) (FSL FLIRT)

5. DTI Fitting (FSL/Camino) 7. Non‐linear Registration 

T1/Template (FSL FNIRT) Template T1/Template (FSL FLIRT) T1/Template (FSL FNIRT) 9. Transform FA/MD to MNI space (FSL Applywarp)p ( pp y p) 8. Calculate ROI Mean FA/MD (AFNI 3dmaskave) Atlas Mask ( ) Mask

(7)

DTI

 

Processing

 

Pipeline

 

(2)

1 ECC DTI (FSL)

DTI BVEC & BVAL

1. ECC DTI (FSL)

2. BET DTI (FSL)

3. DTI Fitting (FSL/Camino)

FA image MD image

Linear Registration (FSL FLIRT)

FA (FSL FLIRT)

Non‐ Linear Registration (FSL FNIRT)

FA 

Template Apply Warp 

coefficient

6. Calculate ROI Mean

Atl M k

FA in MNI space MD in MNI space

(8)

Approaches

 

for

 

Implementing

 

Pipelines

Gl b G i

XNAT Pipeline Engine

S i t Globus Genomics

• SaaS for genomics

• Graphical interface for 

creation and execution

XNAT Pipeline Engine

• Defined by code (XML  + scripts) • Overhead to include Scripts • Bash scripts written to  execute tools on a 

single computer creation and execution

• Supports ondemand  provisioning based on  pricing policies Overhead to include  tools, develop  interfaces and create  pipelines single computer • Time consuming, error  prone, hard to transfer  knowledge  p g p • Tools installed  dynamically when  required • Difficult to change  tools/pipelines • Some support for  ll li ti g • Little support for  parallelization parallelization

(9)

DTI

 

Pipeline

 

Platform

Globus Endpoints Globus Transfer Galaxy & Manager Dynamic  Worker Pool Transfer Galaxy Shared  File System Dynamic Condor Dynamic Scheduler …

(10)

DTI

 

Pipelines

 

in

 

the

 

Cloud

Gluster NFS Camino GridFTP Condor Schedule

(11)
(12)

Cloud

 

Computing

• Leverages economies of scale to 

facilitateac tate ut ty utility modelsode s

• Pay only for resources used • 1 * 100 hours  == 100 * 1 hour

• On‐demand and elastic access to 

“unlimited” capacity

Software as a Service

p y

• Addresses fluctuating requirements

• Web access to data throughg   Platform as a Service

defined interfaces

• Platform as a Service

– No management of hardware or 

(13)

Challenges

 

Moving

 

to

 

the

 

Cloud

Resource Selection:  Comparing price, capabilities, performance, 

instance types (EBS, Instance store), tool performance

instance types ( S, Instance store), tool performance

Tool Selection and Management:  Finding tools, installing, 

configuring and using them in different environments

configuring and using them in different environments

Analysis/Resource Management:  Developing structured and 

repeatable analyses with different tools

repeatable analyses with different tools. 

Data transfer: Moving large amounts of data in/out of Cloud 

environment reliably and efficiently

environment reliably and efficiently

Scale and Parallelism: Scaling analyses by efficiently parallelizing 

across elastic infrastructure

across elastic infrastructure

(14)

Amazon

 

EC2

 

Pricing

System Specifications Pricing

CPU Units CPU Cores Memory On‐Demand Spot (Low) Spot (High)

m1 large 4 2 7 5 0 24 0 026 5 5 m1.large 4 2 7.5 0.24 0.026 5.5 m1.xlarge 8 4 15 0.48 0.052 0.64 m3.xlarge 13 4 15 0.5 0.058 0.058 m3 2xlarge 26 8 30 1 0 0115 0 115 m3.2xlarge 26 8 30 1 0.0115 0.115 m2.xlarge 6.5 2 17.1 0.41 0.035 0.36 m2.2xlarge 13 4 34.2 0.82 0.07 3 m2 4xlarge 26 8 68 4 1 64 0 14 0 14 m2.4xlarge 26 8 68.4 1.64 0.14 0.14

(15)
(16)

Instance

 

Performance

 

and

 

Pricing

EBS Instance Store On‐Demand

Spot (Low) Spot (High)

1.2 1.5 100 120 ($) ) 0.9 60 80 Subjec   ( M inut es 0.6 40 60 st   per   S Time   (M 0 0.3 0 20 Co s T 0 0

(17)

Pricing

 ‐

Multiple

 

Analyses

 

Per

 

Node

On‐Demand Spot (Low) Spot (High)

0.4 0.45 0.5 t   ($) 0.25 0.3 0.35 Subjec t 0.1 0.15 0.2 o st   per   S 0 0.05 C o

(18)

Elastic

 

Startup

 

Cost

1:00:00 1:15:00 ROI Calculation 0:45:00 1:00:00 m e Tensor Fitting

ECC & Registration 0:30:00 Ti m g Contextualize 0:15:00 Spot Price Queue  0:00:00

(19)

Data

 

Transfer

 

with

 

Globus

 

Online

• Reliable file transfer, sharing, syncing. – Easy “fire and forget” file transfers – Automatic fault recovery – High performance A lti l it d i – Across multiple security domains • In place sharing of files with users and  groups g p • No IT required. – Software as a Service (SaaS) N li f i ll i o No client software installation o New features automatically  available

(20)
(21)

Summary

Structured

 

pipelines

 

simplify

 

creation,

 

execution

 

and sharing of complex analyses

and

 

sharing

 

of

 

complex

 

analyses

– Hosted as a service can further reduce barriers 

B

t

i

i

li

ti

th Cl

d

By

 

outsourcing

 

pipeline

 

execution

 

on

 

the

 

Cloud

 

we

 

can

 

reduce

 

overhead

 

and

 

costs

– Previously we took weeks to process ~100 scans

o Using this approach < 5 cents a subject ($5 for 1 hour)

What's

 

next?

– Can we deliver this as a service? 

o Billing, security, paradigm shift, interactive tools …

(22)

Acknowledgements

Mike

 

Vannier,

 

Xia

 

Jiang,

 

Farid

 

Dahi

Globus

 

Online

– Ian Foster Steve Tuecke Rachana AnanthakrishnanIan Foster, Steve Tuecke, Rachana Ananthakrishnan

Globus

 

Genomics

References

Related documents

Impact of NAS on public sector institutional deliveries: There was an increase in the number of institutional deliveries in public sector institutions in districts Ambala and

Hispanics have been portrayed in American film as early as the 1890s in silent films up to present day films of all genres, primarily taking on stereotypical roles and/or in a

We detail how practical issues related to sampling and model misspecification may be addressed using semi-parametric techniques for time series, and we demonstrate the effectiveness

It is in this vein legal protection of shareholders and creditors could be an effective subsitutive mechanism of weak corporate governance rules, consequently improving the role

Improves media use. VTV stacking onto MVCs improves media use. Data sets are compressed when written to a VTV, and the VTVs are then staged, stacked, and migrated to an MVC. VSM

Different measurements about the hydrological impact of fast growing plant species particularly on eucalyptus analyzed by different researchers considering the physiology of the

The findings indicate that accounting ratios can serve as leading indicators of stock returns in the next year; classification models (logit regression models and logit

Arizona-Mexico Commission, Arizona Nanotechnology Cluster, Arizona Optics Industry Association , Arizona State Legislature The Center for Workforce Learning, Challenger Space