DTI
Image
Processing
Pipeline
and
Cloud
Computing
Environment
Kyle Chard
Computation Institute
Introduction
•
DTI
image
analysis
requires
the
use
of
many
tools
QC Registration ROI Marking Fiber Tracking
– QC, Registration, ROI Marking, Fiber Tracking, ..
•
Constructing
analyses
is
challenging
Data & tool discovery selection orchestration
– Data & tool discovery, selection, orchestration, ..
•
We
have
made
huge
strides
in
terms
of
data
D f i i l d CDE
– Data formats, repositories, protocols, metadata, CDEs
•
We
now
need
infrastructure
to
reduce
the
barriers
h
i b
d
id
l d
l
that
exist
between
data
providers,
tool
developers,
researchers,
and
clinicians
– Big Science. Small Labs
o We have exceptional infrastructure for the 1%, what about the
99%? 99%?
Common
Approach
to
Analysis
Modify (Re)Run Script Install CaminoHow
can
we
improve?
•
We
need
a
platform
where
users
can
easily
constr ct and e ec te anal ses
construct
and
execute
analyses
– Using best of bread tools and pipelines
– Abstracting low level infrastructure and platform
heterogeneity
– Supporting automation and parallelism – Supporting experimentation
– Supporting experimentation
=> Make existing tools and common analyses mundane
building blocks building blocks
DTI
Metric
Reproducibility
Pipeline
•
Ultimate
Goal:
Investigate
the
feasibility
of
using
DTI in clinical practice
DTI
in
clinical
practice
•
Automatic
calculation
of
DTI
metrics
(FA,
MD)
from
48
automatically
generated
ROIs
– Using existing tools to create a reusable analysisUsing existing tools to create a reusable analysis
workflow that can be easily repeated
Investigate the ability to scale analyses over large
– Investigate the ability to scale analyses over large
datasets
E l
h
d ibili
f 20
•
Explore
the
reproducibility
over
a
group
of
20
DTI
Processing
Pipeline
(1)
1 ECC DTI (FSL)
DTI T1
BVEC & BVAL
1. ECC DTI (FSL) 2. BET DTI (FSL)
3. BET T1 (FSL) ( )
4. Linear Registration DTI / T1
(FSL FLIRT) T1/T7. Linear Registration l t (FSL FLIRT) (FSL FLIRT)
5. DTI Fitting (FSL/Camino) 7. Non‐linear Registration
T1/Template (FSL FNIRT) Template T1/Template (FSL FLIRT) T1/Template (FSL FNIRT) 9. Transform FA/MD to MNI space (FSL Applywarp)p ( pp y p) 8. Calculate ROI Mean FA/MD (AFNI 3dmaskave) Atlas Mask ( ) Mask
DTI
Processing
Pipeline
(2)
1 ECC DTI (FSL)
DTI BVEC & BVAL
1. ECC DTI (FSL)
2. BET DTI (FSL)
3. DTI Fitting (FSL/Camino)
FA image MD image
Linear Registration (FSL FLIRT)
FA (FSL FLIRT)
Non‐ Linear Registration (FSL FNIRT)
FA
Template Apply Warp
coefficient
6. Calculate ROI Mean
Atl M k
FA in MNI space MD in MNI space
Approaches
for
Implementing
Pipelines
Gl b G i
XNAT Pipeline Engine
S i t Globus Genomics
• SaaS for genomics
• Graphical interface for
creation and execution
XNAT Pipeline Engine
• Defined by code (XML + scripts) • Overhead to include Scripts • Bash scripts written to execute tools on a
single computer creation and execution
• Supports ondemand provisioning based on pricing policies Overhead to include tools, develop interfaces and create pipelines single computer • Time consuming, error prone, hard to transfer knowledge p g p • Tools installed dynamically when required • Difficult to change tools/pipelines • Some support for ll li ti g • Little support for parallelization parallelization
DTI
Pipeline
Platform
Globus Endpoints Globus Transfer Galaxy & Manager Dynamic Worker Pool Transfer Galaxy Shared File System Dynamic Condor Dynamic Scheduler …DTI
Pipelines
in
the
Cloud
Gluster NFS Camino GridFTP Condor ScheduleCloud
Computing
• Leverages economies of scale to
facilitateac tate ut ty utility modelsode s
• Pay only for resources used • 1 * 100 hours == 100 * 1 hour
• On‐demand and elastic access to
“unlimited” capacity
Software as a Service
p y
• Addresses fluctuating requirements
• Web access to data throughg Platform as a Service
defined interfaces
• Platform as a Service
– No management of hardware or
Challenges
Moving
to
the
Cloud
• Resource Selection: Comparing price, capabilities, performance,
instance types (EBS, Instance store), tool performance
instance types ( S, Instance store), tool performance
• Tool Selection and Management: Finding tools, installing,
configuring and using them in different environments
configuring and using them in different environments
• Analysis/Resource Management: Developing structured and
repeatable analyses with different tools
repeatable analyses with different tools.
• Data transfer: Moving large amounts of data in/out of Cloud
environment reliably and efficiently
environment reliably and efficiently
• Scale and Parallelism: Scaling analyses by efficiently parallelizing
across elastic infrastructure
across elastic infrastructure
Amazon
EC2
Pricing
System Specifications Pricing
CPU Units CPU Cores Memory On‐Demand Spot (Low) Spot (High)
m1 large 4 2 7 5 0 24 0 026 5 5 m1.large 4 2 7.5 0.24 0.026 5.5 m1.xlarge 8 4 15 0.48 0.052 0.64 m3.xlarge 13 4 15 0.5 0.058 0.058 m3 2xlarge 26 8 30 1 0 0115 0 115 m3.2xlarge 26 8 30 1 0.0115 0.115 m2.xlarge 6.5 2 17.1 0.41 0.035 0.36 m2.2xlarge 13 4 34.2 0.82 0.07 3 m2 4xlarge 26 8 68 4 1 64 0 14 0 14 m2.4xlarge 26 8 68.4 1.64 0.14 0.14
Instance
Performance
and
Pricing
EBS Instance Store On‐Demand
Spot (Low) Spot (High)
1.2 1.5 100 120 ($) ) 0.9 60 80 Subjec ( M inut es 0.6 40 60 st per S Time (M 0 0.3 0 20 Co s T 0 0
Pricing
‐
Multiple
Analyses
Per
Node
On‐Demand Spot (Low) Spot (High)
0.4 0.45 0.5 t ($) 0.25 0.3 0.35 Subjec t 0.1 0.15 0.2 o st per S 0 0.05 C o
Elastic
Startup
Cost
1:00:00 1:15:00 ROI Calculation 0:45:00 1:00:00 m e Tensor FittingECC & Registration 0:30:00 Ti m g Contextualize 0:15:00 Spot Price Queue 0:00:00
Data
Transfer
with
Globus
Online
• Reliable file transfer, sharing, syncing. – Easy “fire and forget” file transfers – Automatic fault recovery – High performance A lti l it d i – Across multiple security domains • In place sharing of files with users and groups g p • No IT required. – Software as a Service (SaaS) N li f i ll i o No client software installation o New features automatically availableSummary
•
Structured
pipelines
simplify
creation,
execution
and sharing of complex analyses
and
sharing
of
complex
analyses
– Hosted as a service can further reduce barriers
B
t
i
i
li
ti
th Cl
d
•
By
outsourcing
pipeline
execution
on
the
Cloud
we
can
reduce
overhead
and
costs
– Previously we took weeks to process ~100 scans
o Using this approach < 5 cents a subject ($5 for 1 hour)
•
What's
next?
– Can we deliver this as a service?
o Billing, security, paradigm shift, interactive tools …
Acknowledgements
•
Mike
Vannier,
Xia
Jiang,
Farid
Dahi
•
Globus
Online
– Ian Foster Steve Tuecke Rachana AnanthakrishnanIan Foster, Steve Tuecke, Rachana Ananthakrishnan
•