Building Life Sciences & Genomics Data
Workflows with AWS Storage Gateway
Stephen Litster, HPC HCLS Lead GTM, AWS
Michael Leonard, File Gateway Product Manager, AWS
Agenda
• Healthcare & Life Sciences Industry trends & workloads
• How Healthcare & Life Sciences customers are using AWS
• AWS Storage Gateway & clinical workflows
Life Sciences, Genomics, and Drug Discovery
Key Workloads
• Genomics
• Computational chemistry/M&S
• Informatics and ML
• Imaging
Key Industry Trends/Challenges
• Exponential data growth
• Secure, global collaboration platforms
• Inform research efforts with real world data
• Scientific reproducibility
Clinical development
Manufacturing & supply chain
Drug Discovery Pipeline
Challenges we’re hearing from life sciences executives
High volume lab workflows – typical challenges
Resource
constraints
Skill set
Visualization
latency
Application
integration
Database/web
operation
Latency
Application
integration
Policies
Functionality
Data transfer from
Lab to NAS
HPC/HPDA Job
Scheduling
Enterprise
Services
Archive
(Object Based)
Local I/O and/or
network
Experiment
Management
Data
Analysis
Reporting
Inference
• X-Ray Crystallography, 1998
• Genomic Sequencing, 2009
• Digital Pathology, 2015
• Light Sheet Microscopy, 2017
• Cryo-Electron Microscopy, 2018
Planning for disruptive technologies
• Value
• Veracity
• Variability
• Velocity
• Volume
The hybrid cloud model:
You have
on-premises
data and applications…
…that want to use storage
and services in
the cloud
AWS Storage Gateway overview
File Gateway
Store and access objects in
Amazon S3 from file-based
applications with local caching
Volume Gateway
Block storage on-premises
backed by cloud storage with
local caching, Amazon EBS
snapshots, and clones, integrated
with AWS Backup
Tape Gateway
Drop-in replacement for physical
tape infrastructure backed by
cloud storage with local caching
File-based applications
Hybrid cloud storage use cases
For all stages of your cloud adoption journey
Migration
Modernization
Continuous
Reinvention
Backup and archive
data to AWS
on-premises applications to
Low latency access for
cloud data
•
Backup and archive data to AWS
•
on-premises file shares backed by cloud storage
•
Provide on-premises applications low-latency access to in-cloud data
Use cases
AWS Cloud HTTPSFile
Gateway
NFS/SMBAny S3 storage class
On premises
Amazon S3
lifecycle
Store and access objects in Amazon S3 from file-based applications with local caching
File Gateway
Features
• NFS/SMB protocol support, mount shares directly on
database and application servers
• Files stored durably in Amazon S3, lifecycle to any S3
storage class
• Local cache up to 64TB for accessing recent backups • Windows ACL support to control access to
backup files
• Reduce on-premises storage for backups • Easily integrates with SAP, SQL Server,
Oracle, HDFS and other applications
• Restore backups on-premises or in the cloud on EC2 or RDS AWS Cloud HTTPS
File
Gateway
NFS/SMB Database / Application ServerAny S3 storage class
On premises
Benefits
Move database and file backups into the cloud and free up on-premises storage capacity
Amazon S3
lifecycle
Access virtually unlimited, highly durable cloud storage using common file protocols
Features
Benefits
• Supports NFS and SMB protocols – no application
changes required
• Files stored durably in Amazon S3, lifecycle to any S3
storage class
• SMB shares up to 64TB integrate with Active Directory • AWS CloudWatch events for automated workflows
• Reduce costs by moving storage to Amazon S3 while still accessing from on-premises
• Virtually unlimited cloud storage – no more running out of capacity
• Eliminate expensive hardware refresh cycles • Files stored as native S3 objects for further
processing in AWS AWS Cloud HTTPS
File
Gateway
NFS/SMBSequencers Any S3 storage class
On premises
Amazon S3
lifecycle
NAS storage
AWS Cloud
Sequencers
NFS/SMB Cache refresh HTTPS Cache refresh HTTPS NFS/SMB On-premisesFile Gateway
On-premisesFile Gateway
AWSDataSync SnowballAWS
Access files quickly from distributed locations and scale capacity as needed
Features
Benefits
• Generate data in-cloud or ingest from on-premises
using AWS DataSync or AWS Snowball
• Up to 64 TB local cache per gateway
• Fully-managed gateway cache provides low-latency
• Access cloud storage from any on-premises location
• Process data in the cloud and refresh gateway cache for up-to-date results
Provide on-premises apps low-latency access to in-cloud data
In-cloud processing
“AWS Storage Gateway plays a vital role
in many important applications at Bristol
Myers Squibb, especially where data
transfers from local labs to cloud…
Storage Gateway allows us to continue to
use existing applications in new cloud
platforms, with zero changes
.”
Oleg Moiseyenko
Senior Cloud Architect
File shares backed by cloud storage
•
Made a strategic decision to move more computing
capabilities to the cloud
•
Needed to reduce overall IT costs
•
Accelerate move to cloud while closing a primary data
center
Problem
Outcome
•
Use legacy apps on-premises without changes to apps
•
On-prem apps get low-latency access to cloud storage
•
Lower cost and simplicity
•
Automation of data management
Solution
•
Multiple Storage Gateway appliances (virtual &
physical) copy new data to Amazon S3
•
Seamlessly access 100s of TBs of data in Amazon S3
1. Instruments write raw data into File Gateway file share
2. File Gateway transfers files to S3 buckets
3. Data Management system scans S3 buckets regularly
4. Applications request data via Data Management system meta catalog
Bristol Myers Squibb data flow
Customer Case Study:
Gritstone Oncology
Gritstone Oncology AWS Storage Gateway Case Study
Enabling automation and scalability for GxP laboratory instrumentation
Solution
Gritstone Oncology replaced
our on-premises storage with
AWS Storage Gateway,
providing a scalable storage
solution with Amazon S3
Challenge
Gritstone Oncology is a
pharmaceutical company with a
GxP compliant laboratory that
needs a secure and scalable
storage solution to
accommodate its ever growing
dataset
Benefits
With AWS Storage Gateway,
Gritstone Oncology was able
to reduce operational
overhead and leverage
AWS Cloud
AWS Cloud Genomic Mass Spectrometer Local Cache File Gateway Technician
Storage and Archival File-Based Data Access to S3
Research Scientists
Bioinformatics Scientists
Data Sources
Genomics Data Transfer, Data Access Patterns, Storage, and Archival
Amazon S3 Glacier Lifecycle policy Amazon S3 IA File Gateway Local Cache Research
tools 1 3 4 Data Scientists
1. Genomics Data Sources
2. Process Automation
3. Storage and Archival
4. File-Based Data Access to S3
12
3
4
Process Automation
Object Put S3 Event Lambda
2