integrating Data for Analysis, Anonymization, and SHaring
iDASH Infrastructure to Host Sensitive Data:
HIPAA Cloud Storage and Compute
Outline
Infrastructure Overview
Typical Scientific Workflow
Cloud Challenges
iDASH Cloud & SHADE
Repeatable Results
Enterprise
iDASH Environments
•
Firewalls
•
Separate VPN pool
•
Physical separation
•
Redundancy
•
Two Factor Authentication
•
Encryption at rest/in transit
•
Centralized logging
•
Intrusion detection
•
Proxies and filters
•
Hardened (secured) system
configurations
•
Remote Backups/DR
PHI
Non-PHI
Cloud
Virtualization
Hardware
Website
PHI Repo.
Miconcur
iCONS
Non-PHI
Repo.
NLP
Privacy
…
Proj.1
Proj.2
Proj.3
Proj.4
HIPAA
Workflow
Typical Scientific Analysis
SaaS
PaaS
IaaS
Algorithm developers, Bioinformatics researchers, Sysadmin
Examples: Amazon EC2, Microsoft Azure
Bioinformatics researchers, Front-end developers
Examples: Heroku, Google App Engine
Biomedical researchers, Clinicians, Other end-users
Examples: Google Docs, Office 365
Short reads
reference
Index
reference
Align to
Call variants
Annotate
variants
Pick high
impact
Deleterious
SNPs
To Cloud or Not to Cloud?
•
Typical bioinformatics applications are NOT
cloud aware
•
Almost nothing at PaaS – this is not web
development
•
Most published cloud papers use public
Amazon VMs
•
Privacy & Security are afterthought
•
Data still goes around with unencrypted FTP
•
End-to-end analyses need serious work
iDASH Cloud & SHADE Overview
S
afe
H
IPAA-compliant
A
nnotated
D
ata deposit box
E
nvironment
C
ompute & storage
e
L
astic, HIPAA-compliant
O
n-demand
U
ser-friendly
D
ata analysis environment
HIPAA and
non-public data
public data,
tools, recipes
Po wer ed b y MI DA S