SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
U"lizing the SDSC Cloud
Storage Service
PASIG Conference
January 13, 2012
Richard L. Moore
[email protected]San Diego Supercomputer Center University of California San Diego
TradiOonal supercomputer center
storage systems
FuncOonal Systems
• Tape-‐based archival system
• Built for capacity
We’ve extended the archive beyond HPC simula9on data to experimental data and other digital assets -‐ and as a node in geographically-‐distributed digital
preserva9on systems (e.g. Chronopolis)
• High-‐bandwidth parallel file system
• Built for speed
• Transient data, single-‐copy reliability
• Home directory system (e.g. NFS)
• Built for robustness and reliability
• Regular backups
LimitaOons
• Archival data is difficult to access
-‐ high latency, lower bandwidth, user interfaces
• Difficult to share archival data by
mulOple users
• All too oXen archived data,
parOcularly HPC simulaOons, is “write-‐once-‐read-‐never”
• Not sustainable and no incen;ves
for users to retain only high-‐value data
SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
AdapOng to emerging requirements and
changing technologies
•
ExponenOal data growth -‐ and analysis of that data -‐ are
increasingly important to the research enterprise
•
Requires ready access to data, w/ low latency & high bandwidth
•
CollaboraOve “team science” demands easy data sharing
•
Consumer product development drives prices
• Disk capaci;es increasing quickly
• Flash memory becoming more affordable
• ‘Gordon’ compute system just now being deployed with 0.25 PB of flash -‐ to fill
the “latency gap” between DRAM and spinning disk
•
For HPC systems with historical “byte/flop” raOos, storage
would be an increasingly significant fracOon of total system cost
• Can’t afford open-‐ended archival storage … must develop methods to
SDSC is deploying a new
repertoire of storage systems
SDSC Cloud
!
• Storage of Digital Data for Ubiquitous Access and High-Durability"
• Access: Multi-platform web interface, S3 interfaces, backup SW"
Data Oasis (PFS)
!• High-Performance Transient Parallel File System for HPC "
• Access: Lustre on HPC Systems (Gordon, Trestles, Triton)"
Project Storage
!
• Purpose: Typical Project / User File Server Storage Needs"
SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
A
Paradigm ShiX for Long-‐Term Storage:
Access, Sharing and CollaboraOon
SDSC Cloud
• hcp://cloud.sdsc.edu
• Launched September 2011
• Largest, highest-‐performance
known academic cloud
• 5.5 Petabytes (raw), 8 GB/sec
• System can upload 500GB in ~1 min
• AutomaOc dual-‐copy and verificaOon
• Capacity and performance scale
linearly to 100’s of petabytes
• Open source plagorm based on
NASA and RackSpace soXware
Key Features of SDSC Cloud
• “Always-‐there” disk-‐based availability of data
• Tape latency and mul;-‐user issues addressed
• High reliability
• Disk RAID; automa;c dual-‐copy; con;nuous background checksum verifica;on/
restora;on; offsite replica;on soon
• Simple data owner user interfaces to data, its management, its access and
seing permissions for sharing data
• Easy access to shared data for any users with permission under range of
mechanisms (hcp, APIs, portals, gateways …)
• EncrypOon readily incorporated – and addresses issues of storing HIPAA/
proprietary data
• TransacOon history is logged – track usage, assess uOlity, support provenance
• Scalable system in both capacity and bandwidth
SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
ApplicaOons of SDSC Cloud
Shared/published/curated data collections
"
HPC simulation data storage and sharing
"
Web/portal applications and site hosting
"
Application integration using supported APIs
"
Serving images/videos
"
Why Openstack SwiX Cloud SoXware?
Industry Standard !
!
More than 100 leading companies from over a dozen countries are
participating in OpenStack, including Cisco, Citrix, Dell, Intel
and Microsoft."
Proven Software !
Running the OpenStack cloud operating system is
same software that powers many large public and private
clouds, including RackSpace Cloud Storage." Highly Compatible! Compatibility w/ public OpenStack clouds means itʼs easy to
migrate data and apps to public clouds when
desired—based on security policies, economics, and other
key business criteria."
Control & Flexibility"
Open source platform means not locked to a proprietary vendor, and
modular design can integrate with legacy or
3rd-party technologies. "
OpenStack project provided under Apache
2.0 license." Evaluated Software! ! OpenStack Swift! • Open Source" • Community Support" • Highly Configurable" ! Eucalyptus! • Highly Flexible" • Compute Focused" ! Caringo Castor! • Commercial Software"
SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO Commercial Products! Commvault" Amanda Backup Tools" Crashplan" Traditional Clients! GUI Applications" Command Line" SDSC Web I/F"
Web Services API!
Amazon S3" Rackspace CloudFiles /" Openstack API"
SDSC Cloud Interfaces
Swift Object Storage Cluster" Load Balanced Proxy Servers" ! User- Developed Web Portals/ Gateways! ! Data Owners! External Users!SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
Rates and Funding Mechanisms
• See h7ps://cloud.sdsc.edu/hp/pricing.php for current pricing; HW costs subject
to market vola"lity; contact [email protected] if interested in service
• “On Demand” Cloud Storage
• Pay monthly per GB used (water-‐mark)
• U California users: $X/TB-‐Year dual-‐copy + applicable indirect costs
• + 50% premium for addi;onal off-‐site copy (when available)
• Users external to UC: 2*$X/TB-‐year dual-‐copy, 3*X for dual-‐copy + 1 off-‐site copy
• “Condo” Cloud Storage
• Recipient buys HW that is integrated into the storage service and pays annual
opera;ng costs for maintenance and system administra;on
• Purchase condo HW at $Y market price (pre-‐configured head node and disk array -‐
currently 2TB drives with 8.5 TB usable dual-‐copy; space will increase over ;me)
• Annual opera;ng cost: $Z/year/condo + applicable indirect costs & UC-‐external factors
• User has right to use condo for 5 years; TCO/condo = $Y + 5*Z over 5 years
*Encryp"on and HIPAA Compliant Storage is available with both op"ons
QuesOons?
You can touch the cloud now:
Download this presenta"on, publicly shared from my
personal account via
hcp://Onyurl.com/SDSC-‐PASIG
= h7ps://cloud.sdsc.edu/v1/AUTH_rlm/PASIG/PASIG-‐Moore.pdf