SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
Utilizing the SDSC Cloud
Storage Service
PASIG Conference
January 13, 2012
Richard L. Moore
[email protected]
San Diego Supercomputer Center
University of California San Diego
Traditional supercomputer center
storage systems
Functional Systems
• Tape-based archival system
• Built for capacity
We’ve extended the archive beyond HPC simulation data to experimental data and other digital assets - and as a node in geographically-distributed digital
preservation systems (e.g. Chronopolis)
• High-bandwidth parallel file system
• Built for speed
• Transient data, single-copy reliability
• Home directory system (e.g. NFS)
• Built for robustness and reliability • Regular backups
Limitations
• Archival data is difficult to access
- high latency, lower bandwidth,
user interfaces
• Difficult to share archival data by
multiple users
• All too often archived data,
particularly HPC simulations, is
“write-once-read-never”
• Not sustainable and no incentives for users to retain only high-value data
SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
Adapting to emerging requirements and
changing technologies
• Exponential data growth - and analysis of that data - are
increasingly important to the research enterprise
• Requires ready access to data, w/ low latency & high bandwidth
• Collaborative “team science” demands easy data sharing
• Consumer product development drives prices
• Disk capacities increasing quickly
• Flash memory becoming more affordable
• ‘Gordon’ compute system just now being deployed with 0.25 PB of flash - to fill
the “latency gap” between DRAM and spinning disk
• For HPC systems with historical “byte/flop” ratios, storage
would be an increasingly significant fraction of total system cost
• Can’t afford open-ended archival storage … must develop methods to
place value on data, especially for long-term high-reliability storage
SDSC is deploying a new
repertoire of storage systems
SDSC Cloud
• Storage of Digital Data for Ubiquitous Access and High-Durability • Access: Multi-platform web interface, S3 interfaces, backup SW
Data Oasis (PFS)
• High-Performance Transient Parallel File System for HPC • Access: Lustre on HPC Systems (Gordon, Trestles, Triton)
Project Storage
• Purpose: Typical Project / User File Server Storage Needs • Access: NFS/CIFS, iSCI
SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
A Paradigm Shift for Long-Term Storage:
Access, Sharing and Collaboration
SDSC Cloud
•
http://cloud.sdsc.edu
• Launched September 2011
• Largest, highest-performance
known academic cloud
• 5.5 Petabytes (raw), 8 GB/sec
• System can upload 500GB in ~1 min
• Automatic dual-copy and verification
• Capacity and performance scale
linearly to 100’s of petabytes
• Open source platform based on
NASA and RackSpace software
Key Features of SDSC Cloud
• “Always-there” disk-based availability of data
• Tape latency and multi-user issues addressed
• High reliability
• Disk RAID; automatic dual-copy; continuous background checksum verification/ restoration; offsite replication soon
• Simple data owner user interfaces to data, its management, its access and
setting permissions for sharing data
• Easy access to shared data for any users with permission under range of
mechanisms (http, APIs, portals, gateways …)
• Encryption readily incorporated – and addresses issues of storing
HIPAA/proprietary data
• Transaction history is logged – track usage, assess utility, support provenance
• Scalable system in both capacity and bandwidth
SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
Applications of SDSC Cloud
Shared/published/curated data collections
HPC simulation data storage and sharing
Web/portal applications and site hosting
Application integration using supported APIs
Serving images/videos
Why Openstack Swift Cloud Software?
Industry StandardMore than 100 leading companies from over a
dozen countries are participating in OpenStack, including Cisco, Citrix, Dell, Intel
and Microsoft.
Proven Software
Running the OpenStack cloud operating system is
same software that powers many large public and private
clouds, including RackSpace Cloud Storage. Highly Compatible Compatibility w/ public OpenStack clouds
means it’s easy to migrate data and apps
to public clouds when desired—based on
security policies, economics, and other
key business criteria.
Control & Flexibility
Open source platform means not locked to a proprietary vendor, and
modular design can integrate with legacy or
3rd-party technologies. OpenStack project provided under Apache
2.0 license. Evaluated Software OpenStack Swift • Open Source • Community Support • Highly Configurable Eucalyptus • Highly Flexible • Compute Focused Caringo Castor • Commercial Software
SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO Commercial Products Commvault Amanda Backup Tools Crashplan Traditional Clients GUI Applications Command Line SDSC Web I/F
Web Services API
Amazon S3 Rackspace CloudFiles / Openstack API
SDSC Cloud Interfaces
Swift Object Storage Cluster Load Balanced Proxy Servers User- Developed Web Portals/ Gateways Data OwnersExternal
Users
SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
Rates and Funding Mechanisms
• See
https://cloud.sdsc.edu/hp/pricing.php
for current pricing; HW costs subject
to market volatility; contact
[email protected]
if interested in service
• “On Demand” Cloud Storage
• Pay monthly per GB used (water-mark)
• U California users: $X/TB-Year dual-copy + applicable indirect costs
• + 50% premium for additional off-site copy (when available)
• Users external to UC: 2*$X/TB-year dual-copy, 3*X for dual-copy + 1 off-site copy
• “Condo” Cloud Storage
• Recipient buys HW that is integrated into the storage service and pays annual operating costs for maintenance and system administration
• Purchase condo HW at $Y market price (pre-configured head node and disk array - currently 2TB drives with 8.5 TB usable dual-copy; space will increase over time)
• Annual operating cost: $Z/year/condo + applicable indirect costs & UC-external factors • User has right to use condo for 5 years; TCO/condo = $Y + 5*Z over 5 years