Reference
Architectures
for Digital
Libraries
Keith Rajecki
Education Solutions Architect
Sun Microsystems, Inc.
Agenda
•
Challenges
•
Digital Library
•
Solution Architectures
>
Open Storage/Open Archive
>
Cloud Computing
•
Solution Benefits
•
Summary
Value of Reference Architectures
•
Minimizes cost, complexity and deployment
time
>
Lowers administrative costs via automated data
management and migration across storage tiers
>
Cost-effectively matches the value of data with the
appropriately priced media
>
Economical and power-friendly cost of operation
•
Flexibility to build performance, economy or
mixed archive repository
Sun Reference Architectures
Develop Collaborative, Replicable Reference Architectures
•
Fedora
•
Fedora/Drupal
•
DSpace
•
EPrints
•
Duraspace
•
Ex Libris Rosetta
•
VTLS VITAL
6
Systems Consulting, Storage, and Data Management
Networking Identity Web Services
Content Repositories: VITAL
Rosetta,Fedora, EPrints, DSpace
Digital Objects Long-term Preservation
Data Curation & Collab. Digital Asset Mgt.
Digital Library and Repository
Digital Library
Scalability, Security, Sustainability, Sharability
Integrated Library
System (ILS)
Acquisition & ILL
Students
LibrariansGovernment Staff Faculty
Researchers eJournals, Serials Cataloging Search and Discovery Business Museums
Sun PASIG
Community
Sun Open Storage Product Positioning
•
Digital repositories for data and metadata storage
•
Fedora, EPrints, and D-Space communities
•
Ex Libris Rosetta and VTLS VITAL applications
•
Large scale preservation projects needing
policies
•
Digital asset management
•
eResearch databases
•
Federated repositories
SAM/QFS
StorageTek7210
Identity Management
and SOA
StorageTek 7410
Virtualized Repository Appliance
Single Virtualized Server
Virtual Machine 1:
●Repository
●Entity Preservation
●Index Creation
●Metadata Management
●Security
●Search Engine
Virtualized Server
Physical Storage
ArchiveApp. Oracle, MySQL
Repository Solaris + ZFS
Digital Objects
Virtual Machine 2:
●Archive DB
●Policies
●Metadata
Virtual Machine 3:
●Open Storage Mgmt
●Storage Preservation
●Physical Storage
Open Repository Tiered Components
Application Server
●Entity Preservation
●Metadata Management
●Relationship connections
●Security
●Search Engine
●Policy Driven
DB Server
●
Digital Asset Policies
●Metadata
Digital Objects
Open Storage
●
Storage Preservation Abstraction
●OpenSolaris, ZFS, SAM
●
Physical Storage Components
Tape Libraries
●
Full Sun portfolio of
supported tape
libraries
Sun’s Infinite Archive System Approach
Factory integrated, solution tested, simple, scalable, and economical
Scalable, Eco-Efficient Tape Tier Options
Tier 1 + Tier 2 Disk IAS Options
Intelligent, Policy-Based Automated Archive
SL8500 SL3000 LibrariesEncryption Access & Capacity Drives 2500 series SATA
IAS GUI, Storage Archive Manager SAM-QFS, Sun Cluster 3.2, Solaris
IP Communications Protocol
Library
DB
Surveillance
Web
Unstructured
Infinite Archive System (IAS) Infinite Archive System (IAS) 2500 series SASModels: Value and Midrange
X4200 T5220
Emerging Cloud Deployment Patterns
Test and
Development
Functional Offload
(Batch Processes –
TimesMachine)
Functional Offload
(Storage – SmugMug)
Augmentation
(Temporary Load – Animoto)
Storage Service
•
What It is
>
On-demand, API-based access to storage on the network
•
Features
>
Ability to store and retrieve data as objects or files
>
REST API with open, AWS S3-like semantics for
object storage
>
WebDAV for file storage
>
Fast and inexpensive cloning of objects and files
>
High availability
>
Detailed metering of storage used, I/O requests,
bandwidth, etc.
•
Customer Benefit
>
Scalable, highly available storage without big hardware
investments
TACC: World’s Top Supercomputer
•The world’s largest largest computing system in the world for open science research
•Sun Constellation Linux Cluster and Sun StorageTek Mass Storage Facility
•579.4 Tflops peak performance
•Sun Data Center Switch 3456
•Dual redundant •110 TB/sec bisectional bandwidth •Sun Fire X4600 •25 systems •800 cores •SunBlade 6048 •3,936 blades •15,744K CPUs •62,976 cores •125 TB/RAM •Sun Fire X4500 •72 Systems •1.7 petabytes •64.8 GB/sec total bandwidth
16
Internet Archive
• Gained a reliable and flexible datacenter that supports multiple PB of storage
• Increased storage capacity of its servers
• Reduced space and energy needs for lower costs
• Superior data integrity to guard against data loss
• Rapid time to deployment – Sun MD unit delivered in less than 45 days
Sun Solution
Client Results
Key Requirements
• Build a server infrastructure to support massive amounts of data — 2 PB of storage, growing by 1 PB per year
• Provide an efficient, reliable, and scalable datacenter
• Keep space, energy, management and maintenance costs low
• Web snapshot 100 TB of data - approximately 4 billion Web pages.
• Support up to 500 user queries per second.
• Sun Modular Datacenter S20
• Sun Fire X4500 Server
• Solaris 10 with ZFS
• Sun Remote Operations Management