• No results found

Open Source Long Term Preservation Archives. Richard Matthews Sun Microsystems, Inc.

N/A
N/A
Protected

Academic year: 2021

Share "Open Source Long Term Preservation Archives. Richard Matthews Sun Microsystems, Inc."

Copied!
29
0
0

Loading.... (view fulltext now)

Full text

(1)

Open Source Long

Term Preservation

Archives

Richard Matthews

(2)

2

Presentation

Prepared by:

> Keith Rajecki

> Industry Solutions Architect > Global Education & Research

Presented by:

> Rick Matthews > Sr. Staff Engineer

(3)

Agenda

Sun, Open Source, and Communities

Preservation Archiving Trends

Sun Archiving Storage Solutions

(4)

Sun Confidential: Internal Only 4 Business Presence 100 Countries Java Developers 5 Million Java Devices 6 Billion Annual Revenues $13+ Billion Worldwide Employees 35,000 Cash $4.8 Billion U.S. Patents 5,000+ Fortune 211 Company Solaris 10 Licenses 7 Million Annual R&D ~$2 Billion Annual Storage Petabytes Shipped 410 SPARC Embedded Processors 44+ Million

(5)

Sun's Open Source Strategy

Developer Preference

User

Preference Value Proposition

● More core developers ● More deploying developers ● More partners ● Business Deployment

● Sun's target market ● Binary distribution ● Pay for value

• Free to use

• More platform choice

• More suppliers

• Larger user community

(6)

6

Sun and Open Source Software

OSS has been part of Sun's DNA for awhile

“Every software asset we produce is open source. If it isn't today, it will be pretty damn quickly.”

Jonathan Schwartz

CEO, Sun Microsystems January, 2007

Sun's commitment to OSS Communities

> This includes Open Repository Communities > More than just a Storage perspective

(7)

Sun's Open Stack

Flexible and Heterogeneous with Zero Barrier to Exit

Java Enterprise System Composite Application Platform

Sun xVM Operating System Virtualization Architecture Database Platform Application Infrastructure Partners

(8)

Sun Confidential: Internal Only 8 • 4300 downloads to date • 14 million lines of source code • Community interest: 1 to 1000 core systems

• First derivative design: SimplyRISC S1 core

OpenSolaris + OpenSPARC = Only Truly Open Platform

www.opensparc.net

OpenSPARC

(9)

OpenSolaris

Innovation Happens Everywhere

www.opensolaris.org

90,000 Members

64 Community Projects,

BrandZ, DTrace, Solaris ZFS, Zones

53 User Groups Worldwide

260 Code Contributions

2006 Codie “Best Open Source Solution” 2005 Open Source World Editor's Choice 2005 InfoWorld Innovators Award

(10)

10

Sun is Committed to Developer Communities

Java Infrastructure Ecosystem Community Solaris Building

Open

and

Free

Communities

Building a Vibrant Ecosystem: Sun is the Largest Commercial Contributor to Open Source Communities

SPARC

java.net

The Source for Java Technology Collaboration

(11)

Sun Preservation Archiving Community

• PASIG Meeting May 27-30 San Francisco

www.sun-pasig.org

• Comparison of high-level OAIS architectures,

services-oriented architecture, and use cases

• Sharing of best practices and software code

• Cooperation on standard, open, ‘in-a-box’ solutions

around repository technologies

• Review of preservation and archiving storage

architectures and eResearch data set management

• Discussion of the uses of commercial third-party and

(12)

12

Trends: Growth and Preservation

Humans created 161 exabytes of data in 2006,

approximately 3 million times the

information in all the books ever written,

according to IDC.

Source: "An Inconvenient IT Truth," By Michael Vizard, eWeek, 06/22/07

80% of all movies made before 1940 are gone.

(13)

Repository Archiving Projects

• Compliance

Book and Image Digitization and Sharing

• National Heritage Content

• Newspapers

• Replicated, Tiered Repositories for Archived Materials

• Research Data, Applications, and Systems

• Science, Technology, and Medical Journals

(14)

14

What's The Buzz?

Problem:

> Exponential growth of digital content, now and in the

future

> Powerful, flexible infrastructure required to archive:

– Store unstructured, fixed content

– Search that content

– Preserve that content for the long-term

One Proposed Solution:

(15)

SAM-QFS Infinite Archive System

SAM-QFS: World’s best policy based multi-tiered

archive manager

> Application Transparent dynamic data movement > Four tiers, local and remote

> “Continuous Archive” = CDP

> WORM & Retention management

Infinite Archive System

> Scalable multi-tiered SAM-QFS > platform base

> 10-256TB systems.

(16)

16

Sun StorageTek 5800 “Honeycomb”

Content-aware Open Storage

• What It Is

> 'Smart', network-attached, clustered,

racked storage system

• What It Provides

> 64TB (raw) per rack: data objects +

metadata

> WORM Objects

> Metadata awareness built into the

design

> Reliability, persistency, currency

assurances

(17)

Sun StorageTek 5800 “Honeycomb”

Content-aware Open Storage

• RAIN architecture

> Symmetric cluster CPU, memory,

SATA Disks

• Each node

> Opteron-based SunFire server

> Solaris 10 > 3 GB RAM > Dual Gig-E > 4 x 500GB SATA • L2 load-spreading switches • Service processor

(18)

18

Sun StorageTek 5800 “Honeycomb”

Content-aware Open Storage

'Half Cell' 8 servers 16 TB (raw) 'Full Cell' 16 servers 32 TB (raw) 'Rack' (2 Cells) 16 servers 32 TB (raw) 'Hive' (N Cells) [Future] 'Hot Scaling'

(19)

Why Honeycomb?

Architecture optimized to store and retrieve

unstructured fixed content

Object storage, metadata aware

Extreme data protection via RAID6, data

self-healing, bit-rot detection

> Mean Time To Data Loss > 2M years

A commitment to standards

> Dublin Core metadata > Web DAV

(20)

20

Why Honeycomb?

Open Source strategy fits with majority of

repository/archive software efforts

Standard Java and C APIs in SDK

Horizontal Scaling as storage needs grow

Dublin Core is only the beginning

Platform-agnostic

[Near Future] On-board local data services available

(“Storage Beans”)

(21)

Why Fedora + Honeycomb?

Answer: Archival Storage

Fedora Archive Server Sun STK5800 Systems Digital Library Custom Monolithic Digital Library Application

In order to seriously address current and future issues of scalability, persistence and flexibility, this will likely not meet the requirements, while this clearly will...

(22)

22

Why Fedora + Honeycomb?

Why Fedora + Honeycomb?

Application Clients Content-Rich Applications Fedora Archive Server Sun STK5800 Systems

ePublishing eResearch HPC Digital Library

• Designed with the proper intelligence in the proper places

• Metadata integral to storage

• World-class reliability, scalability and persistency1

• End-to-end Open Source Software solution

• Automated wide-area backup option

(23)

Storage Beans

Discrete Services inside Honeycomb

What will they do? That's up to

you!

Asynchronous (Background)

> Transformations

> Periodic Data Scrubbing > Duplicate Consolidation

Synchronous (Real-time)

> Audit logs

> Watermarking > Encryption

(24)

24

Sun/Fedora Efforts

Fedora runs now on Solaris/Open Solaris

> Server + Storage reference configurations

> Inclusion of Fedora 3 in the Open Solaris 'Indiana'

Repository

> Fedora on Solaris

– How does it perform? – How does it scale?

– What are the advantages to running on Solaris? – Best Practices on Sun

(25)

Proof Point: Fedora/Sun/JHU

User Applications

E-Research Preservation Archive Publishing

Fedora Commons Framework

Content Storage

Content Storage

Abstract Data Mgmt. Layer

Abstract Data Mgmt. Layer

Tape Honeycomb

Fast Disk

Fedora Web Service APIs (SOAP and REST)

Manage

API Access API Registry Search SearchRDF

Ingest Validate Manage Policy Access RDF Index Store Registry File system

(Objects) (Registry)RDBMS Triplestore (Index)

CMABind

Pluggable

Pluggable

Core Modules

Core Modules

A Fedora/Sun Solution for Creation,

A Fedora/Sun Solution for Creation,

Management, and Exchange of

Management, and Exchange of

Durable, Digital e-Research Content

Durable, Digital e-Research Content

at Johns Hopkins University

(26)

26

Honeycomb Virtual Views Example

Photo Application

• Photo demo application on top of StorageTek 5800 or ST5800 Emulator • Leverages metadata to

organize and present the content in logical views • Photo app extracts,

stores, and displays embedded EXIF jpeg metadata

(27)

Digital Repository References

• New York U. Digital Media Management

• Stanford U. OAIS Digital Repository

• Johns Hopkins U. eResearch (Fedora)

• Purdue U. eResearch (Fedora)

• Oxford University Google Project (Fedora/VITAL)

• National Library of New Zealand Digital Preservation (Ex Libris)

• California Digital Library Large Scale Digitization

• Swedish Archive for Sound/Recording Digital Media Management

• Southampton U. EPrints Repository

• San Diego Supercomputer Large Dataset Storage

• U. Michigan D-Space Repository

(28)

28

For More Information

• Storage Archive Manager

http://www.sun.com/storagetek/management_software/data_management/sam/index.xml/

• Honeycomb

http://www.sun.com/storagetek/disk_systems/enterprise/5800/index.xml

• Honeycomb Architecture Document:

http://www.sun.com/storagetek/disk_systems/enterprise/5800/5800-Arch-final-LR.pdf

• Sun Preservation and Archiving Community

http://www.sun-pasig.org

• Open Source Honeycomb Software

(29)

Thank You

Q&A

References

Related documents