• No results found

Cloud and Big Data initiatives. Mark O Connell, EMC

N/A
N/A
Protected

Academic year: 2021

Share "Cloud and Big Data initiatives. Mark O Connell, EMC"

Copied!
35
0
0

Loading.... (view fulltext now)

Full text

(1)

PRESENTATION TITLE GOES HERE

Object storage systems: the underpinning of

Cloud and Big Data initiatives

(2)

SNIA Legal Notice

The material contained in this tutorial is copyrighted by the SNIA unless

otherwise noted.

Member companies and individual members may use this material in

presentations and literature under the following conditions:

Any slide or slides used must be reproduced in their entirety without modification

The SNIA must be acknowledged as the source of any material used in the body of

any document containing material from these presentations.

This presentation is a project of the SNIA Education Committee.

Neither the author nor the presenter is an attorney and nothing in this

presentation is intended to be, or should be construed as legal advice or an

opinion of counsel. If you need legal advice or a legal opinion please

contact your attorney.

The information presented herein represents the author's personal opinion

and current understanding of the relevant issues involved. The author, the

presenter, and the SNIA do not assume any responsibility or liability for

damages arising out of any reliance on or use of this information.

(3)

Abstract

Object storage systems: the underpinning of Cloud and

Big Data initiatives

Object storage systems have risen to prominence in the storage

industry and underlie both public and private cloud

offerings. This talk will cover the needs of a cloud based storage

system, why traditional approaches are insufficient to meet this

challenge, the basic paradigms and architectures of object

storage systems, how the architecture of an object storage

system addresses the needs of cloud based storage, and

challenges of using object storage systems vs traditional IT

systems. Additionally this talk will compare and contrast the

major object storage systems in use today, including Amazon’s

S3, Microsoft’s Azure, EMC Atmos, the OpenStack Swift

(4)

This talk will cover

What is Cloud?

What are Object Storage systems?

How do Object Storage systems satisfy cloud use cases?

What is Big Data?

(5)

What is cloud?

Cloud computing is a model for enabling ubiquitous,

convenient, on-demand

network access to a

shared

pool of configurable computing resources

(e.g.,

networks, servers, storage, applications, and services) that

can be

rapidly provisioned

and released with minimal

management effort or service provider interaction. This

cloud model is composed of five essential characteristics,

three service models, and four deployment models.

(6)

5 Essential Cloud Characteristics

On-demand self-service.

Broad network access.

Resource pooling.

Rapid elasticity.

Measured service

(7)

What storage meets these criteria?

On-demand self-service.

Broad network access.

Resource pooling.

Rapid elasticity.

Measured service

(8)

Object storage: Evolution of storage

(9)

At first there was one – Block devices

Characteristics

Disk, LUN, etc

Simple, linear block addresses

Fixed size elements (512 byte block)

Atomic access at the element level

Advantages

Speed and performance

Ability to satisfy many use cases

Disadvantages

Difficult to use

Fixed size LUNs

Caveats: may grow, thin LUNs

Fragmentation issues

(10)

Then there were two – Filesystems

Characteristics

Built on block

Space management

Sharing – NFS, CIFS

Byte accessible

OS caching for performance

Advantages

Easy to use, human understandable addressing

Standard tools

Disadvantages

Limitations – path length, directory size, FS size, inodes, etc.

Semantics heavyweight for some use cases

(11)

The Third Amigo – Object Storage

Characteristics

Advantages

Disadvantages

(12)

Defining Characteristics

Single, Flat namespace

Location independent addressing

“Constant time” read performance

Unlimited storage, autoconfiguring

Policies at object or bucket level (policies at user defined

container level)

(13)

Typical Characteristics

Scale out

Distributed

(14)

Common Characteristics

REST/HTTP for internet/mobile access

Multi-tenancy

Self-service

Provisioning

Metering

(15)

Single, Flat Namespace

Key/Value store (Amazon S3, Azure, SWIFT)

May looks like a pathname, e.g. /foo/bar/filename

Unique within a bucket

Ability to list based on common prefix and up to a delimiter

Otherwise no directory semantics

Typically used with a consistent hashing algorithm

Unique Identifier (Atmos, XAM)

Opaque character string

Must typically be used with an application database to store the

identifiers

(16)

Location independent addressing

Location independent addressing

No relationship between a name and the location of the object

Two objects with similar “names” are not necessarily colocated

Block systems: Reading N sequential blocks faster than reading N random

blocks

Flash based storage changes this effect

Filesystem: Accessing /foo/bar/file1 may speed up access to /foo/bar/file2

(caching effects)

Filesystem: Reading from /foo/bar/file1 normally speeds up future reads of that

same file

“Constant time” read performance

Read of each object completes in the same time

Typically achieved via a hash on object name

(17)

Unlimited storage and policies

Unlimited storage, autoconfiguring

Adding storage to the system automatically expands capacity

available for all users

“No limits” on total storage, total objects, etc

Policies at user defined container level

Amazon – bucket level policies, reduced redundancy at object level

Azure – storage account level policies

Atmos – per object policies

(18)

Per-object user definable metadata

(19)

Per-object user definable metadata

(20)

Per-object user definable metadata

Applications embed metadata in files in

an application specific mechanism

What’s different in an object storage

system?

Object systems provide standard ways to

store metadata

Enables easier processing of the metadata

Allows different applications to manipulate

or process the metadata

(21)

Object Storage: Typical

Characteristics

Scale out

Distributed

(22)

Object Storage: Typical

Characteristics

Eventually consistent

Write an object with all “1”s

Overwrite the object with all “2”s

Read from the object – should get “2”s (strong consistency)

May get “1”s, will eventually get “2”s

Will not get “5”s (a value that was never written)

Areas which generally show eventual consistency

Quorum reads

Number of object replicas

Asynchronous replication with read allowed

List after write eventually consistent

(23)

Common Characteristics

REST/HTTP for internet/mobile access

Multi-tenancy

More than logical data segregation

Separation of users, administrators, etc

E.g. a per-filesystem administrator, a per-LUN administrator

Self-service

Provisioning

Metering

(24)

Object Storage Examples

Clouds powered by object storage

(25)

Summary: Storage Evolution

With Block Storage:

Data is organized as an array of unrelated blocks

Hosts directly access blocks

With File/NAS Storage:

Data is organized as an array of unrelated blocks

Onboard file system places data on disk

External systems directly access files within the onboard file system

With Object Storage:

Application centric data storage, access, and management model

Stores virtual containers that encapsulate data, data attributes, metadata, and

Object IDs/keys

With Cloud Storage:

Nearly limitless scalability designed to manage the explosion of unstructured data

Capable of scaling across multiple physical locations, regardless of distance

Advanced, policy driven data management based on attributes and metadata

Flexible access methods that support traditional object storage models and new

web based application architectures

(26)

Object storage satisfies cloud needs

On-demand self-service.

Unlimited storage, Multi-tenancy, Self provisioning

Broad network access.

Scale out architecture, REST/HTTP

Resource pooling.

Single flat namespace, unlimited storage, autoconfiguring, user

defined policy scope, scale out, distributed

Rapid elasticity.

Single flat namespace, autoconfiguring, location independent

addressing, scale out

Measured service.

(27)
(28)

Big Data Definition

Big data is high-volume, high-velocity and high-variety

information assets that demand cost-effective, innovative

forms of information processing for enhanced insight and

decision making.

(29)

Where does Big Data come from?

Shopping habits

Web surfing

Social media and picture sharing sites

Car and appliance sensors

(30)

Leveraging Big Data

Massive data collection

Collected across horizontal applications

What do online sites like Netflix,

Pandora, Amazon, and others do?

Utilize data across customers

(31)

Big data needs

Broad network access.

Scale out architecture, REST/HTTP

Resource pooling.

Single flat namespace, unlimited storage, autoconfiguring, user

defined policy scope, scale out, distributed

Rapid elasticity.

Single flat namespace, autoconfiguring, location independent

addressing, scale out

Commonality with cloud

(32)

Object systems uniquely suit Big Data

Per-object metadata gives context to drive analytics

Medical records from patient X

Records from patients with asthma diagnosis

May add as agile response to real world situations

(33)
(34)

Object storage systems

Object storage systems build on block and file storage

Automate the management of these devices

Object storage systems are optimized for new usage

patterns

Non-local access: Web, mobile, etc

Non-stateful access: REST, HTTP

Scale out access across many independent streams

Scalable access: Built for multiple independent use cases

Needs of cloud systems overlap with object systems

Scale, Elasticity, Pooling of resource, Broad network access

(35)

Attribution & Feedback

The SNIA Education Committee thanks the following

individuals for their contributions to this Tutorial.

Authorship History

Mark O’Connell

March 2013

Additional Contributors

Michelle Scardino

March 2013

Joseph White

March 2013

References

Related documents

Kimberly Meier-Sims (violin teacher trainer) Director of the Sato Center of Suzuki Studies at the Cleveland Institute of Music became one of the youngest registered Suzuki

HELICOBACTER PYLORI Ab - Test rapido Lateral Flow per la rilevazione qualitativa degli anticorpi specifici IgM e IgG su siero, plasma e sangue intero.. Rapid Test Device for

Supplementation of Moringa powder to chicken’s commercial ration could decrease the subcutaneous fat and cholesterol contents in broilers' meat without decreasing

30 Hz Dual Thermal, 36X, NTSC, PTZ camera, Sand, Bosch Protocol with integrated wiper and heater. Requires US Department of Commerce export license— one license per

うべき幹線的高速道路は、今回の地震で変位した

Major recommendations for the NIH included: identifying opportunities to facilitate coordination between and among the Clinical and Translational Science Award program, Cancer

1. Capital Needs Assessment. The Application must include a capital needs assessment satisfying the requirements set forth in Appendix C. Minimum Rehabilitation Requirements. The

Embedded network operators are responsible for embedded network customer meter management and the meters supply, installation, maintenance and electricity supply faults