Cloud Storage: Efficient and Sustainable Solutions for Data Growth

51  Download (0)

Full text

(1)

Cloud Storage:

Efficient and Sustainable Solutions for Data Growth

Dalit Naor, PhD

Manager, Storage Research

IBM Haifa Research Lab

(2)

© 2013 IBM Corporation

2

Who are we?

(3)

IBM Research: “The World is our Lab”

China Watson Almaden Austin Tokyo Haifa Zurich India Dublin Melbourne Brazil Kenya

(4)

© 2013 IBM Corporation

4

IBM Research – Haifa



Established in 1972



Largest IBM Research facility outside the US



Spanning all IBM Research strategy areas



Working with IBM business units and

and IBM clients worldwide



Collaborating with academia and industry

(5)

Cloud Computing

Big Data Analytics

Quality Storage

Optimization

IBM Research – Haifa: What we do

Collaboration & Social Networking

(6)

© 2013 IBM Corporation

6

EU projects

Valuable projects

• Collaborative innovation - partners sharing their knowledge

• Extensive portfolio of collaborations with over 100 European universities, and 130 clients and partners

(7)

IBM Storage Reseach: Major Research Initiatives

Solid State

Information Systems

Solid State

Information Systems

Digital Preservation

& Archival Systems

Digital Preservation

& Archival Systems

Scale Out

and Cloud Storage

Scale Out

and Cloud Storage

IBM Storage Today and Tomorrow

Autonomic Storage

Management

Autonomic Storage

Management

Exploratory Storage

Systems

Exploratory Storage

Systems

Storage for Analytics

Storage for Analytics

(8)

© 2013 IBM Corporation

8

Motivation – why care about Storage?

Or,

why is Storage the most relevant topic in IT

today?

(9)
(10)

© 2013 IBM Corporation

10

Amazon S3 Growth : The First Trillion Objects (June 2012)

(11)

So what’s the problem



Data is growing faster than the IT infrastructure investments required

to support it, “data deluge gap.”

– IT budgets are growing only 5%-7% per year.

– Information is growing faster than the investment required to store, transmit, analyze and manage it.

– Energy consumption



Why?

– More automation and collection of data

– More multi-media data (e.g., online video, surveillance, medical imaging).

– More sharing of data

– Maintaining duplicates



What can be done?

– Store less (expire, compress…)

– Store it “in the right place”

– Optimize existing systems

(12)

© 2013 IBM Corporation

12

(13)

Impact of Storage Technologies on Energy Efficiency



Capacity optimization

Compression and de-duplication



Data tiers and technologies

Disk types

Flash / SSD

Tape



Cloud - A disruptive effect on storage

Low cost storage – consumer drives

Data replication

(14)

© 2013 IBM Corporation

(15)

Topics

1.

Optimizing Storage thru Cloud Technologies

Object Stores

Block storage for cloud compute

2.

Data Reduction



Data Compression and Deduplication techniques



Smart Compression and Deduplication decisions

(16)

© 2013 IBM Corporation

16

Different cloud workloads need different classes of storage



High-performance, co-located

storage for XaaS

– Blocks/file to support compute



General purpose data center NAS

extension

– Files



Fixed content depot

– Objects

E.g. Amazon EBS, Openstack NOVA

E.g. Amazon S3, Openstack Swift

(17)

Cloud (Online) Storage



Networked online storage



Data is stored in virtualized pools of

storage

– may span across multiple data centers



Typically hosted by a third party



Customers use to store files or data

objects.

 Cloud Object Storage protocols

(18)

© 2013 IBM Corporation

18

How is it done? – the Internals



Cloud File Systems

(19)

Scalable File Systems



Different design points than traditional file systems

– New architecture

– New , “relaxed” , protocols and systems operations (I/O and management)

– New solutions for resiliency and high availability based on replication, e.g. not RAID

– Support for computation

– Designed for new workloads: large streaming, sequential Writes or Analytics.



Assumptions

– Based on commodity hardware

– Components always fail

- Need self monitoring to detect, tolerate, and recover from failures

– Optimized for large files

(20)

© 2013 IBM Corporation

20

Cloud Object Storage – Openstack / Swift

Source: Swiftstack documentation http://swiftstack.com/openstack-swift/architecture/



RESTful APIs



Swift storage:

http://swift.company.com/v1/account

/container/object

(21)

Cloud Object Storage – Openstack / Swift

 Building Blocks

– Proxy Servers: Handles all incoming API requests.

– Rings: Map logical names of data to locations on particular disks.

– Zones: failure domains

(22)

© 2013 IBM Corporation

22

Cloud Object Storage – Openstack / Swift

Source : Swiftstack documentation http://swiftstack.com/openstack-swift/architecture/



Data model

– Accounts: “tenants”

– Containers: sets of objects

– Objects: The data itself, mapped to files on the local file system

– Partitions/Containers : Manage locations where data lives in the cluster.



Replication

– Everything is stored three times (by default)

Upon a disk failure, the data is replication to other zones, ensuring three copies

(23)

VISION Cloud

:

Vi

rtualized

S

torage Serv

i

ces

Foundati

o

n for the Future Inter

n

et



Architect and build the next generation, scalable,

low-cost and secure storage cloud system



Key Innovations:

– Raise Abstraction Level of Storage

– Computational Storage

– Content-Centric Storage

– Advanced Capabilities

– Data Mobility and Federation



Four use cases to demonstrate data-intensive

services

(24)

© 2013 IBM Corporation

24

The evolution of Object Stores Research at IBM



Object Stores – an active area of research at IBM for the past 5 years



Research focus evolved over time:

how to build a

basic scalable

object store

advanced

capabilities

for object

stores

swift

(25)

1 Rich Meta Data Support

Metadata integral part of objects. Can describe content and how handled. Provide queries and indexing over metadata, while supporting scalability

2 Multi-tenancy Provide secure logical isolation between tenants to enable

hosting of many tenants over the same shared

infrastructure. User of one tenant cannot access storage of another tenant. Security breach in one tenant cannot be leveraged to breach another tenant.

3 Computational

Storage

“Stored procedures” for a storage cloud. Provide ability to run computations (storlets) safely and securely, close to the data in Swift. Enables extending Swift without changing its code.

4 On-boarding Provide an on-boarding service. Relationships set with

containers on old provider. Client starts working with

(26)

© 2013 IBM Corporation

26

Index and Query of User Metadata is viewed as a critical

feature by our VISION Cloud Partners

 A catalog maintains for each object in a container a list of the attributes and attribute-value pairs

–A content-centric query requires a look-up in the catalog

 Example (schematic) – list all red objects

GET /MyContainer/ HTTP/1.1 . . . Match-md: Attribute=‘color’ x-Value=‘red’  Response (schematic) HTTP/1.1 200 OK Content-Type: application/json { "children" : [ “Obj 2", “Obj 3" ] }

Attribute Value Object

color red Obj 3

shape square Obj 2

shape triangle Obj 1

color blue Obj 1

color red Obj 2

shape square Obj 3

Obj 1

Obj 2 Obj 3

(27)

Computations (storlets) running in the object store save

network bandwidth and increase security

 Restricted module performed in the storage close to the data

– Analogous to database stored procedures

– Dynamically loaded

 Execute in a sandbox

 Triggered synchronously or asynchronously by events

– Any RESTful request can trigger

– Can run in background or act as “transform”

 Benefits

– Locality – avoid network overhead

– Security – avoid transferring data outside of cloud

– Automated execution

PUT Pudong Feb 2012

mimetype = jpeg

category = vacation picture location = Shanghai

Thumbnail Creator

Object-type = storlet Put object trigger:

mimetype = jpeg

category = vacation picture Code:

. . . .

Pudong Feb 2012 thumbnail

(28)

© 2013 IBM Corporation

28

On-Boarding-Federation allows a tenant to switch

to a new cloud storage provider on-the-fly

New Cloud

New Cloud

Old Cloud

Old Cloud

Federator Jobs Federator Direct Federation Admin Module 1 2a 3 2b

NAS, Other storage

(29)

Extract and associate metadata

Group be geo-coordinates

VISION Cloud Telco Use Case (FT, Telenor)

Value: easily develop value-add



What is the use case

– Storage capacity for telco users with value-add applications, including transcoding on-the-fly

– Cloud provider



Innovations needed

– Rich metadata, content-centric access, storlets, secure multi-tenancy



Status

(30)

© 2013 IBM Corporation

(31)

Overview of OpenStack: Key components

Horizon Nova Swift Networking Glance New in Havana Metering (Ceilometer)

Basic Cloud Orchestration & Service Definition (Heat)

Oslo

(32)

© 2013 IBM Corporation

32

Example:

Cinder Volume Migration for Havana Release



Goal: To migrate a volume's data

from one location to another in

a manner that is as transparent as

possible to users and workloads.

– Analog to VMware's Storage vMotion



Use cases targeted for Havana:

 Storage maintenance / decommissioning

 Modifying volume types

- Enable/disable compression, Easy Tier, etc.



Design tenet:

– Users today are not aware of a volume's location, and should therefore not be aware of or directly control volume migration.

(33)

Topics

1.

Optimizing Storage thru Cloud Technologies

Object Stores

Block storage for cloud compute

2.

Data Reduction



Data Compression and Deduplication techniques



Smart Compression and Deduplication Decisions

(34)

© 2013 IBM Corporation

34

Compression and Deduplication

Compression and deduplication as

functions will continue to move into the

storage controllers

The challenge and the opportunity will be to

coordinate effective implementation of the

deduplication and/or compression functions

given approaches which happen further

upstream from the storage.

As CPU processing power continues to scale

Compression and deduplication as

functions will continue to move into the

storage controllers

The challenge and the opportunity will be to

coordinate effective implementation of the

deduplication and/or compression functions

given approaches which happen further

upstream from the storage.

(35)



Real-time Compression is seamlessly integrated with Storwize

family GUI

Simply select new volume preset

Straightforward compression of existing volumes using volume

mirroring

Convert to compressed and eliminate unused space during conversion

(36)

© 2013 IBM Corporation

36

Typical Compression Rates



Comprestimator

host-based utility can be used to evaluate expected

compression benefits for existing environments

About 1 minute analysis per device

Very accurate (less than 5% error)

Supported on wide range of clients

Evaluate expected compression on any storage system

Databases

50-80%

Server/Desktop

Virtualization

45-75%

Collaboration Data

20-75%

Engineering Data

50-80%

E-mail

30-80%

36

(37)

Our scope – Real-Time Compression



Compression for

primary data

in

enterprise

storage systems



Many benefits to compression. Reduces:

– Costs

– Rack space

– Cooling

– Can delay need for additional purchases to existing systems



The challenge:

Add “seamless” compression to a storage system

with little effect on performance

A bit about compression techniques:



We focused on Zlib – a popular compression engine (zip). Combines :

(38)

© 2013 IBM Corporation

38

Estimating compression ratios – some motivation



To Zip or not to Zip?

Compression does not come for free

Incurs

overheads

, sometimes significant

Not always worth the effort – depends on the actual

data

Goal: Avoid compressing “incompressible” data

Other potential benefits:



Evaluation and sizing

Compression ratio  number of disks  money!

Evaluation: should I invest in a storage system with compression

Sizing: How many disks should I buy

Especially In a

storage

system with a high disk to

(39)

Existing solutions

Rules of thumb

 Deduce from past experience with similar applications

 By file extension

– Not always accurate

– Not always available

Look at the actual data

 Scan and compress everything

– Takes too long

 Look at a prefix (of a file/chunk) and deduce about the rest

– No guarantees on the outcome

.jpg .doc

.ppt .vmdk .zip

(40)

© 2013 IBM Corporation

40



Input:

Large volume of data

– Block volume, file system, etc..



Goal:

Estimate the overall compression ratio with accuracy guarantee.

Part I – The Macro

The framework:

 Choose m random locations

 Compute an average of the compression ratio of these locations

 What is “location”?

 What is “compression ratio of a location”?

 How do we get a guarantee?

(41)

The Macro-scale – Sample size and accuracy

Analysis yields:

Confidence ≤ 2e-2m·Accuracy

 Accuracy is a bound on the additive error

 Plug desired confidence and accuracy into equation to get the required sample size

 Sample size independent of Volume size!

 Results of an estimator run are normally distributed around the actual compression ratio

(42)

© 2013 IBM Corporation

42

The Macro-scale – the Actual Tool

 Written in C

 Multi-threaded

 Two implementations:

1. IBM Real-Time compression

2. Zlib compression on full objects

 Tested on real life data

 Example of a run: 73 seconds on a 3.2 TB volume – Error ~0.5%

– Exhaustive run took almost 4 hours



IBM Comprestimator – the macro-scale for IBM Real-time compression on

Storewize V7000 and SAN Volume controller: Downloadable at:

(43)

Topics

1.

Optimizing Storage thru Cloud Technologies

Object Stores

Block storage for cloud compute

2.

Data Reduction



Data Compression and Deduplication techniques



Smart Compression and Deduplication Decisions

(44)

© 2013 IBM Corporation

44

The Green Grid:

DCcE: Data Center Compute Efficiency



Primary services

– A server is usually commissioned to provide one or more specific services



Server compute Efficiency (ScE)

– Represent the percentage of time spent doing primary services



DCcE: Data Center compute Efficiency

– Calculated by averaging ScE from all servers

(45)

The Green Grid Data Center Storage Efficiency Metrics

(DCsE):

The Application Of Storage Systems Efficiency Operational Metrics



The Green Grid

– The Green Grid is proposing the use of a new family of metrics—Data Center storage Efficiency (DCsE)

– Represent efficiency at the Datacenter level

(46)

© 2013 IBM Corporation

46

Operational Metrics - Capacity



Data Center Storage Efficiency – Capacity (DCsE

CAP

)



Data Center User Capacity in Use is defined as the storage space used by

applications and measures, in GB, the total amount of file system space

consumed by all applications, as seen from the application point of view.



Data Center Storage Power Consumption is defined as the average power

consumption of the storage system measured over a long enough period of

time that it is representative of the storage system behavior (for the

measured capacity). It is measured in watts.

n

Consumptio

Power

Storage

Center

Data

in Use

Capacity

r

Center Use

Data

=

cap

DCsE

(47)

Operational Metrics - Workload



Data Center Storage Efficiency – Workload (DCsE

IO

)



Data Center I/O Throughput is defined as the number of I/O operations per

second that the applications execute (i.e., applications’ I/O rates). It is

measured in IOPS.



Data Center Storage Power Consumption is defined as the average power

consumed by the storage system while running the I/O workload over a long

enough period of time. It is measured in watts.

n

Consumptio

Power

Storage

Center

Data

Throughput

I/O

Center

Data

=

io

DCsE

(48)

© 2013 IBM Corporation

48

Operational Metrics - Throughput



Data Center Storage Efficiency – Throughput (DCsE

TP

)



Data Center Data Transfer Throughput is defined as the amount of data

transferred per second by the applications (i.e., applications’ I/O rates), and it

is measured in MBPS.



Data Center Storage Power Consumption is defined as the average power

consumed by the storage system while running the data transfer workload

over a long enough period of time. It is measured is watts.

n

Consumptio

Power

Storage

Center

Data

in Use

Capacity

r

Center Use

Data

=

cap

DCsE

(49)

Optimizing Operational Metrics using

Energy-Saving Features



Scenario #1 – Capacity Planning And Data Migration

– It is well known that storage systems do not operate at peak throughput and capacity at all times. Moreover, storage workloads change over time, due to changes in the

applications, how the applications are used, and the storage hardware itself. Monitoring the values of the DCsE operational metrics over time provides a perspective on the current state of efficiency and whether the storage system can be optimized.

– The DCsE capacity and workload metrics can be optimized by moving data between storage systems or tiers with different performance and energy consumption

characteristics, based on the workload and application classification (e.g., business-critical, high-performance, high-availability, administrative).



Scenario #2 – Capacity Optimization: Compression And De-duplication

– Implementing storage capacity optimization techniques, such as de-duplication and compression, allows data centers to store more data on their systems. Data centers will

(50)

© 2013 IBM Corporation

50

Optimizing Operational Metrics using

Energy-Saving Features



Scenario #3 – Advanced Raid Levels And Snapshots

– Different resiliency (e.g., RAID) levels incur different storage capacity overheads. For example, RAID 10 incurs 50% capacity overhead; RAID 5 and RAID 6 incur different levels according to the number of data and parity (or mirrored) disks. However, in addition to capacity overhead, maintaining the parity incurs additional overhead when writing data. Thus, the decision of resiliency level affects DCsE capacity, DCsE

workload, and DCsE throughput. Data centers should take into account that both the application workload and the resiliency level affect the energy efficiency of the storage system.

– In addition, the storage system’s number of defined online snapshots, for rapidly

changing data, and the number of backup copies affect the underlying storage capacity requirements and energy consumption. Adjusting the number of snapshots and backups affects the DCsE capacity metric and allows data centers to store more data on the

same system.



Scenario #4 – Low Power Modes For Storage Systems

– Enabling the storage system to use low power states, including spinning/shutting down components and devices based on application data classification, can yield significant power reductions, but this approach is use-case dependent.

(51)

Collaboration



We collaborate w/universities



We host students (internships)



We organize conferences



We lead EU projects



Visit our site

IBM Research – Haifa

Storage Research

Figure

Updating...

References

Related subjects :