• No results found

Storage Technologies and Solutions

N/A
N/A
Protected

Academic year: 2021

Share "Storage Technologies and Solutions"

Copied!
42
0
0

Loading.... (view fulltext now)

Full text

(1)

David Hung-Chang Du

Qwest Chair Professor

Computer Science and Engineering University of Minnesota

[email protected]

CRIS: NSF I/UCRC Center on Intelligent Storage More information on http://cris.cs.umn.edu

(2)

2

Outline of Talk

• Two Major Changes in Computing & Communication Environment

• Big Data Problem

• Solving Big Data Problem

– Software Defined Network vs. Software Defined Storage

• Storage Research Projects at NSF I/UCRC Center on Intelligent Storage

(3)

Bridge Monitoring Building Environment Controls Earthquake Monitoring Elder Care Factories Fire Response First Responders Forest Management Soil Monitoring Supply Chain Wind Response … and more more

(4)

4

4 OOPSLA Jeannette M. Wing

Sensors Everywhere

Sonoma

Redwood Forest smart buildings

Kindly donated by Stewart Johnston

smart bridges

Credit: MO Dept. of Transportation

Hudson River Valley

(5)

Digital Explosion: Data Centric

 The digital universe will

grow over six-fold, from

281 exabytes in 2007 to

1,773 exabytes in 2011

 > 90% of the information

in the digital universe is unstructured and

absolute # of files

growing faster than the TBs

(6)

6

Big Data Problem

 Converting Analog to Digital

 All Data Access Traces in Digital World

 How to Gain Information from All Stored Data?

 How to Make Better Decisions?

 What to Keep and What to Preserve?

 Can We Develop Knowledge from All These Data?

(7)

Blocks Files Objects Information Knowledge  Traditional storage device view - raw bits, no associated semantics.

 Extended attributes augmented view high level semantics associated.

Need New Architectures & Systems to Capture

Exploited to store and retrieve data more efficiently with Indexing/Search capability

[ INTELLIGENCE ]

Intelligent Storage

(8)

28 May 2014 8

Current Cyber Space

“A domain characterized by the use of electronics and the electromagnetic spectrum to store, modify, and exchange data via networked systems and associated physical

(9)

Inside the ‘Net: A Different Story…

• Closed equipment

– Software bundled with hardware – Vendor-specific interfaces

• Over specified

– Slow protocol standardization

• Few people can innovate

– Equipment vendors write the code

(10)

10

Do We Need Innovation Inside?

Many boxes (routers, switches,

firewalls, …) with different interfaces and not programmable.

(11)

Proposed SDN Solution

Control Plane Data Plane Standard API to Enable Programmable Separation of Control Plane and Data Plane

Logically Centralized

Controller

(12)

12

Seamless Mobility

• See host sending traffic at new location • Modify rules to reroute the traffic

(13)

Server Load Balancing

• Pre-install load-balancing policy • Split traffic based on source IP

src=0*, dst=1.2.3.4 src=1*, dst=1.2.3.4 10.0.0.1 10.0.0.2

(14)

14

Example SDN Applications

• Seamless mobility and migration • Server load balancing

• Dynamic access control

• Using multiple wireless access points • Energy-efficient networking

• Adaptive traffic monitoring

• Denial-of-Service attack detection • Network virtualization

14

(15)

Network Function Virtualization (NFV)

(16)

16

Use Case: vWOC (virtualized

WAN Optimization Controller)

(17)

What is SDS ?

1. Policy-Driven Storage (IOPS, latency, reliability, Fault tolerance, Provisioning, QoS)

2. Scale-out Architecture

3. Storage as a Seamless Pool of Resource (Storage Virtualization)

4. Control Integration from Multi-Vendors 5. Heterogeneous Storage Containers

(18)

18

Web 2.0 Pattern J2EE/OLTP

Map/Reduce Pattern

Transactional Analytics Web

Availability •Clustering •Replication Capacity/Performance • Storage Class • De-duplication/Compression/Thin Provisioning

Security & Compliance

• Encryption • Archival/WORM

Data storage and retrieval services

Plan Deploy Optimize

Legacy high-function (external) storage systems Portable storage software on

commodity hdwr

Public Cloud Private Cloud Hybrid Cloud Bare Metal Cloud

Software Defined Storage

(19)

Platinum Gold Silver Bronze Authentication/Auditing Encryption Mirroring/DR High Availability Striping Clustering Compression Tiering/ILM

Backup & Recovery

Deduplication

Security and Availability

Performance and Opt.

St or ag e Ser vices L ay er RESILIENCY CAPABILITY OPTIMIZATION FABRIC MANAGEMENT

SOFTWARE DEFINED STORAGE

• Storage Abstraction • Storage Provisioning • Storage Monitoring • SAN/GPFS/NAS/DAS • •FC/FCoE/iSCSI/ Infiniband •Zone management • Storage replication • Disaster recovery • Consistency groups • Backup HETEROGENEITY • Storage tiers

• Performance aware placement • Continous optimizations • Migration SOFTWARE DEFINED COMPUTE SOFTWARE DEFINED NETWORK

(20)

SDN vs. SDS

• Consensus on Definition • OpenFlow Switches as De

Facto Devices

• Wide Area Networks

• Benefit Big Network Users • IP Network Focus

• Support Applications

• No Clear Definition Yet • Heterogeneous Types of

Storage Containers

• Data Center Deployment • Ensure QoS & Efficiency • Virtual Machine Focus

• Integration with SDN and Compute

(21)

CRIS Research Summary

(22)

22

Current Sponsor Companies

Two

Memberships

One

(23)

• Research on New Storage Technologies (Flash Memory based SSD, PCM, Shingled Write Disks: (Seagate, LSI, SGI and Western Digital (HGST))

• Research on New Storage Hierarchies (multi-level caching/prefetching, data allocation/migration, and tiered storage: (HP, NetApp and Dell)

• Cloud Storage and Big Data (HP, NetApp, FedCentric and NEC-Labs)

• I/O Workload Characterization and Synthetic Workload Generation (Seagate, Xyratex and NetApp)

(24)

24

New Storage Technologies

Flash Memory based SSD

FTL Design

PCM Prototype

Shingled Write Disk Design and Layout

(25)

Challenges in New Technologies

• Investigating and Understanding Fundamental Properties

• Research of Design Issues

• What are their impacts on applications? • How to effectively integrate the new

technologies into existing memory/storage hierarchies?

(26)

26

5/28/2014 26

Summary of SSD Research Results

• Robust and Reliable Design of SSDs

• Integrating SSDs into Storage Hierarchy

• New FTL Design: A Convertible FTL Design • Efficient Wear-Leveling Algorithm

• Optimal/Efficient Read/Write Caching • Hot and Cold Data Classification

• Bloom Filter Design and Key-Value Store Based on Flash Memory

• Using Sampling Technique for Meta-Data Management in FTL

(27)
(28)

28

• NVM Replaces DRAM as Main Memory • NVM to Be Used As A Cache • DRAM+NVM

Non-Volatile Memory

CPU NVM HDD Main Memory Storage CPU NVM SSD Main Memory Storage DRAM SSD CPU NVM Main Memory Storage

(29)

New Memory and Storage

Hierarchies

• Data Storage • Data Migration • Multi-Level Caching • Data Prefetching • Tiered Storage

(30)
(31)

• “In-place Update”: many small bands – Protect previously-written data by

Read-Modify-Write

– Behaves similar to regular disks

• “Out-of-place Update”: few large band – Maintain data in circular log structure

• Data Addition to head pointer • Data removal from tail pointer – LBA-to-PBA mapping is not fixed

– Transfer random writes into sequential write – Compromise sequential read performance

Possible Methods

Indirected Addressing Higher Space overhead Defragmentation (Garbage Collection) Write Amplification

(32)

32

• How to build large scale storage systems with SSD or SWD?

• Modeling multi-channel multi-chip SSD

• Investigating SSD reliability and performance with a wide set of metrics

• Investigating the impact of non-volatile memory as main memory

• Revisit FTL design issues for SSD when SSDs are composed of a large storage system

instead of caching devices

Current Research Focuses on New

Storage Technologies

(33)

Storage Layer Management and

Caching

off off On SSD Read Queues (RT) Read Queues (Prefetch) Write Queues (Offloading)

Big Memory with PCM

When/ Where/how much

Cloud Storage

(34)

34

Local Storage + Cloud Storage

(35)

NAND Flash Package with Integrated ECC

and General Purpose Processor

Host CPU DDR PCIe SSD Controller Block Management Data buffer Host communication DDR Wear Leveling Garbage Collection … …

NAND Flash Package

NAND Flash Die NAND Flash Die … … ECC Processor

NAND Flash Package

NAND Flash Die NAND Flash Die … … ECC Processor

NAND Flash Package

NAND Flash Die NAND Flash Die … … ECC Processor

NAND Flash Package

NAND Flash Die NAND Flash Die … … ECC Processor Manufacturers incorporated hardware in flash package

(36)

36

Accelerating Hadoop on SGI UV2000(In-Memory System)

Hadoop & MapReduce Are

for Data Intensive Applications

How to Speed Up in High

(37)

• Emphasize more on Virtual Machine environment

• Ensure QoS support for VMs in Cloud (VDI as An Application)

• How data deduplication can be applied in cloud + big data (more on primary storage dedupe)?

• Integration of cloud and local storage • Integration of various file systems with

federated file system

Research Focuses of Cloud Storage +

Big Data

(38)

38

Framework of I/O Workload

Characterization

Original trace Workload Parameters Synthetic trace Workload characterization Adjusted Parameters Parameter adjustment Workload generation Replay by workload replayer Replayed trace Changes to applications and /or system ( either host or

storage)

Arrival pattern, File/Data access pattern in the

form of parameters Replay on same/different storage system Action Output Comparison 2 Comparison 1 Comparison 3

(39)

• Completed a tool for I/O workload

characterization and generation for parallel file systems

• Hfplayer v.2 (replay engine) is now available • Proposed a new cache replacement scheme

for non-volatile memory as main memory and disk as storage device

• A detailed design of integrating cloud storage with local storage

• Proposed a journaling based scheme for SSD

Recent Accomplishments

(40)

40

• Further Integration with block I/O, parallel file system I/O and replay engine

• How to improve the performance of storage systems?

• I/O workload phase detection

• How to apply knowledge in I/O workload to multi-level caching?

Research Focuses on I/O Workload

Characterization and Generation

(41)

Conclusions

• Storage Research Face Challenges from Applications (Big Data, Long-Term Data Preservation, Cloud Storage, Scalability)

• Also Face Challenges from New Technologies (Emerging Memory/Storage Hierarchies)

• Integrated Approach Including Compute,

Storage and Network Systems Consideration Is A Must (SDS???)

(42)

42 42

Thank You!

Questions?

References

Related documents

The random-subspace approach was used for multivariable feature selection in [46] by identifying the features used by these random subspace classifiers rather than use

This thesis deals with two research problems in the context of infant cry signals: audio segmentation of cry recordings in order to extract relevant acoustic parts, and

The Online Company Listings are then ‘served up’ by the content management system so that they appear adjacent to relevant content – as defined by the industry categories and

1) Sivuston ulkoasun ja navigoinnin uudistaminen ja kehittäminen ovat paras keino palvelun käyttäjäkokemuksen kehittämiseen. Buildercomin oma kokemus ja saatu asiakaspalaute

a) To establish a mathematical model to investigate Casson fluid model with the effects of magnetic field and nano-particles yields a moving cylinder. b) To introduce an

Firms or inventors in environment technology domains like environment management, material/plastic/packaging and energy related climate change mitigation technology have

Undersøkelsen viste at det var stor forskjell på hvor mange målesett det var registrert ved hjelp av henholdsvis mTILT (419) og eTILT (182 666) (5.2.1).. At eTILT genererte

Taking the results of present work together, there is a remarkable and important effect found in all of the experiments: there is in fact a modulation of attentional processes in