David Hung-Chang Du
Qwest Chair Professor
Computer Science and Engineering University of Minnesota
CRIS: NSF I/UCRC Center on Intelligent Storage More information on http://cris.cs.umn.edu
2
Outline of Talk
• Two Major Changes in Computing & Communication Environment
• Big Data Problem
• Solving Big Data Problem
– Software Defined Network vs. Software Defined Storage
• Storage Research Projects at NSF I/UCRC Center on Intelligent Storage
Bridge Monitoring Building Environment Controls Earthquake Monitoring Elder Care Factories Fire Response First Responders Forest Management Soil Monitoring Supply Chain Wind Response … and more more
4
4 OOPSLA Jeannette M. Wing
Sensors Everywhere
Sonoma
Redwood Forest smart buildings
Kindly donated by Stewart Johnston
smart bridges
Credit: MO Dept. of Transportation
Hudson River Valley
Digital Explosion: Data Centric
The digital universe will
grow over six-fold, from
281 exabytes in 2007 to
1,773 exabytes in 2011
> 90% of the information
in the digital universe is unstructured and
absolute # of files
growing faster than the TBs
6
Big Data Problem
Converting Analog to Digital
All Data Access Traces in Digital World
How to Gain Information from All Stored Data?
How to Make Better Decisions?
What to Keep and What to Preserve?
Can We Develop Knowledge from All These Data?
Blocks Files Objects Information Knowledge Traditional storage device view - raw bits, no associated semantics.
Extended attributes augmented view high level semantics associated.
Need New Architectures & Systems to Capture
Exploited to store and retrieve data more efficiently with Indexing/Search capability
[ INTELLIGENCE ]
Intelligent Storage
28 May 2014 8
Current Cyber Space
“A domain characterized by the use of electronics and the electromagnetic spectrum to store, modify, and exchange data via networked systems and associated physical
Inside the ‘Net: A Different Story…
• Closed equipment
– Software bundled with hardware – Vendor-specific interfaces
• Over specified
– Slow protocol standardization
• Few people can innovate
– Equipment vendors write the code
10
Do We Need Innovation Inside?
Many boxes (routers, switches,
firewalls, …) with different interfaces and not programmable.
Proposed SDN Solution
Control Plane Data Plane Standard API to Enable Programmable Separation of Control Plane and Data PlaneLogically Centralized
Controller
12
Seamless Mobility
• See host sending traffic at new location • Modify rules to reroute the traffic
Server Load Balancing
• Pre-install load-balancing policy • Split traffic based on source IP
src=0*, dst=1.2.3.4 src=1*, dst=1.2.3.4 10.0.0.1 10.0.0.2
14
Example SDN Applications
• Seamless mobility and migration • Server load balancing
• Dynamic access control
• Using multiple wireless access points • Energy-efficient networking
• Adaptive traffic monitoring
• Denial-of-Service attack detection • Network virtualization
14
Network Function Virtualization (NFV)
16
Use Case: vWOC (virtualized
WAN Optimization Controller)
What is SDS ?
1. Policy-Driven Storage (IOPS, latency, reliability, Fault tolerance, Provisioning, QoS)
2. Scale-out Architecture
3. Storage as a Seamless Pool of Resource (Storage Virtualization)
4. Control Integration from Multi-Vendors 5. Heterogeneous Storage Containers
18
Web 2.0 Pattern J2EE/OLTP
Map/Reduce Pattern
Transactional Analytics Web
Availability •Clustering •Replication Capacity/Performance • Storage Class • De-duplication/Compression/Thin Provisioning
Security & Compliance
• Encryption • Archival/WORM
Data storage and retrieval services
Plan Deploy Optimize
Legacy high-function (external) storage systems Portable storage software on
commodity hdwr
Public Cloud Private Cloud Hybrid Cloud Bare Metal Cloud
Software Defined Storage
Platinum Gold Silver Bronze Authentication/Auditing Encryption Mirroring/DR High Availability Striping Clustering Compression Tiering/ILM
Backup & Recovery
Deduplication
Security and Availability
Performance and Opt.
St or ag e Ser vices L ay er RESILIENCY CAPABILITY OPTIMIZATION FABRIC MANAGEMENT
SOFTWARE DEFINED STORAGE
• Storage Abstraction • Storage Provisioning • Storage Monitoring • SAN/GPFS/NAS/DAS • •FC/FCoE/iSCSI/ Infiniband •Zone management • Storage replication • Disaster recovery • Consistency groups • Backup HETEROGENEITY • Storage tiers
• Performance aware placement • Continous optimizations • Migration SOFTWARE DEFINED COMPUTE SOFTWARE DEFINED NETWORK
SDN vs. SDS
• Consensus on Definition • OpenFlow Switches as De
Facto Devices
• Wide Area Networks
• Benefit Big Network Users • IP Network Focus
• Support Applications
• No Clear Definition Yet • Heterogeneous Types of
Storage Containers
• Data Center Deployment • Ensure QoS & Efficiency • Virtual Machine Focus
• Integration with SDN and Compute
CRIS Research Summary
22
Current Sponsor Companies
Two
Memberships
One
• Research on New Storage Technologies (Flash Memory based SSD, PCM, Shingled Write Disks: (Seagate, LSI, SGI and Western Digital (HGST))
• Research on New Storage Hierarchies (multi-level caching/prefetching, data allocation/migration, and tiered storage: (HP, NetApp and Dell)
• Cloud Storage and Big Data (HP, NetApp, FedCentric and NEC-Labs)
• I/O Workload Characterization and Synthetic Workload Generation (Seagate, Xyratex and NetApp)
24
New Storage Technologies
Flash Memory based SSD
FTL Design
PCM Prototype
Shingled Write Disk Design and Layout
Challenges in New Technologies
• Investigating and Understanding Fundamental Properties
• Research of Design Issues
• What are their impacts on applications? • How to effectively integrate the new
technologies into existing memory/storage hierarchies?
26
5/28/2014 26
Summary of SSD Research Results
• Robust and Reliable Design of SSDs
• Integrating SSDs into Storage Hierarchy
• New FTL Design: A Convertible FTL Design • Efficient Wear-Leveling Algorithm
• Optimal/Efficient Read/Write Caching • Hot and Cold Data Classification
• Bloom Filter Design and Key-Value Store Based on Flash Memory
• Using Sampling Technique for Meta-Data Management in FTL
28
• NVM Replaces DRAM as Main Memory • NVM to Be Used As A Cache • DRAM+NVM
Non-Volatile Memory
CPU NVM HDD Main Memory Storage CPU NVM SSD Main Memory Storage DRAM SSD CPU NVM Main Memory StorageNew Memory and Storage
Hierarchies
• Data Storage • Data Migration • Multi-Level Caching • Data Prefetching • Tiered Storage• “In-place Update”: many small bands – Protect previously-written data by
Read-Modify-Write
– Behaves similar to regular disks
• “Out-of-place Update”: few large band – Maintain data in circular log structure
• Data Addition to head pointer • Data removal from tail pointer – LBA-to-PBA mapping is not fixed
– Transfer random writes into sequential write – Compromise sequential read performance
Possible Methods
Indirected Addressing Higher Space overhead Defragmentation (Garbage Collection) Write Amplification32
• How to build large scale storage systems with SSD or SWD?
• Modeling multi-channel multi-chip SSD
• Investigating SSD reliability and performance with a wide set of metrics
• Investigating the impact of non-volatile memory as main memory
• Revisit FTL design issues for SSD when SSDs are composed of a large storage system
instead of caching devices
Current Research Focuses on New
Storage Technologies
Storage Layer Management and
Caching
off off On SSD Read Queues (RT) Read Queues (Prefetch) Write Queues (Offloading)Big Memory with PCM
When/ Where/how much
Cloud Storage
34
Local Storage + Cloud Storage
NAND Flash Package with Integrated ECC
and General Purpose Processor
Host CPU DDR PCIe SSD Controller Block Management Data buffer Host communication DDR Wear Leveling Garbage Collection … …
NAND Flash Package
NAND Flash Die NAND Flash Die … … … … ECC Processor
NAND Flash Package
NAND Flash Die NAND Flash Die … … ECC Processor
NAND Flash Package
NAND Flash Die NAND Flash Die … … … … ECC Processor
NAND Flash Package
NAND Flash Die NAND Flash Die … … ECC Processor Manufacturers incorporated hardware in flash package
36
Accelerating Hadoop on SGI UV2000(In-Memory System)
Hadoop & MapReduce Are
for Data Intensive Applications
How to Speed Up in High
• Emphasize more on Virtual Machine environment
• Ensure QoS support for VMs in Cloud (VDI as An Application)
• How data deduplication can be applied in cloud + big data (more on primary storage dedupe)?
• Integration of cloud and local storage • Integration of various file systems with
federated file system
Research Focuses of Cloud Storage +
Big Data
38
Framework of I/O Workload
Characterization
Original trace Workload Parameters Synthetic trace Workload characterization Adjusted Parameters Parameter adjustment Workload generation Replay by workload replayer Replayed trace Changes to applications and /or system ( either host orstorage)
Arrival pattern, File/Data access pattern in the
form of parameters Replay on same/different storage system Action Output Comparison 2 Comparison 1 Comparison 3
• Completed a tool for I/O workload
characterization and generation for parallel file systems
• Hfplayer v.2 (replay engine) is now available • Proposed a new cache replacement scheme
for non-volatile memory as main memory and disk as storage device
• A detailed design of integrating cloud storage with local storage
• Proposed a journaling based scheme for SSD
Recent Accomplishments
40
• Further Integration with block I/O, parallel file system I/O and replay engine
• How to improve the performance of storage systems?
• I/O workload phase detection
• How to apply knowledge in I/O workload to multi-level caching?
Research Focuses on I/O Workload
Characterization and Generation
Conclusions
• Storage Research Face Challenges from Applications (Big Data, Long-Term Data Preservation, Cloud Storage, Scalability)
• Also Face Challenges from New Technologies (Emerging Memory/Storage Hierarchies)
• Integrated Approach Including Compute,
Storage and Network Systems Consideration Is A Must (SDS???)
42 42