Information is Exploding
EVERY MINUTE
120 HOURS
VIDEO UPLOADED TO YOUTUBE
50,000
APPS DOWNLOADED204
MILLION E-MAILS
EVERY DAY
Intel Corporation 2015
The Data is Changing
3
Performance
Optimized
Capacity
Optimized
Data Type
Structured
Unstructured
Record Size
Kilobytes or less
Megabytes to
Terabytes
Data Updates
Frequent
Rare/never
Access
Frequency
Heavy
Light
Metadata
Fixed
Variable
Scale Required
Up to Terabytes
Exabytes
Copyright 2014 IDC
“Unstructured Data accounts for 70-80% of storage capacity growth”
Ashish Nadkarni, IDC
1.
Scale Out
2.
Software Defined
3.
Smart Data
4
Scale-Out Economics
• Start Small – Scale Large
•
Start from a single node (TBs) but
have the ability to scale to multiple
independent nodes (PBs)
•
RAIN Architecture
•
Granular Resource
Scaling
Add CPUs and storage
independently as needed
•
Take advantage of decreasing
storage costs and increased storage
densities
Software Defined Storage
Tokyo New York. London
Deep Archive
Analytics
Modern Apps
Object
HyperStore Smart Storage Platform
HDFS
File
100% S3
Always On
Smart Protect
Multi Datacenter
Smart Policies
Enterprise Grade
The Era of
Smart Data
Storage
7
DATA STORAGE = problem
SMART DATA STORAGE = solution
Active
Timely Insight
Meaning
Actionable
Business Value
DATA
INFORMATION
Passive
Delayed Analytics
Static Data
OBJECT STORE
HYPERSTORE ANALYTICS
Smart Data –
Analytics
in Place
8
Consumer Activity
(Events, GPS, WiFi)
Social Media
Device Tracking and Logs
Result of Analysis
Cloudian HyperStore
I N T E R N E T O F T H I N G S
B I G DATA
Fast
Efficient
Better business decisions
Event processing
platform
Benefits
Faster time-to-decision
Analyze more – allows for efficient bulk data analysis in place No redundant storage of data
HyperStore scales out with your data – adding nodes for I/O Take advantage of multi-core CPUs – makes sense for
MapReduce
Can feed smarter data to subsequent analytic systems
Analytics
COST EFFICIENT
9
Cloudian & Hortonworks
YARN : Data Operating System
Script Pig Search Solr SQL Hive/Tez, HCatalog NoSQL HBase Accumulo Stream Storm Others In-Memory Analytics, ISV engines 1 ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° N Batch Map Reduce
Linux Windows On-Premise
Cloud
HDFS
S3 Native File System (URI scheme: s3n)
• HDFS Shell Commands
• File I/O Operations
• Mass Upload
• ETL with Pig
• Standard Map Reduce
• Analysis with Hive
•
Availability
–
Peer to peer storage system
•
Locality – Data Center Locality
–
Can enforce constraints on the location of Hadoop data and
maintain locality of reference for Hadoop
–
Hadoop can be run on storage nodes
•
Efficiency
–
Erasure Coding for efficient bulk data storage
–
Scale Cluster on demand as needed – dynamic rebalance
–
Multi-part uploading to improve large object uploads
•
Rich metadata
–
Example Pig can load filtered data directly from Cloudian
HyperStore without passing for HDFS
•
A = LOAD 's3n://BUCKET' USING CloudianStorage();
•
B = FILTER A BY (time >= '2015/02/16') AND (time <= '2015/02/20');
10
11
Use Cases
Hadoop for Internet of Things
Clickstream data Sentiment data
Server log data
Sensor data
Analysis of what people click on – Individual web pages and in what order.
Clickstream analysis can reveal how users research products and also how they complete their online purchases.
Internet Marketing Online Commerce
Unstructured data on opinions, emotions, and attitudes from sources like social media posts, blogs, online product reviews and customer support
interactions.
Organizations use sentiment analysis to understand how the public feels about something and track how those opinions change over time.
Retail
Media & Entertainment
Large enterprises build, manage and protect their own proprietary, distributed information networks. Server logs are the computer-generated records that report data on the operations of those networks.
When there is a problem, its one of the first places the IT team looks for a diagnosis.
IT Organizations Customer Support
From refrigerators and coffee makers to energy-measuring smart meters, sensor data is everywhere. It is created by the machinery that runs assembly lines and the cell towers that route our phone calls. It is net new data that is increasing exponential in the information age.
Manufacturing Industrial
Smart Support
12
Cloudian Support
HyperStore
Appliances
Hadoop
Cluster
HyperStore
Appliances
S3n://bucket/…Smart Support
Smart Support
Analytics
CUSTOMER
CLOUDIAN
Telemetrics
Data
Cloudian
HyperStore
Platform
Multi tenancy & QoS
Requests per Min Storage Bytes Storage Objects Inbound Bytes/Min Outbound Bytes/MinHyperStore Software Defined Storage
Tenant A Tenant B Tenant C Tenant A Tenant B Tenant C Storage Policies Tiering Access Control
Data Placement Data Access