1
Impact of Big Data growth On Transparent Computing
Michael A. Greene
Intel Vice President, Software and Services Group, General Manager, System Technologies and
Optimization
2
Transparent Computing (TC)
TC is a user controlled cloud computing.
– Prof. Zhang Yaoxue
3
Transparent Computing Vision
User can get any info, any application or any
Operating System from any devices transparently.
4
Transparent Computing (TC) is facing Data growth challenges in cloud
Data
Data is the Key driver
behind TC services
5
Data Collection
Transparent Computing (TC) Challenges
TC Data challenges comes from 4 areas
Data Process
Data Storage
Data Access
Data
6
Billions
connected users sharing
5.3B
CELL PHONES
629M
364M
HOTMAIL
663M
SKYPE
273M
YAHOO
7
Billions
connected users sharing
5.3B
CELL PHONES
629M
364M
HOTMAIL
663M
SKYPE
273M
YAHOO
8
Billions
connected users sharing
5.3B
CELL PHONES
629M
364M
HOTMAIL
663M
SKYPE
273M
YAHOO
>1500 Exabytes
of cloud traffic
9
Billions
connected users sharing
5.3B
CELL PHONES
629M
364M
HOTMAIL
663M
SKYPE
273M
YAHOO
>1500 Exabytes
of cloud traf c
10
>1500 Exabytes
of cloud traf c
Billions
connected users sharing
5.3B
CELL PHONES
629M
364M
HOTMAIL
663M
SKYPE
273M
YAHOO
1400 Exabytes
of new integrated systems data
11
>1500 Exabytes
of cloud traf c
Billions
connected users sharing
5.3B
CELL PHONES
629M
364M
HOTMAIL
663M
SKYPE
273M
YAHOO
1400 Exabytes
of new integrated systems data
12
1400 Exabytes
of new integrated systems data
>1500 Exabytes
of cloud traf c
Billions
connected users sharing
5.3B
CELL PHONES
629M
364M
HOTMAIL
663M
SKYPE
273M
YAHOO
690% Growth
in storage capacity by 2015
Big Sensed Data
Big Corp Data Big Web Data
Structured Data Unstructured Data
Corporate Data
Time
Volume
13
1400 Exabytes
of new integrated systems data
>1500 Exabytes
of cloud traf c
Billions
connected users sharing
5.3B
CELL PHONES
629M
364M
HOTMAIL
663M
SKYPE
273M
YAHOO
690% Growth
in storage capacity by 2015
Big Sensed Data
Big Corp Data Big Web Data
Structured Data Unstructured Data
Corporate Data
Time
Volume
14
690% Growth
in storage capacity by 2015
Big Sensed Data Big Corp Data Big Web Data
Structured Data Unstructured Data
Corporate Data
Time Volume
1400 Exabytes
of new integrated systems data
>1500 Exabytes
of cloud traf c
Billions
connected users sharing
5.3B
CELL PHONES
629M
364M
HOTMAIL
663M
SKYPE
273M
YAHOO
What insights can we derive?
REPORTING ANALYSIS
MONITORING PREDICTION
COMPLEXITY
BUSINESS VALUE
Are you looking at Big Data?
75% Yes
5% No, but No on radar 20%
HOW ARE YOU APPROACHING
THE OPPORTUNITY?
15
Data
The Big Data Platform
Big Data Technology can solve TC
Data Challenges
16
Distribute analytics to the edge sensors/devices and drive a standards based connected, managed and secure architecture
Drive innovation in big data applications by providing optimized software stacks and services
Foster the growth of big data through partner collaboration, focused on usage model examples and reference deployment architectures
Intel Role in Big Data
Invest in solution research and academia collaboration
Accelerate big data analytics through faster and more effective CPU,
storage, I/O and network architectures
17
Intel® Intelligent Systems Framework:
Simplifying the Internet of Things
DRIVING SECURE
INTEROPERABILITY UNLOCKING EDGE DATA FILTERING DATA
Billions of devices that need to share data with each other
and the cloud
Edge systems need to react to streaming data
in real time
Data volume outpacing network and storage
efficiency
Data Collection
18
Up to four channels DDR3 1600 MHz memory
Up to eight cores Up to 20 MB cache Integrated
PCI Express*
3.0 Up to 40 lanes per socket
Platform and Software Optimizations for Hadoop
1 Performance comparison using best submitted/published 2-socket server results on the SPECfp*_rate_base2006 benchmark as of 6 March 2012.
2 Source: Intel internal measurements of average time for an I/O device read to local system memory under idle conditions comparing Intel® Xeon® processor E5-2600 product family (230 ns) vs.. Intel® Xeon® processor 5500 series (340 ns). See notes in backup for configuration details
* Other names and brands may be claimed as the property of others
•
• Up to 80% Performance Boost vs. Prior Generation – Intel® AVX - Reduce Compute Time
– Intel Turbo Boost
• Hadoop Optimizations
– Built on Open Source Releases
– Custom Tuning for Data Types and – Scaling Approaches
Data Process
19
Intel & Cloudera Strategic Partnership
CDH to be Performance-optimized for Intel Architecture
Support for Intel CPUs, Ethernet, SSD, security & future technologies Promote CDH as the
Hadoop Distribution of choice
Largest strategic shareholder in Cloudera
Data Process
20
“In Memory analytics”
are “Game Changing”
Near Real-time Insight Enabled by In-Memory Solutions
HANA
TimesTen In- Memory Database
Business Intelligence Enterprise Edition
+ +
Architecting for In Memory Model
VOLTDB 20 node VoltDB system can do what a 1000 node Hadoop cluster can do … Michael
Stonebreaker,
Objectivity
GraphDB SolidDB
Low Cost Memory Technology
$0
$10,000
$20,000
$30,000
$40,000
$50,000
Q4 2010 (DRAM) Q4 2016 (DRAM) 2016 (CR)
$/TB 20x Reduction
50 500 5000
1 2 4 8
Running time (s)
Socket Count
SAP HANA* Scalability
Customer Workload
Ideal
8S Glueless
Near-perfect scaling on Intel Xeon processor E7 family
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. Source: http://www.intel.com/content/www/us/en/high-performance-computing/high-performance-computing-xeon-e7-analyze-business-as-it-happens-with-sap- hana-software-brief.html
Data Process
21
Intelligent Storage Pays Off De-duplication
Intelligent Tiering Thin Provisioning Real Time
Compression
BEFORE DE-DUPLICATION AFTER
APPLI 1 APPLI 2 APPLI 3
TRADITIONAL
ALLOCATION THIN PROVISIONING
ALLOCATED BUT FREE USED ALLOCATED BUT FREE
USED
USED ALLOCATED BUT FREE
APPLI 1 APPLI 2 APPLI 3 SYSTEM-WIDE
CAPACITY RESERVED
Up to 80% data reduction 2
95% smaller backup 1
Up to 80% reduction
in disk expenses 3
1
IBM storage simulcast, November 9, 2011
2
BM storage simulcast, November 9, 2011
3
Dell “Fluid Data Storage: Driving Flexibility in the Data Center”, February 2011
4