SAP
HANA, HADOOP
Big Data: Why now?
of Top 500 enterprises will
Fail to exploit Big Data
285%
of enterprises have no formal
concept for data management
5>30%
digital data globally doubles
every two years
1x2
of all data is unstructured and
90%
cannot be handled with traditional
analytics tools
110-50%
cost reduction in production
through Big Data exploitation
4of all IT invest 2015 will be
Big Data driven
270%
1 IDC Predictions 2012 , 2 Gartner, Predicts 2012 .
Mobile database
In-memory database
Database
appliances
Cloud
database
Relational OLTP
Object
database
database
Graph
Document
database
Key-value
Traditional EDW
Column-store EDW
MPP EDW
Enterprise data warehouse
NoSQL (nonrelational)
Relational
Scale-out
relational
Traditional data sources
New data sources
CRM ERP Legacy apps
Public data Sensors Marketplace
Social media Geo-location
Source: Forrester Research, Inc.
The BI Ecosystem according to Forrester
Cost of a Terabyte of Enterprise Disk Storage
•
1990 – in the region of USD 9 million
•
2013 – in the region of USD 100
Cost of a Terabyte of RAM
•
1990 – in the region of USD 106 million
•
2013 – in the region of USD 500
•
i.e. over the last 20 years the price ratio of Memory to Storage has dropped from 1:12 to 1:5
•
But in real terms the drop in price is 200 000 times
Performance Comparison of Memory to Disk Read
•
Enterprise Disk – between 4 and 13 million nanoseconds
•
Memory – between 0.4 and 40 nanoseconds
•
i.e. between 150 000 and 1 million times faster when already in memory
Positioning Big Data Technologies
November 2013
Approaching and beyond mainstream adoption
Hadoop SQL Interfaces
Hadoop Distribution
Big Data tools complement existing BI investment
They do not replace them - Yet
Existing data sources
Business Intelligence Tools and analytical applications
Transactional
OLTP DBMS
Business
Applications
ERP, CRM, etc.
Data
Warehouse
Appliance
Data Mart
Cube
Reporting
Dashboard
OLAP
Data & Text Mining
Big Data tools complement existing BI investment
They do not replace them - Yet
Hadoop,
NoSQL,
Log-Data
In-Memory
Database
Static data
Flowing data
Real-time data
processing and
analysis
Complex event
processing
Structured and
unstructured data
New data sources
Operational
Intelligence
Predictive
Analytics
Existing data sources
Business Intelligence Tools and analytical applications
Transactional
OLTP DBMS
Business
Applications
ERP, CRM, etc.
Data
Warehouse
Appliance
Data Mart
Cube
Reporting
Dashboard
OLAP
Data & Text Mining
The 3 V’s of Big Data
Business
Problem
Technology
Solution
Backward-looking
analysis
Using data out of
business applications
SAP HANA
Cloudera Hadoop
Hortonworks Hadoop
Structured
Limited (2 – 3 TB in RAM)
Structured
Limited (1 PB in RAM)
Structured or unstructured
Quasi unlimited
(20 – 30 PB)
Legacy BI
High performance BI
„Hadoop“ Ecosystem
Selected Vendors
Data Type/Scalability
SAP Business Objects
IBM Cognos
MicroStrategy
Quasi-real-time,
In-memory analysis
Using data out of
business applications
Complex Event
Processing
Batch, Forward-looking
predictive analysis
Questions defined in the
moment, using data
from many sources
HADOOP vs In-Memory analytics
How fast
do you want your
delivery made?
What
is being delivered?
do you want to spend?
How much
Do you have
specialist drivers?
HADOOP vs In-Memory analytics
Hadoop
(with Impala)
MPV
Good performance
Capacity
Easy to drive
Affordable
Hadoop
(without Impala)
Long Haul Trucks
Excellent Capacity
Drives overnight
Moderate performance
Needs a specialist driver’s license
IMA
Ferrari
Sexy
Very fast
HADOOP vs In-Memory analytics
Some Hadoop improvements
Cloudera’s Hadoop offerings
when you buy the Trucks they throw in the MPV's for free
Hadoop becomes easier and easier to use
With the ecosystem of contributors and distributions
e.g. Cloudera’s Impala, Microsoft’s HDInsight, MapR’s Drill, Hortonworks’ Stinger Initiative
Hadoop 2.0
brings YARN, Graph Analysis and Stream Processing
The speed of improvements in HDFS/HBase/Hive/Yarn
The gap between batch and real-time/low-latency is going to be cut fairly soon
Use case segmentation drives solution design and
technology selection
Real-time Reporting of SAP OLTP data, including joins
and data transformations
Summarise Unstructured DATA LOGS (scheduled)
Realtime reporting of Summarised Data Logs, with Joins
to other NON OLTP Data
Near Realtime reporting of Social Media Data
Realtime reporting of recent OLTP data joined with
recent Social Media Data
Image Analysis Processing (scheduled)
Image Analysis Reporting
Predictive Analysis Reporting (comparing OLTP & NON
OLTP DATA)
SAP HANA
HADOOP MAP/REDUCE
IMPALA
IMPALA + HADOOP MAP/REDUCE (scheduled to collect recent Social Media Data)
HANA + HADOOP MAP/REDUCE (scheduled to collect recent Social Media Data
and load into HANA)
HADOOP MAP/REDUCE (scheduled job runs sophisticated analysis of Video files
and stores results in a structured file)
IMPALA (to report on results file)
HANA + HADOOP MAP/REDUCE (scheduled to collect & transfer applicable
Historic or relevant Non OLTP Data to HANA)
The NEW Real time analytics with SAP HANA &
Hadoop
Integrate and federate
non-SAP
SAP
In-Memory
Hadoop
MapReduce/Batch C
Computing engine
SAP HANA
Hadoop
UI/Front end analytics
SAP
ERP/DW
Sybase ASE & IQ
3
rdparty DBMS
Sybase ESP
SAP
LIVE & UI Analytics
Mobile & Embedded
Applications
non-SAP BI
SLT
DXC
ETL
Smart
Access
SAP DS
Smart
Access
Learning some of the language of Big Data
Jaspersoft
Karmasphere
Studio
Talend
Pentaho
Continuity
NoSQL
MongoDB
Cassandra
CouchDB
Redis
Riak
Neo4j
Platfora
Tableau
Splunk
Shep
Hadoop
MapReduce
ZooKeeper
Avro
Nutch
HDFS
Matlab
R
Python
JRuby
Ruby
Java
C++
Kafka
InfoChimps
Skytree
GreenPlum
Aster
GoPivotal
Hive
Pig
Hbase
Chukwa
Yarn
The other Big Data tools
Once you have a data store and a means of accessing the data.
Operational
Intelligence
Platform
Video search, audio
search and content
analytics
Text search
Graph
databases
Complex event
processing
In-memory
data grid
recognition
Speech
Pattern
recognition
Some new roles in data/analytics
The coming of age of data in the enterprise
The Data
Scientist
The Chief
Data Officer
Data Explorer
Campaign
Expert
Data Security
Officer
Business Solution
Architect/ Domain
Expert
Data Hygienist/
Data Steward
expected until 2018
Big Data talent gap
external online sources Facebook Twitter LinkedIn Google+ YouTube TomTom MarketWatch Financial Times Bloomberg
the information-driven Transport & logistics & Retail provider
new customer base Financial
Industry AuthoritiesPublic ResearchMarket SME Retail commercial data
services
Adress Verification Market Intelligence Supply Chain Monitoring Environmental Statistics Marketing And Sales Product Management Operations New Business Order volume,
received service quality
Customer sentiment and feedback
Location, Destination, Availability
Network flow data
Network flow data
Real-time incidents Market and Customer Intelligence
Location, traffic density, directions, delivery sequence Continuous
sensor data existing customer base
High-Tech / Pharma Manufacturing / FMCG Commerce Sector Households / SME
real-time route optimization Delivery Routes are dynamically calculated based on delivery sequence, traffic conditions and recipient status.
1
2
consolidated pickup and deliveryCarriers of multiple existing fleets are leveraged to pick up or deliver shipments along routes they would take anyway.
3
strategic network planning Long-term demand forecasts for transport capacity are generated in order to support strategic investments into the network.
4
operational capacity planning
Short- and mid-term capacity planning allows optimal utilization and scaling of manpower and resources.
5
customer loyalty management
Public customer information is mapped against business parameters in order to predict churn and initiate countermeasures.
6
service improvement and product innovation
A comprehensive view on customer
requirements and service quality is used to enhance the product portfolio.
7
risk evaluation and resilience Planning
By tracking and predicting events that lead to supply chain disruptions, the resilience level of transport services is increased manpower and resources.
8
market intelligence for sme
Supply chain monitoring data is used to create market intelligence reports for small and medium-sized companies.
9
financial demand and supply chain analytics
A micro-economic view is created on global supply chain data that helps financial institutions improve their rating and investment decisions.
10
address verification
Fleet personnel verifies recipient addresses which are transmitted to a central address verification service provided to retailers and marketing agencies.
11
environmental intelligence
Sensors attached to delivery vehicles produce fine-meshed statistics on pollution, traffic density, noise, parking spot utilization etc.
smartPORT logistics
developed by T-Systems, Deutsche Telekom Innovation Laboratories,
SAP Research and Hamburg Port Authority
Only location-based
information
sent to driver, thanks to geo-fencingPrecise communications
thanks to real-time data andsmart devices
Stakeholder integration
Incl. port authority, forwarding agents, terminal and parking lot operators, plus others as required (sea shipping
companies etc.)
5-10 minutes saved per tour
means one more pick-up per dayPortal provides transparency
for all stakeholders, with role-based accessCloud solution
collects all relevant real-time information in one place
Greater Efficiency for truck and container movements
100 %
compliance with legal requirements
Up to 20 %
lower costs1)
Full
transparency
Up to 20 %
reduction in HR costs thanks to automation
Seamless
data flow
Rapid
reactions
Patient controlled
data distribution
VOLUME VELOCITY
VARIETY
VALUE
VOLUME VELOCITY
VARIETY
VALUE
Integration Consolidation OptimizationProcessing & integrating smart data management
Factor of 5.8:
Potential growth by 20152)
Secured connection for error-free data transfer
Optimization and automation of processes Pinpointing guzzlers Intelligent management of medical care
Management of Devices Immediate availability
of patient and poc data Physicians, Specialists,
Family Doctors
Insurance
Hospitals & Pharma