• No results found

SAP HANA, HADOOP and other Big Data Tools

N/A
N/A
Protected

Academic year: 2021

Share "SAP HANA, HADOOP and other Big Data Tools"

Copied!
29
0
0

Loading.... (view fulltext now)

Full text

(1)

SAP

HANA, HADOOP

(2)

Big Data: Why now?

of Top 500 enterprises will

Fail to exploit Big Data

2

85%

of enterprises have no formal

concept for data management

5

>30%

digital data globally doubles

every two years

1

x2

of all data is unstructured and

90%

cannot be handled with traditional

analytics tools

1

10-50%

cost reduction in production

through Big Data exploitation

4

of all IT invest 2015 will be

Big Data driven

2

70%

1 IDC Predictions 2012 , 2 Gartner, Predicts 2012 .

(3)

Mobile database

In-memory database

Database

appliances

Cloud

database

Relational OLTP

Object

database

database

Graph

Document

database

Key-value

Traditional EDW

Column-store EDW

MPP EDW

Enterprise data warehouse

NoSQL (nonrelational)

Relational

Scale-out

relational

Traditional data sources

New data sources

CRM ERP Legacy apps

Public data Sensors Marketplace

Social media Geo-location

Source: Forrester Research, Inc.

The BI Ecosystem according to Forrester

(4)

Cost of a Terabyte of Enterprise Disk Storage

1990 – in the region of USD 9 million

2013 – in the region of USD 100

Cost of a Terabyte of RAM

1990 – in the region of USD 106 million

2013 – in the region of USD 500

i.e. over the last 20 years the price ratio of Memory to Storage has dropped from 1:12 to 1:5

But in real terms the drop in price is 200 000 times

Performance Comparison of Memory to Disk Read

Enterprise Disk – between 4 and 13 million nanoseconds

Memory – between 0.4 and 40 nanoseconds

i.e. between 150 000 and 1 million times faster when already in memory

(5)

Positioning Big Data Technologies

November 2013

Approaching and beyond mainstream adoption

Hadoop SQL Interfaces

Hadoop Distribution

(6)

Big Data tools complement existing BI investment

They do not replace them - Yet

Existing data sources

Business Intelligence Tools and analytical applications

Transactional

OLTP DBMS

Business

Applications

ERP, CRM, etc.

Data

Warehouse

Appliance

Data Mart

Cube

Reporting

Dashboard

OLAP

Data & Text Mining

(7)

Big Data tools complement existing BI investment

They do not replace them - Yet

Hadoop,

NoSQL,

Log-Data

In-Memory

Database

Static data

Flowing data

Real-time data

processing and

analysis

Complex event

processing

Structured and

unstructured data

New data sources

Operational

Intelligence

Predictive

Analytics

Existing data sources

Business Intelligence Tools and analytical applications

Transactional

OLTP DBMS

Business

Applications

ERP, CRM, etc.

Data

Warehouse

Appliance

Data Mart

Cube

Reporting

Dashboard

OLAP

Data & Text Mining

(8)

The 3 V’s of Big Data

Business

Problem

Technology

Solution

Backward-looking

analysis

Using data out of

business applications

SAP HANA

Cloudera Hadoop

Hortonworks Hadoop

Structured

Limited (2 – 3 TB in RAM)

Structured

Limited (1 PB in RAM)

Structured or unstructured

Quasi unlimited

(20 – 30 PB)

Legacy BI

High performance BI

„Hadoop“ Ecosystem

Selected Vendors

Data Type/Scalability

SAP Business Objects

IBM Cognos

MicroStrategy

Quasi-real-time,

In-memory analysis

Using data out of

business applications

Complex Event

Processing

Batch, Forward-looking

predictive analysis

Questions defined in the

moment, using data

from many sources

(9)

HADOOP vs In-Memory analytics

How fast

do you want your

delivery made?

What

is being delivered?

do you want to spend?

How much

Do you have

specialist drivers?

(10)

HADOOP vs In-Memory analytics

Hadoop

(with Impala)

MPV

Good performance

Capacity

Easy to drive

Affordable

Hadoop

(without Impala)

Long Haul Trucks

Excellent Capacity

Drives overnight

Moderate performance

Needs a specialist driver’s license

IMA

Ferrari

Sexy

Very fast

(11)

HADOOP vs In-Memory analytics

Some Hadoop improvements

Cloudera’s Hadoop offerings

when you buy the Trucks they throw in the MPV's for free

Hadoop becomes easier and easier to use

With the ecosystem of contributors and distributions

e.g. Cloudera’s Impala, Microsoft’s HDInsight, MapR’s Drill, Hortonworks’ Stinger Initiative

Hadoop 2.0

brings YARN, Graph Analysis and Stream Processing

The speed of improvements in HDFS/HBase/Hive/Yarn

The gap between batch and real-time/low-latency is going to be cut fairly soon

(12)

Use case segmentation drives solution design and

technology selection

Real-time Reporting of SAP OLTP data, including joins

and data transformations

Summarise Unstructured DATA LOGS (scheduled)

Realtime reporting of Summarised Data Logs, with Joins

to other NON OLTP Data

Near Realtime reporting of Social Media Data

Realtime reporting of recent OLTP data joined with

recent Social Media Data

Image Analysis Processing (scheduled)

Image Analysis Reporting

Predictive Analysis Reporting (comparing OLTP & NON

OLTP DATA)

SAP HANA

HADOOP MAP/REDUCE

IMPALA

IMPALA + HADOOP MAP/REDUCE (scheduled to collect recent Social Media Data)

HANA + HADOOP MAP/REDUCE (scheduled to collect recent Social Media Data

and load into HANA)

HADOOP MAP/REDUCE (scheduled job runs sophisticated analysis of Video files

and stores results in a structured file)

IMPALA (to report on results file)

HANA + HADOOP MAP/REDUCE (scheduled to collect & transfer applicable

Historic or relevant Non OLTP Data to HANA)

(13)

The NEW Real time analytics with SAP HANA &

Hadoop

Integrate and federate

non-SAP

SAP

In-Memory

Hadoop

MapReduce/Batch C

Computing engine

SAP HANA

Hadoop

UI/Front end analytics

SAP

ERP/DW

Sybase ASE & IQ

3

rd

party DBMS

Sybase ESP

SAP

LIVE & UI Analytics

Mobile & Embedded

Applications

non-SAP BI

SLT

DXC

ETL

Smart

Access

SAP DS

Smart

Access

(14)

Learning some of the language of Big Data

Jaspersoft

Karmasphere

Studio

Talend

Pentaho

Continuity

NoSQL

MongoDB

Cassandra

CouchDB

Redis

Riak

Neo4j

Platfora

Tableau

Splunk

Shep

Hadoop

MapReduce

ZooKeeper

Avro

Nutch

HDFS

Matlab

R

Python

JRuby

Ruby

Java

C++

Kafka

InfoChimps

Skytree

GreenPlum

Aster

GoPivotal

Hive

Pig

Hbase

Chukwa

Yarn

(15)

The other Big Data tools

Once you have a data store and a means of accessing the data.

Operational

Intelligence

Platform

Video search, audio

search and content

analytics

Text search

Graph

databases

Complex event

processing

In-memory

data grid

recognition

Speech

Pattern

recognition

(16)

Some new roles in data/analytics

The coming of age of data in the enterprise

The Data

Scientist

The Chief

Data Officer

Data Explorer

Campaign

Expert

Data Security

Officer

Business Solution

Architect/ Domain

Expert

Data Hygienist/

Data Steward

expected until 2018

Big Data talent gap

(17)

external online sources Facebook Twitter LinkedIn Google+ YouTube TomTom MarketWatch Financial Times Bloomberg

the information-driven Transport & logistics & Retail provider

new customer base Financial

Industry AuthoritiesPublic ResearchMarket SME Retail commercial data

services

Adress Verification Market Intelligence Supply Chain Monitoring Environmental Statistics Marketing And Sales Product Management Operations New Business Order volume,

received service quality

Customer sentiment and feedback

Location, Destination, Availability

Network flow data

Network flow data

Real-time incidents Market and Customer Intelligence

Location, traffic density, directions, delivery sequence Continuous

sensor data existing customer base

High-Tech / Pharma Manufacturing / FMCG Commerce Sector Households / SME

real-time route optimization Delivery Routes are dynamically calculated based on delivery sequence, traffic conditions and recipient status.

1

2

consolidated pickup and delivery

Carriers of multiple existing fleets are leveraged to pick up or deliver shipments along routes they would take anyway.

3

strategic network planning Long-term demand forecasts for transport capacity are generated in order to support strategic investments into the network.

4

operational capacity planning

Short- and mid-term capacity planning allows optimal utilization and scaling of manpower and resources.

5

customer loyalty management

Public customer information is mapped against business parameters in order to predict churn and initiate countermeasures.

6

service improvement and product innovation

A comprehensive view on customer

requirements and service quality is used to enhance the product portfolio.

7

risk evaluation and resilience Planning

By tracking and predicting events that lead to supply chain disruptions, the resilience level of transport services is increased manpower and resources.

8

market intelligence for sme

Supply chain monitoring data is used to create market intelligence reports for small and medium-sized companies.

9

financial demand and supply chain analytics

A micro-economic view is created on global supply chain data that helps financial institutions improve their rating and investment decisions.

10

address verification

Fleet personnel verifies recipient addresses which are transmitted to a central address verification service provided to retailers and marketing agencies.

11

environmental intelligence

Sensors attached to delivery vehicles produce fine-meshed statistics on pollution, traffic density, noise, parking spot utilization etc.

(18)

smartPORT logistics

developed by T-Systems, Deutsche Telekom Innovation Laboratories,

SAP Research and Hamburg Port Authority

Only location-based

information

sent to driver, thanks to geo-fencing

Precise communications

thanks to real-time data and

smart devices

Stakeholder integration

Incl. port authority, forwarding agents, terminal and parking lot operators, plus others as required (sea shipping

companies etc.)

5-10 minutes saved per tour

means one more pick-up per day

Portal provides transparency

for all stakeholders, with role-based access

Cloud solution

collects all relevant real-time information in one place

Greater Efficiency for truck and container movements

(19)

100 %

compliance with legal requirements

Up to 20 %

lower costs1)

Full

transparency

Up to 20 %

reduction in HR costs thanks to automation

Seamless

data flow

Rapid

reactions

Patient controlled

data distribution

VOLUME VELOCITY

VARIETY

VALUE

VOLUME VELOCITY

VARIETY

VALUE

Integration Consolidation Optimization

Processing & integrating smart data management

Factor of 5.8:

Potential growth by 20152)

Secured connection for error-free data transfer

Optimization and automation of processes Pinpointing guzzlers Intelligent management of medical care

Management of Devices Immediate availability

of patient and poc data Physicians, Specialists,

Family Doctors

Insurance

Hospitals & Pharma

Health care & Pharmagrids got smart

(20)

Summary

Data Volumes

are here to stay

In-Memory Computing

is becoming increasingly “affordable”

Hadoop is not your Big Data answer

it is part of your BI and Big

Data ecosystem

BI and Big Data Ecosystem

will likely benefit from other tools as well

An Enterprise Data Strategy and Data Governance

is

critical to success

(21)

Summary

Make sure you have two conversations in your enterprise

1

2

A Business

Conversation

about the business values from your BI

Ecosystem

An IT Conversation

to ensure your IT Organisation

understands the new world of BI, the

shortcomings, the strengths and roles

of the component technologies

(22)

Summary

“What matters is how — and why — vastly more

data leads to vastly greater value creation.

Designing and determining those links is typically

in the province of top management”

but needs to be facilitated by the IT Organisation

in Business terms

(23)

A parting thought: Big Data‘s 4 V‘s

VALUE

value comes from knowing more than the rest

ANALYTICS

(24)
(25)
(26)

HADOOP Innovation #1: Much cheaper storage

0.5 Petabytes

200,000 IOPS

8 Gbyte/sec

1 Petabyte

200,000 IOPS

10 Gbyte/sec

10 Petabytes

400,000 IOPS

250 Gbyte/sec

$1 Million

gets you

Software

HDS, bundled with

hardware by HDS

NetApp, bundled with

hardware by NetApp

Open source Hadoop ecosystem,

hardware self-assembled

Gigabyte

SAN Storage

NAS File Servers

Local Storage

(27)

Learning the language of Big Data

Colour coding key

Core Hadoop

Kernel/Modules

Hadoop DW Modules

NoSQL DB Platforms

MPP Analytics Platforms

Programming Languages

IDEs

Data Hubs

BI Suite

Analysis and Visualisation

Data Analysis Tool

Data Integration Tool

(28)

How use case segmentation drives solution design

and technology selection

(29)

Gartner hyper cycle for analytic applications

References

Related documents

In conclusion, for the studied Taiwanese population of diabetic patients undergoing hemodialysis, increased mortality rates are associated with higher average FPG levels at 1 and

Crescent Point also enters into physical delivery and derivative WTI price differential contracts which manage the  spread  between  US$  WTI  and  various 

‘Zefyr’ caused by Gnomonia fragariae in the greenhouse 11 weeks after inoculation: (A) Severe stunt of plants inoculated by root dipping in ascospore

The specificity of the approach that I suggest in this essay lies in the fact that the starting point of our analysis is neither the history of feminism nor the development of

F IGURE 4.37 T IME DOMAIN RESULTS OF ANISOTROPIC CYLINDER WITH INCLINED IMPACT AT THE MIDDLE CONSIDERING TIMBER POLE SITUATION ( DOWN GOING WAVE )

In designing Act 60, Vermont policymal(ers attempted to satisfy both mandates. To achieve both substantial equality in spending and wealth neutrality, the first tier of Act 60

It aimed to identify the difference in red blood cell profiles, namely hemoglobin (Hb), Mean Corpuscular Volume (MCV), Mean Corpuscular Hemoglobin (MCH), and

The objectives of the study were to assess the interrelationships between blood minerals measured by the IDEXX VetTest and the reference method, as well as the relationship