Big Data
and Trusted Information
CAS Single Point of Truth – 7. Mai 2012
2 2
The Hype
“most enterprise data warehouse (EDW) and BI teams currently lack a clear understanding of big data technologies… They are increasingly asking the question,
"How can we use big data to
deliver new insights?"
Gartner 2012
Searches
for "big data" on Gartner's website have increased981%
between March 2011 -October 2011Big Data - We are at a huge inflection point and this opportunity comes only once.
We are declaring that IBM is the
#1 leader
in providing a Big Data platform.Alyse Passarelli, WW VP IM Sales Jan 10th 2012
2012 will be the year of 'big data'
BBC Nov 30 2011
“Big Data: The next frontier for innovation,
competition and productivity”
V
3Variety
Optimize capital investments
based on
6 Petabytes
of information
Volume
Analyze
100k records/
second
to address customer
satisfaction in real time
Velocity
Analyze
telemetry, fuel
consumption, schedule and
weather patterns
to optimize
shipping logistics.
4
IBM’s Big Data Platform Vision
Big Data Enterprise Engines
IBM Big Data Solutions
Internet Scale Analytics
Streaming Analytics
Developers End Users Administrators
Big Data User Environments
Bringing Big Data to the Enterprise
Client and Partner Solutions
Open Source Foundational Components
Hadoop HBase Pig Lucene Jaql Linux Eclipse UIMA OpenCV
A GEN T S IN T EGR A T ION In fo rm a tio n S e rv e r Marketing Warehouse Appliances Data Warehouse Database Content Analytics Business Analytics Master Data Mgmt InfoSphere Warehouse Netezza InfoSphere MDM DB2 Cognos & SPSS Unica Data Growth Management InfoSphere Optim ECM
Forrester Research Study 2012
• Data volume – 75%
• Analysis driven requirements – 58%
• Data diversity – 52%
• Existing transactional data – 75%
• Sensor / device data – 58%
• Social media – 52%
Data sources
for
Big Data
Requirements
for
Big Data
6 6 6
Data Warehouse
CGR 2010-15 : 8.5%
Big Data
2010-15 CGR: 13.8%
Big Data is a key growth adjacency for data warehouse
Soruce: GMV 1H2012 2H2011 and IBM MI estimates
DW Appliance
Merging the Traditional and Big Data Approaches
IT
Structures the
data to answer
that question
IT
Delivers a platform to
enable creative
discovery
Business
Explores what
questions could be
asked
Business Users
Determine what
question to ask
Monthly sales reports Profitability analysis Customer surveys
Brand sentiment Product strategy
Maximum asset utilization
Big Data Approach
Iterative & Exploratory Analysis
Traditional Approach
8 8 8
Vestas optimizes
capital investments
based on
2.5
Petabytes
of
information.
Model the weather to optimize
placement of turbines,
maximizing power generation
and longevity.
Reduce time required to identify
placement of turbine from weeks
to hours.
Incorporate 2.5 PB of structured
and semi-structured information
flows. Data volume expected to
grow to 6 PB.
Millions of events per second Microsecond Latency Traditional / Non-traditional data sources
Real time delivery
Powerful Analytics Algo Trading Telco churn predict Smart Grid Cyber Security Government / Law enforcement ICU Monitoring Environment Monitoring
A Platform to Run In-Motion Analytics
on BIG Data
Volume
Terabytes per second
Petabytes per day
Variety
All kinds of data
All kinds of analytics
Velocity
Insights in microseconds
10
Enterprise Integration
Trusted Information &
Governance
–
Companies need to
govern what comes in,
and the insights that
come out
Data Management
–
Insights from Big Data
must be incorporated into
the warehouse
Big Data Platform
Data Warehouse
Enterprise
Integration
One Example - The 360 Multi-Channel Customer Sentiment Analysis
Master Data Management
Business Processes
Big Data Platform
Call Detail Reports
(CDRs)
Call Behavior and Experience Insight Data Warehouse Website Logs Social Media Streaming Analytics Internet Scale Analytics
Web Traffic and Social Media Insight Events and Alerts Information Integration Cognos Consumer Insight Campaign Management
© 2011 IBM Corporation 12
Big Data Enterprise Engine
IBM Big Data Solutions
Developers End Users Administrato rs
Big Data User Environment
Client and Partner Solutions
Languages Orchestration Prioritization Quality of Service Optimizations
Storage and Indexing
Operators Applications Cognos Applications InfoSphere Information Server Cubing Services InfoSphere Warehouse Operational Data Store
Traditional data sources (ERP, CRM, databases, etc.)
12
Big Data is an integral part of the Enterprise Data Platform
Big Data Platform
Source Data from every source
(Web, sensor, data, network, social, RFID, media)
• Control point for data starting from the instant it enters the enterprise • High fidelity for all data without changing its original format.
• Source data available for new uses, analyses, and integrations.
Common
Metadata
Repository
Trusted Information Delivery Architecture
Information Analyzer
Source Systems Transformation &
Harmonisation Target Systems
Staging & Error Tables
Business Terms Specifications Development Infrastructure Reports
DQ Dashboard
14
Information Server – Hadoop Integration
Business Value:
Fueling and helping organizations leverage big data analysis across the enterprise.
• Exchange of information with big data
sources
• Move enterprise information into big data sources so it can be included in analytics
• Take analytical results of Hadoop and apply them into other IT solutions
• Parallelism and scale
• Support for HDFS provides massive scalability via the Information Server parallel engine
• Lineage of jobs with Big Insights
source/target steps
• Using extensibility feature in Information Server
Information Server - Netezza Integration
Business Value:
Improves performance and accelerates time to value for organizations using
•
Netezza Next Generation Connector (with migration
tool to replace current Netezza Enterprise stage)
• Scalable, high-performance data exchange for DataStage, QualityStage and Info Analyzer
• Shared metadata across Information Server • Enhanced lookups, statistics, other functions
•
Balanced Optimization for Netezza
• Execute either traditional ETL on the Information Server engine or push parts/all the processing into the Netezza appliance • Maximizes performance where data is already in Netezza
•
CDC and CDD for Netezza
• Enable captured changes to be applied directly to Netezza (available today via User Exit from services, productization planned for next major release)
Netezza Data
Warehouse Appliance
16
Conclusions
Big Data enhances the BI portfolio
– Larger data volumes (petabyte compared to terabytes) – Access to new sources (Internet, unstructured, sensor
data)
– Real time analysis of data streams – Explorative analytics
Businesses already get competitive advantages out of Big Data
However, BI maturity in most companies is low to medium – Cross domain analysis
– Predictive analysis – Real-time DWH
– Analytical process support
DWH with Trusted Information remains the base for enterprise analytics
– Integration tools and DWH have adapted to the new technologies IT Structure s the data to answer that question IT Delivers a platform to enable creative discovery Business Explores what questions could be asked Business Users Determine what question to ask Monthly sales reports Profitability analysis Customer surveys Brand sentiment Product strategy Maximum asset utilization
Big Data Approach Iterative & Exploratory Analysis Traditional Approach
Structured & Repeatable Analysis
Master Data Management Business Processes Big Data Platform Call Detail Reports (CDRs)
Call Behavior and Experience Insight Data Warehous e Website Log s Social Med ia Streaming Analytics Internet Scale Analytics
Web Traffic and Social Media Insight Events and Alerts Information Integration Cognos Consumer Insight Campaign Management