Business Analytics | © TechTarget
Wayne W. Eckerson
Director of Research, TechTarget Founder, BI Leadership Forum
Architecting for Big Data
Analytics and Beyond:
A New Framework for Business
Intelligence and Data Warehousing
What comes next?
● Kilobyte (KB) – 103 bytes
● Megabyte (MB) –106 bytes
● Gigabyte (GB) – 109 bytes
● Terabyte (TB) –1012 bytes
● Petabyte (PB) – 1015 bytes
● –1018 bytes
● – 1021 bytes
● – 1024 bytes
3
Exabyte (EB) Zettabyte (ZB) Yottabyte (YB)
Information explosion
4
Every 18 months, non-rich structured and unstructured enterprise data doubles
2005 2006 2007 2008 2009 2010 2011 2012
Unstructured &
Content Depot Structured &
Replicated
Source: IDC Digital Universe 2009; White Paper, Sponsored by EMC, May 2009
Data deluge
● Structured data
- Call detail records - Point of sale records - Claims data
● Semi-structured data
- Web logs - Sensor data - Email, Twitter
● Unstructured data
- Video, Audio, - Images, Text
5
“A Sea of Sensors”, The Economist, Nov 4, 2010
Three “Big Data” revolutions
● Data warehousing (1995+)
● Analytical platforms (2005+)
● Hadoop ecosystem (2010+)
6 Business Analytics | © TechTarget
First revolution: data warehousing
7 Business Analytics | © TechTarget
BI Server
Operational System
Operational System
Data Data Mart
Warehouse
Reports / Dashboards
Operational System Operational
System
Data Warehouse
ETL ETL
Second revolution: analytical platforms
1010data
Aster Data (Teradata) Calpont
Datallegro (Microsoft) Exasol
Greenplum (EMC) IBM SmartAnalytics Infobright
Kognitio
Netezza (IBM) Oracle Exadata Paraccel
Pervasive
Sand Technology SAP HANA
Sybase IQ (SAP) Teradata
Vertica (HP)
Purpose-built database
management systems designed explicitly for query processing
and analysis that provides dramatically higher price/performance and
availability compared to general purpose solutions.
Deployment Options
-Software only (Paraccel, Vertica) -Appliance (SAP, Exadata, Netezza) -Hosted(1010data, Kognitio)
Game-changing technology
● Purpose built
- For analytics in general
- For specific analytic workloads
● Quicker to deploy
- Preconfigured and tuned - Fast ROI
● Faster and more scalable
- Faster query response times - Linear performance
● Built-in analytics
- Libraries of functions - Extensible SDK
● Less costly
- Less power, cooling, space - Fewer people to maintain
Business value of analytic platforms
● Kelley Blue Book –
Consolidates millions of auto transactions each week to calculate car valuations
● AT&T Mobility – Tracks
purchasing patterns for 80M customers daily to optimize targeted marketing
● CBS Interactive – Analyzes Web visitor behavior to
optimize content/ad
placement and revenue
MPP Analytical
Database Analytical appliance
Hadoop + Analytical database
Third Revolution - Hadoop
Business Analytics | © TechTarget 11
•Open source projects
•Hosted by Apache Foundation
•Initially developed by Google, Yahoo, etc.
•Offers scale out architecture on commodity servers with direct attached storage
Click to edit Master title style
Hadoop distilled
12
Open Source $$
MapReduce
“Schema at Read”
Unstructured data
BIG DATA
Distributed File System
Benefits
- Any data - Agile
- Expressive - Affordable
Drawbacks
- Immature
- Batch oriented
- Security, concurrency, metadata, etc.
- Expertise - TCO?
Data scientist
Click to edit Master title style
Hadoop hype
13
Gartner Group – Hype Cycle
Overheard
“Hadoop will replace relational databases.”
“Hadoop will replace data warehouses.”
“Hadoop has a superior query engine compared to analytical platforms.”
“Use Hadoop for any application that requires more than one
node.”
Hadoop adoption rates
14
38%
32%
20%
5%
4%
No plans Considering Experimenting Implementing In production
Based on 158 respondents, BI Leadership Forum, April, 2012
Hadoop workloads
15
92%
92%
83%
58%
42%
25%
58%
92%
92%
92%
67%
67%
67%
83%
Staging area Online archive Transformation Engine Ad hoc queries Scheduled reports Visual exploration Data mining
Today In 18 Months
Based on respondents that have implemented Hadoop. BI Leadership Forum, April, 2012
Hadoop’s impact on the data warehouse
16
0%
50%
67%
33%
25%
8%
Replaces it Offloads existing workloads Handles new workloads Shares existing workloads Shares new workloads
Don't know
Based on respondents that have implemented Hadoop. BI Leadership Forum, April, 2012
Business Intelligence
17
Analytics Intelligence
Continuous Intelligence
Content Intelligence
Data Warehousing
Ad hoc query, Spreadsheets, OLAP, Visual Analysis, Analytic
Workbenches, Hadoop Analytic Sandboxes
Event-driven
Reports and Dashboards
MAD Dashboards
Data Ware- housing
End-User Tools
Event-Driven Alerts and Dashboards
BI Framework 2020
Ad hoc SQL
Dashboard Alerts
Event detection and correlationCEP, Streams
Analytic Sandboxes
Design Framework
Architecture
Reporting
&
Analysis
Excel, Access, OLAP, Data mining, visual exploration Keyword search, BI tools, Xquery, Hive, Java, etc. MapReduce, XML schema, Key-value pairs, graph notation, etc. HDFS, NoSQL databses
Exploration
Power Users
Reporting & Monitoring (Casual Users) Predefined
Metrics
Corporate Objectives and Strategy TOP DOWN- “Business Intelligence”
Processes and Projects
Analysis and Prediction (Power Users) Ad hoc
queries
Analysis Begets Reports Reports
Beget Analysis Pros:
- Alignment -Consistency Cons:
- Hard to build
- Politically charged - Hard to change - Expensive
- “Schema Heavy”
Pros:
- Quick to build
- Politically uncharged - Easy to change
-Low cost Cons:
- Alignment - Consistency - “Schema Light”
Data Warehousing
Architecture Non-volatile
Data
Analytics Architecture
Volatile Data
18
BI Framework
The new analytical ecosystem
19
Machine Data
Web Data
Hadoop Cluster
Operational Systems (Structured data)
Power User
BI Server
Casual User
Operational System
Operational System
Documents & Text
Free- Standing Sandbox
Dept Data Mart
Data Warehouse
Virtual Sandboxes
Top-down Architecture Bottom-up Architecture
External Data Audio/video
Data
Streaming/
CEP Engine Extract, Transform, Load
(Batch, near real-time, or real-time)
Analytic platform or non- relational database
In- memory Sandbox
www.bileadership.com
Analytical sandboxes
www.bileadership.com 20
Machine Data
Web Data
Hadoop Cluster Operational Systems
(Structured data)
Power User
BI Server
Casual User
Operational System
Operational System
Documents & Text
Free- Standing Sandbox
Dept Data Mart Data Warehouse
Virtual Sandboxes
Top-down Architecture Bottom-up Architecture
External Data Audio/video
Data
Streaming/
CEP Engine Extract, Transform, Load
(Batch, near real-time, or real-time)
Analytic platform or non- relational database
In- memory Sandbox
Recommendations
● Your BI architecture is now an “analytical
ecosystem”
● Deploy analytical platforms to turbo-charge
performance
● Explore Hadoop for “big data”
● Reconcile top-down and bottom-up BI
environments
21 Business Analytics | © TechTarget
Hadoop ecosystem
Business Analytics | © TechTarget 23
Courtesy, Hortonworks, 2012.