• No results found

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

N/A
N/A
Protected

Academic year: 2021

Share "Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing"

Copied!
23
0
0

Loading.... (view fulltext now)

Full text

(1)
(2)

Business Analytics | © TechTarget

Wayne W. Eckerson

Director of Research, TechTarget Founder, BI Leadership Forum

Architecting for Big Data

Analytics and Beyond:

A New Framework for Business

Intelligence and Data Warehousing

(3)

What comes next?

● Kilobyte (KB) – 103 bytes

● Megabyte (MB) –106 bytes

● Gigabyte (GB) – 109 bytes

● Terabyte (TB) –1012 bytes

● Petabyte (PB) – 1015 bytes

● –1018 bytes

● – 1021 bytes

● – 1024 bytes

3

Exabyte (EB) Zettabyte (ZB) Yottabyte (YB)

(4)

Information explosion

4

Every 18 months, non-rich structured and unstructured enterprise data doubles

2005 2006 2007 2008 2009 2010 2011 2012

Unstructured &

Content Depot Structured &

Replicated

Source: IDC Digital Universe 2009; White Paper, Sponsored by EMC, May 2009

(5)

Data deluge

Structured data

- Call detail records - Point of sale records - Claims data

Semi-structured data

- Web logs - Sensor data - Email, Twitter

Unstructured data

- Video, Audio, - Images, Text

5

“A Sea of Sensors”, The Economist, Nov 4, 2010

(6)

Three “Big Data” revolutions

● Data warehousing (1995+)

● Analytical platforms (2005+)

● Hadoop ecosystem (2010+)

6 Business Analytics | © TechTarget

(7)

First revolution: data warehousing

7 Business Analytics | © TechTarget

BI Server

Operational System

Operational System

Data Data Mart

Warehouse

Reports / Dashboards

Operational System Operational

System

Data Warehouse

ETL ETL

(8)

Second revolution: analytical platforms

1010data

Aster Data (Teradata) Calpont

Datallegro (Microsoft) Exasol

Greenplum (EMC) IBM SmartAnalytics Infobright

Kognitio

Netezza (IBM) Oracle Exadata Paraccel

Pervasive

Sand Technology SAP HANA

Sybase IQ (SAP) Teradata

Vertica (HP)

Purpose-built database

management systems designed explicitly for query processing

and analysis that provides dramatically higher price/performance and

availability compared to general purpose solutions.

Deployment Options

-Software only (Paraccel, Vertica) -Appliance (SAP, Exadata, Netezza) -Hosted(1010data, Kognitio)

(9)

Game-changing technology

● Purpose built

- For analytics in general

- For specific analytic workloads

● Quicker to deploy

- Preconfigured and tuned - Fast ROI

● Faster and more scalable

- Faster query response times - Linear performance

● Built-in analytics

- Libraries of functions - Extensible SDK

● Less costly

- Less power, cooling, space - Fewer people to maintain

(10)

Business value of analytic platforms

● Kelley Blue Book –

Consolidates millions of auto transactions each week to calculate car valuations

● AT&T Mobility – Tracks

purchasing patterns for 80M customers daily to optimize targeted marketing

● CBS Interactive – Analyzes Web visitor behavior to

optimize content/ad

placement and revenue

MPP Analytical

Database Analytical appliance

Hadoop + Analytical database

(11)

Third Revolution - Hadoop

Business Analytics | © TechTarget 11

•Open source projects

•Hosted by Apache Foundation

•Initially developed by Google, Yahoo, etc.

•Offers scale out architecture on commodity servers with direct attached storage

(12)

Click to edit Master title style

Hadoop distilled

12

Open Source $$

MapReduce

“Schema at Read”

Unstructured data

BIG DATA

Distributed File System

Benefits

- Any data - Agile

- Expressive - Affordable

Drawbacks

- Immature

- Batch oriented

- Security, concurrency, metadata, etc.

- Expertise - TCO?

Data scientist

(13)

Click to edit Master title style

Hadoop hype

13

Gartner Group – Hype Cycle

Overheard

“Hadoop will replace relational databases.”

“Hadoop will replace data warehouses.”

“Hadoop has a superior query engine compared to analytical platforms.”

“Use Hadoop for any application that requires more than one

node.”

(14)

Hadoop adoption rates

14

38%

32%

20%

5%

4%

No plans Considering Experimenting Implementing In production

Based on 158 respondents, BI Leadership Forum, April, 2012

(15)

Hadoop workloads

15

92%

92%

83%

58%

42%

25%

58%

92%

92%

92%

67%

67%

67%

83%

Staging area Online archive Transformation Engine Ad hoc queries Scheduled reports Visual exploration Data mining

Today In 18 Months

Based on respondents that have implemented Hadoop. BI Leadership Forum, April, 2012

(16)

Hadoop’s impact on the data warehouse

16

0%

50%

67%

33%

25%

8%

Replaces it Offloads existing workloads Handles new workloads Shares existing workloads Shares new workloads

Don't know

Based on respondents that have implemented Hadoop. BI Leadership Forum, April, 2012

(17)

Business Intelligence

17

Analytics Intelligence

Continuous Intelligence

Content Intelligence

Data Warehousing

Ad hoc query, Spreadsheets, OLAP, Visual Analysis, Analytic

Workbenches, Hadoop Analytic Sandboxes

Event-driven

Reports and Dashboards

MAD Dashboards

Data Ware- housing

End-User Tools

Event-Driven Alerts and Dashboards

BI Framework 2020

Ad hoc SQL

Dashboard Alerts

Event detection and correlationCEP, Streams

Analytic Sandboxes

Design Framework

Architecture

Reporting

&

Analysis

Excel, Access, OLAP, Data mining, visual exploration Keyword search, BI tools, Xquery, Hive, Java, etc. MapReduce, XML schema, Key-value pairs, graph notation, etc. HDFS, NoSQL databses

Exploration

Power Users

(18)

Reporting & Monitoring (Casual Users) Predefined

Metrics

Corporate Objectives and Strategy TOP DOWN- “Business Intelligence”

Processes and Projects

Analysis and Prediction (Power Users) Ad hoc

queries

Analysis Begets Reports Reports

Beget Analysis Pros:

- Alignment -Consistency Cons:

- Hard to build

- Politically charged - Hard to change - Expensive

- “Schema Heavy”

Pros:

- Quick to build

- Politically uncharged - Easy to change

-Low cost Cons:

- Alignment - Consistency - “Schema Light”

Data Warehousing

Architecture Non-volatile

Data

Analytics Architecture

Volatile Data

18

BI Framework

(19)

The new analytical ecosystem

19

Machine Data

Web Data

Hadoop Cluster

Operational Systems (Structured data)

Power User

BI Server

Casual User

Operational System

Operational System

Documents & Text

Free- Standing Sandbox

Dept Data Mart

Data Warehouse

Virtual Sandboxes

Top-down Architecture Bottom-up Architecture

External Data Audio/video

Data

Streaming/

CEP Engine Extract, Transform, Load

(Batch, near real-time, or real-time)

Analytic platform or non- relational database

In- memory Sandbox

www.bileadership.com

(20)

Analytical sandboxes

www.bileadership.com 20

Machine Data

Web Data

Hadoop Cluster Operational Systems

(Structured data)

Power User

BI Server

Casual User

Operational System

Operational System

Documents & Text

Free- Standing Sandbox

Dept Data Mart Data Warehouse

Virtual Sandboxes

Top-down Architecture Bottom-up Architecture

External Data Audio/video

Data

Streaming/

CEP Engine Extract, Transform, Load

(Batch, near real-time, or real-time)

Analytic platform or non- relational database

In- memory Sandbox

(21)

Recommendations

● Your BI architecture is now an “analytical

ecosystem”

● Deploy analytical platforms to turbo-charge

performance

● Explore Hadoop for “big data”

● Reconcile top-down and bottom-up BI

environments

21 Business Analytics | © TechTarget

(22)

Questions?

● Wayne Eckerson

[email protected]

22 Business Analytics | © TechTarget

(23)

Hadoop ecosystem

Business Analytics | © TechTarget 23

Courtesy, Hortonworks, 2012.

References

Related documents

Potential explanations for the large and seemingly random price variation are: (i) different cost pricing methods used by hospitals, (ii) uncertainty due to frequent changes in

Players can create characters and participate in any adventure allowed as a part of the D&D Adventurers League.. As they adventure, players track their characters’

The concept of flight testing a new technology such as the da Vinci parachute is similar; together they are a good topic for further class discussion..

Regarding policy, local investment in public infrastructure has a negative effect in both cases, as firms take investment costs into account, while actions of the central government

Whether grown as freestanding trees or wall- trained fans, established figs should be lightly pruned twice a year: once in spring to thin out old or damaged wood and to maintain

In conclusion, for the studied Taiwanese population of diabetic patients undergoing hemodialysis, increased mortality rates are associated with higher average FPG levels at 1 and

The main wall of the living room has been designated as a "Model Wall" of Delta Gamma girls -- ELLE smiles at us from a Hawaiian Tropic ad and a Miss June USC

These cavities spent the least amount of time above 35˚C and 40˚C (Fig 9A-F) and thus a model cannot be run because there are so few non- diapausing individuals spending