• No results found

Are You Ready for Big Data?

N/A
N/A
Protected

Academic year: 2021

Share "Are You Ready for Big Data?"

Copied!
32
0
0

Loading.... (view fulltext now)

Full text

(1)

Are You Ready for Big Data?

Jim Gallo

National Director, Business Analytics February 11, 2013

(2)

Agenda

© 2012, Information Control Corporation 2

• What is Big Data?

• How do you leverage Big Data in your company?

• How do you prepare for a Big Data initiative?

• Summary

(3)

What is Big Data?

(4)

What is “Big Data”?

© 2012, Information Control Corporation 4

“Big data" is high-volume, -velocity, -variety and -veracity information assets that demand cost- effective, innovative forms of information processing for enhanced insight and decision making.

Model, Predict and Score

Measure and Analyze

Twitter RFID

Click Stream Facebook

Volume

(TB to ZB)

Monitors Machi

Data ne

Trades &

Transactions Identity

Velocity

(streaming &large volume data movement)

Geospatial Relational

Text Video

Variety

(relational & non- relational data types)

Cost-effective Veracity

(managing the reliability and predictability of inherently

imprecise data types)

(5)

What might a Big Data platform look like?

Data

Warehouse Hadoop

Information

Integration Stream Computing

Reporting BI/

Exploration/

Visualization Content Analytics

Functional Apps

(6)

What is Hadoop?

© 2012, Information Control Corporation 6

• Open source software project

• Distributed processing of large data sets

• Leverage clusters of commodity servers

• Scale from single server to thousands of machines

• High degree of fault tolerance (detects and handles failures at the application layer)

(7)

What are the benefits of Hadoop?

Scalable

New nodes can be added as needed

Add without needing to change:

data formats

how data is loaded

how jobs are written

the applications

Flexible

Schema-less

Can absorb any type of data, structured or not

Any number of sources

Data from multiple sources can be joined and aggregated in arbitrary ways

Cost effective

Massively parallel computing on commodity servers

Sizeable decrease in the cost per terabyte of storage

Fault tolerant

Redirects work to another location of the data

Continues processing

(8)

What are the key components of Hadoop?

© 2012, Information Control Corporation 8

• MapReduce

• Hadoop Distributed File System (HDFS)

• Pig

• Hive

• ZooKeeper

(9)

What is MapReduce?

Programming paradigm that allows for massive scalability across hundreds or thousands of servers in a Hadoop cluster.

“Map" step: The master node takes the input, divides it into smaller sub-

problems, and distributes them to worker nodes. A worker node may do this again in turn, leading to a multi-level tree structure. The worker node

processes the smaller problem, and passes the answer back to its master node.

"Reduce" step: The master node then collects the answers to all the sub-problems and combines them in some way to form the

output – the answer to the problem it was originally trying to solve.

(10)

What is HDFS?

© 2012, Information Control Corporation 10

• Data in a Hadoop cluster is broken down into smaller pieces (called blocks) and distributed throughout the cluster.

• The map and reduce functions can be executed on smaller subsets of larger data

sets, and this provides the scalability that is needed for big data processing.

(11)

What are Pig and Hive?

Pig

• Developed at Yahoo!

• Programming language

• Designed to handle any kind of data

Hive

• Developed at Facebook

• Hive Query Language (HQL) similar to standard SQL

• Allows anyone who is already fluent with SQL to more quickly

leverage the Hadoop platform

(12)

What is Zookeeper?

© 2012, Information Control Corporation 12

• Provides a centralized infrastructure and services that enable synchronization across a cluster

• Maintains common objects needed in large cluster environments, such as:

 configuration information

 hierarchical naming space, etc.

• Applications can leverage these services to coordinate distributed

processing across large clusters

(13)

What does a Big Data platform do?

Analyze a Variety of Information

Novel analytics on a broad set of mixed information that could not be analyzed before.

Analyze Information in Motion

Streaming data analysis

Large volume data bursts and ad hoc analysis

Analyze Extreme Volumes of Information

Cost-efficiently process and analyze petabytes of information Manage and analyze high volumes of structured, relational data

Discover and Experiment

Ad hoc analytics, data discovery and experimentation

Manage and Plan

(14)

How does a Big Data platform fit?

© 2012, Information Control Corporation 14

Traditional Sources

Data Warehouse Big Data Platform

New Sources Enterprise

Integration

(15)

Is the approach the same?

Big Data Approach

Iterative and Exploratory Analysis

Traditional Approach

Structured & Repeatable Analysis

Business Users

Determine what questions to ask

IT

Structures the data to answer the questions

IT

Delivers a platform to enable creative

discovery

Business Users

Explore what questions could be asked

(16)

Leveraging Big Data

© 2012, Information Control Corporation 16

(17)

What can you do with Big Data?

Analyze Information in Motion

Smart Grid management

Multimodal surveillance

Real-time promotions

Cyber security

ICU monitoring

Options trading

Click-stream analysis

CDR processing

IT log analysis

RFID tracking and analysis

Analyze a Variety of Information

Social media/sentiment analysis

Geospatial analysis

Brand strategy

Scientific research

Epidemic early warning system

Market analysis

Video analysis

Audio analysis

Discovery and Experimentation

Sentiment analysis

Brand strategy

Scientific research

Ad hoc analysis

Model development

Hypothesis testing

Transaction analysis to create insight-based product/service offerings

Manage and Plan Analyze Extreme Volumes of

Information

Transaction analysis to create insight- based product/service offerings

Fraud monitoring and detection

Risk modeling and management

(18)

What are some use cases?

© 2012, Information Control Corporation 18

Fraud Detection and Modeling 360 View of the Customer

o

Email, Call Center Transcript Analysis

Call Detail Record Analysis RFID Tracking and Analysis

Smart Grid / Smarter Utilities Cyber Security

Risk Modeling & Management

Threat Detection / Multi-modal Surveillance

Geo-marketing

(19)

What are some analytics examples?

Financial Services

Improved risk decisions

Customer sentiment analysis

AML (Anti Money Laundering)

Transportation

Weather and traffic impact on logistics and fuel consumption

Call Centers

Voice-to-text for customer behavior understanding

Telecommunications

Utilities

Weather impact analysis on power generation

Smart meter data analysis

IT

Transaction log analysis for multiple transactional systems

E Commerce

Internet behavior and buying patters

Digital asset piracy

Multi-channel Integration

(20)

What are some streaming analytics examples?

© 2012, Information Control Corporation 20

Natural Systems

Wild fire management

Water management

Transportation

Intelligent traffic management

Manufacturing

Process control for microchip fabrication

Health & Life Sciences

Neonatal ICU monitoring

Epidemic early warning system

Remote healthcare monitoring

Telephony

CDR processing

Social analysis

Churn prediction

Geomapping

Stock Market

Impact of weather on securities prices

Market analysis at ultra-low latencies

Law Enforcement, Defense & Cyber Security

Real-time multimodal surveillance

Situational awareness

Cyber security detection

Fraud Prevention

Detecting multi-party fraud

Real time fraud prevention

e-Science

Space weather prediction

Detection of transient events

Genomics research

Other

Smart Grid

Text analysis

Who’s talking to whom?

(21)

To what extent is Bid Data being adopted?

Pilot and 28%

implementati on of big data

activities

Three out of four organizations have big data

activities underway; and one in four are either in pilot or production

Pilot and 28%

implementation of big data

activities

Have not begun 24%

big data activities

Planning big 48%

data activities

Early days of big data era

Almost half of all organizations surveyed report active discussions about big data plans

Big data has moved out of IT and into business discussions

Getting underway

More than a quarter of organizations have active big data pilots or implementations

Tapping into big data is becoming real Acceleration ahead

The number of active pilots underway suggests big data implementations will rise exponentially

(22)

What are some tends for Big Data adoption?

© 2012, Information Control Corporation 22

Improving the customer experience by better understanding behaviors drives almost half of all active big data efforts.

Source: IBM Institute for Business Value and Saïd Business School, University of Oxford, 2012

(23)

Preparing for a Big Data Initiative

(24)

Five Practical Questions

© 2012, Information Control Corporation 24

(25)

What do you want to know?

• Business Objectives

• Improved decision-making

• Better business performance

Needs Postulates

Questions

Results

Improved customer satisfaction Increased profit margin

Expanded social awareness

(26)

Big Data or “lots of data”?

or

© 2012, Information Control Corporation 26

(27)

Is there a data source?

Sentiment Analysis Foursquare Surveys

Twitter

LinkedIn

Facebook

Blogs

Demographics

Geospatial

Competitors Weather

Identity

Facial Recognition

License Plate Recognition

RFID

Site behavior

& Experience

Ad Campaigns

Display Media Sales

Effectiveness Predictive

Analytics

(28)

Is it worth it?

© 2012, Information Control Corporation 28

ROI

Labor

Sourcing

Options

Hardware &

Software

(29)

Will it work?

Model, Predict and Score

Measure and Analyze

Options

Resources

(Internal & External)

(30)

Summary

© 2012, Information Control Corporation 30

(31)

Summary

Big Data

High-volume, -velocity, -variety and -veracity information assets

Cost-effective, innovative forms of information processing

Enhanced insight and decision making

Uses

Wide applicability

Cross-industry

Iterative and exploratory

Complimentary to BI/DW

Be Pragmatic

Business-driven

Provable ROI

Features and Functions

Analyze a variety of information

Analyze information in motion

Analyze extreme volumes of information

Discover and experiment

Manage and plan

(32)

For More Information

© 2012, Information Control Corporation 32

Jim Gallo

National Director, Business Analytics Information Control Corporation

[email protected]

(614) 523-3070 x192

References

Related documents

Such a collegiate cul- ture, like honors cultures everywhere, is best achieved by open and trusting relationships of the students with each other and the instructor, discussions

This recommended standard specification has been formulated as a guide to users, industry and government to ensure the proper use, maintenance and inspection of Load binders designed

Delivery can be arranged and will be charged on a pallet basis.. Stock will be available on a first come, first

These cavities spent the least amount of time above 35˚C and 40˚C (Fig 9A-F) and thus a model cannot be run because there are so few non- diapausing individuals spending

In conclusion, for the studied Taiwanese population of diabetic patients undergoing hemodialysis, increased mortality rates are associated with higher average FPG levels at 1 and

The main wall of the living room has been designated as a "Model Wall" of Delta Gamma girls -- ELLE smiles at us from a Hawaiian Tropic ad and a Miss June USC

(Widiyanto, 2017, hal. 80), keterampilan hanya dapat diperoleh dan dikuasai dengan jalan praktik dan banyak latihan. Kemampuan berbicara ini dilatih dengan tujuan

Fields and I helped to design the digital recording studio by helping to choose appropriate computers, software, and other equipment (including printers, scanners, and digital still