Are You Ready for Big Data?
Jim Gallo
National Director, Business Analytics February 11, 2013
Agenda
© 2012, Information Control Corporation 2
• What is Big Data?
• How do you leverage Big Data in your company?
• How do you prepare for a Big Data initiative?
• Summary
What is Big Data?
What is “Big Data”?
© 2012, Information Control Corporation 4
“Big data" is high-volume, -velocity, -variety and -veracity information assets that demand cost- effective, innovative forms of information processing for enhanced insight and decision making.
Model, Predict and Score
Measure and Analyze
Twitter RFID
Click Stream Facebook
Volume
(TB to ZB)
Monitors Machi
Data ne
Trades &
Transactions Identity
Velocity
(streaming &large volume data movement)
Geospatial Relational
Text Video
Variety
(relational & non- relational data types)
Cost-effective Veracity
(managing the reliability and predictability of inherentlyimprecise data types)
What might a Big Data platform look like?
Data
Warehouse Hadoop
Information
Integration Stream Computing
Reporting BI/
Exploration/
Visualization Content Analytics
Functional Apps
What is Hadoop?
© 2012, Information Control Corporation 6
• Open source software project
• Distributed processing of large data sets
• Leverage clusters of commodity servers
• Scale from single server to thousands of machines
• High degree of fault tolerance (detects and handles failures at the application layer)
What are the benefits of Hadoop?
Scalable
• New nodes can be added as needed
• Add without needing to change:
data formats
how data is loaded
how jobs are written
the applications
Flexible
• Schema-less
• Can absorb any type of data, structured or not
• Any number of sources
• Data from multiple sources can be joined and aggregated in arbitrary ways
Cost effective
• Massively parallel computing on commodity servers
• Sizeable decrease in the cost per terabyte of storage
Fault tolerant
• Redirects work to another location of the data
• Continues processing
What are the key components of Hadoop?
© 2012, Information Control Corporation 8
• MapReduce
• Hadoop Distributed File System (HDFS)
• Pig
• Hive
• ZooKeeper
What is MapReduce?
Programming paradigm that allows for massive scalability across hundreds or thousands of servers in a Hadoop cluster.
• “Map" step: The master node takes the input, divides it into smaller sub-
problems, and distributes them to worker nodes. A worker node may do this again in turn, leading to a multi-level tree structure. The worker node
processes the smaller problem, and passes the answer back to its master node.
• "Reduce" step: The master node then collects the answers to all the sub-problems and combines them in some way to form the
output – the answer to the problem it was originally trying to solve.
What is HDFS?
© 2012, Information Control Corporation 10
• Data in a Hadoop cluster is broken down into smaller pieces (called blocks) and distributed throughout the cluster.
• The map and reduce functions can be executed on smaller subsets of larger data
sets, and this provides the scalability that is needed for big data processing.
What are Pig and Hive?
Pig
• Developed at Yahoo!
• Programming language
• Designed to handle any kind of data
Hive
• Developed at Facebook
• Hive Query Language (HQL) similar to standard SQL
• Allows anyone who is already fluent with SQL to more quickly
leverage the Hadoop platform
What is Zookeeper?
© 2012, Information Control Corporation 12
• Provides a centralized infrastructure and services that enable synchronization across a cluster
• Maintains common objects needed in large cluster environments, such as:
configuration information
hierarchical naming space, etc.
• Applications can leverage these services to coordinate distributed
processing across large clusters
What does a Big Data platform do?
Analyze a Variety of Information
Novel analytics on a broad set of mixed information that could not be analyzed before.
Analyze Information in Motion
Streaming data analysis
Large volume data bursts and ad hoc analysis
Analyze Extreme Volumes of Information
Cost-efficiently process and analyze petabytes of information Manage and analyze high volumes of structured, relational data
Discover and Experiment
Ad hoc analytics, data discovery and experimentation
Manage and Plan
How does a Big Data platform fit?
© 2012, Information Control Corporation 14
Traditional Sources
Data Warehouse Big Data Platform
New Sources Enterprise
Integration
Is the approach the same?
Big Data Approach
Iterative and Exploratory Analysis
Traditional Approach
Structured & Repeatable Analysis
Business Users
Determine what questions to ask
IT
Structures the data to answer the questions
IT
Delivers a platform to enable creative
discovery
Business Users
Explore what questions could be asked
Leveraging Big Data
© 2012, Information Control Corporation 16
What can you do with Big Data?
Analyze Information in Motion
• Smart Grid management
• Multimodal surveillance
• Real-time promotions
• Cyber security
• ICU monitoring
• Options trading
• Click-stream analysis
• CDR processing
• IT log analysis
• RFID tracking and analysis
Analyze a Variety of Information
• Social media/sentiment analysis
• Geospatial analysis
• Brand strategy
• Scientific research
• Epidemic early warning system
• Market analysis
• Video analysis
• Audio analysis
Discovery and Experimentation
• Sentiment analysis
• Brand strategy
• Scientific research
• Ad hoc analysis
• Model development
• Hypothesis testing
• Transaction analysis to create insight-based product/service offerings
Manage and Plan Analyze Extreme Volumes of
Information
• Transaction analysis to create insight- based product/service offerings
• Fraud monitoring and detection
• Risk modeling and management
What are some use cases?
© 2012, Information Control Corporation 18
Fraud Detection and Modeling 360 View of the Customer
oEmail, Call Center Transcript Analysis
Call Detail Record Analysis RFID Tracking and Analysis
Smart Grid / Smarter Utilities Cyber Security
Risk Modeling & Management
Threat Detection / Multi-modal Surveillance
Geo-marketing
What are some analytics examples?
Financial Services
• Improved risk decisions
• Customer sentiment analysis
• AML (Anti Money Laundering)
Transportation
• Weather and traffic impact on logistics and fuel consumption
Call Centers
• Voice-to-text for customer behavior understanding
Telecommunications
Utilities
• Weather impact analysis on power generation
• Smart meter data analysis
IT
• Transaction log analysis for multiple transactional systems
E Commerce
• Internet behavior and buying patters
• Digital asset piracy
Multi-channel Integration
What are some streaming analytics examples?
© 2012, Information Control Corporation 20
Natural Systems
• Wild fire management
• Water management
Transportation
• Intelligent traffic management
Manufacturing
• Process control for microchip fabrication
Health & Life Sciences
• Neonatal ICU monitoring
• Epidemic early warning system
• Remote healthcare monitoring
Telephony
• CDR processing
• Social analysis
• Churn prediction
• Geomapping
Stock Market
• Impact of weather on securities prices
• Market analysis at ultra-low latencies
Law Enforcement, Defense & Cyber Security
• Real-time multimodal surveillance
• Situational awareness
• Cyber security detection
Fraud Prevention
• Detecting multi-party fraud
• Real time fraud prevention
e-Science
• Space weather prediction
• Detection of transient events
• Genomics research
Other
• Smart Grid
• Text analysis
• Who’s talking to whom?
To what extent is Bid Data being adopted?
Pilot and 28%
implementati on of big data
activities
Three out of four organizations have big data
activities underway; and one in four are either in pilot or production
Pilot and 28%
implementation of big data
activities
Have not begun 24%
big data activities
Planning big 48%
data activities
Early days of big data era
Almost half of all organizations surveyed report active discussions about big data plans
Big data has moved out of IT and into business discussions
Getting underway
More than a quarter of organizations have active big data pilots or implementations
Tapping into big data is becoming real Acceleration ahead
The number of active pilots underway suggests big data implementations will rise exponentially
What are some tends for Big Data adoption?
© 2012, Information Control Corporation 22
Improving the customer experience by better understanding behaviors drives almost half of all active big data efforts.
Source: IBM Institute for Business Value and Saïd Business School, University of Oxford, 2012
Preparing for a Big Data Initiative
Five Practical Questions
© 2012, Information Control Corporation 24
What do you want to know?
• Business Objectives
• Improved decision-making
• Better business performance
Needs Postulates
Questions
Results
Improved customer satisfaction Increased profit margin
Expanded social awareness
Big Data or “lots of data”?
or
© 2012, Information Control Corporation 26
Is there a data source?
Sentiment Analysis Foursquare Surveys
Blogs
Demographics
Geospatial
Competitors Weather
Identity
Facial Recognition
License Plate Recognition
RFID
Site behavior
& Experience
Ad Campaigns
Display Media Sales
Effectiveness Predictive
Analytics
Is it worth it?
© 2012, Information Control Corporation 28
ROI
Labor
Sourcing
Options
Hardware &
Software
Will it work?
Model, Predict and Score
Measure and Analyze
Options
Resources
(Internal & External)
Summary
© 2012, Information Control Corporation 30
Summary
Big Data
• High-volume, -velocity, -variety and -veracity information assets
• Cost-effective, innovative forms of information processing
• Enhanced insight and decision making
Uses
• Wide applicability
• Cross-industry
• Iterative and exploratory
• Complimentary to BI/DW
Be Pragmatic
• Business-driven
• Provable ROI
Features and Functions
• Analyze a variety of information
• Analyze information in motion
• Analyze extreme volumes of information
• Discover and experiment
• Manage and plan
For More Information
© 2012, Information Control Corporation 32