Are You Ready for Big Data?
Jim Gallo
National Director, Business Analytics
April 10, 2013
Agenda
© 2013, Information Control Corporation 2
• What is Big Data?
• How do you leverage Big Data in your company?
• How do you prepare for a Big Data initiative?
• Summary
What is Big Data?
What is “Big Data”?
© 2013, Information Control Corporation 4
“Big data" is high-volume, -velocity, -variety and -veracity information assets that demand cost-
effective, innovative forms of information processing for enhanced insight and decision making.
Model, Predict and Score
Measure and Analyze
Twitter RFID
Click Stream Facebook
Volume
(TB to ZB)
Monitors Machine
Data
Trades &
Transactions Identity
Velocity
(streaming &large volume data movement)
Geospatial Relational
Text Video
Variety
(relational & non- relational data types)
Cost-effective
Veracity
(managing the reliability and predictability of inherentlyimprecise data types)
What might a Big Data platform look like?
Data
Warehouse Hadoop
Information
Integration Stream Computing
Reporting BI/
Predictive Analytics
Exploration/
Visualization Content Analytics
Instrumentation Functional
Apps
Industry Apps
What is Hadoop?
© 2013, Information Control Corporation 6
• Open source software project
• Distributed processing of large data sets
• Leverage clusters of commodity servers
• Scale from single server to thousands of machines
• High degree of fault tolerance (detects and handles failures at the application layer)
What are the benefits of Hadoop?
Scalable
• New nodes can be added as needed
• Add without needing to change:
data formats
how data is loaded
how jobs are written
the applications
Flexible
• Schema-less
• Can absorb any type of data, structured or not
• Any number of sources
• Data from multiple sources can be joined and
aggregated in arbitrary ways
Cost effective
• Massively parallel computing on
commodity servers
• Sizeable decrease in the cost per terabyte
of storage
Fault tolerant
• Redirects work to another location of
the data
• Continues processing
What are the key components of Hadoop?
© 2013, Information Control Corporation 8
• MapReduce
• Hadoop Distributed File System (HDFS)
• Pig
• Hive
• ZooKeeper
What does a Big Data platform do?
Analyze a Variety of Information
Novel analytics on a broad set of mixed information
that could not be analyzed before.
Analyze Information in Motion
Streaming data analysis
Large volume data bursts and ad hoc analysis
Analyze Extreme Volumes of Information
Cost-efficiently process and analyze petabytes of information
Manage and analyze high volumes of structured, relational data
Discover and Experiment
Ad hoc analytics, data discovery and experimentation
Manage and Plan
Enforce data structure, integrity and control to ensure
consistency for repeatable queries
How does a Big Data platform fit?
© 2013, Information Control Corporation 10
Traditional Sources
Data Warehouse Big Data Platform
New Sources
Enterprise
Integration
Is the approach the same?
Big Data Approach
Iterative and Exploratory Analysis
Traditional Approach
Structured & Repeatable Analysis
Monthly sales reports
Profitability analysis Brand sentiment
Product strategy
Business Users
Determine what
questions to ask
IT
Structures the data to
answer the questions
IT
Delivers a platform to
enable creative
discovery
Business Users
Explore what questions
could be asked
Leveraging Big Data
© 2013, Information Control Corporation 12
What can you do with Big Data?
Analyze Information in Motion
• Smart Grid management
• Multimodal surveillance
• Real-time promotions
• Cyber security
• ICU monitoring
• Options trading
• Click-stream analysis
• CDR processing
• IT log analysis
• RFID tracking and analysis
Analyze a Variety of Information
• Social media/sentiment analysis
• Geospatial analysis
• Brand strategy
• Scientific research
• Epidemic early warning system
• Market analysis
• Video analysis
• Audio analysis
Discovery and Experimentation
• Sentiment analysis
• Brand strategy
• Scientific research
• Ad hoc analysis
• Model development
• Hypothesis testing
• Transaction analysis to create insight-based product/service offerings
Manage and Plan
• Operational analytics – BI reporting
• Planning and forecasting analysis
Analyze Extreme Volumes of
Information
• Transaction analysis to create insight- based product/service offerings
• Fraud monitoring and detection
• Risk modeling and management
• Social media/sentiment analysis
• Environmental analysis
What are some use cases?
© 2013, Information Control Corporation 14
Fraud Detection and Modeling
360 View of the Customer o
Email, Call Center Transcript Analysis
Call Detail Record Analysis
RFID Tracking and Analysis
Smart Grid / Smarter Utilities
Cyber Security
Risk Modeling & Management
Threat Detection / Multi-modal Surveillance
Geo-marketing
What are some analytics examples?
Financial Services
• Improved risk decisions
• Customer sentiment analysis
• AML (Anti Money Laundering)
Transportation
• Weather and traffic
impact on logistics and
fuel consumption
Call Centers
• Voice-to-text for customer
behavior understanding
Telecommunications
• Operations and failure analysis
from device, sensor, and GPS
Utilities
• Weather impact analysis on
power generation
• Smart meter data analysis
IT
• Transaction log analysis
for multiple transactional
systems
E Commerce
• Internet behavior and buying patterns
• Digital asset piracy
Multi-channel Integration
• Integrated customer behavior modeling
What are some streaming analytics examples?
© 2013, Information Control Corporation 16
Natural Systems
• Wild fire management
• Water management
Transportation
• Intelligent traffic management
Manufacturing
• Process control for microchip fabrication
Health & Life Sciences
• Neonatal ICU monitoring
• Epidemic early warning system
• Remote healthcare monitoring
Telephony
• CDR processing
• Social analysis
• Churn prediction
• Geomapping
Stock Market
• Impact of weather on securities prices
• Market analysis at ultra-low latencies
Law Enforcement, Defense & Cyber Security
• Real-time multimodal surveillance
• Situational awareness
• Cyber security detection
Fraud Prevention
• Detecting multi-party fraud
• Real time fraud prevention
e-Science
• Space weather prediction
• Detection of transient events
• Genomics research
Other
• Smart Grid
• Text analysis
• Who’s talking to whom?
Preparing for a Big Data Initiative
Five Practical Questions
© 2013, Information Control Corporation 18
What do you want to know?
• Business Objectives
• Improved decision-making
• Better business performance
Needs
Postulates
Questions
Results
Improved customer satisfaction
Increased profit margin
Expanded social awareness
Big Data or “lots of data”?
or
© 2013, Information Control Corporation 20
Is there a data source?
Sentiment Analysis Foursquare Surveys
Blogs
Demographics
Geospatial
Competitors Weather
Identity
Facial Recognition
License Plate Recognition
RFID
Machine
Site behavior
& Experience
Ad Campaigns
Display Media Sales
Effectiveness Predictive
Analytics
Trades &
Transactions
Is it worth it?
© 2013, Information Control Corporation 22
ROI
Labor
Sourcing
Options
Hardware &
Software
Will it work?
Model, Predict and Score
Measure and Analyze
Options
Intranet & Extranet
Resources
(Internal & External)
Time & Money
Summary
© 2013, Information Control Corporation 24
Summary
Big Data
• High-volume, -velocity, -variety and -veracity information assets
• Cost-effective, innovative forms of information processing
• Enhanced insight and decision making
Uses
• Wide applicability
• Cross-industry
• Iterative and exploratory
• Complimentary to BI/DW
Be Pragmatic
• Business-driven
• Provable ROI
• Proof of concept
• Not for everyone
Features and Functions
• Analyze a variety of information
• Analyze information in motion
• Analyze extreme volumes of information
• Discover and experiment
• Manage and plan
For More Information
© 2013, Information Control Corporation 26