© 2012 IBM Corporation 1
Technology for an Analytics-Driven World
Nagui Halim, IBM Fellow
© 2012 IBM Corporation 2
Information
from Everywhere
Radical
Flexibility
Extreme
Scalability
Business are evolving rapidly – ushering in a new era of computing
Volume
of Tweets created daily
12
terabytes
from surveillance cameras
Variety
100’s
video
feeds
trade events per second
Velocity
© 2012 IBM Corporation 3
New analytic applications require a big data platform
•
Integrate and manage the full variety,
velocity and volume of data
•
Apply advanced analytics to
information in its native form
•
Visualize all available data for ad-hoc
analysis
•
Development environment for building
new analytic applications
•
Workload optimization and scheduling
•
Security and Governance
Advanced Analytic Applications
Big Data Platform
Process and analyze any type of data
© 2012 IBM Corporation 4
IBM Big Data Platform
Cost-effectively analyze petabytes of structured and unstructured information BI / Reporting BI / Reporting Exploration / Visualization Functional App Industry App Predictive Analytics Content Analytics
Analytic Applications
IBM Big Data Platform
Hadoop System
© 2012 IBM Corporation 5
IBM Big Data Platform
Analyze streaming data
and large data bursts for real-time insights BI / Reporting BI / Reporting Exploration / Visualization Functional App Industry App Predictive Analytics Content Analytics
Analytic Applications
IBM Big Data Platform
Hadoop System
Stream Computing
© 2012 IBM Corporation 6
IBM Big Data Platform
Deliver deep insight with advanced in-database analytics and operational analytics BI / Reporting BI / Reporting Exploration / Visualization Functional App Industry App Predictive Analytics Content Analytics
Analytic Applications
IBM Big Data Platform
Data Warehouse Hadoop System Stream Computing
© 2012 IBM Corporation 7
IBM Big Data Platform
Govern data quality and manage information lifecycle BI / Reporting BI / Reporting Exploration / Visualization Functional App Industry App Predictive Analytics Content Analytics
Analytic Applications
IBM Big Data Platform
Information Integration & Governance Data Warehouse Hadoop System Stream Computing
© 2012 IBM Corporation 8 Cloud | Mobile | Security
IBM Big Data Platform
Gather, extract and explore data
using spreadsheet metaphor Speed time to value with analytic and application accelerators BI / Reporting BI / Reporting Exploration / Visualization Functional App Industry App Predictive Analytics Content Analytics
Analytic Applications
IBM Big Data Platform
Systems Management Application Development Visualization & Discovery Accelerators
Information Integration & Governance Data Warehouse Hadoop System Stream Computing
© 2012 IBM Corporation 9
Progressing the IBM Big Data Platform: recent announcements
Accelerators
•
Text analytics tool-kit
•
Temperature monitoring
•
Geospatial accelerator
•
HDFS connector
•
Balanced optimization and connectivity
Integration
Enterprise Robustness
•
Adaptive MapReduce
•
Cluster and workload management
•
Enhanced user and network security
BI / Reporti ng BI / Reporting Exploration / Visualization Functional App Industry App Predictive Analytics Content Analytics Analytic Applications
IBM Big Data Platform
Systems Management Application Development Visualization & Discovery Accelerators
Information Integration & Governance Data Warehouse Hadoop System Stream Computing
© 2011 IBM Corporation
IBM’s Ecosystem
IBM Confidential Cognos Spreadsheets Applications In fo S erv er Data Marts SOA WebService Fin Planning
Mashups InfoSphere Warehouse InfoSphere Streams DB2
• ERP,CRM and Other Data Sources Cognos Real Time Monitoring Analytic Models Pre -pro ces sed D ata CDRs InfoSphere BigInsights
© 2012 IBM Corporation 11
Big data creates new possibilities for optimized outcomes and
competitive differentiation
•
Network analysis to
improve client experience
•
1.7 billion daily events
•
Support 1400+ users
with real-time reports
Improving Results
T-Mobile
Dublin City Council
Brocade
IBM Business Partner
•
Public transport
optimization
•
Analyzes 50 bus location
updates per second
•
Monitor 1000 buses
across 150 routes daily
New Approaches
•
Network security
intrusion detection
•
Sub-millisecond analysis
and response
•
No impact on network
performance
Strategic Advantage
© 2012 IBM Corporation 12
Energy and Utilities
• Smart Meter Analytics • Asset ManagementRetail
• Omni-channel Marketing • Real-time promotionsLaw Enforcement
• Multimodal surveillance • Cyber security detectionTransportation
• Logistics optimization • Traffic congestion
Financial Services
• Fraud detection• 360° View of the Customer
Digital Media
• Real-time ad targeting • Attribution Analysis
Health & Life Sciences
• Medical Record Analytics • Disease SurveillanceTelecommunications
• Customer Profile Monetization • Network Analytics & Optimization
© 2011 IBM Corporation
IBM InfoSphere Streams as Enabler
• Streaming analytic applications
•
Multiple input streams
•
Advanced streaming analytics
• Eclipse based IDE
•
Define sources, apply
operators, define intermediary
and final output sinks
•
User defined operators in Java
or C++
• Optimizing compiler automates
deployment and connections
•
Extremely low latency
•
No limits on cluster size
InfoSphere Streams Studio
(IDE for Streams)
Source Adapters
Sink Adapters Operator Repository
Automated, Optimized Deploy
and Management (Scheduler)
© 2012 IBM Corporation 14
ITS Application Flow-graph (125k GPS/second)
GPS source ISO 8601 to timestamp Time of day, day of week month of year Filter taxis and invalid values GeoMatch GpsTrackClean Sort Adapt
Travel times Join Query 5min Aggr. KML Color map DSP Inter-quartiles (IQ) week-ends IQ IQ Title Current Title
© 2012 IBM Corporation 19
© 2011 IBM Corporation
New opportunities continue to emerge for Telco players due to
hyper-growth in wireless demands
Source: ABI research
Enterprise Wireless Smart Phone and Mobile
Entertainment applications will drive >10 to 30x mobile traffic in next 6 years
Wireless industry will need to transform existing voice-oriented network to content- oriented network
M2M communication has become an multi-billion fast growing market, and will continue to grow 4x in 5 years Emerging Smart Grid, Public Security, Telematics 2.0 will drive broadband M2M growth
70% of mobile traffic will happen
in-building
Femtocell / picocell covering wireless in-door will grow >10x in the next 5 years
The volume of 4G femtocell / picocell will drive down the cost impacting wireless in enterprise
© 2011 IBM Corporation
Growth in wireless traffic presents new challenges to Telco companies
Effective
customer
retention
Contextual
awareness
Growing
Fraud
Newer Govt
Regulations
Smarter mktg
campaigns
Better asset
utilization
© 2011 IBM Corporation
22
MOVE FROM REACTION TO PREDICTION
CREATE VALUE FASTER
GAIN INSIGHT FROM THE INFORMATION EXPLOSION
ENGAGE THE ENTIRE VALUE CHAIN OPTIMIZED PERFORMANCE NEW INTELLIGENCE CONTENT PARTNERS NETWORK PARTNERS RETAIL PARTNERS DISTRIBUTORS DEVICE PARTNERS REAL TIME MEDIATION WAREHOUSE CONSOLIDATION MASTER DATA MANAGEMENT REAL-TIME ANALYTICS PROCESS PERFORMANCE METRICS
CHURN PREDICTION
KEY PERFORMANCE PREDICTORS BEHAVIORAL & SOCIAL NETWORK ANALYTICS CASHFLOW ANALYTICS CUSTOMER EXPERIENCE MANAGEMENT REAL-TIME CAMPAIGNS CAMPAIGN ANALYTICS MARKET BASKET ANALYSIS REAL TIME FRAUD
1
2
3
4
New Intelligence is helping wireless carriers sell more services, retain
customers, and operate in a low-cost, highly efficient, agile environment
© 2011 IBM Corporation
Customers Experiences. . .
•
A telco implementing a solution to access and analyze call, internet usage and
texting detail records (xDRs) in real-time.
•
91% reduction in time to merge data
•
92% reduction in time to load data (from 95 minutes to 8 minutes)
•
93% reduction in storage requirements
•
85% reduction in servers used (80 blades to 12 blades)
•
A telco requiring a solution to analyze up to 25M messages per second. At these
volumes, in-motion analysis is the only option.
•
Even at these volumes, Streams provided near linear scalability
•
“Streams handled at least an order of magnitude more events per second on
the same hardware than competitors.” (Telco’s Chief Architect)
•
A government customer required only 1.5 FTE Streams administrators for netflow
analysis and video & image analysis across
• ~15 geographically dispersed data centers
© 2011 IBM Corporation
Analysis of Call Detail Records for Customer Retention
Telcom Switch Call
Detail Records
InfoSphere Streams Mediation with
Churn and Social Analytics
Process CDRs in 1
min vs 12 hours
112 x86 cores vs
384 P5 cores
Deduplication in
Streams reduces
Warehouse work
Simultaneous
summaries and
analysis
Network Equipment
Providers
+
=
© 2011 IBM Corporation
Mediation & Revenue Assurance Performance at IDEA
1.01 Billion CDRs in 2 hours for all circles running Telcordia IN
average rate of 140K per second
• 2 HS22 blade dual CPU quad core servers
•
8 cores each, 2.5 GHz, 64 GB memory (total 16 cores)
• Avg CPU utilization: 75%
• Avg. memory utilization ~6GB
740% more!
62% fewer!
Change
505M CDR/hr
68M CDR/hr
16 x86 cores
42 P6 cores
After
Before
98 Million subscribers
Tier 1 operator in India
Operate in 22 circles
© 2011 IBM Corporation
Real Time Marketing at Southeast Asian Telco
Insight
Insight
Information
Information
pr
es
cr
ip
tiv
e
pr
es
cr
ip
tiv
e
Data
Data
ac
tiv
e
ac
tiv
e
B
u
s
in
e
s
s
f
le
x
ib
il
it
y
&
r
e
s
p
o
n
s
iv
e
n
e
s
s
Business value
“A moment’s insight is
sometimes worth a
life’s experience.”
Oliver Wendell Holmes
The Pain:
• 100M CDRs per day from SMS
from 25M subscribers
• Used to send bills to customers
• The Answer:
• InfoSphere Streams to create
thousands of concurrent realtime
marketing promotions
© 2011 IBM Corporation
Social Media Analysis
© 2012 IBM Corporation 28
How many people are talking about the film?
•Do they intend to actually see the film?
•Did the Super Bowl trailers have any impact?
Who are they?
•What is their demographic profile
•Are they highly influential?
•Are they avid movie-goers?
•Are they comic book fans?
What is their reaction?
•Did they like the trailer?
•What elements (plot, characters, etc.) had the best reaction?
•What elements (plot, characters, etc.) had the worst reaction?
•Why did they feel this way?
How does this compare to the competition?
•Compared to other trailers aired at the same time?
•Compared to other films releasing at the same time?
IBM analyzed over 1B social media posts to determine the
reaction to Disney trailers aired during the Super Bowl
© 2012 IBM Corporation
Conversations were collected in real-time providing
to-the-minute insight over a one month period
29 Jan 1 5pm 6pm 7pm 8pm
Super Bowl
Monitoring Period
• 1.1B tweets• 5.7M blog and forum posts • 3.5M relevant messages
• 97K referencing The Avengers • 18K referencing John Carter
• Buzz and sentiment
• Gender, Location and Occupation • Avid movie-goers, comic book fans • Intent to see specific films
• Specific attributes of the film/trailer
Data Set
Information extracted
Feb 5th
Golden Globes NFC Championship
© 2012 IBM Corporation
The data set included over 1.1B tweets and 5.7M blog & board
posts, from which 3.5M relevant conversations were identified
30 facebook.com gaiaonline.com kaskus.us www.gamefaqs.com babycenter.com imdb.com reddit.com weightwatchers.com thebump.com ar15.com comicbookresources.com fanforum.com uwants.com reddit.com/r/AskReddit/ city-data.com sherdog.net pinoyexchange.com investorshub.advfn.com boards.ie tripadvisor.com baktan.wordpress.com sherwinlaranga.com/winieville ksipnistere.blogspot.com ualrtoday.wordpress.com six03.posterous.com travelpod.com www.blair-cook.com doesgrey.wordpress.com chnp101.wordpress.com installblogs.com/blog bleacherreport.com blogs.forbes.com/network/rss/ merchantyellowpages.net/wordpress/ archiveofmytweets.wordpress.com halamovie.com indohr.blogspot.com bnotizie.com rockingappuse.wordpress.com www.iseenews.com americanbankingnews.com
-10% Direct Feed
-Targeted search via
PowerTrack
Hundreds of
Thousands of
Message Boards
Hundreds of
Thousands of
Blogs
Sample Board list Sample Blog list
© 2012 IBM Corporation
Questions that Drove the Investigation
Was the Super Bowl campaign effective?
Should Disney adjust creative and messaging?
Should the campaign be tailored around a specific segment or demographic?
Should the advertising campaign be adjusted to deal with emerging threats?
© 2012 IBM Corporation
Trailers airing on the Super Bowl resulted in 15 – 20 times the
daily buzz for John Carter and The Avengers
Super Bowl generated
roughly 15x more buzz than the daily average
6x more buzz than
commercials run during the NFC Championship Over 6,000 individual conversations on Feb 5th 32 January February January February
Super Bowl generated
roughly 20x more buzz than the daily average
Over 36,000 individual conversations on Feb 5th, 11,000 in February 6th
© 2012 IBM Corporation
The Avengers generated over 5 times more buzz than John
Carter
33 The Avengers John Carter T h e A v e n g e rs B u z z V o lu m e Jo h n C a rt e r B u z z V o lu m e The Avengers John Carter© 2012 IBM Corporation
Messages indicating intent to actually see the film in theaters were
extracted from general conversations to judge a trailers’ impact
% of conversations
indicating a desire to see
John Carter dropped rapidly
after the Super Bowl trailer
aired, indicating the content
and message of the trailer
received a poor reaction
34 Feb 5th(EST) % % # # B u z z V o lu m e B u z z V o lu m e Feb 5th(EST) In te n t a s % o f B u z z In te n t a s % o f B u z z
% of conversations
indicating a desire to see
The Avengers rose with the
general level of
conversations indicating the
trailer was effective in driving
purchasing behavior
© 2012 IBM Corporation
Of all film trailers aired during the Super Bowl, The Avengers
was the clear winner in terms of social media reaction
35 Feb 5th(EST) B u z z b y V o lu m e
Clear spikes indicate the time each trailer aired The Avengers generated more than double the level of buzz as the next highest competitor, Act of Valor Trailers that aired during the pre-game generated a small fraction of the buzz
compared to those that aired during the game
© 2012 IBM Corporation
Each new trailer resulted in a domination of film-related
share of voice
36 The Dictator Battleship John Carter The Lorax The Avengers Act of ValorShare of Voice of Trailers Aired During Super Bowl – 6pm to 10pm EST
© 2012 IBM Corporation
Sentiment was overwhelmingly positive for The Avengers and
mixed for John Carter during the Super Bowl
37
Of messages indicating a clear
positive or negative sentiment
(i.e. This film looks incredible
or This film looks terrible), The
Avengers maintained a level of
between 90-100% positive
reactions
N e t S e n tim e n t % S e n ti m e n t b y V o lu m e N e t S e n tim e n t % S e n ti m e n t b y V o lu m e Feb 5th Feb 5thThe Super Bowl trailer for John
Carter generated a substantial
amount of negative reactions,
pushing the net sentiment level
as low as 13% immediately
after the trailer aired
© 2012 IBM Corporation
Similar titles or titles releasing the same time as The Avengers
generally had a positive reaction, but at a much smaller scale
38 3,500 -100 S e n ti m e n t V o lu m e 100% 0% N e t S e n tim e n t% 6,500 -100 S e n ti m e n t V o lu m e 150 -50 S e n ti m e n t V o lu m e 100% 0% N e t S e n tim e n t% 100% -10% N e t S e n tim e n t%
Ghost Rider received negative reactions, but low volumes due to the trailer airing during pre-game which skew the figures
© 2012 IBM Corporation
John Carter received a much more negative reaction than its
competitors
39 7,500 -500 S e n ti m e n t V o lu m e 100% 0% N e t S e n tim e n t% 500 -100 S e n ti m e n t V o lu m e 100% 0% N e t S e n tim e n t%John Carter received the most negative reactions of any film trailer airing during the Super Bowl
© 2012 IBM Corporation 40
IBM SPSS Modeler – Model Building and Scoring
IBM InfoSphere Streams
InfoSphere Streams + SPSS Product Integration Architecture
42 © 2012 IBM Corporation
S
IBM SPSS Modeler Solution Publisher
In-m emor y In-m em ory Repository Repository IBM SPSS Collaboration & Deployment Services
Model Refresh
R
Change Notification
S SPSS Scoring Operator
SPSS Modeler Scoring Stream
R SPSS Repository Operator
P SPSS Publish Operator
File System
© 2012 IBM Corporation 43
In the new era of computing, outperformers will leverage the Volume,
Velocity and Variety of information for better business outcomes
Volume
Velocity
Variety
•
Broadest enterprise big data platform
•
Strongest business partner ecosystem
•
Leader in each era of computing
Embrace heterogeneous data types, schemas and data sources
Analyze streaming data and large volume bursts in milliseconds
Scale from terabytes to petabytes of structured and unstructured data
© 2012 IBM Corporation 44