Large-Scale Machine Learning at Verizon
Ashok N. Srivastava, Ph.D.
Chief Data Scientist, Verizon
A Transition in Roles
•
Enormous data volumes
•
Massive computing infrastructure
Organizations started to recognize they are
operating with blind spots
1 in 3
business leaders frequently make
critical decisions without the
information they need
53%
don’t have access to the information
across their organization needed to
do their jobs
Factors supporting major decisions
79 %
52 %
62 %
To a little
extent
To a great
extent
Analytics
Personal
Experience
Collective
Experience
200 TB
metadata/day
Mobile in the United States
Social Already Mobile
Smartphone
consumers using
social
70%
58%
of social
media time
4 IN 10
social users
bought after sharing on
Sources: eMarketer, Jumptap & comScore, Vision Critical, Verizon internal data
Data Traffic Exploding
40%-45% growth
projected per year
2,000 Terabytes
per day
1 Terabyte (TB) = ~ 1,000 GB
Consumers are Mobile
will
Game
74%
will listen to
Music
41%
will watch
Videos
45%
of 2013 Black
Friday
ecommerce
21%
Smartphone
penetration
61%
2013
80%
2017
7%
CAGR•
What site?
•
What page?
•
Last site?
•
Next site?
•
From where?
•
What time?
•
What app?
•
From who?
Revenue Streams due to Analytics
The Orion Cluster at Verizon
FiOS
Other 3
rdparty
data
PIP and other
Enterprise Data
VZW Network, Clickstream,
Time, Location Data
Publicly
available data
Data
Feeds
Big Data
Infrastructure
Reporting
and
Advanced
Analytics
Hadoop
Ecosystem
Massive
Parallel
Processing
RDBMS
NoSQL
DBMS
Apache
Projects
BI
Reporting
&
Dashboards
Domain-Specific
Rules Engine
Anomaly &
Pattern
Detection
Prediction
Algorithms
Insight
Discovery
Recommendation
Engine
VZ Management Center
Vertical Solutions
Web Services
& APIs
•
Advertising
•
Managed Network
Services
•
Cybersecurity (unified network)
•
Network Health Management
•
M2M…
8
Confidential and proprietary materials for authorized Verizon personnel and outside agencies only. Use, disclosure or distribution of this material is not permitted to any unauthorized persons or third parties except by written agreement.
Precision Market Insights
Connecting marketers with consumers, improving
engagement and driving audience response
New Challenges for Marketers
The rise of Big Data & Mobile are
complicating marketing activities
Onslaught of 1
st
and
3
rd
party information
Siloed data stores
across internal
groups
Burgeoning channel for
customer interactions
The amount of data
from mobile is
exploding
Data
Mobile
Lack of standards for
tracking and
targeting
Access to key
information from
channels
Mobile can be
the solution
•
Bridge Digital & Physical
•
Direct customer relationship
•
Context for cross-channel
interactions
•
Increased efficiency of
The Online/Mobile Ad Ecosystem
BRANDS
$1.00
AGENCIES
$0.10
AD NETWORKS
$0.40
PUBLISHERS
$0.35
CONSUMER
AD
EXCHANGES
$0.10
Buy
wholesale
inventory for
advertisers
Streamline
multiple ad
networks for
publishers
ENABLER
(Targeting,
Data provider)
$0.05
Mobile + Big Data Solutions for
Marketing
Precision enables better 1:1 understanding of customers across physical and digital
contexts to drive more relevant and personalized interactions
Location
Demographics
Clickstream
App
Usage
Married 35-40 year old female in DC metro
interested in tennis and luxury brands;
recently browsed Tory Burch shoes at
m.Nordstrom.com and visits their store 3
times per month.
• Known interest in sports
• Previous basic app download
• Have basic app but not premium app
Case Study: Relevant Mobile Ads Drive A Major Sports League
Engage a targeted audience to click on a mobile banner ad,
inviting them to purchase and download the NFL Premium App.
Privacy
OBJECTIVE
AUDIENCE
RESULT
Precision can reach a broad audience across multiple properties and devices,
accurately targeting specific mobile user segments.
PRECISION MEETS THE CHALLENGE
Had 64% higher CTR compared
to other mobile media properties
Precision
Others
0.38%
0.23%
Adaptive Architecture for
Optimal Advertising
13
Data Driven
Customer Model
Automated
Decision Maker
Observed
Consumer
Behavior
Adjustment
Mechanism
Automatic Profile Discovery
Confidential and proprietary materials for authorized Verizon personnel and outside agencies only. Use, disclosure or distribution of this material is not permitted to any unauthorized persons or third parties except by written agreement.
Large-scale Machine Learning with
Applications to Connected Machines
Confidential and proprietary materials for authorized Verizon personnel and outside agencies only. Use, disclosure or distribution of this material is not permitted to any unauthorized persons or third parties except by written agreement.
16
Connected Ecosystem Market Size
By 2017, in the U.S., there will
be…
207 Million
Smartphones
Over 426 Million
Ways to Interact
with Customers
Source:
eMarketer, August 2013 and Frost & Sullivan 2013
Healthcare
Finance
Energy
Transportation
Distribution
Confidential and proprietary materials for authorized Verizon personnel and outside agencies only. Use, disclosure or distribution of this material is not permitted to any unauthorized persons or third parties except by written agreement.
17
Telematics / Fleet Management
Today
OEM & AFTERMARKET TELEMATICS
COMMERCIAL TELEMATICS
ENERGY | SECURITY
EDUCATION
MANUFACTURING
RETAIL
FINANCIAL SERVICES
Others
HEALTHCARE
2012
2013
2014
C O M M E R C I A L
T E L E M A T I C S :
F L E E T
HUGHES
ACQUISITION
C O M M E R C I A L T E L E M A T I C S :
W O R K F O R C E | W O R K - O R D E R
| A S S E T M G T .
C O N S U M E R T E L E M A T I C S :
A F T E R M A R K E T |
I N F O T A I N M E N T
S M A R T T R A N S P O R T A T I O N :
T R A F F I C | P A R K I N G
S M A R T T R A N S P O R T A T I O N :
C A R
S H A R I N G | R A I L | P U B L I C
T R A N S P O R T | A S S E T M G T . | M U N I
T R A N S P O R T
S M A R T B U I L D I N G S
R E T A I L
S M A R T T R A N S P O R T A T I O N :
A I R | M A R I T I M E
C O N S U M E R T E L E M A T I C S :
O E M E M B E D D E D
Confidential and proprietary materials for authorized Verizon personnel and outside agencies only. Use, disclosure or distribution of this material is not permitted to any unauthorized persons or third parties except by written agreement.
18
Connected Machine Landscape (I)
• Location analytics
• Comparative analytics
across vehicle
makes/models
• Prognostic and diagnostic
vehicle health
management (self-healing
autonomous systems)
Car
250M cars on the road in the
US, with about 20M new cars
sold each year.
Confidential and proprietary materials for authorized Verizon personnel and outside agencies only. Use, disclosure or distribution of this material is not permitted to any unauthorized persons or third parties except by written agreement.
19
Connected Machine Landscape (II)
• Driver behavior (safety,
remote management)
• Introduces uncertainty
• Need for personalization
• Autonomous control
• Consumer behavior
(infotainment, advertising,
etc.)
• Information retrieval
• Relevance
Car + Consumer
Over 100M mobile
subscribers currently on
Verizon’s network
Confidential and proprietary materials for authorized Verizon personnel and outside agencies only. Use, disclosure or distribution of this material is not permitted to any unauthorized persons or third parties except by written agreement.
20
Connected Machine Landscape (III)
• Environmental inputs
(Traffic, weather,
emergency services, etc.)
• Data correlation
• Distributive decision
making and
optimization
Car + Consumer + Environment
100s of TB of environmental
data generated daily
Confidential and proprietary materials for authorized Verizon personnel and outside agencies only. Use, disclosure or distribution of this material is not permitted to any unauthorized persons or third parties except by written agreement.
21
Verizon has core competency in advanced analytics areas
including:
• Anomaly Detection
• Diagnostics
• Predictive Analytics
• Time Series Analysis and Forecasting
• Correlations and Association Analysis
• Optimization
Advanced Analytics for Connected
Machines
Confidential and proprietary materials for authorized Verizon personnel and outside agencies only. Use, disclosure or distribution of this material is not permitted to any unauthorized persons or third parties except by written agreement.
22
Connected Machine Health Management
3.
Prognosis
“What will
happen next
and when?”
2. Diagnosis
“What is happening?”
1.
Anomaly
Detection
“Is something
different?”
4.
Mitigation
“
What actions
to take?”
Manual intervention or
autonomous and
self-healing (“intelligent”)
systems
Predictive
Analytics
Real- and near-real-
time monitoring and
anomaly detection
Comprehensive and
real-time view of big
data relevant to
diagnosis
Data and domain rules
Domain rules and
expert opinion
Domain rules and
expert opinion
Confidential and proprietary materials for authorized Verizon personnel and outside agencies only. Use, disclosure or distribution of this material is not permitted to any unauthorized persons or third parties except by written agreement.
23
Multiple Kernel Anomaly Detection
Multiple
Kernel
Anomaly
Detection
(MKAD)
Discrete
Textual
Networks
Continuous
Primary Source:
Switches, routers
and other
machines
Primary Source:
Maintenance
Reports
Primary Source:
Network
connectivity
Optimization problem
j
i
j
i
j
i
K
Q
,
,
min
2
1
i
l
i
1
,
0
0
,
1
,
1
i
i
Subject to:
One class SVMs training algorithms require solving the quadratic problem
Dual form
Linear equality
constraint
Bounds on design
variables
Control parameter
: Lagrange multipliers of the primal QP problem
Anomaly scores
Data points with will
be the support vectors
Value of
h: degree of anomalousness
Sign of
h: if negative – outlier
if positive - normal
Indicator
0
k
i
z
i
i
z
K
f
h
,
,
,
,
Decision boundary is determined only by margin and non-margin support
vectors obtained by solving the QP problem
The Impact of Big Data Innovations…
26
Inc
rea
si
ng
D
egr
ee
of
An
oma
lou
sne
ss
Numeric Inputs Only
Numeric and Text Inputs
The
addition
of Numeric and Text Data in the new
MKAD algorithm
significantly improves
(by as much as 7000 points) the ranking of the monitored anomalies.
Anomaly Detection using NASA MKAD Algorithm
•Aviation Safety Program Annual Review 2012
| SSAT Project
•27
Reported Exceedance
Level 3: Speed low at touchdown
Level 2: Flaps at questionable setting at landing
28
Confidential and proprietary materials for authorized Verizon personnel and outside agencies only. Use, disclosure or distribution of this material is not permitted to any unauthorized persons or third parties except by written agreement.
Confidential and proprietary materials for authorized Verizon personnel and outside agencies only. Use, disclosure or distribution of this material is not permitted to any unauthorized persons or third parties except by written agreement.
29
Telematics
What Can We Build?
W O R K F L O W
W O R K F O R C E
F L E E T
I N V E N T O R Y
C L O U D
AN AL Y T I C S
N E T W O R K
Monitor fleet location, condition
and driver behavior
Align resource capacity with
work and direct the right
resource to the right location at
the right time
Manage work and share updates
both internally and externally
Monitor the location and condition
of cargo
Confidential and proprietary materials for authorized Verizon personnel and outside agencies only. Use, disclosure or distribution of this material is not permitted to any unauthorized persons or third parties except by written agreement.
30
Connected Car Solutions
After market data collectors (for all modern makes and models)
Confidential and proprietary materials for authorized Verizon personnel and outside agencies only. Use, disclosure or distribution of this material is not permitted to any unauthorized persons or third parties except by written agreement.
31
1 Billion Miles
Chicago
,
IL
Houston, TX
United States
New York, NY
Dallas, TX
Detroit, MI
Confidential and proprietary materials for authorized Verizon personnel and outside agencies only. Use, disclosure or distribution of this material is not permitted to any unauthorized persons or third parties except by written agreement.
32
Real-world
MPG
calculations are sampled from
thousands of vehicles, not just a handful.
Confidential and proprietary materials for authorized Verizon personnel and outside agencies only. Use, disclosure or distribution of this material is not permitted to any unauthorized persons or third parties except by written agreement.
33
Driver behavior is analyzed at multiple levels.
Driver behavior can
be compared and
contrasted to drivers
from other OEM
vehicles.
Honda
Vehicles:
32.65 mi/day
Nissan
Vehicles:
33.22 mi/day
Mitsubishi
Vehicles:
33.32 mi/day
Ford
Vehicles:
32.08 mi/day
Volkswagen
Vehicles:
33.69 mi/day
Cadillac (GM)
Vehicles:
29.24 mi/day
Toyota
Vehicles:
31.92 mi/day
Confidential and proprietary materials for authorized Verizon personnel and outside agencies only. Use, disclosure or distribution of this material is not permitted to any unauthorized persons or third parties except by written agreement.