NOTICE: Proprietary and Confiden5al
Opera Solu+ons, LLC
180 Maiden Lane 17th Floor
New York, NY 10038 +1 (646) 437 2100 telephone +1 (646) 437 2101 facsimile
www.operasolu5ons.com
Using Opera’s Signal Hub to Derive Profitable Insights
Joe Milana, Global Head of Analy5cs
NOTICE: Proprietary and Confiden5al
From Big Data to Small Data:
Opera Solu+ons, LLC
12230 El Camino Real, Suite 330 San Diego, California 92130 +1 (858) 480 3750 telephone +1 (858) 480 3727 facsimile
www.operasolu5ons.com
What we do
Make average people
extraordinary
Block & Tackling +
Machine Intelligence
èprosthetic to the
human mind
Turning Data into “Signals”
Broad-‐based indicators • Demographic variables • Macro-‐economic indicators Learn what “excites” the customer • World Cup 2014 (Sao Paulo) • FaceBook comments on new mobile phone External Environmental EX AMP LE S Develop “memory” of each customer’shistory • Data Usage • Product & Services Purchases Purchase History Discover products that “bond” together • Video on Demand • Smartphone Apps Product Affinity Unearth common DNA markers among similar customers • Segment “9” has
high Video on Demand propensity • Group 4, has higher games applica5on usage Signals Groups
Gauge and describe customer’s interacDon
• Customer’s e-‐
campaign open rate over last 3 months • Top 3 click-‐rate campaigns Campaign Other behavioural descriptors/triggers • Email responses • Web behaviour Usage SIGNALS LIBRARY Signal Selector Signal Genera+on Sta+c/Slow-‐ Moving Signals Fast-‐Moving/ Rate of Change
Signals
Customer State Signals
Signal Hubs: A Dynamic Collec5on of the Strongest Signals For a
Specific Domain (e.g., Marke5ng, Spend, Risk….)
Domain-‐
Specific
SIGNAL HUB
Signal library:
Opera’s cumula5ve knowledge within and across ver5calsClient-‐specific
signal crea+on:
Based on client data and situa5on
Con+nual signal
refinement and
genera+on:
New, emerging signals based on ongoing monitoring
Key Defini5ons –
Signal Hub
A produc5on system that ingests mul5ple internal and external data sources, manages
Signals, extracts Signal Values, manages and executes analy5c services, and exposes
all data and services via standard services interfaces
.
P R O D U C T I O N D I S C O V E R Y
Internal
Structured UnstructuredInternal StructuredExternal UnstructuredExternal
Data Inges+on Create / Calculate Variables Signal Detec+on Train/Adapt Models Evaluate and Package “Best” Signals and Models
Signals Libr
ar
y
Intelligent ETL Connect, Extract, Transform, & Load History
Context /Profile Storage
Signal Promo5on & Demo5on Signal Management Intelligent ETL Simula5on Services Batch Services
Visualiza5on & Control Services Transac5on Services
Applica5on Specific Models & Rules
Consumer Finance & Insurance Retail Marke+ng Services Government
Healthcare Procurement SIP/
AppGMS Data Sources Applica+ons Man + Machine Interfaces SigGMS Signal Storage Signal Hub Fe ed bac k Loop
Marke5ng
Opera5ons &
Finance Ac5ons
Spend &
Sourcing Ac5ons
Industry-‐Specific
Ac5ons
Fading/ABri5on Treatment Real-‐Time 360 Compe5tor View Individualized Customer Offers & Recommenda5onsPrice Elas5city By
SKU
Liquidity & Margin Op5miza5on Real Estate Op5miza5on Real-‐Time FA Service Ac+ons Technology Op5miza5on Healthcare Revenue Leakage Mobiuss™ Porpolio Valua5on & Risk
Automo5ve Pricing Op5mizer
Casino and Gaming S5mula5on Government Threat Assessment
Bust-‐Out &
Compromise Collec5on Targe5ng & Priori5za5on; Best Call Times;Op5miza5on
Early Warning Fraud/ Risk Anomaly Detec5on for Credit Gran5ng Enterprise-‐Wide Direct Spend Category-‐Based Indirect Spend Spend Control Revenue Protec5on Sourcing
Fraud &
Risk Ac5ons
Summary: Signal Hub Sample Applica5ons
Finding Bust-‐Out Candidates Earlier, in a Sea of Faint Signals
• Bust-‐outs were responsible for
$350MM+ in losses annually
• Key to loss preven5on: more accurate
iden5fica5on and earlier detec5on (7 days’ advance in predic5on could yield savings of $50MM annually for the client)
• Over 90% of bust-‐outs were iden5fied
too late in the process to stop fraud
• Therefore, it is both an Analy+c and
Business necessity to score accounts in near-‐real +me
Current Bust-‐out Detec+on Timeliness
Frequency distribu5on
days
Bust-‐out before detec5on Detec5on before Bust-‐out
91%
• Reduce bust-‐out losses through:
– Predic5ng bust-‐out accounts earlier – Priori5zing predicted cases to increase
manual review hit rate and total number of Bust-‐outs detected
A Fortune 50 financial credit card issuer engaged Opera to transform its current approach and methodology in detecDng Bust out fraud
< -‐14
-‐13
-‐12
-‐11
-‐10
-‐9
-‐8
-‐7
-‐6
-‐5
-‐4
-‐3
-‐2
-‐1
0
1
2
3
4
>5
C H A L L E N G E B A C K G R O U N D
Block and Tackling: Big Data Used to Build Predic5ve Model
Behavioral paLerns indicaDve
of bust-‐outs and credit abuse
were compiled. Variables
created to capture these
components were uDlized in a
predicDve model
Transac+on Ac+vity
• Propor5on of high-‐risk purchases (jewelry,
giw cards, casinos)
• Frequency of whole-‐value transac5ons
• Use of convenience checks
• Use of balance transfers
• Transac5on velocity
Payment Ac+vity
• Propor5on/frequency of payment reversals
• Payment amount
• Payment frequency
Nonmonetary Ac+vity • Number of trade lines (Bureau data)
• Credit Bureau Scores
• Frequency of payment status inquiry
• Frequency of credit line increase requests
A P P R O A C H
• Customer ac5vity paBerns were
monitored on a daily basis to iden5fy paBerns predic5ve of Bust-‐outs
• Mul5tude of new metrics were defined
and used in the detec5on algorithm:
– Transac5on ac5vity, e.g. propor5on
of high risk purchases, transac5on velocity, use of BT
– Payment ac5vity, e.g. payment
frequency, payment amount, payment reversals
– Non-‐monetary ac5vity, e.g. CL
increase requests, geographical loca5on, status queries
• A new, neural net based predic5ve
model which significantly improved detec5on accuracy, 5 days earlier
Model Lift Curve1
0% 100% 30% 40% 10% 100% 20% 0% Population Capture 80% 70% 60% 50% 40% 30% 20% 10% 50% 90% 70% 80% 90%
Bustout Capture Rate
60% Random Legacy Score Logis5c Model Neural Network Old New Lead Time (days) -‐ 5 Ac+on Rate (%) 7 25
Impact
The Neural Network framework yielded a vastly superior tool
R E S U L T S
A non-‐linear adapDve analyDcs approach was used for credit abuse detecDon to provide a beLer predicDve power and ability to idenDfy accounts earlier in the cycle
Legacy Score
Bu
st-‐
ou
t S
co
re
200
>940
0
980
70-‐80%
20-‐30%
10-‐15%
3-‐5%
Segment Hit Rates
1-‐2%
1
2
3
4
5
#
Segment Number
Legacy Score Residual Segment Missing Legacy Score
Combining the Model Score and the Legacy Score, the high scoring region was divided into 5 segments, each associated with a suggested treatment strategy. The overall fraud losses were reduced by more than $75MM
I M P L E M E N T A T I O N A N D I M P A C T S C O R E C O M B I N A T I O N
• Five segments are created based on
the joint use of the Bust-‐out Score (i.e. from the predic5ve model) and of the Legacy score
• Segmented treatments are proposed
based on the probability of an account being a bust-‐out:
– Segment 1: Block authoriza5ons
– Segments 2 & 3: Float payments
– Segments 4 & 5: Manual review
• A non-‐linear approach to bust-‐outs
detec5on iden5fied accounts earlier and re-‐priori5zed higher-‐value cases – the overall loss was reduced by more than $75MM (of a base of $225MM)
Cu st omer In terac+o ns Soc ial N et w or ks
I am not happy with your service Please bring back the salad
I love this restaurant
Thanks for the couponJ
Case Study 2: Plumbing Social Media for Voice of Public
Extrac5ng and aggrega5ng subjec5ve informa5on present in large chunks of social media text,
revealing what consumers like, don’t like and prefer
We have developed a process for fast and accurate extrac5on of sen5ment from community forum data
THE PROCESS U N D E R S T A N D C L I E N T B U S I N E S S V O C A B U L A R Y I D E N T I F Y S I G N A L S I N D A T A P R E P A R E T R A I N I N G D A T A T R A I N A N D R U N S E N T I M E N T E X T R A C T O R Sen6ment Analysis Cu st omer Percep +o n Im pac t Ev alua+ on Cu st om er Re ac +o ns Br and Popular ity Inde x Most popular brand Biggest complain % Happy % Unsa5sfied % Angry Most liked feature What’s Hated? Why Popular? What’s Preferred? TEXT ANALYSIS IN ACTION
12
© 2012 Opera Solu5ons, LLC. All rights reserved.
Block & Tackling: OSTAP Providing Situa5onal Awareness Needs
A 360 Degree Awareness View:
•
Global, Mul5-‐Language Threat and Informa5onal Monitoring of the Blogosphere
(Social Media) and Iden5fied Radical Ac5vity Sites – Threat monitoring for Open
and certain Private source nega5ve sen5ment in one or more threat categories.
They are defined as Violent, Non-‐Violent, Proximity and Event Specific.
•
An Unblinking Eye Focused On Threat Detec5on -‐ 26 Major Social Media portals,
(Facebook, Twifer, Word Press, etc.) combined with Recognized News Feeds,
Message Boards and Forums for over 150 million informa+on points con5nuously
monitored, searched and assessed 365x24x7 across 56 mul+ple languages in
increments as liBle as every 30 minutes.
Ø
Sophia Technology Drives the Linguis+c Analy+cs Used within the “OSTAP plalorm”:
Ø Was built for and in use within the three leBer agency Intelligence community
•
Empowering Analysts’ Con5nuous Monitoring Efficiency and Ad-‐hoc Searching
Capabili5es through automated solu+ons and portal based tools that can be
incorporated within your worksta5ons.
Machine Intelligence OSTAP: Threat Scenario Monitoring
Providing Ac5onable Intelligence -‐ External & Internal Searching that
Iden5fied Future Threat Window Events Requiring An Internal Response
SituaDonal Awareness – Automated E-‐mail Alerts
•
Violent Threat Monitoring – Militant Orgs, Individual threats posted in Social Media
•
Non-‐Violent Threat Monitoring – Civil Disobedience Groups, GeoPolitcal Unrest
•
Ad-‐Hoc Nega6ve Sen6ment Monitoring of Future company Related Events, ExecuDves,
Employees and Key Business rela6onships – (Shareholder MeeDngs, Employee AcDons,
Media Covered Events,)
•
Event Proximity Disrup6ons – General Public Events Within Proximity to company assets (G8,
NATO Summit, London Olympics. Mayday CelebraDon, etc.)
AutomaDng Ad-‐Hoc Searching – Web-‐based Portal Searching
•
Manual ConDnuous Threat Monitoring and searching for related links from ExisDng Vendor
Supplied InformaDon
•
Conducts Internet searching from internally developed keywords and linked relaDonships
14
© 2012 Opera Solu5ons, LLC. All rights reserved.
Prosthe5c: OSTAP Threat Scenario Monitoring
SituaDonal Awareness – Imminent Violent Threat Scenario AffecDng US Company Overseas During Spanish EU Crisis Protests Detected by OSTAP before Event Occurred From Three Independent Info Sources
1. OSTAP Violent Threat Alert – High Severity
ID 69529184758
CONTENT Estoy bastante segura de que voy a poner una Bomba en el centro de salud de moratalaz
TRANSLATED_CONTENT I'm prefy sure I'll put a bomb in the health center Moratalaz
EXTERNAL_ID 223348160377012225
SOURCE <a href="hBp://blackberry.com HOST twiBer.com URL 223348160377012225 MEDIA_PROVIDER TWITTER MEDIA_TYPE_ID 8 LANGUAGE_ID es SPAM_RATING 0 PUBLISH_DATE 2012-‐07-‐12 15:28:45 -‐0400 HARVEST_DATE 2012-‐07-‐12 15:34:33 -‐0400 TWITTER_LOCATION Madrid TWITTER_FOLLOWER_COUNT 104 TWITTER_FRIEND_COUNT 143
3. OSTAP Violent Threat Alert – High Severity
Sucursal de Company X en Madrid: Calle de XXXXXX
, 28006 Madrid. Una visi+ta de cortesía es de bien agradecidos. Translated
Branch of Company X XXXXX in Madrid: Calle de XXXXXX, 28006 Madrid. A complimentary visi+ta is grateful.
5:17 AM -‐ 13 Jul 12
2. OSTAP Violent Threat Alert – High Severity
ID 69174679421
CONTENT Mientras que 25.000 personas apoyan en Madrid la Marcha Negra, Company X especula con XXXXX ón en algovamal
TRANSLATED_CONTENT While 25,000 people suppor+ng the Black March in Madrid, Company X speculated XXXXX algovamal
Marchers on Way To Madrid’s Financial District
1. The US Company accused in local press of pu}ng protes5ng trade group employees out of work 2. Bomb Threat
Tweet within Proximity of US Company Office 3. US Company’s
Foreign Address Posted in
“Request” for someone to pay a “Complimentary Visit”
Case Study 3: Wealth Management example
The Challenge: Massive data flow overwhelming human capacity
Limited use of
intellectual capital
High variability in
performance
Multiplied by:
15,000
Advisors3MM
Accounts1MM
transactions/day i i iDAILY DATA
FLOW
Book of business data(transactions, positions, etc.) (Average book:
• 200+ accounts
• 5,000 positions – hundreds
unique) Equity ratings –
160k products to select from
Research ideas – 250/day; 200,000+/year Peer activity i i i
300
GB
historical
9
GB
daily
Generate
Master
Data
(2 Hours/
10TB)
Extract
DNA (<30
minutes/
50GB)
Signal
Generation(20
minutes/ 10TBs to
50GBs)
Block & Tackling: Processed every day within 4-hour window
Scoring
(20 minutes/20GB)
Output to User
Interface
10GB dataset
Parallel Processing Nodes+
DAILY DATA INPUT
120+TB Hadoop Distributed File System (HDFS)
21 machines; 330+ CPU cores; 1TB RAM
Each FA’s performance is composed of various types of accounts that must be clustered into
“like peers” prior to
comparison
400+ Peer Groups
Machine Intelligence (1): FA Peer Group Clustering and Performance
Built-‐in filters Life Stage Client Age Discre+on Status Products Others 5 3 6 ? x x x 4 x
Changes Over Time
Calculate performance across mul5ple 5me periods while considering changes in
• Objec5ves
• Discre5on status, age, assets
• Account ac5vity, open/closed
date, household, etc.
Calculate performance across mul5ple metrics:
• Net and Gross Return
• Sharpe Ra5o
• Informa5on Ra5o
• Treynor Ra5o
• Others
MulDple Metrics
Measuring FA performance accurately requires complex data modeling and Big Data processing to
Si
gn
al
Pro
ce
ssi
ng
Signals extracted every day
(fast, slow, transient, persistent)
12MM+
Recos
Created
and
Matched
With
Advisors
and
Accounts
Machine Intelligence (2): From master dataset to refined Signal
Sentiment Behavior Anomalies Clusters
S A M P L E A L G O R I T H M S
• K-means
• Neural Nets
• Singular Vector Decomposition
• Kalman Filters
• Support Vector Machine
• Wavelets Positive/neutral/ negative on: • Single-name • Sector • Markets • Geography • Market direction, magnitude • Propensity to transact • Fading • Advisors’ performance • Tail risk detection • Non-suitable investments • Portfolio churn • Abnormal transactions • Trading behavior • Asset allocation • Product • Strategy popularity