• No results found

From Big Data to Small Data:

N/A
N/A
Protected

Academic year: 2021

Share "From Big Data to Small Data:"

Copied!
19
0
0

Loading.... (view fulltext now)

Full text

(1)

NOTICE:  Proprietary  and  Confiden5al  

Opera  Solu+ons,  LLC  

180  Maiden  Lane   17th  Floor  

New  York,  NY  10038   +1  (646)  437  2100  telephone   +1  (646)  437  2101  facsimile    

www.operasolu5ons.com    

Using  Opera’s  Signal  Hub  to  Derive  Profitable  Insights  

Joe  Milana,  Global  Head  of  Analy5cs  

NOTICE:  Proprietary  and  Confiden5al  

From  Big  Data  to  Small  Data:  

Opera  Solu+ons,  LLC  

12230  El  Camino  Real,  Suite  330   San  Diego,  California  92130   +1  (858)  480  3750  telephone   +1  (858)  480  3727  facsimile    

www.operasolu5ons.com      

(2)

What we do

Make average people

extraordinary

Block & Tackling +

Machine Intelligence

èprosthetic to the

human mind

(3)

Turning  Data  into  “Signals”  

Broad-­‐based   indicators       •  Demographic   variables   •  Macro-­‐economic   indicators   Learn  what   “excites”  the   customer       • World  Cup  2014   (Sao  Paulo)   • FaceBook   comments  on   new  mobile   phone   External   Environmental   EX AMP LE S   Develop  “memory”   of  each  customer’s  

history     •  Data  Usage     •  Product  &   Services   Purchases   Purchase   History   Discover  products   that  “bond”   together       • Video  on   Demand   • Smartphone   Apps   Product   Affinity   Unearth  common   DNA  markers   among  similar   customers   •  Segment  “9”  has  

high  Video  on   Demand   propensity     •  Group  4,  has   higher    games   applica5on   usage   Signals   Groups  

Gauge  and  describe   customer’s   interacDon    

•  Customer’s    e-­‐

campaign  open   rate  over  last  3   months   •  Top  3  click-­‐rate   campaigns   Campaign   Other  behavioural   descriptors/triggers   •  Email  responses   •  Web  behaviour   Usage   SIGNALS  LIBRARY   Signal  Selector   Signal   Genera+on   Sta+c/Slow-­‐ Moving  Signals   Fast-­‐Moving/ Rate  of  Change  

Signals  

Customer   State  Signals  

(4)

Signal  Hubs:  A  Dynamic  Collec5on  of  the  Strongest  Signals  For  a  

Specific  Domain  (e.g.,  Marke5ng,  Spend,  Risk….)  

Domain-­‐

Specific  

SIGNAL  HUB  

Signal  library:      

Opera’s   cumula5ve   knowledge   within  and   across  ver5cals  

Client-­‐specific  

signal  crea+on:  

Based  on  client   data  and  situa5on  

Con+nual  signal  

refinement  and  

genera+on:      

New,  emerging   signals  based  on   ongoing  monitoring  

(5)

Key  Defini5ons  –  

Signal  Hub

 

A  produc5on  system  that  ingests  mul5ple  internal  and  external  data  sources,  manages  

Signals,  extracts  Signal  Values,  manages  and  executes  analy5c  services,  and  exposes  

all  data  and  services  via  standard  services  interfaces

.    

P R O D U C T I O N   D I S C O V E R Y  

Internal  

Structured   UnstructuredInternal     StructuredExternal     UnstructuredExternal    

Data  Inges+on   Create  /  Calculate   Variables   Signal  Detec+on   Train/Adapt   Models   Evaluate  and   Package  “Best”   Signals  and  Models  

Signals  Libr

ar

y  

Intelligent  ETL   Connect,  Extract,  Transform,  &  Load   History  

Context  /Profile  Storage  

Signal  Promo5on    &  Demo5on    Signal  Management   Intelligent  ETL   Simula5on  Services   Batch  Services  

Visualiza5on  &  Control  Services   Transac5on  Services  

Applica5on  Specific   Models  &  Rules  

Consumer   Finance  &   Insurance   Retail   Marke+ng   Services   Government  

Healthcare   Procurement  SIP/  

AppGMS   Data   Sources   Applica+ons   Man  +   Machine     Interfaces   SigGMS   Signal  Storage   Signal  Hub   Fe ed bac k   Loop  

(6)

Marke5ng  

Opera5ons  &  

Finance  Ac5ons  

 

Spend  &  

Sourcing  Ac5ons  

Industry-­‐Specific  

Ac5ons  

Fading/ABri5on   Treatment     Real-­‐Time  360   Compe5tor  View     Individualized   Customer  Offers  &   Recommenda5ons  

Price   Elas5city  By  

SKU  

 

Liquidity  &  Margin   Op5miza5on     Real  Estate   Op5miza5on     Real-­‐Time  FA   Service  Ac+ons     Technology   Op5miza5on     Healthcare  Revenue   Leakage     Mobiuss™   Porpolio   Valua5on  &  Risk  

 

Automo5ve  Pricing   Op5mizer  

 

Casino  and  Gaming   S5mula5on   Government   Threat   Assessment    

Bust-­‐Out  &  

Compromise     Collec5on   Targe5ng  &   Priori5za5on;   Best  Call  Times;  

Op5miza5on  

 

Early  Warning  Fraud/ Risk     Anomaly   Detec5on   for  Credit   Gran5ng     Enterprise-­‐Wide   Direct  Spend     Category-­‐Based   Indirect  Spend     Spend  Control   Revenue  Protec5on     Sourcing    

Fraud  &    

Risk  Ac5ons  

Summary:  Signal  Hub  Sample  Applica5ons  

(7)

Finding  Bust-­‐Out  Candidates  Earlier,  in   a  Sea  of  Faint  Signals  

•  Bust-­‐outs  were  responsible  for  

$350MM+  in  losses  annually  

•  Key  to  loss  preven5on:    more  accurate  

iden5fica5on  and  earlier  detec5on  (7   days’  advance  in  predic5on    could   yield  savings  of  $50MM  annually  for   the  client)  

•  Over  90%  of  bust-­‐outs  were  iden5fied  

too  late  in  the  process  to  stop  fraud  

•  Therefore,  it  is  both  an  Analy+c  and  

Business  necessity  to  score  accounts   in  near-­‐real  +me    

 

Current  Bust-­‐out  Detec+on  Timeliness  

Frequency  distribu5on

 

days

 

Bust-­‐out  before  detec5on   Detec5on     before     Bust-­‐out  

91%

 

•  Reduce  bust-­‐out  losses  through:  

–  Predic5ng  bust-­‐out  accounts  earlier   –  Priori5zing  predicted  cases  to  increase  

manual  review    hit  rate  and  total   number  of  Bust-­‐outs  detected  

A  Fortune  50  financial  credit  card  issuer  engaged  Opera  to  transform  its  current  approach  and   methodology  in  detecDng  Bust  out  fraud  

<  -­‐14

 

-­‐13

 

-­‐12

 

-­‐11

 

-­‐10

 

-­‐9

 

-­‐8

 

-­‐7

 

-­‐6

 

-­‐5

 

-­‐4

 

-­‐3

 

-­‐2

 

-­‐1

 

0

 

1

 

2

 

3

 

4

 

>5

 

C H A L L E N G E   B A C K G R O U N D  

(8)

Block  and  Tackling:  Big  Data  Used  to  Build  Predic5ve  Model  

Behavioral  paLerns  indicaDve  

of  bust-­‐outs  and  credit  abuse  

were  compiled.  Variables  

created  to  capture  these  

components  were  uDlized  in  a  

predicDve  model  

Transac+on  Ac+vity  

•  Propor5on  of  high-­‐risk  purchases  (jewelry,  

giw  cards,  casinos)  

•  Frequency  of  whole-­‐value  transac5ons  

•  Use  of  convenience  checks  

•  Use  of  balance  transfers  

•  Transac5on  velocity  

Payment  Ac+vity  

•  Propor5on/frequency  of  payment  reversals    

•  Payment  amount  

•  Payment  frequency  

Nonmonetary  Ac+vity   •  Number  of  trade  lines  (Bureau  data)  

•  Credit  Bureau  Scores  

•  Frequency  of  payment  status  inquiry  

•  Frequency  of  credit  line  increase  requests  

(9)

A P P R O A C H  

•  Customer  ac5vity  paBerns  were  

monitored  on  a  daily  basis  to  iden5fy   paBerns  predic5ve  of  Bust-­‐outs  

•  Mul5tude  of  new  metrics  were  defined  

and  used  in  the  detec5on  algorithm:  

–  Transac5on  ac5vity,  e.g.  propor5on  

of  high  risk  purchases,  transac5on   velocity,  use  of  BT  

–  Payment  ac5vity,  e.g.  payment  

frequency,  payment  amount,   payment  reversals  

–  Non-­‐monetary  ac5vity,  e.g.  CL  

increase  requests,  geographical   loca5on,  status  queries  

•  A  new,  neural  net  based  predic5ve  

model  which  significantly  improved   detec5on  accuracy,  5  days  earlier  

Model Lift Curve1

0% 100% 30% 40% 10% 100% 20% 0% Population Capture 80% 70% 60% 50% 40% 30% 20% 10% 50% 90% 70% 80% 90%

Bustout Capture Rate

60% Random   Legacy  Score   Logis5c  Model   Neural  Network   Old   New   Lead  Time  (days)   -­‐   5   Ac+on  Rate  (%)   7   25  

Impact

The  Neural  Network   framework  yielded  a   vastly  superior  tool  

R E S U L T S  

A  non-­‐linear  adapDve  analyDcs  approach  was  used  for  credit  abuse  detecDon  to  provide  a  beLer   predicDve  power  and  ability  to  idenDfy  accounts  earlier  in  the  cycle  

(10)

Legacy  Score  

Bu

st-­‐

ou

t  S

co

re

 

200

 

>940

 

0

 

980

 

70-­‐80%

 

20-­‐30%

 

10-­‐15%

 

3-­‐5%

 

Segment  Hit  Rates

 

1-­‐2%

 

1

 

2

 

3

 

4

 

5

 

#

 

Segment  Number

 

Legacy  Score   Residual  Segment   Missing   Legacy   Score  

Combining  the  Model  Score  and  the  Legacy  Score,  the  high  scoring  region  was  divided  into  5  segments,  each   associated  with  a  suggested  treatment  strategy.  The  overall  fraud  losses  were  reduced  by  more  than  $75MM  

I M P L E M E N T A T I O N   A N D   I M P A C T   S C O R E   C O M B I N A T I O N  

•  Five  segments  are  created  based  on  

the  joint  use  of  the  Bust-­‐out  Score  (i.e.   from  the  predic5ve  model)  and  of  the   Legacy  score  

•  Segmented  treatments  are  proposed  

based  on  the  probability  of  an  account   being  a  bust-­‐out:  

–  Segment  1:  Block  authoriza5ons  

–  Segments  2  &  3:  Float  payments  

–  Segments  4  &  5:  Manual  review  

•  A  non-­‐linear  approach  to  bust-­‐outs  

detec5on  iden5fied  accounts  earlier   and  re-­‐priori5zed  higher-­‐value  cases  –   the  overall  loss  was  reduced  by  more   than  $75MM  (of  a  base  of  $225MM)  

(11)

Cu st omer    In terac+o ns   Soc ial     N et w or ks  

I  am  not   happy  with   your  service   Please  bring   back  the  salad  

I  love  this   restaurant  

Thanks  for  the   couponJ  

Case  Study  2:  Plumbing  Social  Media  for  Voice  of  Public  

Extrac5ng  and  aggrega5ng  subjec5ve  informa5on  present  in  large  chunks  of  social  media  text,  

revealing  what  consumers  like,  don’t  like  and  prefer  

 

We  have  developed  a  process  for  fast  and  accurate  extrac5on  of    sen5ment    from  community    forum  data  

THE  PROCESS   U N D E R S T A N D   C L I E N T   B U S I N E S S   V O C A B U L A R Y   I D E N T I F Y   S I G N A L S   I N   D A T A   P R E P A R E   T R A I N I N G   D A T A   T R A I N   A N D   R U N   S E N T I M E N T   E X T R A C T O R   Sen6ment   Analysis   Cu st omer  Percep +o n   Im pac t  Ev alua+ on   Cu st om er  Re ac +o ns   Br and   Popular ity   Inde x   Most   popular   brand   Biggest   complain   %     Happy   %     Unsa5sfied   %   Angry   Most  liked   feature   What’s   Hated?   Why   Popular?   What’s   Preferred?   TEXT  ANALYSIS  IN  ACTION  

(12)

12  

©  2012  Opera  Solu5ons,  LLC.  All  rights  reserved.  

Block  &  Tackling:  OSTAP  Providing  Situa5onal  Awareness  Needs    

A  360  Degree  Awareness  View:    

 

Global,  Mul5-­‐Language  Threat  and  Informa5onal  Monitoring  of  the  Blogosphere  

(Social  Media)  and  Iden5fied  Radical  Ac5vity  Sites  –  Threat  monitoring  for  Open  

and  certain  Private  source  nega5ve  sen5ment  in  one  or  more  threat  categories.  

They  are  defined  as  Violent,  Non-­‐Violent,  Proximity  and  Event  Specific.          

     

An  Unblinking  Eye  Focused  On  Threat  Detec5on    -­‐  26  Major  Social  Media  portals,  

(Facebook,  Twifer,  Word  Press,  etc.)  combined  with  Recognized  News  Feeds,  

Message  Boards  and  Forums  for  over  150  million  informa+on  points  con5nuously  

monitored,  searched  and  assessed  365x24x7  across  56  mul+ple  languages  in  

increments  as  liBle  as  every  30  minutes.  

Ø

Sophia  Technology  Drives  the  Linguis+c  Analy+cs  Used  within  the  “OSTAP  plalorm”:  

Ø  Was  built  for  and  in  use  within  the  three  leBer  agency  Intelligence  community    

   

Empowering  Analysts’  Con5nuous  Monitoring  Efficiency  and  Ad-­‐hoc  Searching  

Capabili5es  through  automated  solu+ons  and  portal  based  tools  that  can  be  

incorporated  within  your  worksta5ons.    

 

 

(13)

Machine  Intelligence  OSTAP:    Threat  Scenario  Monitoring  

 

Providing  Ac5onable  Intelligence  -­‐  External  &  Internal  Searching  that  

Iden5fied  Future  Threat  Window  Events  Requiring  An  Internal  Response    

SituaDonal  Awareness  –  Automated  E-­‐mail  Alerts    

 

Violent  Threat  Monitoring  –  Militant  Orgs,  Individual  threats  posted  in  Social  Media    

Non-­‐Violent  Threat  Monitoring  –  Civil  Disobedience  Groups,  GeoPolitcal  Unrest    

Ad-­‐Hoc  Nega6ve  Sen6ment  Monitoring  of  Future  company  Related  Events,  ExecuDves,  

Employees    and  Key  Business  rela6onships  –  (Shareholder  MeeDngs,  Employee  AcDons,  

Media  Covered  Events,)  

Event  Proximity  Disrup6ons  –  General  Public  Events  Within  Proximity  to  company  assets  (G8,  

NATO  Summit,  London  Olympics.  Mayday  CelebraDon,  etc.)    

AutomaDng  Ad-­‐Hoc  Searching    –  Web-­‐based  Portal  Searching      

 

Manual  ConDnuous  Threat  Monitoring  and  searching  for  related  links  from  ExisDng  Vendor  

Supplied  InformaDon  

Conducts  Internet  searching  from  internally  developed  keywords  and  linked  relaDonships  

 

(14)

14  

©  2012  Opera  Solu5ons,  LLC.  All  rights  reserved.  

Prosthe5c:  OSTAP  Threat  Scenario  Monitoring  

 

SituaDonal  Awareness  –  Imminent  Violent  Threat  Scenario  AffecDng  US  Company  Overseas  During  Spanish  EU   Crisis  Protests  Detected  by  OSTAP  before  Event  Occurred  From  Three  Independent  Info  Sources      

 

1.  OSTAP  Violent  Threat  Alert  –  High  Severity  

 

ID  69529184758  

CONTENT  Estoy  bastante  segura  de  que  voy  a  poner  una   Bomba  en  el  centro  de  salud  de  moratalaz  

TRANSLATED_CONTENT  I'm  prefy  sure  I'll  put  a   bomb  in  the  health  center  Moratalaz  

EXTERNAL_ID  223348160377012225  

SOURCE  &lt;a  href=&quot;hBp://blackberry.com   HOST  twiBer.com   URL  223348160377012225   MEDIA_PROVIDER  TWITTER   MEDIA_TYPE_ID  8   LANGUAGE_ID  es   SPAM_RATING  0   PUBLISH_DATE  2012-­‐07-­‐12  15:28:45  -­‐0400   HARVEST_DATE  2012-­‐07-­‐12  15:34:33  -­‐0400   TWITTER_LOCATION  Madrid   TWITTER_FOLLOWER_COUNT  104   TWITTER_FRIEND_COUNT  143    

3.  OSTAP  Violent  Threat  Alert  –  High  Severity  

 

Sucursal  de  Company  X  en  Madrid:  Calle  de  XXXXXX  

,  28006  Madrid.  Una  visi+ta  de  cortesía  es  de  bien  agradecidos.   Translated    

Branch  of  Company  X  XXXXX  in  Madrid:  Calle  de  XXXXXX,       28006  Madrid.  A  complimentary  visi+ta  is  grateful.  

5:17  AM  -­‐  13  Jul  12    

 

 

2.  OSTAP  Violent  Threat  Alert  –  High  Severity  

 

ID  69174679421  

CONTENT  Mientras  que  25.000  personas  apoyan  en  Madrid  la   Marcha  Negra,  Company  X  especula  con  XXXXX  &#243;n  en   algovamal  

TRANSLATED_CONTENT  While  25,000  people  suppor+ng  the   Black  March  in  Madrid,  Company  X  speculated  XXXXX  algovamal  

 

Marchers  on  Way  To  Madrid’s  Financial  District  

     

1.  The  US  Company   accused  in  local   press  of  pu}ng   protes5ng  trade   group  employees   out  of  work   2.  Bomb  Threat  

Tweet  within   Proximity  of  US   Company  Office   3.  US  Company’s  

Foreign  Address   Posted  in  

“Request”  for   someone  to  pay  a   “Complimentary   Visit”  

(15)

Case Study 3: Wealth Management example

The Challenge: Massive data flow overwhelming human capacity

Limited use of

intellectual capital

High variability in

performance

Multiplied by:

15,000

Advisors

3MM

Accounts

1MM

transactions/day i i i

DAILY DATA

FLOW

Book of business data

(transactions, positions, etc.) (Average book:

•  200+ accounts

•  5,000 positions – hundreds

unique) Equity ratings

160k products to select from

Research ideas 250/day; 200,000+/year Peer activity i i i

(16)

300

GB

historical

9

GB

daily

Generate

Master

Data

(2 Hours/

10TB)

Extract

DNA (<30

minutes/

50GB)

Signal

Generation(20

minutes/ 10TBs to

50GBs)

Block & Tackling: Processed every day within 4-hour window

Scoring

(20 minutes/20GB)

Output to User

Interface

10GB dataset

Parallel Processing Nodes

+

DAILY DATA INPUT

120+TB Hadoop Distributed File System (HDFS)

21 machines; 330+ CPU cores; 1TB RAM

(17)

Each  FA’s  performance   is  composed  of  various   types  of  accounts  that   must  be  clustered  into  

“like  peers”  prior  to  

comparison  

400+  Peer  Groups  

Machine  Intelligence  (1):  FA  Peer  Group  Clustering  and  Performance  

Built-­‐in   filters   Life  Stage   Client  Age   Discre+on  Status   Products   Others   5 3 6 ?   x   x   x   4 x  

Changes  Over  Time  

Calculate  performance  across   mul5ple  5me  periods  while   considering  changes  in  

• Objec5ves  

• Discre5on  status,  age,  assets  

• Account  ac5vity,  open/closed  

date,  household,  etc.    

 

Calculate  performance  across   mul5ple  metrics:  

• Net  and  Gross  Return  

• Sharpe  Ra5o  

• Informa5on  Ra5o  

• Treynor  Ra5o  

• Others  

MulDple  Metrics  

Measuring  FA  performance  accurately  requires  complex  data  modeling  and  Big  Data  processing  to  

(18)

Si

gn

al

Pro

ce

ssi

ng

Signals extracted every day

(fast, slow, transient, persistent)

12MM+

Recos

Created

and

Matched

With

Advisors

and

Accounts

Machine Intelligence (2): From master dataset to refined Signal

Sentiment Behavior Anomalies Clusters

S A M P L E A L G O R I T H M S

•  K-means

•  Neural Nets

•  Singular Vector Decomposition

•  Kalman Filters

•  Support Vector Machine

•  Wavelets Positive/neutral/ negative on: • Single-name • Sector • Markets • Geography • Market direction, magnitude • Propensity to transact • Fading • Advisors’ performance • Tail risk detection • Non-suitable investments • Portfolio churn • Abnormal transactions • Trading behavior • Asset allocation • Product • Strategy popularity

(19)

Prosthe5c:  Ra5onale  for  each  recommended  ac5on  for  a  customer  

Machine-­‐Driven  Recommenda+ons  Based  on  Signals  

Driving  Porlolio  Performance  Liu    

References

Related documents

The main wall of the living room has been designated as a &#34;Model Wall&#34; of Delta Gamma girls -- ELLE smiles at us from a Hawaiian Tropic ad and a Miss June USC

The elected officials were interviewed, and their responses were compared to the objectives of the study, the literature reviewed, the case study of the Department of

3.2 Effect of gamma radiation and storage on the Total Soluble Solids (TSS, %) of solar dried and freeze dried tomato powder.. The total soluble solids of the freeze dried and

741 of Gallocanta Lake.. Physical and chemical properties of the nine soils studied. L: lacustrine, D: detritic, SL: semi-lacustrine; ECe: electrical conductivity of

Paper presented at the annual conference of the Dutch Association of Researchers in the field of Industrial and Organizational Psychology (WAOP), Amsterdam, the

Asian retirees need other means of income in order to supplement public pension benefits and to meet their living expenses in retirement.. The need for effective retirement

Fig- 5: Membership function of Packet Delivery Ratio Input After analyses the traditional fuzzy approach, the comparison between the traditional Fuzzy and Proposed ANFIS

(managing the data explosion, adaptive organization and placement of data and computation, tools for data analysis, integrated symbolic computation, data mining