• No results found

Big Data and Predictive Analytics: Fiserv Data Mining Case Study [CON8631] Data Warehouse and Big Data

N/A
N/A
Protected

Academic year: 2021

Share "Big Data and Predictive Analytics: Fiserv Data Mining Case Study [CON8631] Data Warehouse and Big Data"

Copied!
58
0
0

Loading.... (view fulltext now)

Full text

(1)

Big Data and Predictive Analytics:

Fiserv Data Mining Case Study

[CON8631]

Data Warehouse and Big Data

Miguel Barrera - Director, Risk Analytics, Fiserv, Inc. Julia Minkowski - Risk Manager, Fiserv, Inc.

Charlie Berger, MS Eng, MBA, Sr. Director

Product Management, Data Mining and Advanced Analytics

(2)

Agenda

Big Data and Predictive Analytics:

Fiserv Data Mining Case Study

[CON8631]

1. Oracle Advanced Analytics Overview

• Charlie Berger, Sr. Director Product Management,

Data Mining and Advanced Analytics, Oracle Corporation

2. Fiserv Data Mining Case Study

• Miguel Barrera - Director, Risk Analytics, Fiserv, Inc.

• Julia Minkowski - Risk Manager, Fiserv, Inc.

(3)

Planning for Future

Growth of Data Exponentially Greater than Growth of Data Analysts!

Conclusion

– Data Analysis platforms need to be

• Extremely Easy to Learn, yet..

• Extremely Powerful and

(4)

Oracle Advanced Analytics Database Evolution

Analytical SQL in the Database

1998 1999 2002 2004 2005 2008 2011 2014

• 7 Data Mining

“Partners”

• Oracle acquires

Thinking Machine Corp’s dev. team + “Darwin” data mining software

• Oracle Data Mining

10g & 10gR2

introduces SQL dm functions, 7 new SQL dm algorithms and new Oracle Data Miner “Classic” wizards driven GUI

• New algorithms (EM,

PCA, SVD)

• Predictive Queries • SQLDEV/Oracle Data

Miner 4.1 SQL script generation, JSON Query node & SQL Query node (R integration) • OAA/ORE 1.3 + 1.4

adds NN, Stepwise, scalable R algorithms

• Oracle Adv. Analytics

for Hadoop Connector launched with parallel implementations of R algorithms (NN, LM, NMF) • Oracle Data Mining

9.2i launched – 2 algorithms (NB and AR) via Java API

• ODM 11g & 11gR2 adds

AutoDataPrep (ADP), text mining, perf. improvements

• SQLDEV/Oracle Data Miner

3.2 “work flow” GUI launched

• Integration with “R” and

introduction/addition of Oracle R Enterprise

• Product renamed “Oracle

Advanced Analytics (ODM + ORE)

(5)

Oracle Advanced Analytics Database Option

• Development has spend 15 years “stem-celling analytics” and

workhorse machine learning algorithms into Oracle Database

–1999: Thinking Machines acquisition of “Darwin” data mining software

–Darwin killed—Instead developed new within the SQL kernel, in-database

implementations of popular & cutting edge data mining algorithms that leverage the strengths of the database

• counting, conditional probabilities, sort, rank, partition, group-by, collections, etc.

Today, in 12c, the Database has become an “Analytical Database”

– Nearly 20 cutting edge machine learning algorithms and 50+ statistical functions implemented as true SQL functions

• When building models, leverage existing scalable technology (e.g., parallel execution, bitmap indexes, aggregation techniques) and add new core database technology (e.g., recursion within the parallel infrastructure, IEEE float, etc.)

• A data mining model is a schema object in the database, built via a PL/SQL API and scored via built-in SQL functions.

– True power is evident when scoring models using built-in SQL functions e.g. Exadata “smart scan” scoring

(6)

Data remains in the Database

 Scalable, parallel Data Mining algorithms in SQL kernel

 Fast parallelized native SQL data mining functions, SQL data preparation and efficient execution of R open-source packages

 High-performance parallel scoring of SQL data mining functions and R open-source models

Fastest way to deliver enterprise-wide predictive analytics

 Integrated GUI for Predictive Analytics

 Database scoring engine

Lowest TCO

 Eliminate data duplication

 Eliminate separate analytical servers

 Leverage investment in Oracle IT

Oracle Advanced Analytics

Performance and Scalability with Low Total Cost of Ownership

avings

Model “Scoring” Embedded Data Prep

Data Preparation Model Building

Oracle Advanced Analytics

Secs, Mins or Hours Traditional Analytics

Hours, Days or Weeks

Data Extraction Data Prep & Transformation

Data Mining Model Building

Data Mining Model “Scoring”

Data Prep. & Transformation

(7)

OBIEE

Oracle Database Enterprise Edition

Oracle Advanced Analytics Database Architecture

Component of Oracle Database—SQL Functions

Oracle Advanced Analytics

Native SQL Data Mining/Analytic Functions + High-performance R Integration for Scalable, Distributed, Parallel Execution

(8)

 In-database data mining algorithms and open source R algorithms

 SQL, PL/SQL, R languages

 Scalable, parallel in-database execution

 Workflow GUI and IDEs

 Integrated component of Database

 Enables enterprise analytical applications

Key Features

Oracle Advanced Analytics Database Option

(9)

Be Specific in Problem Statement

Poorly Defined Better Data Mining Technique

Predict employees that leave • Based on past employees that voluntarily left:

• Create New Attribute E m p l T u r n o v e r  O/1

Predict customers that churn • Based on past customers that have churned:

• Create New Attribute C h u r n  YES/NO

Target “best” customers • Recency, Frequency Monetary (RFM) Analysis

• Specific Dollar Amount over Time Window:

• Who has spent $500+ in most recent 18 months

How can I make more $$? • What helps me sell soft drinks & coffee? Which customers are likely to buy? • How much is each customer likely to spend? Who are my “best customers”? • What descriptive “rules” describe “best

customers”?

How can I combat fraud? • Which transactions are the most anomalous?

(10)

More Data Variety—Better Predictive Models

Increasing sources of

relevant data can boost

model accuracy

Naïve Guess or Random 100% 0% Population Size R espon ders

Model with 20 variables

Model with 75 variables

Model with 250 variables

Model with “Big Data” and hundreds -- thousands of input variables including:

• Demographic data

• Purchase POS transactional

data

• “Unstructured data”, text &

comments

• Spatial location data

• Long term vs. recent historical

behavior

• Web visits • Sensor data • etc.

(11)

Predicting Behavior

Identify “Likely Behavior” and their Profiles

Consider:

• Demographics • Past purchases • Recent purchases

(12)

Oracle Big Data Management System

SOUR CE S Oracle Database Oracle Industry Models Oracle Advanced Analytics Oracle Spatial &

Graph

Big Data Appliance

Cloudera Hadoop Oracle NoSQL Database

Oracle R Advanced Analytics for Hadoop Oracle R Distribution Oracle Database Oracle Advanced Security Oracle Advanced Analytics

Oracle Spatial & Graph

Oracle Exadata

Oracle Big Data Connectors Oracle Data

Integrator

B

Oracle Big Data SQL

select cust_id from customers

where region = ‘US’

(13)

• Strengths

– Powerful & Extensible

– Graphical & Extensive statistics

– Free—open source

• Challenges

– Memory constrained

– Single threaded

– Outer loop—slows down process

– Not industrial strength

R environment

R—Widely Popular

(14)

Oracle Advanced Analytics

• R-SQL Transparency Framework intercepts R functions for scalable in-database execution

• Function intercept for data transforms, statistical functions and advanced analytics

• Interactive display of graphical results and flow control as in standard R

• Submit entire R scripts for execution by database

• Scale to large datasets

• Access tables, views, and external tables, as well as data through DB LINKS

• Leverage database SQL parallelism

• Leverage new and existing in-database statistical and data mining capabilities

R Engine Other R

packages

Oracle R Enterprise packages

User R Engine on desktop

• Database can spawn multiple R engines for database-managed parallelism

• Efficient data transfer to spawned R engines

• Emulate map-reduce style algorithms and applications

• Enables “lights-out” execution of R scripts

1

User tables

Oracle Database

SQL

Results

Database Compute Engine

2

R Engine Other R

packages

Oracle R Enterprise packages

R Engine(s) spawned by Oracle DB

R Results

3

?x

R

Open Source

(15)

Integrated Business Intelligence

Enhance Dashboards with Predictions and Data Mining Insights

• In-database

predictive models “mine” customer

data and predict their behavior

• OBIEE’s integrated spatial mapping shows location

• All OAA results and predictions available in Database via OBIEE Admin to enhance dashboards

Oracle Advanced Analytics data mining results available to Oracle BI EE

Oracle BI EE defines results for end user presentation

(16)

• Fastest Way to Deliver Scalable

Enterprise-wide Predictive Analytics • OAA’s clustering and predictions

available in-DB for OBIEE

• Automatic Customer Segmentation, Churn Predictions, and Sentiment Analysis

Pre-Built Predictive Models

Oracle Communications Industry Data Model

(17)

• Oracle Advanced Analytics factory-installed predictive analytics

• Employees likely to leave and predicted performance

• Top reasons, expected behavior • Real-time "What if?" analysis

Fusion Human Capital Management Powered by OAA

Fusion HCM Predictive Workforce

(18)

Oracle Advanced Analytics Database Option

Oracle Data Miner/SQLDEV 4.1

(for Oracle Database 11g and 12c)

New Graph node (box, scatter, bar, histograms)

SQL Query node + integration of R scripts

Automatic SQL script generation for deployment

JSON Query node (added in 4.1)

Oracle Advanced Analytics 12c features exposed in Oracle Data Miner

– New SQL data mining algorithms/enhancements

• Expectation Maximization clustering algorithm

• PCA & Singular Vector Decomposition algorithms

• Improved/automated Text Mining, Prediction Details and other algorithm improvements)

– Predictive SQL Queries—automatic build, apply within SQL query

(19)

12c New Features

Predictive Queries

Immediate build/apply of ODM

models in SQL query

• Classification & regression

– Multi-target (nested) problems

• Clustering query • Anomaly query

• Feature extraction query

New Server Functionality

OAA automatically creates multiple anomaly detection models “Grouped_By” and “scores” by partition via powerful SQL query

R

(20)

OAA/ORACLE DATA MINER

QUICK 2 MINUTE DEMO

(21)

Take a Test Drive!

Vlamis Software, Oracle Partner Offers FREE Test Drives on the Amazon Cloud

Step 1—Fill out request

– Go to http://www.vlamis.com/testdrive-registration/

Step 2—Connect

– Connect with Remote Desktop

Step 3—Start Test Drive!

– Oracle Database +

– Oracle Advanced Analytics Option

– SQL Developer/Oracle Data Miner GUI

– Demo data for learning

(22)

New book on

Oracle Advanced

Analytics available

Book available on Amazon

Predictive Analytics Using Oracle Data Miner: Develop for ODM in SQL &

(23)

OAA Links and Resources

Oracle Advanced Analytics Overview:

– Link to presentation—Big Data Analytics using Oracle Advanced Analytics In-Database Option

– OAA data sheet on OTN

Oracle Internal OAA Product Management Wiki and Workspace

YouTube recorded OAA Presentations and Demos:

– Oracle Advanced Analytics and Data Mining at the YouTube Movies (6 + OAA “live” Demos on ODM’r 4.0 New Features, Retail, Fraud, Loyalty, Overview, etc.)

Getting Started:

– Link to Getting Started w/ ODM blog entry

– Link to New OAA/Oracle Data Mining 2-Day Instructor Led Oracle University course.

– Link to OAA/Oracle Data Mining 4.0 Oracle by Examples (free) Tutorials on OTN

– Take a Free Test Drive of Oracle Advanced Analytics (Oracle Data Miner GUI) on the Amazon Cloud

– Link to SQL Developer Days Virtual Event w/ downloadable VM of Oracle Database + ODM/ODMr and e-training for Hands on Labs

– Link to OAA/Oracle R Enterprise (free) Tutorial Series on OTN

Additional Resources:

– Oracle Advanced Analytics Option on OTN page

– OAA/Oracle Data Mining on OTN page, ODM Documentation & ODM Blog

(24)

BIWA Summit

January 27-29, 2015

Oracle HQ Conference Center

(25)

Faster than a Mouse:

Turn Data Mining Strategy into Action

Miguel Barrera, Director of Risk Analytics, Fiserv Inc. Julia Minkowski, Risk Manager, Fiserv Inc.

(26)

Use Case : Fraud Prevention in Online Payments

Best Practices : Turning Data Mining Strategy into Action

Agenda

(27)

Use Case : Fraud Prevention in

Online Payments

(28)

Risk Analytics @ Fiserv Electronic Payments

• We prevent $200M in losses every year using data to monitor,

understand and anticipate fraud

We manage risk for $24BB in transfers, servicing 2,000+ US financial institutions, including the 5 of the top 10 banks

A department of 5 people, we operated in start-up mode until we

were acquired in 2011 by Fiserv

We build our risk models, supervise their installation & develop the next-generation of strategies for risk mitigation

(29)

What is special about Fraud Prevention?

Fraud is performed by organized criminal groups

using sophisticated technologies and logistics

Hard to detect: target has low frequency (2 in 10,000)

The cost of mistakes is very high

 $ Losses if you fail to detect fraud

 60% increase in customer attrition if you miss-classify

The environment changes fast, so you need to adapt

Fraud

prevention is a

great field for

the application

of predictive

(30)

Analytics for Fraud Prevention

Explore & Understand Anticipate & Control Monitor

(31)

Risk Management: Goals and Constraints

Constraints:

• Build a flexible system that adapts to new fraud patterns

• Service the existing client base

• Minimize the time that the production systems will be off-line or reset

• Build the next-generation of strategies with very limited resources Goals:

• Help the business to expand to more profitable markets (on-line booking, real time payments), while keeping loss rates constant, and customers happy

(32)

Data Miner Survey 2013 by Rexer Analytics

While 6 out 10 data miners report the data is available for analysis within days of capture, the time to deploy the models takes substantially longer. For 60% of the respondents the deployment time will range between 3 weeks and 1year.

(33)

The problem we faced…

We had the Algorithms (SAS + Angoss)

• Decision Trees + Gradient Boosting

• GLM and Logistic Regression

• SVM

• Bayesian Estimates

But implementation took too long…

• 3 months turn-around to estimate + deploy Logistic Regression (2008)

• 1 month to estimate and deploy Trees and GLM (2010)

(34)

In Fraud-Mitigation Speed is the Key

(35)

Why we liked

Oracle Advance Analytics

?

Accuracy

Agility

Scalability

• The algorithms fit are as good as more complex algorithms

• The loss reduction from timely deployment (hours) compensates for model fit

• No data transfer needed (in-database)

• New opportunities to combine structured data with unstructured data

• The integration with our DB replication makes re-fit inexpensive • The same algorithm can scale-up for all other clients

(36)

Oracle Advanced Analytics Time Value

(37)

Accuracy + Agility vs. Cost to Deploy

Pick the best combination of:

• Less days to deployment

• High model accuracy

• Lower Cost

Application Deploy (Days) Accuracy Total Cost

SAS Server 3 0.92 x5

ODM 1 0.90 1

SAS Ba s e 15 0.83 30%

(38)

ODM – Oracle Data Miner GUI

Built-In in Oracle SQL

Developer Tool

• Downloadable free on OTN version 4.0 or latest

Easy to use

• GUI; explore data; work-flows

Powerful

• multiple algorithms and data

transformations; 100% in-DB; build, evaluate and apply data mining models

Deployable

• Shared analytical workflows; Generates PMML and SQL scripts for automation

(39)

ODM – Oracle Data Miner GUI

Oracle Data Miner Nodes

(40)

Oracle Data Miner Algorithms

Identify most important risk factors (Attribute

Importance)

Predict Fraudsters’ behavior (Classification)

Find profiles of bad transfers (Decision Trees)

Predict Fraud Risk Probability (Regression)

Segment overall population (Clustering)

Find fraudulent transactions (Anomaly detection)

Determine co-occuring items in baskets

(Associations)

Reduce a large dataset into representative new attributes (Feature Extraction)

(41)

Turning Data Mining

Strategy into Action

(42)

Select Best Option(s)

Success Factors and Constraints

Best Practices in Analytics

• ROI /Cost • Profitability • Operations 1. Identify Benefits & Constraints

Install into Production

• Run A/B testing

• Start Small and Increase Gradually

Data Scientist

3.Turn Strategy into Action

IT

Manager

Select the Appropriate Infrastructure • DB Architecture

• Modeling techniques

2. Develop the Strategy

Provide Actionable Insights

Estimate Impact for the Business Track Benefits and KPI

• Test Predictive Models

• Simulate scenarios (Monte Carlo) Score models on KPI

Collect & Process Data

• Run Descriptive Analytics • Identify patterns

Business Manager

• Align your Team’s Incentives Involve Key Stakeholders

(43)

Involve the Right Stakeholders

Business Manager

Data Scientist

IT Manager

• Preserve Service Level Agreement

• Reduce Operational Risk • Preserve Budget

(44)

Conflict of Interests?

Cannot agree on success factors?

Wonder why…?

(45)

IT Manager’s Strategy

Preserve Service Level Agreements

Stable systems

• Ease of roll-back

Minimize Operational risk

(46)

Data Scientist’s Mind

Estimate the Best Model Possible

• Improve Detection Rates

• Better Algorithms, Faster Hardware

• Big(ger) Data!

Explore New Algorithms

Put some power behind it !!

(47)

Business Manager’s Mind

Maximize Productivity: Build for specific

needs

• What is the cost?

• Why does it take so long?

• What is the impact on customer experience? And: Don’t talk to me in Tech-Speak !

“First we ran a chi- square test, and then we converted the

categorical data to ordinal, next we ran a logistic regression, and then we lagged the economic data by a year…”

(48)
(49)

Managing the Quants

• Define clearly the objective and constraints

• Implement SMART* goal setting

Get familiar with basic analytics concepts

• Establish a time-line for delivery  then multiply x 2

• Make sure you understand enough to explain to other executives…

(50)

Taking Care of Business

(tips for Data Scientists)

Communicate clearly business level information

• When and what is the expected result

• Present the key concept in 2 phrases

• Avoid technical language for communication

If asked for more details, then present the “How”

Provide a Business Dashboard

• Provide the $$ metrics profit/loss reduction

• Show the impact of algorithms deployed / provided

• Current vs. Historical

Pick the right model - the model that maximizes the ROI

(51)

From Paper to Execution…

(52)

© 2014 Fiserv, Inc. or its affiliates.

Success Factors in Fraud Mitigation

Accuracy:

• Low False Negative Rate (How much fraud $ you miss)

• Low False Positive Rates (How many people you bother with additional identity verification)

Agility

• Minimize reaction-time to fraud attacks

Make your updates easy to implement

Scalability:

• Automate processes to keep variable cost down

(53)

Selecting the Right Tools

Easy to Use and Deploy

Can combine structured data with unstructured data - new trend

Tools that integrate in the DB

• Allow for Fast Model Fitting and Re-estimation

• Minimize data transport across systems 

In-House Algorithms

(54)

Tracking Performance: Dashboard

Our dashboards tracked the key performance metrics:

Historical Trends for Fraud Rates and Losses (Business KPI)

Percentage of Transfers affected by Risk Mitigation (Business

KPI)

% of population affected by policy and % of fraud prevented (KPI for Analytics)

(55)

Key Takeaways

On Fraud Modeling

• When dealing with fraud, the speed to implement a new model is the most important

factor

• Improvements in accuracy may be lost due to delays in deployment; systems with fast

turnaround have better ROI than complex algorithms with long implementation times. Select the right tools that will enable fast analysis and deployment.

Turning Strategy into Action

• Involving the key stakeholders early in the process maximizes your chance for

success. Once you have aligned the incentives for the team, selecting the appropriate techniques, tools and infrastructure becomes much simpler

• It is crucial for business managers to correctly define the problems and objectives, asking the right questions and learning the basic analytical concepts

(56)

If you have further

questions or comments,

please contact:

Julia Minkowski

Lead Risk Analyst, Fiserv Inc [email protected] 408-838-3827

(57)
(58)

References

Related documents