Big Data and Predictive Analytics:
Fiserv Data Mining Case Study
[CON8631]
Data Warehouse and Big Data
Miguel Barrera - Director, Risk Analytics, Fiserv, Inc. Julia Minkowski - Risk Manager, Fiserv, Inc.
Charlie Berger, MS Eng, MBA, Sr. Director
Product Management, Data Mining and Advanced Analytics
Agenda
Big Data and Predictive Analytics:
Fiserv Data Mining Case Study
[CON8631]1. Oracle Advanced Analytics Overview
• Charlie Berger, Sr. Director Product Management,
Data Mining and Advanced Analytics, Oracle Corporation
2. Fiserv Data Mining Case Study
• Miguel Barrera - Director, Risk Analytics, Fiserv, Inc.
• Julia Minkowski - Risk Manager, Fiserv, Inc.
Planning for Future
Growth of Data Exponentially Greater than Growth of Data Analysts!
•
Conclusion
– Data Analysis platforms need to be
• Extremely Easy to Learn, yet..
• Extremely Powerful and
Oracle Advanced Analytics Database Evolution
Analytical SQL in the Database
1998 1999 2002 2004 2005 2008 2011 2014
• 7 Data Mining
“Partners”
• Oracle acquires
Thinking Machine Corp’s dev. team + “Darwin” data mining software
• Oracle Data Mining
10g & 10gR2
introduces SQL dm functions, 7 new SQL dm algorithms and new Oracle Data Miner “Classic” wizards driven GUI
• New algorithms (EM,
PCA, SVD)
• Predictive Queries • SQLDEV/Oracle Data
Miner 4.1 SQL script generation, JSON Query node & SQL Query node (R integration) • OAA/ORE 1.3 + 1.4
adds NN, Stepwise, scalable R algorithms
• Oracle Adv. Analytics
for Hadoop Connector launched with parallel implementations of R algorithms (NN, LM, NMF) • Oracle Data Mining
9.2i launched – 2 algorithms (NB and AR) via Java API
• ODM 11g & 11gR2 adds
AutoDataPrep (ADP), text mining, perf. improvements
• SQLDEV/Oracle Data Miner
3.2 “work flow” GUI launched
• Integration with “R” and
introduction/addition of Oracle R Enterprise
• Product renamed “Oracle
Advanced Analytics (ODM + ORE)
Oracle Advanced Analytics Database Option
• Development has spend 15 years “stem-celling analytics” and
workhorse machine learning algorithms into Oracle Database
–1999: Thinking Machines acquisition of “Darwin” data mining software
–Darwin killed—Instead developed new within the SQL kernel, in-database
implementations of popular & cutting edge data mining algorithms that leverage the strengths of the database
• counting, conditional probabilities, sort, rank, partition, group-by, collections, etc.
• Today, in 12c, the Database has become an “Analytical Database”
– Nearly 20 cutting edge machine learning algorithms and 50+ statistical functions implemented as true SQL functions
• When building models, leverage existing scalable technology (e.g., parallel execution, bitmap indexes, aggregation techniques) and add new core database technology (e.g., recursion within the parallel infrastructure, IEEE float, etc.)
• A data mining model is a schema object in the database, built via a PL/SQL API and scored via built-in SQL functions.
– True power is evident when scoring models using built-in SQL functions e.g. Exadata “smart scan” scoring
Data remains in the Database
Scalable, parallel Data Mining algorithms in SQL kernel
Fast parallelized native SQL data mining functions, SQL data preparation and efficient execution of R open-source packages
High-performance parallel scoring of SQL data mining functions and R open-source models
Fastest way to deliver enterprise-wide predictive analytics
Integrated GUI for Predictive Analytics
Database scoring engine
Lowest TCO
Eliminate data duplication
Eliminate separate analytical servers
Leverage investment in Oracle IT
Oracle Advanced Analytics
Performance and Scalability with Low Total Cost of Ownership
avings
Model “Scoring” Embedded Data Prep
Data Preparation Model Building
Oracle Advanced Analytics
Secs, Mins or Hours Traditional Analytics
Hours, Days or Weeks
Data Extraction Data Prep & Transformation
Data Mining Model Building
Data Mining Model “Scoring”
Data Prep. & Transformation
OBIEE
Oracle Database Enterprise Edition
Oracle Advanced Analytics Database Architecture
Component of Oracle Database—SQL Functions
Oracle Advanced Analytics
Native SQL Data Mining/Analytic Functions + High-performance R Integration for Scalable, Distributed, Parallel Execution
In-database data mining algorithms and open source R algorithms
SQL, PL/SQL, R languages
Scalable, parallel in-database execution
Workflow GUI and IDEs
Integrated component of Database
Enables enterprise analytical applications
Key Features
Oracle Advanced Analytics Database Option
Be Specific in Problem Statement
Poorly Defined Better Data Mining Technique
Predict employees that leave • Based on past employees that voluntarily left:
• Create New Attribute E m p l T u r n o v e r O/1
Predict customers that churn • Based on past customers that have churned:
• Create New Attribute C h u r n YES/NO
Target “best” customers • Recency, Frequency Monetary (RFM) Analysis
• Specific Dollar Amount over Time Window:
• Who has spent $500+ in most recent 18 months
How can I make more $$? • What helps me sell soft drinks & coffee? Which customers are likely to buy? • How much is each customer likely to spend? Who are my “best customers”? • What descriptive “rules” describe “best
customers”?
How can I combat fraud? • Which transactions are the most anomalous?
More Data Variety—Better Predictive Models
•
Increasing sources of
relevant data can boost
model accuracy
Naïve Guess or Random 100% 0% Population Size R espon dersModel with 20 variables
Model with 75 variables
Model with 250 variables
Model with “Big Data” and hundreds -- thousands of input variables including:
• Demographic data
• Purchase POS transactional
data
• “Unstructured data”, text &
comments
• Spatial location data
• Long term vs. recent historical
behavior
• Web visits • Sensor data • etc.
Predicting Behavior
Identify “Likely Behavior” and their Profiles
Consider:
• Demographics • Past purchases • Recent purchases
Oracle Big Data Management System
SOUR CE S Oracle Database Oracle Industry Models Oracle Advanced Analytics Oracle Spatial &Graph
Big Data Appliance
Cloudera Hadoop Oracle NoSQL Database
Oracle R Advanced Analytics for Hadoop Oracle R Distribution Oracle Database Oracle Advanced Security Oracle Advanced Analytics
Oracle Spatial & Graph
Oracle Exadata
Oracle Big Data Connectors Oracle Data
Integrator
B
Oracle Big Data SQL
select cust_id from customers
where region = ‘US’
• Strengths
– Powerful & Extensible
– Graphical & Extensive statistics
– Free—open source
• Challenges
– Memory constrained
– Single threaded
– Outer loop—slows down process
– Not industrial strength
R environment
R—Widely Popular
Oracle Advanced Analytics
• R-SQL Transparency Framework intercepts R functions for scalable in-database execution
• Function intercept for data transforms, statistical functions and advanced analytics
• Interactive display of graphical results and flow control as in standard R
• Submit entire R scripts for execution by database
• Scale to large datasets
• Access tables, views, and external tables, as well as data through DB LINKS
• Leverage database SQL parallelism
• Leverage new and existing in-database statistical and data mining capabilities
R Engine Other R
packages
Oracle R Enterprise packages
User R Engine on desktop
• Database can spawn multiple R engines for database-managed parallelism
• Efficient data transfer to spawned R engines
• Emulate map-reduce style algorithms and applications
• Enables “lights-out” execution of R scripts
1
User tables
Oracle Database
SQL
Results
Database Compute Engine
2
R Engine Other R
packages
Oracle R Enterprise packages
R Engine(s) spawned by Oracle DB
R Results
3
?xR
Open SourceIntegrated Business Intelligence
Enhance Dashboards with Predictions and Data Mining Insights
• In-database
predictive models “mine” customer
data and predict their behavior
• OBIEE’s integrated spatial mapping shows location
• All OAA results and predictions available in Database via OBIEE Admin to enhance dashboards
Oracle Advanced Analytics data mining results available to Oracle BI EE
Oracle BI EE defines results for end user presentation
• Fastest Way to Deliver Scalable
Enterprise-wide Predictive Analytics • OAA’s clustering and predictions
available in-DB for OBIEE
• Automatic Customer Segmentation, Churn Predictions, and Sentiment Analysis
Pre-Built Predictive Models
Oracle Communications Industry Data Model
• Oracle Advanced Analytics factory-installed predictive analytics
• Employees likely to leave and predicted performance
• Top reasons, expected behavior • Real-time "What if?" analysis
Fusion Human Capital Management Powered by OAA
Fusion HCM Predictive Workforce
Oracle Advanced Analytics Database Option
•
Oracle Data Miner/SQLDEV 4.1
(for Oracle Database 11g and 12c)– New Graph node (box, scatter, bar, histograms)
– SQL Query node + integration of R scripts
– Automatic SQL script generation for deployment
– JSON Query node (added in 4.1)
•
Oracle Advanced Analytics 12c features exposed in Oracle Data Miner
– New SQL data mining algorithms/enhancements
• Expectation Maximization clustering algorithm
• PCA & Singular Vector Decomposition algorithms
• Improved/automated Text Mining, Prediction Details and other algorithm improvements)
– Predictive SQL Queries—automatic build, apply within SQL query
12c New Features
•
Predictive Queries
– Immediate build/apply of ODM
models in SQL query
• Classification & regression
– Multi-target (nested) problems
• Clustering query • Anomaly query
• Feature extraction query
New Server Functionality
OAA automatically creates multiple anomaly detection models “Grouped_By” and “scores” by partition via powerful SQL query
R
OAA/ORACLE DATA MINER
QUICK 2 MINUTE DEMO
Take a Test Drive!
Vlamis Software, Oracle Partner Offers FREE Test Drives on the Amazon Cloud
•
Step 1—Fill out request
– Go to http://www.vlamis.com/testdrive-registration/
•
Step 2—Connect
– Connect with Remote Desktop
•
Step 3—Start Test Drive!
– Oracle Database +
– Oracle Advanced Analytics Option
– SQL Developer/Oracle Data Miner GUI
– Demo data for learning
New book on
Oracle Advanced
Analytics available
Book available on Amazon
Predictive Analytics Using Oracle Data Miner: Develop for ODM in SQL &
OAA Links and Resources
• Oracle Advanced Analytics Overview:
– Link to presentation—Big Data Analytics using Oracle Advanced Analytics In-Database Option
– OAA data sheet on OTN
– Oracle Internal OAA Product Management Wiki and Workspace
• YouTube recorded OAA Presentations and Demos:
– Oracle Advanced Analytics and Data Mining at the YouTube Movies (6 + OAA “live” Demos on ODM’r 4.0 New Features, Retail, Fraud, Loyalty, Overview, etc.)
• Getting Started:
– Link to Getting Started w/ ODM blog entry
– Link to New OAA/Oracle Data Mining 2-Day Instructor Led Oracle University course.
– Link to OAA/Oracle Data Mining 4.0 Oracle by Examples (free) Tutorials on OTN
– Take a Free Test Drive of Oracle Advanced Analytics (Oracle Data Miner GUI) on the Amazon Cloud
– Link to SQL Developer Days Virtual Event w/ downloadable VM of Oracle Database + ODM/ODMr and e-training for Hands on Labs
– Link to OAA/Oracle R Enterprise (free) Tutorial Series on OTN
• Additional Resources:
– Oracle Advanced Analytics Option on OTN page
– OAA/Oracle Data Mining on OTN page, ODM Documentation & ODM Blog
BIWA Summit
January 27-29, 2015
Oracle HQ Conference Center
Faster than a Mouse:
Turn Data Mining Strategy into Action
Miguel Barrera, Director of Risk Analytics, Fiserv Inc. Julia Minkowski, Risk Manager, Fiserv Inc.
Use Case : Fraud Prevention in Online Payments
Best Practices : Turning Data Mining Strategy into Action
Agenda
Use Case : Fraud Prevention in
Online Payments
Risk Analytics @ Fiserv Electronic Payments
• We prevent $200M in losses every year using data to monitor,
understand and anticipate fraud
• We manage risk for $24BB in transfers, servicing 2,000+ US financial institutions, including the 5 of the top 10 banks
• A department of 5 people, we operated in start-up mode until we
were acquired in 2011 by Fiserv
• We build our risk models, supervise their installation & develop the next-generation of strategies for risk mitigation
What is special about Fraud Prevention?
• Fraud is performed by organized criminal groups
using sophisticated technologies and logistics
• Hard to detect: target has low frequency (2 in 10,000)
• The cost of mistakes is very high
$ Losses if you fail to detect fraud
60% increase in customer attrition if you miss-classify
• The environment changes fast, so you need to adapt
Fraud
prevention is a
great field for
the application
of predictive
Analytics for Fraud Prevention
Explore & Understand Anticipate & Control MonitorRisk Management: Goals and Constraints
Constraints:
• Build a flexible system that adapts to new fraud patterns
• Service the existing client base
• Minimize the time that the production systems will be off-line or reset
• Build the next-generation of strategies with very limited resources Goals:
• Help the business to expand to more profitable markets (on-line booking, real time payments), while keeping loss rates constant, and customers happy
Data Miner Survey 2013 by Rexer Analytics
While 6 out 10 data miners report the data is available for analysis within days of capture, the time to deploy the models takes substantially longer. For 60% of the respondents the deployment time will range between 3 weeks and 1year.
The problem we faced…
•
We had the Algorithms (SAS + Angoss)
• Decision Trees + Gradient Boosting
• GLM and Logistic Regression
• SVM
• Bayesian Estimates
•
But implementation took too long…
• 3 months turn-around to estimate + deploy Logistic Regression (2008)
• 1 month to estimate and deploy Trees and GLM (2010)
In Fraud-Mitigation Speed is the Key
Why we liked
Oracle Advance Analytics
?
AccuracyAgility
Scalability
• The algorithms fit are as good as more complex algorithms
• The loss reduction from timely deployment (hours) compensates for model fit
• No data transfer needed (in-database)
• New opportunities to combine structured data with unstructured data
• The integration with our DB replication makes re-fit inexpensive • The same algorithm can scale-up for all other clients
Oracle Advanced Analytics Time Value
Accuracy + Agility vs. Cost to Deploy
• Pick the best combination of:
• Less days to deployment
• High model accuracy
• Lower Cost
Application Deploy (Days) Accuracy Total Cost
SAS Server 3 0.92 x5
ODM 1 0.90 1
SAS Ba s e 15 0.83 30%
ODM – Oracle Data Miner GUI
• Built-In in Oracle SQL
Developer Tool
• Downloadable free on OTN version 4.0 or latest
• Easy to use
• GUI; explore data; work-flows
• Powerful
• multiple algorithms and data
transformations; 100% in-DB; build, evaluate and apply data mining models
• Deployable
• Shared analytical workflows; Generates PMML and SQL scripts for automation
ODM – Oracle Data Miner GUI
Oracle Data Miner Nodes
Oracle Data Miner Algorithms
• Identify most important risk factors (Attribute
Importance)
• Predict Fraudsters’ behavior (Classification)
• Find profiles of bad transfers (Decision Trees)
• Predict Fraud Risk Probability (Regression)
• Segment overall population (Clustering)
• Find fraudulent transactions (Anomaly detection)
• Determine co-occuring items in baskets
(Associations)
• Reduce a large dataset into representative new attributes (Feature Extraction)
Turning Data Mining
Strategy into Action
Select Best Option(s)
Success Factors and Constraints
Best Practices in Analytics
• ROI /Cost • Profitability • Operations 1. Identify Benefits & Constraints
Install into Production
• Run A/B testing
• Start Small and Increase Gradually
Data Scientist
3.Turn Strategy into Action
IT
Manager
Select the Appropriate Infrastructure • DB Architecture
• Modeling techniques
2. Develop the Strategy
Provide Actionable Insights
Estimate Impact for the Business Track Benefits and KPI
• Test Predictive Models
• Simulate scenarios (Monte Carlo) Score models on KPI
Collect & Process Data
• Run Descriptive Analytics • Identify patterns
Business Manager
• Align your Team’s Incentives Involve Key Stakeholders
Involve the Right Stakeholders
Business Manager
Data Scientist
IT Manager
• Preserve Service Level Agreement
• Reduce Operational Risk • Preserve Budget
Conflict of Interests?
Cannot agree on success factors?
Wonder why…?
IT Manager’s Strategy
•
Preserve Service Level Agreements
•
Stable systems
• Ease of roll-back
•
Minimize Operational risk
Data Scientist’s Mind
• Estimate the Best Model Possible
• Improve Detection Rates
• Better Algorithms, Faster Hardware
• Big(ger) Data!
• Explore New Algorithms
• Put some power behind it !!
Business Manager’s Mind
Maximize Productivity: Build for specific
needs
• What is the cost?
• Why does it take so long?
• What is the impact on customer experience? And: Don’t talk to me in Tech-Speak !
“First we ran a chi- square test, and then we converted the
categorical data to ordinal, next we ran a logistic regression, and then we lagged the economic data by a year…”
Managing the Quants
• Define clearly the objective and constraints
• Implement SMART* goal setting
• Get familiar with basic analytics concepts
• Establish a time-line for delivery then multiply x 2
• Make sure you understand enough to explain to other executives…
Taking Care of Business
(tips for Data Scientists)
• Communicate clearly business level information
• When and what is the expected result
• Present the key concept in 2 phrases
• Avoid technical language for communication
• If asked for more details, then present the “How”
• Provide a Business Dashboard
• Provide the $$ metrics profit/loss reduction
• Show the impact of algorithms deployed / provided
• Current vs. Historical
• Pick the right model - the model that maximizes the ROI
From Paper to Execution…
© 2014 Fiserv, Inc. or its affiliates.
Success Factors in Fraud Mitigation
• Accuracy:
• Low False Negative Rate (How much fraud $ you miss)
• Low False Positive Rates (How many people you bother with additional identity verification)
• Agility
• Minimize reaction-time to fraud attacks
• Make your updates easy to implement
• Scalability:
• Automate processes to keep variable cost down
Selecting the Right Tools
Easy to Use and Deploy
Can combine structured data with unstructured data - new trend
Tools that integrate in the DB
• Allow for Fast Model Fitting and Re-estimation
• Minimize data transport across systems
In-House Algorithms
Tracking Performance: Dashboard
Our dashboards tracked the key performance metrics:
• Historical Trends for Fraud Rates and Losses (Business KPI)
• Percentage of Transfers affected by Risk Mitigation (Business
KPI)
• % of population affected by policy and % of fraud prevented (KPI for Analytics)
Key Takeaways
On Fraud Modeling
• When dealing with fraud, the speed to implement a new model is the most important
factor
• Improvements in accuracy may be lost due to delays in deployment; systems with fast
turnaround have better ROI than complex algorithms with long implementation times. Select the right tools that will enable fast analysis and deployment.
Turning Strategy into Action
• Involving the key stakeholders early in the process maximizes your chance for
success. Once you have aligned the incentives for the team, selecting the appropriate techniques, tools and infrastructure becomes much simpler
• It is crucial for business managers to correctly define the problems and objectives, asking the right questions and learning the basic analytical concepts
If you have further
questions or comments,
please contact:
Julia Minkowski
Lead Risk Analyst, Fiserv Inc [email protected] 408-838-3827