The Presentation Will Begin Momentarily
Hexaware Webinar Series Presents:
Data Mining - Seeing the future and knowing the patterns of yourbusiness using your organization data
Rajesh Natarajan – Senior Consultant, Hexaware Technologies
Our mission : To build value for clients through innovative use of technology and talent Business Areas Transportation ERP/HRIT Insurance BFS India India’’s s Fastest Fastest Growing Mid Growing Mid- -Sized Sized Company Company 32 offices worldwide 18 Global locations 17 years of technology outsourcing expertise 55 Global 500 clients
166 Clients served worldwide
187 USD mil Revenues, 06
6900 Employees worldwide
Strategies and Strengths
Core Competency
Management of business-critical applications offshore
Organization Traits
Consultative approach, Responsive and
Result-oriented
Robust Backbone
World-class infrastructure, Flexible delivery models, SEI CMMi Level 5, BS7799
Track Record
88% Repeat Business Offshore transition expertise
Global Delivery
Leading BFSI service provider with proprietary products (Operational Risk, Collections, Leasing, Wealth
Management
# 1 Airlines services provider in India 8 of top 10 airlines are our clients
# 1 provider of HR-IT services in India 500+ projects, 750+ resources
Specialized Insurance service provider Content management, Fraud Mgmt, Work flow, SOX, BPO
L E A D E R S H I P T H R O U G H F O C U S L E A D E R S H I P T H R O U G H F O C U S E N H A N C I N G V A L U E E N H A N C I N G V A L U E
Data Mining - Seeing the future and knowing
the patterns of your business using your
organization data
Agenda
Introduction: Data Mining and its necessity
Data Mining Vs OLAP/Statistics
A Perspective on Data Mining: Functionalities and Tasks
A Process-oriented view of Data Mining
Important Data Mining Techniques
Regression
Neural Networks
Cluster Analysis
Data Mining Applications across different domains
Banking analytics
Insurance Analytics
Airlines Analytics
Retail Analytics
Data Mining Applications across different business functions
CRM Analytics
HR Analytics
Current Business Landscape
• Increased competition due to globalization
• Barriers to entry reduced due to factors such as the Internet, Outsourcing and other innovative trends
• Advances in technology creating a level playing field resulting in smaller margins
• Lower life-span of product and service models due to dynamic environmental conditions
• Increasing pressure to reduce the time to market.
• Higher customer expectations: quality, cost and customization
• Products catering to individual customers • Marketing to one in a population of six billion
• Advances in Data capture, processing, storage and retrieval technologies
• Data flood & Information overload
Changing factors of Competitive advantageChanging factors of Competitive advantage Size Economies of scale Economies of scope Size Economies of scale Economies of scope Technology Process efficiencies New Services Technology Process efficiencies New Services Knowledge Process and Product Innovation Knowledge Process and Product Innovation
Need for Business Intelligence and Analytics
• Data capture and storage has become cheap and convenient • Transactional information captured as part of the business process
• Valuable customer, business and stakeholder information hidden in captured data
• Development of new disciplines that enable extraction of information from captured data
• Extracted information can be used as a competitive advantage to drive business innovations
“As the trend to gather more data from all kinds of processes will increase - so will the competitive pressure to derive value out of it. Analytics have increased in importance as enterprises recognize their potential for alleviating the paralyzing condition known as "info glut" — an overwhelming information and data overload. Enterprises may pay for their failure to invest in analytics with decreased productivity and inferior decision making. “
Evolution of Database Technology
• 1960s:– Data collection, database creation, IMS and network DBMS • 1970s:
– Relational data model, relational DBMS implementation • 1980s:
– RDBMS, advanced data models (extended-relational, OO, deductive, etc.) and application-oriented DBMS (spatial, scientific, engineering, etc.)
• 1990s—2000s:
– Data mining and data warehousing, multimedia databases, and Web databases
9
Information Dimensions and Data Sources
Enterprise Operational Information Business Information Customer Information Economic Indicators • Relational databases • Data warehouses • Transactional databases
• Advanced DB and information repositories
– Object-oriented and object-relational databases
– Spatial databases
– Time-series data and temporal data
– Text databases and multimedia databases
– Heterogeneous and legacy databases
What is Data Mining? (Knowledge Discovery in Databases)
• Data Mining or Knowledge Discovery in Databases (KDD) is the non-trivial process of identifying valid, novel, potentially useful and ultimately
understandable patterns in data • Validity:
– The discovered patterns must be valid on new data with some degree of certainty.
• Novelty:
– The patterns are novel (at least to the system).
– Novelty is measured with respect to changes in data
– Comparison of current values to previous or expected values • Potentially Useful:
– The discovered patterns should lead to some useful actions • Ultimately Understandable
11 Database Management Systems Machine Learning Statistics Pattern Recognition Visualization Artificial Intelligence Expert Systems Data Data Mining Mining
• Some experts look at Data Mining as a stage in the KDD (Knowledge Discovery in Databases) process
• Secondary analysis of data collected for some other purpose
– Customer segmentation using customer transactional and demographic data
• Traditional statistical techniques may not be applicable due to large sizes of the datasets
• Data Mining is an iterative and interactive process
• Emphasis is on automated/semi-automated techniques
• End-user (business-user) involvement is an important goal
• Descriptive and Predictive techniques answer the “why,” “how,” “who,” and “what will happen” type of business questions.
OLAP Vs Data Mining
• On-Line Analytical Processing (OLAP)
– Presentation Tool, Reports on Data– Designed for faster response
– Metrics can be viewed at any level of detail
– Hypothesis driven
– User makes hypotheses about the data and looks at the data for confirmation of hypothesis
– Problems arise when there are a large number of variables
• Data Mining
– Not a Presentation tool
– Discovers hidden, implicit, significant and actionable patterns in the Data
– Brings out all the Hypothesis fitting the data (exploratory analysis)
– No-bias, Lets the data “talk”
Statistics and Data Mining
• Statistical Analysis
– Analysis of primary data
– Data collected to test specific hypothesis
– Experimental data also collected
• Data Mining
– Secondary data that is collected for other reasons
– Unbiased, but important data could be missing
– Typically, observational data
• Data Mining deals with large data bases
• Many databases do not lead to classical form of data
organization, example, data that comes from the Internet
• There should be a link between the results of data mining and business actions
Statistics and Data Mining
• Two main criticisms of Data Mining
• In Data Mining, there is not just one theoretical model but several models in competition with each other
• Model is chosen depending on the data available
• Criticism 1: It is always possible to find a model, however complex, which will adapt well to the data
• Criticism 2: Great amount of data might lead to non-existent relationships being found in the data
• While choosing models great attention is paid to the possibility of generalizing results.
• Predictive performance is considered and more complex models are penalized
19
The Primary Tasks of Data Mining
• High-level primary goals of Data mining
– Prediction
– Description • Prediction
– Using some variables or fields in a database to predict unknown or future values of a variable of interest.
– Regression
– Classification • Description
– Focus is on finding human-interpretable patterns describing the data
– Clustering
– Summarization
• In the context of Data Mining description tends to be more important than prediction.
20
Data Mining Functionalities or Tasks (1)
• Concept description: Characterization and
discrimination
– Generalize, summarize, and contrast data characteristics,
e.g., dry vs. wet regions
• Association (correlation and causality)
– Multi-dimensional vs. single-dimensional association
– age(X, “20..29”) ^ income(X, “20..29K”) Æ buys(X, “PC”)
[support = 2%, confidence = 60%]
– contains(T, “computer”) Æ contains(x, “software”) [1%,
21
Data Mining Functionalities or Tasks (2)
• Classification and Prediction
– Finding models (functions) that describe and distinguish classes or concepts for future prediction
– E.g., classify countries based on climate, or classify cars based on gas mileage
– Presentation: decision-tree, classification rule, neural network
– Prediction: Predict some unknown or missing numerical values • Cluster analysis
– Class label is unknown: Group data to form new classes, e.g., cluster houses to find distribution patterns
– Clustering based on the principle: maximizing the intra-class similarity and minimizing the interclass similarity
22
Data Mining Functionalities or Tasks (3)
• Outlier analysis
– Outlier: a data object that does not comply with the general
behavior of the data
– It can be considered as noise or exception but is quite useful
in fraud detection, rare events analysis
• Trend and evolution analysis
– Trend and deviation: regression analysis
– Sequential pattern mining, periodicity analysis
24
Data Mining: A KDD Process
–
Data mining: the core
of knowledge discovery
process.
Data Cleaning Data Integration Databases Data Warehouse Task-relevant Data Selection Data Mining Pattern EvaluationImportant Data Mining Techniques
•Regression
•Neural Networks
•Cluster Analysis
1. Regression: A Prediction Technique
• Linear Regression -Target is continuous or interval based. • Logistic Regression -Target is discrete.Regression uses following variable selection techniques:
– Forward Selection- First selects the best one variable model and then best two variable model and so on.
– Backward Selection - It begins with a full model and starts dropping variable one by one that is least significant. (cutoff –the stay p-value)
– Stepwise Selection-Modification of Forward model where variables already in the model might be removed depending on their level of significance.
For Example:
From a set of variables Age, Income ,Gender, Location,
Occupation,Years of experience, Education…and many others what technique one can use for Variable selection?
2. Neural Networks
1. Neural networks are a class of flexible non-linear models
used for supervised prediction problems.
2. Based on the functioning of the human brain.
3. Enables us to construct, train and validate multilayer feed
forward neural networks.
4. Each hidden unit is a non-linear transformation of a linear
combination of their inputs.
5. Excellent predictive ability but difficult to interpret the
results
6. Learning mechanism: trains itself over time and fresh data
For example,
Neural networks are industry standard to detect fraud in Credit card industry.
Neural Networks: Examples
1. Neural network model can be used to study the
effectiveness of a particular campaign and helps us to analyze whether or not they responded to a recent
promotion.
2. This can further be used for prediction in future.
3. Target – Response (Yes= 1, No=0)
4. Input –Age,Income,Married, FICO,GENDER , OWN HOME, No. of purchases, Value of purchases and others.
5. Weights and Graphs of variables are shown as output to study the weight of input variables.
6. The response rate of the campaign can be increased using
the results from these models by targeting the right
3. Cluster Analysis
1. Cluster Analysis is an Unsupervised Classification method
that divides into classes that are homogeneous with respect to inputs.
2. Clustering is based on the principle: maximize the
intra-class similarity and minimize the interintra-class similarity at the same time.
3. It helps in segmenting existing customers into groups and
associating a distinct profile with each group to help in future marketing strategies.
4. Common technique - K-Means Clustering, Hierarchical
Cluster Analysis: Examples
• Not all customers can be fitted into a single model.
• Clustering helps organize data into clusters and then fit an individual model for each cluster.
• For a customer profile analysis for marketing , up selling or default study clustering allows to classify datasets.
• Clusters can give us results as follows:
• Cluster 1- Married Persons living in Climate Zone 10 • Cluster 2- Married Persons living in Climate Zone 30 • Cluster 3- Married Persons living in Climate Zone 20
• Cluster 4- Younger Un Married Persons, lower FICO score living in Climate Zone 10 • Cluster 5- Younger Unmarried Persons, higher Incomes living in Climate Zone 20 • Cluster 6- Younger ,unmarried women living in Climate Zone 20 or 30.
• These clusters can be used for marketing , depending on line of business and campaigns. For ex- Cluster 1 may have very high percentage of people who have taken home loans , so customers with similar profile can be
Application Areas
• Classification / Categorical Analysis
– Logistic Regression, Support Vector Machine, Naïve Bayes, Adaptive Bayes
• Clustering / Association / Neighborhood Analysis
– K’ Means, ‘K’ Nearest Neighbor, ‘O’Cluster, Association
• Decision Trees
– CART / CHAID
• Parameter Selection & Improvement
– Factor Selection/Non-Negative Matrix Factorization / Attribute Importance • Forecasting
• Optimization • Others
– Regression, ANOVA
Applications
Applications Mining TechniquesMining Techniques • Prospect targeting
• Call planning
• Marketing optimization • Sales force optimization • Propensity to churn • Customer Segmentation • Performance attribution • Funds/fees analysis • Fund benchmarking • Revenue forecasting • Demand forecasting • Probability of default • Loss given default • Probability of claim • Underwriting Scoring • Fraudulent identification
• ALM /FTP models (core segregation, attrition, matched maturity)
• Economic capital modeling • Clinical Intelligence
Data Mining applications across different
domains
Use of Decision Trees in Banking
1. Decision trees are used to select the best course of action
in situations where you face uncertainty.
2. A decision tree is a predictive model; that is, a mapping of observations about an item to conclusions about the item's target value.
For example,
What rules should a bank follow so that the response rate for a population to a Personal Loan marketing is 80%?
Case Study for Decision Tree
N = 5000 10% bad Debt _Inc < 45 N = 3350 5% BAD N =1650 21% BAD Yes No No. of Delinquent Trade lines < 2 N = 1000 30% BAD N = 650 4% BAD N=2500 1% BAD N = 1000 15% BAD Income > 500000 p.a. Yes No Yes NoNote:-Probability of a bad home loan is higher if Debt_Inc > 45 and Income< 5000000 p.a.
Logistic Regression in Banking
• Logistic regression can be used to conduct a marketing campaign up
selling a credit card to customers that have accounts in banks.
• Predicted variable- Target in terms of 1 or 0 , where Possessing a Credit Card=1
Not possessing a Credit card =0
• Input variables: Income
Age
Home Owner Service
Account Age
Account Balance Other credit cards
Note-This will help us detect the most important predictor variables for effective targeting of Credit Cards.
Multiple Linear Regression: Example
• A payment behavior modeling of customers can be done through multiple
regression model.
• Input variables: Income
Age Gender
Payment Pattern
No. of times delinquent Homeowner
No. of dependents Total amount of loans
• Target Variable- Ordinal variables(1-10) can be used to signify bad- excellent
payment patterns.
• Prediction of future payment can be done based on this multiple regression
Analytics in Insurance
Claims Servicing Underwriting Customer Management
Modeling Probability of Claim Optimize Claims Processing Detect and Prevent Claims Fraud
Pure Premium Modeling Claim Frequency Modeling
Claim Severity Modeling Claims Estimation Analysis
Pricing of insurance policies Risk assessment and Pricing
Portfolio Risk Management Business Performance Analysis
Customer Segmentation Attrition / Prediction Analytics Cross- Sell , Up –sell of Products
Development of New Products Customer Retention Strategy
Clustering Decision Trees
Association Analysis Multivariate Analysis Linear Regression Neural Networks
Logistic Regression Oultlier Detection
Variable Selection Support Vector Machine
Regression Outcome
Shows the effect of each independent variable on probability of Claim.
Customer Analytics – Predicting Campaign Effectiveness
and targeting the correct customer segments…
Optimize marketing effectiveness by:
• minimizing the cost per campaign per customer
• maximizing revenue per campaign per customer
Business Goal
To achieve the goal of optimizing marketing effectiveness the following parameters that yield the most optimal results have to be identified:
• customer segments
• campaign medium
• offer type
Problem Definition
9 Use historic campaign data to statistically determine customer segments
9 Analyze response data from past campaigns with respect to revenue achieved from the campaign for a given customer
9 Use the above analysis to come up with a model that predicts probability of revenue generation given a combination of the parameters – customer demographics, campaign medium, offer type
9 For all future campaigns use the model to effectively use the right medium to the appropriate customer segment
Campaign Analysis & Segmentation
Customer Segmentation based on Campaign Channel in each Campaign Type
a. Retail Fare Sale
i. E-mail
ii. Direct Mail
iii. Advertisement
iv. Partner
b. Competitive Response sale
i. E-mail
ii. Direct Mail
iii. Advertisement
iv. Partner
c. Frequent Flyer Acquisition Drive
i. E-mail
ii. Direct Mail
iii. Advertisement
iv. Partner
d. Partner Offer
i. E-mail
ii. Direct Mail
iii. Advertisement
iv. Partner
Maximum Response Rate & Profitability In which Channel for which campaign type
Defect Cost Analysis
Market Basket Analysis: Association Rule Mining
• An association rule is a statement of the form (Beer) (Diapers ).
• Support of the rule Beer Diaper is estimated by
number of transactions that contain Beer and Diaper Total number of transactions in the database
• Confidence of an association rule Beer Diaper is the conditional probability of a transaction containing item set B given that it contains item set A.
transactions that contain Beer and Diaper transactions that contain only Beer
Case Study for Association Analysis
500 3500 1000 5000 No No Yes Yes Checking Account Savings Account Support (SVG CK) = 50% Confidence (SVG CK) = 83% Lift (SVG CK) = .83/.85 < 1This shows a strong rule. Also, those without a savings account are even more likely to have a checking account(87.5%)
Savings and checking are negatively correlated.
4000
6000 10000
Data Mining applications across different
business functions
CRM Analytics
Understanding Customer Profitability
Understanding Customer
Retention/Attrition
Deepenening Customer Relationships
• New Business Analysis •Cross selling / Upselling •Cross Product Holding •Acquisition Pattern
•Service Request Analysis •Customer Behavior
•Targeted Campaign •Response analysis •Customer Profiling •Channel Behavior
•Analytic and Mining Models •Off the Shelf CRM Tools
Split Customer List Assignment Model Right Product/Right Customer/Right Channel using SAS linear programming Campaign Objective Campaign Description Customer Selection Customer Exclusions Criteria Response Propensity Purchase Score Credit Score Relationship value Other Business Criteria Cross Sell Upsell Churn Retention Segmentation Customer transition to higher segment Data Mining Techniques Model Selection Customer Selection Based on other attributes Customer Attributes – Age/ Gender/ Relationship details Select Customers who did not Respond to past campaign and are poor on
other parameters Response Propensity Score for maximum likelihood of response Data Mining Techniques Model Selection Tele Marketing Direct Mail Walk in to Advisors Other Medium Siebel Schedules & Executes Campaign Campaign Effectiveness Evaluation Model
(Response & Returns to Cost)
Response To Multiple Campaigns Stored as history Filtered Customer List Filtered Customer List based how old
Is relationship Owner of Campaign
Budget ,Costs and Segmentation
Other constraints
Reporting & Analysis (use SAS BI/ Micro strategy/MS Excel)
HR Analytics - The potential
Hir e to Ret ire Demand Vs Supply Required Vs Skill Aspiration Vs Actual Growth Vs Capability Measurement Vs Measurable Rewards Vs Budgets Benefits Vs Cost Separation Vs Retention HR Function Challenges •Demographics •Productivity •Health & Safety•Relations & Satisfaction
•Turnover & Mobility •Planning
•Staffing & Recruiting •Compensation & Benefits •Training & Development
•Training cost per employee •Impact on performance Learning Learning & & Skill dev Skill dev Compensation Compensation & & Rewards Rewards Potential Potential & & Performance Performance Workforce Workforce Recruitment Recruitment & & Attrition Attrition Succession Succession Planning Planning HR Functions •Compensation Analysis
•Rewards & Benefits Administration
•Salary / Compensation
information including overtime costs
•Recruiting & Talent Management
•HR Performance Analysis
•Impact of absenteeism and tardiness
•Cost per Hire
•Employee Analysis
•Work Force Profile & Compliance Analysis\
•Headcount/Turnover
• Derive Hindsight, Insight and Foresight into your HR to derive Strategic imperatives
• DW ,OLAP & Reporting– Hindsight
• Statistical & Data Mining Techniques – Foresight & Insight
•Data models/Hierarchy/Metrics
•Reports
•DB / KPIs
1.Work Force Analysis
a. Headcount Analysis b. Turnover Analysis c. Hiring Trend Analysis d. Overtime Usage Analysis e. Affirmative Action Analysis
2. Compensation Analysis a. Variable Compensation Analysis b. Merit Distribution Analysis c. Compa-Ratio Analysis d. Total Compensation Analysis
e. Year Over Year Analysis
3. Benefits Analysis
a. Plan Participation Analysis c. Benefits Cost Analysis d. Disability Fraud Analysis
Hexaware Jumpstart Analytic
Pack
Q & A
Q & A
You can also reach us at
Thank You For Attending
http://www.hexaware.com/webcastarchive1.html
For a recording of this webinar please visit:
Hexaware Webinar Series
Upcoming Webinars
Download Case Studies & Whitepapers at: www.hexaware.com
Register Today!
How can banks leverage analytics across various perspectives protecting their current investment in technology?
Extend PeopleSoft Enterprise Applications with Oracle Fusion