Hexaware Webinar Series Presents: The Presentation Will Begin Momentarily

(1)

The Presentation Will Begin Momentarily

Hexaware Webinar Series Presents:

Data Mining - Seeing the future and knowing the patterns of your

business using your organization data

Rajesh Natarajan – Senior Consultant, Hexaware Technologies

(2)

Our mission : To build value for clients through innovative use of technology and talent Business Areas Transportation ERP/HRIT Insurance BFS India India’’s s Fastest Fastest Growing Mid Growing Mid- -Sized Sized Company Company 32 offices worldwide 18 Global locations 17 years of technology outsourcing expertise 55 Global 500 clients

166 Clients served worldwide

187 USD mil Revenues, 06

6900 Employees worldwide

(3)

Strategies and Strengths

Core Competency

Management of business-critical applications offshore

Organization Traits

Consultative approach, Responsive and

Result-oriented

Robust Backbone

World-class infrastructure, Flexible delivery models, SEI CMMi Level 5, BS7799

Track Record

88% Repeat Business Offshore transition expertise

Global Delivery

Leading BFSI service provider with proprietary products (Operational Risk, Collections, Leasing, Wealth

Management

# 1 Airlines services provider in India 8 of top 10 airlines are our clients

# 1 provider of HR-IT services in India 500+ projects, 750+ resources

Specialized Insurance service provider Content management, Fraud Mgmt, Work flow, SOX, BPO

L E A D E R S H I P T H R O U G H F O C U S L E A D E R S H I P T H R O U G H F O C U S E N H A N C I N G V A L U E E N H A N C I N G V A L U E

(4)

Data Mining - Seeing the future and knowing

the patterns of your business using your

organization data

(5)

Agenda

Introduction: Data Mining and its necessity

Data Mining Vs OLAP/Statistics

A Perspective on Data Mining: Functionalities and Tasks

A Process-oriented view of Data Mining

Important Data Mining Techniques

Regression

Neural Networks

Cluster Analysis

Data Mining Applications across different domains

Banking analytics

Insurance Analytics

Airlines Analytics

Retail Analytics

Data Mining Applications across different business functions

CRM Analytics

HR Analytics

(6)

Current Business Landscape

• Increased competition due to globalization

• Barriers to entry reduced due to factors such as the Internet, Outsourcing and other innovative trends

• Advances in technology creating a level playing field resulting in smaller margins

• Lower life-span of product and service models due to dynamic environmental conditions

• Increasing pressure to reduce the time to market.

• Higher customer expectations: quality, cost and customization

• Products catering to individual customers • Marketing to one in a population of six billion

• Advances in Data capture, processing, storage and retrieval technologies

• Data flood & Information overload

Changing factors of Competitive advantageChanging factors of Competitive advantage Size Economies of scale Economies of scope Size Economies of scale Economies of scope Technology Process efficiencies New Services Technology Process efficiencies New Services Knowledge Process and Product Innovation Knowledge Process and Product Innovation

(7)

Need for Business Intelligence and Analytics

• Data capture and storage has become cheap and convenient • Transactional information captured as part of the business process

• Valuable customer, business and stakeholder information hidden in captured data

• Development of new disciplines that enable extraction of information from captured data

• Extracted information can be used as a competitive advantage to drive business innovations

“As the trend to gather more data from all kinds of processes will increase - so will the competitive pressure to derive value out of it. Analytics have increased in importance as enterprises recognize their potential for alleviating the paralyzing condition known as "info glut" — an overwhelming information and data overload. Enterprises may pay for their failure to invest in analytics with decreased productivity and inferior decision making. “

(8)

Evolution of Database Technology

• 1960s:

– Data collection, database creation, IMS and network DBMS • 1970s:

– Relational data model, relational DBMS implementation • 1980s:

– RDBMS, advanced data models (extended-relational, OO, deductive, etc.) and application-oriented DBMS (spatial, scientific, engineering, etc.)

• 1990s—2000s:

– Data mining and data warehousing, multimedia databases, and Web databases

(9)

9

Information Dimensions and Data Sources

Enterprise Operational Information Business Information Customer Information Economic Indicators • Relational databases • Data warehouses • Transactional databases

• Advanced DB and information repositories

– Object-oriented and object-relational databases

– Spatial databases

– Time-series data and temporal data

– Text databases and multimedia databases

– Heterogeneous and legacy databases

(10)

What is Data Mining? (Knowledge Discovery in Databases)

• Data Mining or Knowledge Discovery in Databases (KDD) is the non-trivial process of identifying valid, novel, potentially useful and ultimately

understandable patterns in data • Validity:

– The discovered patterns must be valid on new data with some degree of certainty.

• Novelty:

– The patterns are novel (at least to the system).

– Novelty is measured with respect to changes in data

– Comparison of current values to previous or expected values • Potentially Useful:

– The discovered patterns should lead to some useful actions • Ultimately Understandable

(11)

11 Database Management Systems Machine Learning Statistics Pattern Recognition Visualization Artificial Intelligence Expert Systems Data Data Mining Mining

(12)

• Some experts look at Data Mining as a stage in the KDD (Knowledge Discovery in Databases) process

• Secondary analysis of data collected for some other purpose

– Customer segmentation using customer transactional and demographic data

• Traditional statistical techniques may not be applicable due to large sizes of the datasets

• Data Mining is an iterative and interactive process

• Emphasis is on automated/semi-automated techniques

• End-user (business-user) involvement is an important goal

• Descriptive and Predictive techniques answer the “why,” “how,” “who,” and “what will happen” type of business questions.

(13)

(14)

(15)

OLAP Vs Data Mining

• On-Line Analytical Processing (OLAP)

– Presentation Tool, Reports on Data

– Designed for faster response

– Metrics can be viewed at any level of detail

– Hypothesis driven

– User makes hypotheses about the data and looks at the data for confirmation of hypothesis

– Problems arise when there are a large number of variables

• Data Mining

– Not a Presentation tool

– Discovers hidden, implicit, significant and actionable patterns in the Data

– Brings out all the Hypothesis fitting the data (exploratory analysis)

– No-bias, Lets the data “talk”

(16)

(17)

Statistics and Data Mining

• Statistical Analysis

– Analysis of primary data

– Data collected to test specific hypothesis

– Experimental data also collected

• Data Mining

– Secondary data that is collected for other reasons

– Unbiased, but important data could be missing

– Typically, observational data

• Data Mining deals with large data bases

• Many databases do not lead to classical form of data

organization, example, data that comes from the Internet

• There should be a link between the results of data mining and business actions

(18)

Statistics and Data Mining

• Two main criticisms of Data Mining

• In Data Mining, there is not just one theoretical model but several models in competition with each other

• Model is chosen depending on the data available

• Criticism 1: It is always possible to find a model, however complex, which will adapt well to the data

• Criticism 2: Great amount of data might lead to non-existent relationships being found in the data

• While choosing models great attention is paid to the possibility of generalizing results.

• Predictive performance is considered and more complex models are penalized

(19)

19

The Primary Tasks of Data Mining

• High-level primary goals of Data mining

– Prediction

– Description • Prediction

– Using some variables or fields in a database to predict unknown or future values of a variable of interest.

– Regression

– Classification • Description

– Focus is on finding human-interpretable patterns describing the data

– Clustering

– Summarization

• In the context of Data Mining description tends to be more important than prediction.

(20)

20

Data Mining Functionalities or Tasks (1)

• Concept description: Characterization and

discrimination

– Generalize, summarize, and contrast data characteristics,

e.g., dry vs. wet regions

• Association (correlation and causality)

– Multi-dimensional vs. single-dimensional association

– age(X, “20..29”) ^ income(X, “20..29K”) Æ buys(X, “PC”)

[support = 2%, confidence = 60%]

– contains(T, “computer”) Æ contains(x, “software”) [1%,

(21)

21

Data Mining Functionalities or Tasks (2)

• Classification and Prediction

– Finding models (functions) that describe and distinguish classes or concepts for future prediction

– E.g., classify countries based on climate, or classify cars based on gas mileage

– Presentation: decision-tree, classification rule, neural network

– Prediction: Predict some unknown or missing numerical values • Cluster analysis

– Class label is unknown: Group data to form new classes, e.g., cluster houses to find distribution patterns

– Clustering based on the principle: maximizing the intra-class similarity and minimizing the interclass similarity

(22)

22

Data Mining Functionalities or Tasks (3)

• Outlier analysis

– Outlier: a data object that does not comply with the general

behavior of the data

– It can be considered as noise or exception but is quite useful

in fraud detection, rare events analysis

• Trend and evolution analysis

– Trend and deviation: regression analysis

– Sequential pattern mining, periodicity analysis

(23)

(24)

24

(25)

Data Mining: A KDD Process

–

Data mining: the core

of knowledge discovery

process.

Data Cleaning Data Integration Databases Data Warehouse Task-relevant Data Selection Data Mining Pattern Evaluation

(26)

(27)

Important Data Mining Techniques

•Regression

•Neural Networks

•Cluster Analysis

(28)

1. Regression: A Prediction Technique

• Linear Regression -Target is continuous or interval based. • Logistic Regression -Target is discrete.

Regression uses following variable selection techniques:

– Forward Selection- First selects the best one variable model and then best two variable model and so on.

– Backward Selection - It begins with a full model and starts dropping variable one by one that is least significant. (cutoff –the stay p-value)

– Stepwise Selection-Modification of Forward model where variables already in the model might be removed depending on their level of significance.

For Example:

From a set of variables Age, Income ,Gender, Location,

Occupation,Years of experience, Education…and many others what technique one can use for Variable selection?

(29)

2. Neural Networks

1. Neural networks are a class of flexible non-linear models

used for supervised prediction problems.

2. Based on the functioning of the human brain.

3. Enables us to construct, train and validate multilayer feed

forward neural networks.

4. Each hidden unit is a non-linear transformation of a linear

combination of their inputs.

5. Excellent predictive ability but difficult to interpret the

results

6. Learning mechanism: trains itself over time and fresh data

For example,

Neural networks are industry standard to detect fraud in Credit card industry.

(30)

Neural Networks: Examples

1. Neural network model can be used to study the

effectiveness of a particular campaign and helps us to analyze whether or not they responded to a recent

promotion.

2. This can further be used for prediction in future.

3. Target – Response (Yes= 1, No=0)

4. Input –Age,Income,Married, FICO,GENDER , OWN HOME, No. of purchases, Value of purchases and others.

5. Weights and Graphs of variables are shown as output to study the weight of input variables.

6. The response rate of the campaign can be increased using

the results from these models by targeting the right

(31)

3. Cluster Analysis

1. Cluster Analysis is an Unsupervised Classification method

that divides into classes that are homogeneous with respect to inputs.

2. Clustering is based on the principle: maximize the

intra-class similarity and minimize the interintra-class similarity at the same time.

3. It helps in segmenting existing customers into groups and

associating a distinct profile with each group to help in future marketing strategies.

4. Common technique - K-Means Clustering, Hierarchical

(32)

Cluster Analysis: Examples

• Not all customers can be fitted into a single model.

• Clustering helps organize data into clusters and then fit an individual model for each cluster.

• For a customer profile analysis for marketing , up selling or default study clustering allows to classify datasets.

• Clusters can give us results as follows:

• Cluster 1- Married Persons living in Climate Zone 10 • Cluster 2- Married Persons living in Climate Zone 30 • Cluster 3- Married Persons living in Climate Zone 20

• Cluster 4- Younger Un Married Persons, lower FICO score living in Climate Zone 10 • Cluster 5- Younger Unmarried Persons, higher Incomes living in Climate Zone 20 • Cluster 6- Younger ,unmarried women living in Climate Zone 20 or 30.

• These clusters can be used for marketing , depending on line of business and campaigns. For ex- Cluster 1 may have very high percentage of people who have taken home loans , so customers with similar profile can be

(33)

(34)

Application Areas

• Classification / Categorical Analysis

– Logistic Regression, Support Vector Machine, Naïve Bayes, Adaptive Bayes

• Clustering / Association / Neighborhood Analysis

– K’ Means, ‘K’ Nearest Neighbor, ‘O’Cluster, Association

• Decision Trees

– CART / CHAID

• Parameter Selection & Improvement

– Factor Selection/Non-Negative Matrix Factorization / Attribute Importance • Forecasting

• Optimization • Others

– Regression, ANOVA

Applications

Applications _{Mining Techniques}_{Mining Techniques} • Prospect targeting

• Call planning

• Marketing optimization • Sales force optimization • Propensity to churn • Customer Segmentation • Performance attribution • Funds/fees analysis • Fund benchmarking • Revenue forecasting • Demand forecasting • Probability of default • Loss given default • Probability of claim • Underwriting Scoring • Fraudulent identification

• ALM /FTP models (core segregation, attrition, matched maturity)

• Economic capital modeling • Clinical Intelligence

(35)

Data Mining applications across different

domains

(36)

(37)

Use of Decision Trees in Banking

1. Decision trees are used to select the best course of action

in situations where you face uncertainty.

2. A decision tree is a predictive model; that is, a mapping of observations about an item to conclusions about the item's target value.

For example,

What rules should a bank follow so that the response rate for a population to a Personal Loan marketing is 80%?

(38)

Case Study for Decision Tree

N = 5000 10% bad Debt _Inc < 45 N = 3350 5% BAD N =1650 21% BAD Yes No No. of Delinquent Trade lines < 2 N = 1000 30% BAD N = 650 4% BAD N=2500 1% BAD N = 1000 15% BAD Income > 500000 p.a. Yes No Yes No

Note:-Probability of a bad home loan is higher if Debt_Inc > 45 and Income< 5000000 p.a.

(39)

Logistic Regression in Banking

• Logistic regression can be used to conduct a marketing campaign up

selling a credit card to customers that have accounts in banks.

• Predicted variable- Target in terms of 1 or 0 , where Possessing a Credit Card=1

Not possessing a Credit card =0

• Input variables: Income

Age

Home Owner Service

Account Age

Account Balance Other credit cards

Note-This will help us detect the most important predictor variables for effective targeting of Credit Cards.

(40)

Multiple Linear Regression: Example

• A payment behavior modeling of customers can be done through multiple

regression model.

• Input variables: Income

Age Gender

Payment Pattern

No. of times delinquent Homeowner

No. of dependents Total amount of loans

• Target Variable- Ordinal variables(1-10) can be used to signify bad- excellent

payment patterns.

• Prediction of future payment can be done based on this multiple regression

(41)

(42)

(43)

(44)

Analytics in Insurance

Claims Servicing Underwriting Customer Management

Modeling Probability of Claim Optimize Claims Processing Detect and Prevent Claims Fraud

Pure Premium Modeling Claim Frequency Modeling

Claim Severity Modeling Claims Estimation Analysis

Pricing of insurance policies Risk assessment and Pricing

Portfolio Risk Management Business Performance Analysis

Customer Segmentation Attrition / Prediction Analytics Cross- Sell , Up –sell of Products

Development of New Products Customer Retention Strategy

Clustering Decision Trees

Association Analysis Multivariate Analysis Linear Regression Neural Networks

Logistic Regression Oultlier Detection

Variable Selection Support Vector Machine

(45)

Regression Outcome

Shows the effect of each independent variable on probability of Claim.

(46)

(47)

Customer Analytics – Predicting Campaign Effectiveness

and targeting the correct customer segments…

Optimize marketing effectiveness by:

• minimizing the cost per campaign per customer

• maximizing revenue per campaign per customer

Business Goal

To achieve the goal of optimizing marketing effectiveness the following parameters that yield the most optimal results have to be identified:

• customer segments

• campaign medium

• offer type

Problem Definition

9 Use historic campaign data to statistically determine customer segments

9 Analyze response data from past campaigns with respect to revenue achieved from the campaign for a given customer

9 Use the above analysis to come up with a model that predicts probability of revenue generation given a combination of the parameters – customer demographics, campaign medium, offer type

9 For all future campaigns use the model to effectively use the right medium to the appropriate customer segment

(48)

Campaign Analysis & Segmentation

Customer Segmentation based on Campaign Channel in each Campaign Type

a. Retail Fare Sale

i. E-mail

ii. Direct Mail

iii. Advertisement

iv. Partner

b. Competitive Response sale

i. E-mail

ii. Direct Mail

iii. Advertisement

iv. Partner

c. Frequent Flyer Acquisition Drive

i. E-mail

ii. Direct Mail

iii. Advertisement

iv. Partner

d. Partner Offer

i. E-mail

ii. Direct Mail

iii. Advertisement

iv. Partner

Maximum Response Rate & Profitability In which Channel for which campaign type

(49)

Defect Cost Analysis

(50)

(51)

Market Basket Analysis: Association Rule Mining

• An association rule is a statement of the form (Beer) (Diapers ).

• Support of the rule Beer Diaper is estimated by

number of transactions that contain Beer and Diaper Total number of transactions in the database

• Confidence of an association rule Beer Diaper is the conditional probability of a transaction containing item set B given that it contains item set A.

transactions that contain Beer and Diaper transactions that contain only Beer

(52)

Case Study for Association Analysis

500 3500 1000 5000 No No Yes Yes Checking Account Savings Account Support (SVG CK) = 50% Confidence (SVG CK) = 83% Lift (SVG CK) = .83/.85 < 1

This shows a strong rule. Also, those without a savings account are even more likely to have a checking account(87.5%)

Savings and checking are negatively correlated.

4000

6000 10000

(53)

(54)

Data Mining applications across different

business functions

(55)

(56)

CRM Analytics

Understanding Customer Profitability

Understanding Customer

Retention/Attrition

Deepenening Customer Relationships

• New Business Analysis •Cross selling / Upselling •Cross Product Holding •Acquisition Pattern

•Service Request Analysis •Customer Behavior

•Targeted Campaign •Response analysis •Customer Profiling •Channel Behavior

•Analytic and Mining Models •Off the Shelf CRM Tools

(57)

Split Customer List Assignment Model Right Product/Right Customer/Right Channel using SAS linear programming Campaign Objective Campaign Description Customer Selection Customer Exclusions Criteria Response Propensity Purchase Score Credit Score Relationship value Other Business Criteria Cross Sell Upsell Churn Retention Segmentation Customer transition to higher segment Data Mining Techniques Model Selection Customer Selection Based on other attributes Customer Attributes – Age/ Gender/ Relationship details Select Customers who did not Respond to past campaign and are poor on

other parameters Response Propensity Score for maximum likelihood of response Data Mining Techniques Model Selection Tele Marketing Direct Mail Walk in to Advisors Other Medium Siebel Schedules & Executes Campaign Campaign Effectiveness Evaluation Model

(Response & Returns to Cost)

Response To Multiple Campaigns Stored as history Filtered Customer List Filtered Customer List based how old

Is relationship Owner of Campaign

Budget ,Costs and Segmentation

Other constraints

Reporting & Analysis (use SAS BI/ Micro strategy/MS Excel)

(58)

(59)

HR Analytics - The potential

Hir e to Ret ire Demand Vs Supply Required Vs Skill Aspiration Vs Actual Growth Vs Capability Measurement Vs Measurable Rewards Vs Budgets Benefits Vs Cost Separation Vs Retention HR Function Challenges •Demographics •Productivity •Health & Safety

•Relations & Satisfaction

•Turnover & Mobility •Planning

•Staffing & Recruiting •Compensation & Benefits •Training & Development

(60)

•Training cost per employee •Impact on performance Learning Learning & & Skill dev Skill dev Compensation Compensation & & Rewards Rewards Potential Potential & & Performance Performance Workforce Workforce Recruitment Recruitment & & Attrition Attrition Succession Succession Planning Planning HR Functions •Compensation Analysis

•Rewards & Benefits Administration

•Salary / Compensation

information including overtime costs

•Recruiting & Talent Management

•HR Performance Analysis

•Impact of absenteeism and tardiness

•Cost per Hire

•Employee Analysis

•Work Force Profile & Compliance Analysis\

•Headcount/Turnover

• Derive Hindsight, Insight and Foresight into your HR to derive Strategic imperatives

• DW ,OLAP & Reporting– Hindsight

• Statistical & Data Mining Techniques – Foresight & Insight

•Data models/Hierarchy/Metrics

•Reports

•DB / KPIs

(61)

1.Work Force Analysis

a. Headcount Analysis b. Turnover Analysis c. Hiring Trend Analysis d. Overtime Usage Analysis e. Affirmative Action Analysis

2. Compensation Analysis a. Variable Compensation Analysis b. Merit Distribution Analysis c. Compa-Ratio Analysis d. Total Compensation Analysis

e. Year Over Year Analysis

3. Benefits Analysis

a. Plan Participation Analysis c. Benefits Cost Analysis d. Disability Fraud Analysis

Hexaware Jumpstart Analytic

Pack

(62)

(63)

(64)

Q & A

You can also reach us at

(65)

Thank You For Attending

http://www.hexaware.com/webcastarchive1.html

For a recording of this webinar please visit:

Hexaware Webinar Series

Upcoming Webinars

Download Case Studies & Whitepapers at: www.hexaware.com

Register Today!

How can banks leverage analytics across various perspectives protecting their current investment in technology?

Extend PeopleSoft Enterprise Applications with Oracle Fusion