• No results found

An Overview of Predictive Analytics for Practitioners. Dean Abbott, Abbott Analytics

N/A
N/A
Protected

Academic year: 2021

Share "An Overview of Predictive Analytics for Practitioners. Dean Abbott, Abbott Analytics"

Copied!
30
0
0

Loading.... (view fulltext now)

Full text

(1)

An Overview of Predictive Analytics for

Practitioners

(2)

Thank You Sponsors

Empower users with new insights through

familiar tools while balancing the need for

IT to monitor and manage user created

content. Deliver access to all data types

across structured and unstructured

sources.

www.microsoft.com/bi

Hortonworks develops, distributes

and supports the only 100% Open

Source distribution of Apache

Hadoop architected, built and

tested for enterprise deployments.

(3)

Dean Abbott

Co-founder and Chief Data Scientist at

SmarterHQ, based in Indianapolis, Indiana

President of Abbott Analytics in San Diego,

California

Internationally recognized data mining and

predictive analytics expert with over two

decades’ experience

Author of

Applied Predictive Analytics (Wiley,

2014), co-author of IBM SPSS Modeler

Cookbook (Packt Publishing, 2013).

Advisory board and instructor for UC Irvine

(4)

Speaker Social Media

@deanabb http://www.linkedin.com/in /deanabbott/ abbottanalytics.blogspot.com/ www.abbottanalytics.com/
(5)

The Analyst’s Journey

Gain critical business and data

analytics skills

Uncover insights and provide valueto

your organization

Put your

knowledge to use immediately

(6)

An Overview of Predictive Analytics for

Practitioners

(7)

What do Predictive Modelers do?

The CRISP-DM Process Model

CR

oss-

I

ndustry

S

tandard

P

rocess

M

odel for

D

ata

M

ining

Describes Components of

Complete Data Mining Cycle

from the Project Manager’s

Perspective

Shows Iterative Nature of Data

Mining

Business

Understanding Data Understanding

Data Preparation Modeling Evaluation Deployment Data Data Data

(8)

CRISP-DM:

Business Understanding Steps

•Ask Relevant Business Questions

•Determine Data

Requirements to Answer Business Question

•Translate Business Question into Appropriate Data Mining Approach

•Determine Project Plan for Data Mining Approach

Define Business Objectives Background Business Objectives Business Success Criteria Assess Situation Inventory of Resources Requirements, Assumptions, Constraints Risks and Contingencies Terminology

Costs and Benefits Determine

Data Mining Objectives

Data Mining Goals Data Mining Success Criteria

Produce

Project Plan Project Plan

Initial Assess-ment of Tools & Techniques

(9)

Objective’s

Business objective:

Random test mailing to NRA’s house file achieved a 11% response rate

Need a model that finds population with a minimum response rate of

13.5% to be profitable

Modeling Objectives:

Develop a binary outcome model that will rank-order current database

based on propensity to respond to traditional mailing, optimizing at a

cumulative average response rate of >= 13.5%.

(10)

CRISP-DM Step 2:

Data Understanding Steps

Collect initial data

• Internal data: historical customer behavior, results from previous

experiments

• External data: demographics & census, other studies and

government research

• Extract superset of data (rows and columns) to be used in

modeling

• Identify form of data repository: multiple vs. single table, flat file

vs. database, local copy vs. data mart

Perform Preliminary Analysis

• Characterize Data (describe, explore, verify) • Condition Data Collect Initial Data Initial Data Collection Report Describe Data Data Description Report

Explore Data Data Exploration Report

Verify Data Quality

Data Quality Report

(11)

Source Data

Business partner provided data that summarizes transactional

data for every active NRA member - 49

independent

variables.

TN Marketing enhanced the database with demographic

data-18

appended

variables.

I-Miner was used to derive new variable features and

transformations of pre existing data points - 79

derived

(12)

CRISP-DM Step 3:

Data Preparation (Conditioning) Steps

Select Data Rationale for Inclusion/Exclusion

Clean Data Data Cleaning Report

Construct

Data Derived Attributes

Generated Records

Integrate

Data Merged Data

Format Data Reformatted Data

Fix Data Problems

(13)

Data Preparation

Key transformations

Date Features

Filling missing data

• Use “Distribution” when possible for numeric fields • Use Constant for categoricals

• For numeric data with both “in-house” and third-party versions, use in-house when available,

(14)

Data Size

Original Data

Data after data cleanup

and feature creation

Data after further cleanup,

and adding interaction

terms

(15)

CRISP-DM Step 4:

Modeling Steps

Select Modeling

Techniques TechniquesModeling AssumptionsModeling

Generate Test

Design Test Design

Build Model Parameter Settings Models

Revised Parameter Model Description

Algorithm Selection

Sampling

Algorithms

Model Ranking

(16)

Sampling

Randomly split the 21,557 records into two data sets, training and validation

Build response model on training data set: 10,778 records

Validate model by scoring test data set: 10,779 records

Ideally, have a third held out data set to provide final assessment of

(17)

Classifiers Find Different Decision Boundaries

11-Nearest Neighbor Neural Network

Naïve Bayes Logistic Regression Decision Tree Actual Data

(18)
(19)

How to deploy model?

Software, source code, in database

How often, when to update

model

Report results

Lessons learned

Plan Deployment Deployment Plan

Plan Moni-toring and Maintenance

Monitoring & Maintenance Plan

Produce Final

Report Final Report Final Presentation

Review Project Experience

CRISP-DM Step 6:

Deployment Steps

(20)

Model Results after Deployment

Scored over 2,100,000 prospects

Actual results from the rollout

Average response rate = 13.67%

(21)
(22)

What is Predictive Analytics?

Simple Definitions

Data driven

analysis for [large] data sets

Data-driven to discover input combinations

Data-driven to validate models

OR

Discovering interesting patterns in data

automatically

from the

data

Input variables are selected automatically

(23)

Customer Analytics: BI vs. PA

Customer Analytics: Business Intelligence

■ What were the e-mail open, click-through, and response rates?

■ Which regions/states/ZIPs had the highest response rates?

■ Which products had the highest/lowest click-through rates?

■ How many repeat purchasers were there last month?

■ How many new subscriptions to the loyalty program were there?

■ What is the average spend of those who belong to the loyalty program? Those who aren’t a part of the

Customer Analytics for Predictive Analytics

■ What is the likelihood an e-mail will be opened?

■ What is the likelihood a customer will click-through a link in an e-mail?

■ Which product is a customer most likely to purchase if given the choice?

■ How many e-mails should the customer receive to maximize the likelihood of a purchase?

■ What is the best product to up-sell to the customer after they purchase a product?

■ What is the visit volume expected on the website next week?

(24)

Predictive Analytics vs. Data Science

Predictive Analytics and

Data Mining have

always

covered

the same ground except for…

Big data-centricity

Advanced database technology (to

handle big data)

• Hadoop

• Other NoSQL (MongoDB, Cassandra…)

Programming language-centricity (not

listed)

(25)

What Degree Does it Take

to Be a Predictive Modeler?

• Highest Degree • 7 PhDs • 1 Masters • 2 Bachelors

• You don’t need an advanced degree to be a great practitioner!

Max. Degree Count

Math 2

Computer Science 2 Social Science 2 Statistics 1

(26)
(27)

PASS Virtual

Chapters for

Business

Analytics

www.sqlpass.org/vc

(28)

Like What You Heard?

Dean will be presenting at BAC 2015!

Pre-Conference (full day):

An Overview of Predictive Analytics for

Practitioners

Breakout Sessions (60 mins):

Starting Your First Predictive Analytics

Project

(29)

passbaconference.com

REGISTER TODAY

(30)

Productivity Revolution in Excel

Avi Singh, PowerPivotPro and Chandoo, chandoo.org

www.microsoft.com/bi http://hortonworks.com/

References

Related documents

Players can create characters and participate in any adventure allowed as a part of the D&D Adventurers League.. As they adventure, players track their characters’

In conclusion, for the studied Taiwanese population of diabetic patients undergoing hemodialysis, increased mortality rates are associated with higher average FPG levels at 1 and

The main wall of the living room has been designated as a "Model Wall" of Delta Gamma girls -- ELLE smiles at us from a Hawaiian Tropic ad and a Miss June USC

According to the international experience, federal authorities can carry out six groups of functions for support of mechanisms of development of innovative

Vital Signs Perfusion Abnormal Normal Focused Assessment Spinal Immobilization Protocol Transport to appropriate trauma hospital * Splinting * Dressings. Rapid transport

Further, in the discussion on Joko Pinurbo’s “Tuhan Datang Malam Ini” translation, we can point out or infer three prominent things, namely (1) a translator can keep the

T h e second approximation is the narrowest; this is because for the present data the sample variance is substantially smaller than would be expected, given the mean

I We also consider a noisy variant with results concerning the asymptotic behaviour of the MLE. Ajay Jasra Estimation of