• No results found

Data Mining in Direct Marketing with Purchasing Decisions Data.

N/A
N/A
Protected

Academic year: 2021

Share "Data Mining in Direct Marketing with Purchasing Decisions Data."

Copied!
24
0
0

Loading.... (view fulltext now)

Full text

(1)

Data Mining in Direct

Data Mining in Direct

Marketing with Purchasing

Marketing with Purchasing

Decisions Data

Decisions Data

.

.

Randy Collica Randy Collica Sr. Business Analyst Sr. Business Analyst

Database Mgmt. & Business Analysis

Database Mgmt. & Business Analysis

Compaq Computer Corp.

(2)

!

! Business Problem to Solve.Business Problem to Solve.

!

! Data Layout and Brief Definitions.Data Layout and Brief Definitions.

!

! Data Preparation Methods.Data Preparation Methods.

!

! Purchasing Decisions Model Building.Purchasing Decisions Model Building.

"

" Three different models were built.Three different models were built.

"

" Final model uses data cleansing ensemble methods.Final model uses data cleansing ensemble methods.

"

" Comparison of data cleansing with typical ensembleComparison of data cleansing with typical ensemble

methods.

methods.

!

! Summary.Summary.

!

! References and Q & A.References and Q & A.

Overview

(3)

Business Problem to Solve.

Business Problem to Solve.

!

! The problem of telemarketing or telesales callingThe problem of telemarketing or telesales calling a potential customer site only to find out that the

a potential customer site only to find out that the

business makes their purchasing decisions at the

business makes their purchasing decisions at the

parent or headquarters office only!

parent or headquarters office only!

!

! Time has been wasted calling the wrong site andTime has been wasted calling the wrong site and the caller either has to find out the parent site or

the caller either has to find out the parent site or

skip over to the next business to call.

skip over to the next business to call.

!

! If the telemarketer already knew what site to callIf the telemarketer already knew what site to call ahead of time, much time and dollars would be

ahead of time, much time and dollars would be

saved as well as increasing the opportunity rate

(4)

Business Problem to Solve.

Business Problem to Solve.

!

! A similar issue arises for direct mail campaigns.A similar issue arises for direct mail campaigns. A direct mailer is sent to a site, however, that site

A direct mailer is sent to a site, however, that site

does not make the final call on IT purchasing

does not make the final call on IT purchasing

decisions.

decisions.

!

! Another example is when analyzing customer orAnother example is when analyzing customer or prospect data for segmentation analysis. The

prospect data for segmentation analysis. The

local vs. parent decision levels is a very important

local vs. parent decision levels is a very important

component in segmentation.

(5)

Data Assay and Brief Definitions.

Data Assay and Brief Definitions.

!

! The data set being modeled contained about 170,000The data set being modeled contained about 170,000

records.

records.

!

! Other demographic data was appended fromOther demographic data was appended from

syndicated sources e.g. Dun & Bradstreet™.

syndicated sources e.g. Dun & Bradstreet™.

!

! The data partition was set to 70% training and 30%The data partition was set to 70% training and 30%

for validation.

for validation.

!

! Definition of a business site in the original data setDefinition of a business site in the original data set

prior to demographic appending is slightly different

prior to demographic appending is slightly different

from the Dun & Bradstreet ™ definition of a business

from the Dun & Bradstreet ™ definition of a business

site.

(6)

Data Preparation Methods.

Data Preparation Methods.

!

! The data preparation methods were done in fiveThe data preparation methods were done in five basic stages.

basic stages.

"

" First, data was extracted and needed to be crossFirst, data was extracted and needed to be cross

referenced in order to facilitate merging with a

referenced in order to facilitate merging with a

syndicated source; e.g. adding D&B site duns

syndicated source; e.g. adding D&B site duns

numbers.

numbers.

"

" Second, the D&B site duns numbers wereSecond, the D&B site duns numbers were

referenced and added to the data set and

referenced and added to the data set and

mismatches were noted. These steps were done

mismatches were noted. These steps were done

outside of Enterprise Miner.

(7)

"

" Third, the D&B site duns numbers were nowThird, the D&B site duns numbers were now

merged with the more complete D&B database.

merged with the more complete D&B database.

"

" Fourth, the data was placed read into EnterpriseFourth, the data was placed read into Enterprise

Miner.

Miner.

"

" Fifth, data was then surveyed.Fifth, data was then surveyed.

!

! The data survey is a rather important componentThe data survey is a rather important component in the data mining process [1].

in the data mining process [1].

"

" Using the data transformation node data on theUsing the data transformation node data on the

number if site employees and corporate employees

number if site employees and corporate employees

needed to be transformed.

needed to be transformed.

Data Preparation Methods.

(8)

Data Preparation Methods.

Data Preparation Methods.

(9)

Data Preparation Methods.

Data Preparation Methods.

(10)

!

! Using the Input Data Source node, the view ofUsing the Input Data Source node, the view of basic frequency distributions of fields is very

basic frequency distributions of fields is very

valuable.

valuable.

!

! The original data source has three levels in theThe original data source has three levels in the target or response field.

target or response field.

" " LocalLocal " " ParentParent " " MissingMissing !

! It was then thought best to try and keep theIt was then thought best to try and keep the missing in the target data. This was then

missing in the target data. This was then

modified to not include missing values.

modified to not include missing values.

Data Preparation Methods.

(11)

Purchasing Decisions Model Building.

Purchasing Decisions Model Building.

!

(12)

Purchasing Decisions Model Building.

Purchasing Decisions Model Building.

!

! The first modeling attempt used a fairly standardThe first modeling attempt used a fairly standard Decision Tree and a single layer 4 neuron Neural

Decision Tree and a single layer 4 neuron Neural

Network. This produced fair results with the

Network. This produced fair results with the

Decision Tree but very poor results with the

Decision Tree but very poor results with the

Neural Net.

Neural Net.

!

! The second attempt used a similar Decision TreeThe second attempt used a similar Decision Tree and also a Decision Tree with an Ensemble node.

and also a Decision Tree with an Ensemble node.

This was used in the Bagging mode with 10

This was used in the Bagging mode with 10

iterations. See reference [2].

iterations. See reference [2].

!

! The third attempt uses a new technique calledThe third attempt uses a new technique called data cleansing or an Ensemble Filter[3]. These

(13)

Purchasing Decisions Model Building.

Purchasing Decisions Model Building.

!

! An Ensemble model (Bagging) An Ensemble model (Bagging) vsvs. typical. typical classifier training.

classifier training.

A sample of a single classifier on a data set.A sample of a single classifier on a data set. Original Training Set

Original Training Set

Training-set 1: 1,2,3,4,5,6,7,8,…

Training-set 1: 1,2,3,4,5,6,7,8,…

A sample of Boosting on the same data set.

A sample of Boosting on the same data set.

Resampled Resampled Training Set Training Set Training-set 1: 2,7,8,3,7,6,3,1 Training-set 1: 2,7,8,3,7,6,3,1 Training-set 2: 1,4,5,4,1,5,6,4 Training-set 2: 1,4,5,4,1,5,6,4 Training-set 3: 7,1,5,8,1,8,1,4 Training-set 3: 7,1,5,8,1,8,1,4

(14)

Purchasing Decisions Model Building.

Purchasing Decisions Model Building.

!

! Ensemble Filters combine the outputs of base-Ensemble Filters combine the outputs of

base-level classifiers and then take a vote to see which

level classifiers and then take a vote to see which

instances should be kept.

instances should be kept.

Training Training “ “CorrectlyCorrectly LearningLearning

InstancesInstances ==> ==> FilterFilter ==> ==> LabeledLabeled” ==>” ==>AlgorithmAlgorithm

InstancesInstances

!

! Instances which are not “correctly labeled” areInstances which are not “correctly labeled” are then discarded from model training.

(15)

Purchasing Decisions Model Building.

Purchasing Decisions Model Building.

!

! How decisions on instances to be used in trainingHow decisions on instances to be used in training are determined.

are determined.

"

" Once different models are fit from x number ofOnce different models are fit from x number of

sampling methods, one now has predictions of x

sampling methods, one now has predictions of x

models.

models.

"

" Two methods: Majority vote Two methods: Majority vote vsvs. Consensus vote.. Consensus vote. "

" Majority Vote: Will tag an instance as mislabeled ifMajority Vote: Will tag an instance as mislabeled if

more than half of the x classifier models classify it

more than half of the x classifier models classify it

incorrectly.

incorrectly.

"

" Consensus Vote: Requires that all of the xConsensus Vote: Requires that all of the x

classifier models must fail to classify correctly in

classifier models must fail to classify correctly in

order for that instance to be eliminated from

(16)

Purchasing Decisions Model Building.

Purchasing Decisions Model Building.

Majority Vote:

Majority Vote:

Models from sample sets

Models from sample sets

X X X X O O O O O O X X X X X X O O O O

X X X X O O O O O O X X X X X X O O O O

This instance agrees This instance out

This instance agrees This instance out

Consesus

Consesus Vote:Vote:

X X X X X X O O X X X X X X X X

X X X X X X O O X X X X X X X X

This instance agrees This instance out

(17)

Purchasing Decisions Model Building.

Purchasing Decisions Model Building.

!

! Some preliminary results on Purchasing Decisions Model.Some preliminary results on Purchasing Decisions Model.

(18)

Purchasing Decisions Model Building.

Purchasing Decisions Model Building.

% Captured Response

(19)

Purchasing Decisions Model Building.

Purchasing Decisions Model Building.

Ensemble Model

(20)

Purchasing Decisions Model Building.

Purchasing Decisions Model Building.

Data

Filtering Method Results

(21)

Summary

Summary

!

! A Purchasing Decisions Model can be useful inA Purchasing Decisions Model can be useful in Marketing and Campaign programs.

Marketing and Campaign programs.

!

! Both Customer and Prospect data analysis canBoth Customer and Prospect data analysis can be scored with this model and used in

be scored with this model and used in

subsequent analysis such as segmentation.

subsequent analysis such as segmentation.

!

! Data Filtering can be used as an EnsembleData Filtering can be used as an Ensemble method to discard instances which are

method to discard instances which are

misclassified

misclassified..

!

! More work on data filtering methods will beMore work on data filtering methods will be investigated for ‘noisy’ data analysis.

(22)

References

References

!

! [1] [1] PylePyle, , DorianDorian, , Data Preparation for DataData Preparation for Data Mining,

Mining, Morgan Morgan KaufmanKaufman, San Francisco, 1999., San Francisco, 1999.

!

! [2][2] Brodley Brodley, Carla E. and , Carla E. and FriedlFriedl, Mark A.,, Mark A., “Identifying Mislabeled Training Data, “

“Identifying Mislabeled Training Data, “ JouJou. of AI. of AI Research,

Research, volvol. 11, 1999, . 11, 1999, pppp. 131-167.. 131-167.

!

! [3][3] Opitz Opitz, David, and , David, and MaclinMaclin, Richard, “Popular, Richard, “Popular Ensemble Methods: An Empirical Study,”

Ensemble Methods: An Empirical Study,” JouJou. of. of AI Research,

(23)

Acknowledgements

Acknowledgements

!

! I would like to thank my cohort who worked on theI would like to thank my cohort who worked on the data cleansing method and implementation in

data cleansing method and implementation in

SAS Enterprise Miner ™.

SAS Enterprise Miner ™.

!

! I would also like to thank I would also like to thank MrMr. Scott Berg and. Scott Berg and

Victor Howard for encouraging me to submit this

Victor Howard for encouraging me to submit this

work.

work.

!

! MrMr. William . William Sommerfeld Sommerfeld and Janice and Janice Shineman Shineman forfor inviting me to present this work at SUGI25.

inviting me to present this work at SUGI25.

!

! Lastly, I would like to thank my supervisor, Lastly, I would like to thank my supervisor, MrMr.. Gary

Gary Alen Alen who always gives continued support inwho always gives continued support in

(24)

References

Related documents

A hybrid statistical model representing both the pose and shape variation of the carpal bones is built, based on a number of 3D CT data sets obtained from different subjects

Immunoprecipi- tation and Western blot for FGFR3 proteins confirmed the presence of both FGFR3 proteins in the cell lysate, suggesting that this decrease in phosphorylation did

In examining the ways in which nurses access information as a response to these uncertainties (Thompson et al. 2001a) and their perceptions of the information’s usefulness in

As a formal method it allows the user to test their applications reliably based on the SXM method of testing, whilst using a notation which is closer to a programming language.

For the cells sharing a given channel, the antenna pointing angles are first calculated and the azimuth and elevation angles subtended by each cell may be used to derive

The construct validity was assessed using Pearson ’ s correlation coefficient or Spearman ’ s correlation to test for correlations among the Chinese version WOMET and the eight

T h e second approximation is the narrowest; this is because for the present data the sample variance is substantially smaller than would be expected, given the mean

• In the third scenario, the emission from nanospheres excited by XUV laser pulses will be studied, where the central aim is to identify the specific impacts of