Data Mining in Direct
Data Mining in Direct
Marketing with Purchasing
Marketing with Purchasing
Decisions Data
Decisions Data
.
.
Randy Collica Randy Collica Sr. Business Analyst Sr. Business AnalystDatabase Mgmt. & Business Analysis
Database Mgmt. & Business Analysis
Compaq Computer Corp.
!
! Business Problem to Solve.Business Problem to Solve.
!
! Data Layout and Brief Definitions.Data Layout and Brief Definitions.
!
! Data Preparation Methods.Data Preparation Methods.
!
! Purchasing Decisions Model Building.Purchasing Decisions Model Building.
"
" Three different models were built.Three different models were built.
"
" Final model uses data cleansing ensemble methods.Final model uses data cleansing ensemble methods.
"
" Comparison of data cleansing with typical ensembleComparison of data cleansing with typical ensemble
methods.
methods.
!
! Summary.Summary.
!
! References and Q & A.References and Q & A.
Overview
Business Problem to Solve.
Business Problem to Solve.
!! The problem of telemarketing or telesales callingThe problem of telemarketing or telesales calling a potential customer site only to find out that the
a potential customer site only to find out that the
business makes their purchasing decisions at the
business makes their purchasing decisions at the
parent or headquarters office only!
parent or headquarters office only!
!
! Time has been wasted calling the wrong site andTime has been wasted calling the wrong site and the caller either has to find out the parent site or
the caller either has to find out the parent site or
skip over to the next business to call.
skip over to the next business to call.
!
! If the telemarketer already knew what site to callIf the telemarketer already knew what site to call ahead of time, much time and dollars would be
ahead of time, much time and dollars would be
saved as well as increasing the opportunity rate
Business Problem to Solve.
Business Problem to Solve.
!! A similar issue arises for direct mail campaigns.A similar issue arises for direct mail campaigns. A direct mailer is sent to a site, however, that site
A direct mailer is sent to a site, however, that site
does not make the final call on IT purchasing
does not make the final call on IT purchasing
decisions.
decisions.
!
! Another example is when analyzing customer orAnother example is when analyzing customer or prospect data for segmentation analysis. The
prospect data for segmentation analysis. The
local vs. parent decision levels is a very important
local vs. parent decision levels is a very important
component in segmentation.
Data Assay and Brief Definitions.
Data Assay and Brief Definitions.
!
! The data set being modeled contained about 170,000The data set being modeled contained about 170,000
records.
records.
!
! Other demographic data was appended fromOther demographic data was appended from
syndicated sources e.g. Dun & Bradstreet™.
syndicated sources e.g. Dun & Bradstreet™.
!
! The data partition was set to 70% training and 30%The data partition was set to 70% training and 30%
for validation.
for validation.
!
! Definition of a business site in the original data setDefinition of a business site in the original data set
prior to demographic appending is slightly different
prior to demographic appending is slightly different
from the Dun & Bradstreet ™ definition of a business
from the Dun & Bradstreet ™ definition of a business
site.
Data Preparation Methods.
Data Preparation Methods.
!! The data preparation methods were done in fiveThe data preparation methods were done in five basic stages.
basic stages.
"
" First, data was extracted and needed to be crossFirst, data was extracted and needed to be cross
referenced in order to facilitate merging with a
referenced in order to facilitate merging with a
syndicated source; e.g. adding D&B site duns
syndicated source; e.g. adding D&B site duns
numbers.
numbers.
"
" Second, the D&B site duns numbers wereSecond, the D&B site duns numbers were
referenced and added to the data set and
referenced and added to the data set and
mismatches were noted. These steps were done
mismatches were noted. These steps were done
outside of Enterprise Miner.
"
" Third, the D&B site duns numbers were nowThird, the D&B site duns numbers were now
merged with the more complete D&B database.
merged with the more complete D&B database.
"
" Fourth, the data was placed read into EnterpriseFourth, the data was placed read into Enterprise
Miner.
Miner.
"
" Fifth, data was then surveyed.Fifth, data was then surveyed.
!
! The data survey is a rather important componentThe data survey is a rather important component in the data mining process [1].
in the data mining process [1].
"
" Using the data transformation node data on theUsing the data transformation node data on the
number if site employees and corporate employees
number if site employees and corporate employees
needed to be transformed.
needed to be transformed.
Data Preparation Methods.
Data Preparation Methods.
Data Preparation Methods.
Data Preparation Methods.
Data Preparation Methods.
!
! Using the Input Data Source node, the view ofUsing the Input Data Source node, the view of basic frequency distributions of fields is very
basic frequency distributions of fields is very
valuable.
valuable.
!
! The original data source has three levels in theThe original data source has three levels in the target or response field.
target or response field.
" " LocalLocal " " ParentParent " " MissingMissing !
! It was then thought best to try and keep theIt was then thought best to try and keep the missing in the target data. This was then
missing in the target data. This was then
modified to not include missing values.
modified to not include missing values.
Data Preparation Methods.
Purchasing Decisions Model Building.
Purchasing Decisions Model Building.
!
Purchasing Decisions Model Building.
Purchasing Decisions Model Building.
!! The first modeling attempt used a fairly standardThe first modeling attempt used a fairly standard Decision Tree and a single layer 4 neuron Neural
Decision Tree and a single layer 4 neuron Neural
Network. This produced fair results with the
Network. This produced fair results with the
Decision Tree but very poor results with the
Decision Tree but very poor results with the
Neural Net.
Neural Net.
!
! The second attempt used a similar Decision TreeThe second attempt used a similar Decision Tree and also a Decision Tree with an Ensemble node.
and also a Decision Tree with an Ensemble node.
This was used in the Bagging mode with 10
This was used in the Bagging mode with 10
iterations. See reference [2].
iterations. See reference [2].
!
! The third attempt uses a new technique calledThe third attempt uses a new technique called data cleansing or an Ensemble Filter[3]. These
Purchasing Decisions Model Building.
Purchasing Decisions Model Building.
!! An Ensemble model (Bagging) An Ensemble model (Bagging) vsvs. typical. typical classifier training.
classifier training.
A sample of a single classifier on a data set.A sample of a single classifier on a data set. Original Training Set
Original Training Set
Training-set 1: 1,2,3,4,5,6,7,8,…
Training-set 1: 1,2,3,4,5,6,7,8,…
A sample of Boosting on the same data set.
A sample of Boosting on the same data set.
Resampled Resampled Training Set Training Set Training-set 1: 2,7,8,3,7,6,3,1 Training-set 1: 2,7,8,3,7,6,3,1 Training-set 2: 1,4,5,4,1,5,6,4 Training-set 2: 1,4,5,4,1,5,6,4 Training-set 3: 7,1,5,8,1,8,1,4 Training-set 3: 7,1,5,8,1,8,1,4
Purchasing Decisions Model Building.
Purchasing Decisions Model Building.
!! Ensemble Filters combine the outputs of base-Ensemble Filters combine the outputs of
base-level classifiers and then take a vote to see which
level classifiers and then take a vote to see which
instances should be kept.
instances should be kept.
Training Training “ “CorrectlyCorrectly LearningLearning
InstancesInstances ==> ==> FilterFilter ==> ==> LabeledLabeled” ==>” ==>AlgorithmAlgorithm
InstancesInstances
!
! Instances which are not “correctly labeled” areInstances which are not “correctly labeled” are then discarded from model training.
Purchasing Decisions Model Building.
Purchasing Decisions Model Building.
!! How decisions on instances to be used in trainingHow decisions on instances to be used in training are determined.
are determined.
"
" Once different models are fit from x number ofOnce different models are fit from x number of
sampling methods, one now has predictions of x
sampling methods, one now has predictions of x
models.
models.
"
" Two methods: Majority vote Two methods: Majority vote vsvs. Consensus vote.. Consensus vote. "
" Majority Vote: Will tag an instance as mislabeled ifMajority Vote: Will tag an instance as mislabeled if
more than half of the x classifier models classify it
more than half of the x classifier models classify it
incorrectly.
incorrectly.
"
" Consensus Vote: Requires that all of the xConsensus Vote: Requires that all of the x
classifier models must fail to classify correctly in
classifier models must fail to classify correctly in
order for that instance to be eliminated from
Purchasing Decisions Model Building.
Purchasing Decisions Model Building.
Majority Vote:
Majority Vote:
Models from sample sets
Models from sample sets
X X X X O O O O O O X X X X X X O O O O
X X X X O O O O O O X X X X X X O O O O
This instance agrees This instance out
This instance agrees This instance out
Consesus
Consesus Vote:Vote:
X X X X X X O O X X X X X X X X
X X X X X X O O X X X X X X X X
This instance agrees This instance out
Purchasing Decisions Model Building.
Purchasing Decisions Model Building.
!
! Some preliminary results on Purchasing Decisions Model.Some preliminary results on Purchasing Decisions Model.
Purchasing Decisions Model Building.
Purchasing Decisions Model Building.
% Captured Response
Purchasing Decisions Model Building.
Purchasing Decisions Model Building.
Ensemble Model
Purchasing Decisions Model Building.
Purchasing Decisions Model Building.
Data
Filtering Method Results
Summary
Summary
!! A Purchasing Decisions Model can be useful inA Purchasing Decisions Model can be useful in Marketing and Campaign programs.
Marketing and Campaign programs.
!
! Both Customer and Prospect data analysis canBoth Customer and Prospect data analysis can be scored with this model and used in
be scored with this model and used in
subsequent analysis such as segmentation.
subsequent analysis such as segmentation.
!
! Data Filtering can be used as an EnsembleData Filtering can be used as an Ensemble method to discard instances which are
method to discard instances which are
misclassified
misclassified..
!
! More work on data filtering methods will beMore work on data filtering methods will be investigated for ‘noisy’ data analysis.
References
References
!! [1] [1] PylePyle, , DorianDorian, , Data Preparation for DataData Preparation for Data Mining,
Mining, Morgan Morgan KaufmanKaufman, San Francisco, 1999., San Francisco, 1999.
!
! [2][2] Brodley Brodley, Carla E. and , Carla E. and FriedlFriedl, Mark A.,, Mark A., “Identifying Mislabeled Training Data, “
“Identifying Mislabeled Training Data, “ JouJou. of AI. of AI Research,
Research, volvol. 11, 1999, . 11, 1999, pppp. 131-167.. 131-167.
!
! [3][3] Opitz Opitz, David, and , David, and MaclinMaclin, Richard, “Popular, Richard, “Popular Ensemble Methods: An Empirical Study,”
Ensemble Methods: An Empirical Study,” JouJou. of. of AI Research,
Acknowledgements
Acknowledgements
!! I would like to thank my cohort who worked on theI would like to thank my cohort who worked on the data cleansing method and implementation in
data cleansing method and implementation in
SAS Enterprise Miner ™.
SAS Enterprise Miner ™.
!
! I would also like to thank I would also like to thank MrMr. Scott Berg and. Scott Berg and
Victor Howard for encouraging me to submit this
Victor Howard for encouraging me to submit this
work.
work.
!
! MrMr. William . William Sommerfeld Sommerfeld and Janice and Janice Shineman Shineman forfor inviting me to present this work at SUGI25.
inviting me to present this work at SUGI25.
!
! Lastly, I would like to thank my supervisor, Lastly, I would like to thank my supervisor, MrMr.. Gary
Gary Alen Alen who always gives continued support inwho always gives continued support in