Survey Analysis: Data Mining versus
Standard Statistical Analysis for Better
Analysis of Survey Responses
By Dean Abbott
Abbott Analytics
http://www.abbottanalytics.com
Salford Systems Data Mining 2006
March 27-31 2006
San Diego, CA
Acknowledgements
Work done under contract with Seer Analytics
Work done under contract with Seer Analytics
Subcontractors:
Subcontractors:
Tessar
Tessar
and Associates (now Mobile
and Associates (now Mobile
Foundry), Abbott Consulting (now Abbott Analytics)
Foundry), Abbott Consulting (now Abbott Analytics)
Seer Analytics, LLC
Seer Analytics, LLC
518 North Tampa Street
518 North Tampa Street
Tampa, FL 33602
Tampa, FL 33602
we help you see what's there.
SEE
R
http://
About Abbott Analytics
Abbott Analytics
Abbott Analytics
Founded in 1999, based in San Diego, CA
Founded in 1999, based in San Diego, CA
Dedicated to data mining consulting and training
Dedicated to data mining consulting and training
Principal: Dean Abbott
Principal: Dean Abbott
Applied Data Mining for 19+ years in
Applied Data Mining for 19+ years in
Direct Marketing, CRM, Survey Analysis, Tax Compliance, Fraud
Direct Marketing, CRM, Survey Analysis, Tax Compliance, Fraud
Detection, Predictive Toxicology, Biological Risk Assessment
Detection, Predictive Toxicology, Biological Risk Assessment
Course Instruction
Course Instruction
Public 2
Public 2-
-day Data Mining Courses
day Data Mining Courses
Conference Tutorials
Conference Tutorials
Customized Training and Knowledge Transfer
Customized Training and Knowledge Transfer
Data mining methodology (CRISP
Data mining methodology (CRISP
-
-
DM)
DM)
Training services for software products, including CART,
Training services for software products, including CART,
Clementine,
Talk Outline
Member survey
Member survey
Survey description
Survey description
Results using statistical modeling
Results using statistical modeling
Lessons learned
Lessons learned
Employee survey
Employee survey
Survey description
Survey description
Results using decision trees (CART)
Results using decision trees (CART)
Lessons learned
Problem Setup:
Member Survey
Question:
Question:
What are the characteristics of members who indicated the
What are the characteristics of members who indicated the
highest overall satisfaction with their Club?
highest overall satisfaction with their Club?
Data:
Data:
32,811 records containing survey answers
32,811 records containing survey answers
No demographic data except what was on survey (marital
No demographic data except what was on survey (marital
status, children, age, gender)
status, children, age, gender)
Approach:
Approach:
Create supervised learning models with target variable
Create supervised learning models with target variable
“
Data Preparation
Begin with 57 candidate inputs to model
Begin with 57 candidate inputs to model
All survey questions are multiple choice
All survey questions are multiple choice
Treated as categories, not numbers
Treated as categories, not numbers
Typically 6 categories per question (1
Typically 6 categories per question (1
-
-
5)
5)
Unknown initially coded as
Unknown initially coded as
“
“
0
0
”
”
No text comments fields included as inputs to model
No text comments fields included as inputs to model
Create new column for target variable
Create new column for target variable
If overall_satisfaction = 1, variable value = 1,
If overall_satisfaction = 1, variable value = 1,
otherwise, variable value = 0
otherwise, variable value = 0
Data very clean with respect to missing data
Data very clean with respect to missing data
Only needed to record # children fields
Only needed to record # children fields
Number missing
Number missing
11,006 children < 6; 10,701 children 6
11,006 children < 6; 10,701 children 6
-
-
12; 10,873 children 13
12; 10,873 children 13
-
-
17; 4,936 children
17; 4,936 children
(overall)
Member Survey Question
Categories
Sampling
Begin with 32,811 responses
Begin with 32,811 responses
Set aside about half for validation (not used during
Set aside about half for validation (not used during
modeling): 16,379 records
modeling): 16,379 records
These records will be used to provide final summaries of the
These records will be used to provide final summaries of the
segments
segments
16,433 records used in creating and scoring model
16,433 records used in creating and scoring model
5,059 had overall satisfaction = 1 (30.8%)
5,059 had overall satisfaction = 1 (30.8%)
Model 1 splits data into training and testing data: 2/3 for
Model 1 splits data into training and testing data: 2/3 for
training (creating model), 1/3 for testing (scoring and ranking
training (creating model), 1/3 for testing (scoring and ranking
models)
Relationship of Overall Satisfaction
to Recommend to Friends
0 1 2 3 4 OVERALL.RA 0 1 2 3 4 5 RE CO M M E ND.Overall satisfaction
Recommend to Friend
•Of the 4912 / 16739 (30.2%) with
Overall Satisfaction = 1
•86% have Recommend to friends = 1
•Of the 8708 / 16739 (54%) with
Recommend to Friends = 1
•49% have Overall Satis. = 1 • 4227 / 16739 (26.0%) have both
overall satisfaction and recommend to friends both equal to 1
•This is the biggest bin of the cross tab, followed by
•Overall = 2 / recommend = 2 (24%; 3890 / 16739)
•Overall = 2 / recommend = 1 (22%; 3565 / 16739)
•No other bin greater than 5% of records
Objective and Data
Challenges
Project Objective
Project Objective
Interpret results of survey for large health club
Interpret results of survey for large health club
(not a predictive model)
(not a predictive model)
Challenges
Challenges
Missing data (some questions either N/A or blank)
Missing data (some questions either N/A or blank)
Solution: Impute values that least effect information communicat
Solution: Impute values that least effect information communicat
ed by
ed by
question (not a mean or median!)
question (not a mean or median!)
Answers (target variables) highly correlated with one another
Answers (target variables) highly correlated with one another
Multi
Multi
-
-
collinearity and interpretation of results problematic
collinearity and interpretation of results problematic
Must reduce dimensionality without losing interpretation of resu
Must reduce dimensionality without losing interpretation of resu
lts
lts
Solution: Factor analysis
Solution: Factor analysis
Target variable
Target variable
Three questions pointed to the important actionable information
Three questions pointed to the important actionable information
(related to
(related to
how satisfied members were)
Data Preprocessing Approach
Reduce input data (for understanding)
Reduce input data (for understanding)
Use factor analysis to identify groupings of variables that are
Use factor analysis to identify groupings of variables that are
interesting.
interesting.
Factors can be candidate inputs to models, but didn
Factors can be candidate inputs to models, but didn’
’t work as well on
t work as well on
this data
this data
Selected as inputs, those variables with highest loadings as
Selected as inputs, those variables with highest loadings as
representative of that type of factor
representative of that type of factor
Also retained key questions in addition to the factor analysis
Also retained key questions in addition to the factor analysis
representative questions
representative questions
The effect is to remove questions
The effect is to remove questions
“
“
too highly
too highly
”
”
correlated
correlated
with one another, while maintaining relevant information for
with one another, while maintaining relevant information for
modeling.
Predictive Modeling Approach
Identify Key
Questions
Identify Key
Questions
Factor Analysis:
10 factors
Factor Analysis:
10 factors
Regression Model:
Find Significant
Variables
Regression Model:
Find Significant
Variables
Regression Model:
Find Significant
Variables
Regression Model:
Find Significant
Variables
3 questions with
high association
with target
10 factors, or
variables that
loaded
highest on
each factor
13 fields
down to 7
Variable
ranks
60+ Survey Questions
60+ Survey Questions
3 key questions
loadings 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 Fact or1 Fact or2 Fac tor3 Fac tor4 Fact or5 Fact or6 Fact or7 Fact or8 Fact or9 Fact or10 Factor Loa di ng loadings Factor 1 0.00 0.20 0.40 0.60 0.80 1.00 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12
Top Question Loadings
Lo a d in g V a lu e Factor 2 0.00 0.20 0.40 0.60 0.80 Q12 Q13 Q14 Q15 Q16 Q17 Q18 Q19 Q20 Q23
Top Question Loadings
Lo a d in g V a lu e s
Factor Analysis:
Member Survey Factor
Analysis Loadings
Reduce Variables using
Regression
Already beginning with
Already beginning with
only 13 variables
only 13 variables
Question: how many of
Question: how many of
these are useful
these are useful
predictors?
predictors?
Decided to retain 5
Decided to retain 5
factors for final model
factors for final model
Regression Rankings of Questions/Factors
0 0.1 0.2 0.3 0.4 0.5 0.6 Q44 Q22 Q25 factor 3.2 facto r3.9 factor 3.1 fact or3. 4 facto r3.3 facto r3.8 facto r3.1 0 facto r3.6 factor 3.5 fact or3. 7 Question/Factor R e g res si o n C o effi c ien t
Explaining Results Through
Visualization
Customer
Customer
was
was
not
not
interested in
interested in
“
“
techno
techno
”
”
solutions
solutions
Customer
Customer
was
was
interested in what actions could be taken
interested in what actions could be taken
as a result of the data mining models
as a result of the data mining models
Which characteristics are most correlated with best
Which characteristics are most correlated with best
customers?
customers?
What do they like and dislike about the club?
What do they like and dislike about the club?
Is it equipment? relationships? facility? staff?
Is it equipment? relationships? facility? staff?
Show key contributors, how each club compared with other
Show key contributors, how each club compared with other
club locations, and if club is improving
Key: Explaining Results
Visualization shows
Visualization shows
key variables in survey
key variables in survey
associated with
associated with
“
“
excellence
excellence
”
”
, and
, and
performance metrics
performance metrics
for each club
for each club
How well did this
How well did this
club do?
club do?
What is the change
What is the change
over last year
over last year
’
’
s
s
result?
result?
Shows which attributes
Shows which attributes
does the club need to
does the club need to
improve to improve
improve to improve
customer satisfaction.
customer satisfaction.
relationships facility equipment Staff 2 Staff 1 goals value Drivers of SatisfactionSo What’s The Problem with
That?
Regression, Neural Networks are
Regression, Neural Networks are
“
“
global
global
”
”
estimators
estimators
The operate over the entire data space
The operate over the entire data space
Descriptors of Regression represent
Descriptors of Regression represent
average
average
influence
influence
Neither technique provides explicit localized characteristics
Neither technique provides explicit localized characteristics
Customer would like actionable analytics
Customer would like actionable analytics
Clear characteristics of subgroups
Clear characteristics of subgroups
Different strategies for subgroups
Different strategies for subgroups
Conclusion: In Round 2 (Employee Survey), use
Conclusion: In Round 2 (Employee Survey), use
another approach
Employee Survey Analysis
Problem Setup
Very similar to member survey
Very similar to member survey
60+ questions
60+ questions
Few demographics
Few demographics
Attitudes the job
Attitudes the job
How to handle questions
How to handle questions
They are ordinal, but CART
They are ordinal, but CART
®
®
supports interval and nominal
supports interval and nominal
types
types
Treat as categorical, but make sure values aren
Treat as categorical, but make sure values aren
’
’
t split up
t split up
If see a split on a question having values 1, 2, 4
If see a split on a question having values 1, 2, 4—
—rebuild as interval
rebuild as interval
variable
variable
Didn
Employee Survey Question
Groupings
Employee Survey:
Target Variable Definition
Predict key attitudes that are consequents
Predict key attitudes that are consequents
Satisfaction
Satisfaction
Recommend to a Friend
Recommend to a Friend
Intend to Work Next Year at Club
Intend to Work Next Year at Club
Club is Good Place to Work
Club is Good Place to Work
Exclude these from each others
Exclude these from each others
’
’
models
models
They are highly correlated with each other
They are highly correlated with each other
Models that predict a target variable with these as inputs are n
Models that predict a target variable with these as inputs are n
ot actionable
ot actionable
Key Predictors, questions relating to:
Key Predictors, questions relating to:
Communications with management
Communications with management
Quality of supervisors
Quality of supervisors
Training received
Training received
Effectiveness of club
Effectiveness of club
Fairness of policies
Fairness of policies
Perceived member attitudes
Employee Satisfaction (=1) Model:
Data Information
File: modeling data with binarized dependents w missing.txt Target Variable: Q1_1 Predictor Variables: Q66, Q67, Q68, Q69, Q3, Q4, Q5, Q6, Q7, Q8, Q9, Q10, Q11, Q12, Q13, Q14, Q15, Q16, Q17, Q18, Q20, Q21, Q22, Q23, Q24, Q25, Q26, Q27, Q28, Q29, Q30, Q31, Q32, Q33, Q34, Q35, Q36, Q37, Q38, Q45, Q46, Q47, Q48, Q49, Q50, Q51, Q52, Q53, Q54, Q55, Q56, Q57, Q58, Q59, Q60, Q61, Q62, Q63, Q64, Q65
Class
N Cases Pct Cases
0
4,645
76.0%
1
1,470
24.0%
Employee Satisfaction Model:
Performance
Node Cases Target Class % of Node Tgt. Class % Target Class Cum % Tgt. Class Cum % Pop % Pop Cases inNode Cum lift Lift 8 859 60.75 58.44 58.44 23.12 23.12 1,414 2.53 2.53 4 95 43.58 6.46 64.90 26.69 3.57 218 2.43 1.81 7 201 42.23 13.67 78.57 34.47 7.78 476 2.28 1.76 3 30 17.44 2.04 80.61 37.29 2.81 172 2.16 0.73 5 92 14.38 6.26 86.87 47.75 10.47 640 1.82 0.60 6 14 13.86 0.95 87.82 49.40 1.65 101 1.78 0.58 2 124 10.12 8.44 96.26 69.44 20.03 1,225 1.39 0.42 1 55 2.94 3.74 100.00 100.00 30.56 1,869 1.00 0.12
Class N Cases N Misclassified Pct. Class
0
4,645
953
20.52
Employee Satisfaction Model:
Splits
•
Q8: Feel Welcome
– Surrogate: Q27 (family friendly),
Q28 (inclusive environment), Q18
(good working conditions)
– Q18: Good working conditions
– Surrogate: Q17 (necessary
support/materials to do job)
•
Q3: Feeling of accomplishment
– Surrogates: Q6 (responsibilities
good fit with interests/skills)
–
Q7: Staff Competent
– Surrogates: Q15 (supervisor lets
know work is appreciated), Q33
(trust management to take interests
into account), Q5 (good
1
2
3
8
Q36 Q3 Q7 Q32 Q3 Q18 Q84
5
6 7
Employee Satisfaction:
Q8 Split (root node)
Competitor Split Improvement
winner Q8 1 0.1174 1 Q18 1 0.1169 2 Q3 1 0.0998 3 Q35 1 0.0957 4 Q6 1 0.0951 5 Q7 1,2 0.094
Employee Satisfaction:
Q18 Split (right side or root)
This is the best terminal
node for satisfaction
Strongly agree feel welcome
Competitor Split Improvement
Winner
Q18
1
0.0271
1
Q3
1
0.0203
2
Q35
1
0.0195
3
Q6
1
0.0177
4
Q14
1,5
0.0172
5
Q13
1,5
0.0167
Employee Satisfaction Model:
Key Variables
Primary splitters only
Variable Score Q18 100 Q8 81.02 Q14 72.03 Q27 55.11 Q26 50.53 Q28 50.12 Q5 17.66 Q3 14.14 Q17 14.05 Q11 13.15 Q7 11.89 Q13 11.56 Q6 11.27 Q33 11.03 Q16 9.6
Variable Score
Q8
100
Q18
23.11
Q3
17.46
Q7
14.68
Q36
2.88
Q32
2.68
•
Q8: Feel Welcome
–
Surrogate: Q27 (family friendly),
Q28 (inclusive environment), Q18
(good working conditions)
–
Q18: Good working conditions
–
Surrogate: Q17 (necessary
support/materials to do job)
•
Q3: Feeling of accomplishment
–
Surrogates: Q6 (responsibilities
good fit with interests/skills)
–
Q7: Staff Competent
–
Surrogates: Q15 (supervisor lets
know work is appreciated), Q33
(trust management to take
interests into account), Q5 (good
opportunities for professional
growth)
Member Satisfaction Model: Key Rules
/*Rules for terminal node 8*/ Matches
• 1,414 surveys (23.1%), • 859 highly satisfied (60.8%), • 58.4% of all highly satisfied RULE:
If ( Q18 = 1 and Q8 = 1) Then Highly Satisfied P(0) = 0.39;
P(1) = 0.61; Lift 2.5
If strongly agree that there are good working conditions and strongly agree that member
/*Rules for terminal node 7 */ Matches
• 476 surveys (7.8%),
• 201 highly satisfied (42.2%), • 13.7% of all highly satisfied RULE:
If ( Q8 = 1 and Q18 <> 1 and Q3 == 1 and Q32 == 1 or 2)
Then Highly Satisfied P(0) = 0.58;
P(1) = 0.42; Lift 1.8
If strongly agree that feel welcome and strongly agree working at the club gives feeling of personal accomplishment, and agree management will take
/*Rules for terminal node 4 */ Matches
• 218 surveys (3.6%),
• 95 highly satisfied (43.6%), • 6.5% of all highly satisfied RULE:
If ( Q8 <> 1 and Q7 = 1 or 2 and Q3 == 1 and Q36 == 1 or 2) Then Highly Satisfied
P(0) = 0.56;
P(1) = 0.44; Lift 1.8
If agree that I’ll be recognized for doing a good job, and strongly agree working at the club gives feeling of personal accomplishment, and agree
Member Satisfaction Model:
Unsatisfied Rules
/*Rules for terminal node 1*/ Matches
• 1,869 surveys (30.6%), • 55 highly satisfied (2.9%), • 3.7% of highly satisfied
• 39.0% of all not highly satisfied RULE:
If ( Q8 <> 1 and Q7 <> 1 or 2) Then not highly satisfied P(0) = 0.96;
P(1) = 0.04; Lift 0.12
If don’t strongly agree that feel welcome and don’t agree that will be properly recognized for a good job, then not highly satisfied.
/*Rules for terminal node 5*/ Matches
• 640 surveys (10.5%), • 92 highly satisfied (14.4%), • 6.3% of all highly satisfied • 11.8% of all not highly satisfied RULE:
If ( Q8 = 1 and Q18 <> 1 and Q3 <> 1) Then not highly satisfied
P(0) = 0.86;
P(1) = 0.14; Lift 0.58
If don’t strongly agree that there are good working conditions and don’t strongly agree that feel welcome and work doesn’t give a feeling of accomplishment, even though strongly agree that feel welcome, then not highly satisfied.
/*Rules for terminal node 2 */ Matches
• 1,225 surveys (20.0%), • 124 highly satisfied (10.1%), • 8.4% of highly satisfied
• 23.7% of all not highly satisfied RULE:
If ( Q8 <> 1 and Q7 = 1 or 2 and Q3 <> 1)
Then not highly satisfied P(0) = 0.90;
P(1) = 0.10; Lift 0.42
If don’t strongly agree that feel welcome and work doesn’t give a feeling of
accomplishment, even though I agree that I will be properly recognized for a good job, then not highly satisfied.
Recommend to Friend (=1)
Model: Data Information
File: modeling data with binarized dependents w missing.txt
Target Variable: Q44_1 Predictor Variables: Q66, Q67, Q68, Q69, Q3, Q4, Q5, Q6, Q7, Q8, Q9, Q10, Q11, Q12, Q13, Q14, Q15, Q16, Q17, Q18, Q19, Q20, Q21, Q22, Q23, Q24, Q25, Q26, Q27, Q28, Q29, Q30, Q31, Q32, Q33, Q34, Q35, Q36, Q37, Q38, Q45, Q46, Q47, Q48, Q49, Q50, Q51, Q52, Q53, Q54, Q55, Q56, Q57, Q58, Q59, Q60, Q61, Q62, Q63, Q64, Q65
Class N Cases
Pct
0
3,958 64.7%
1
2,157 35.3%
31 © Abbott Analytics, 2000-2006
Recommend to Friend Model
Performance
Class N Cases N Misclassified Pct. Class
0
3,958
894
22.59
1
2,157
525
24.34
Node Cases Target Class % of NodeTgt. Class % Target Class
Cum % Tgt. Class
Cum %
Pop % Pop
Cases in
Node Cum lift Lift 10 1,113 71.90 51.60 51.60 25.32 25.32 1,548 2.04 2.04 9 110 58.51 5.10 56.70 28.39 3.07 188 2.00 1.66 5 198 56.57 9.18 65.88 34.11 5.72 350 1.93 1.60 4 128 49.81 5.93 71.81 38.32 4.20 257 1.87 1.41 8 83 45.36 3.85 75.66 41.31 2.99 183 1.83 1.29 3 215 29.49 9.97 85.63 53.23 11.92 729 1.61 0.84 7 36 24.83 1.67 87.30 55.60 2.37 145 1.57 0.70 2 132 15.60 6.12 93.42 69.44 13.84 846 1.35 0.44 6 12 14.12 0.56 93.97 70.83 1.39 85 1.33 0.40
Recommend to Friend Model
Splits
Q19: Treated with respect
Q19: Treated with respect
Surrogates: Q18 (good working conditions) and Q8
Surrogates: Q18 (good working conditions) and Q8
(feel welcome)
(feel welcome)
Q37: Compensation practice is fair
Q37: Compensation practice is fair
Surrogates: Q36 (I am paid fairly)
Surrogates: Q36 (I am paid fairly)
Q45: How think members rate club
Q45: How think members rate club
Surrogates: Q47, Q46, Q60 (member
Surrogates: Q47, Q46, Q60 (member
-
-
cleanliness,
cleanliness,
enough equip., check on progress)
enough equip., check on progress)
Q33: Trust management to take interests into account
Q33: Trust management to take interests into account
Surrogates: Q32 (management keeps promises), Q34
Surrogates: Q32 (management keeps promises), Q34
(leaders remove roadblocks to inclusion)
(leaders remove roadblocks to inclusion)
Q5: Good opportunities for professional growth
Q5: Good opportunities for professional growth
Surrogates: Q4 (responsibilities good fit with interests),
Surrogates: Q4 (responsibilities good fit with interests),
Q7 (appropriately recognized)
Q7 (appropriately recognized)
1
2
5
9
6
Q8 Q5 Q45 Q33 Q50 Q35 Q45 Q37 Q1910
33 © Abbott Analytics, 2000-2006
Recommend to Friend Model
Key Variables
Primary splitters only
Variable Score Q8 100.0 Q19 99.1 Q18 97.4 Q15 64.5 Q16 63.1 Q14 61.3 Q33 39.6 Q35 33.8 Q32 24.7 Q34 23.9 Q31 23.9 Q9 21.5 Q7 15.4 Q45 14.8 Q37 12.9 Q5 10.0 Q36 9.7 Q4 4.3 Q38 4.0 Q22 1.6 Q50 1.4 Q26 1.0 Q48 0.8 Q47 0.7 Q28 0.6 Q46 0.6 Q11 0.3 Q51 0.3 Q60 0.1
Variable
Score
Q19
100
Q33
32.23
Q45
14.94
Q37
12.99
Q5
8.98
Q8
3.03
Q35
1.67
Q50
1.34
Q19: Treated with respect
Q19: Treated with respect
Surrogates: Q18 (good working conditions) and
Surrogates: Q18 (good working conditions) and
Q8 (feel welcome)
Q8 (feel welcome)
Q37: Compensation practice is fair
Q37: Compensation practice is fair
Surrogates: Q36 (I am paid fairly)
Surrogates: Q36 (I am paid fairly)
Q45: How think members rate club
Q45: How think members rate club
Surrogates: Q47, Q46, Q60 (member
Surrogates: Q47, Q46, Q60 (member--cleanliness, cleanliness,
enough equip., check on progress)
enough equip., check on progress)
Q33: Trust management to take interests into
Q33: Trust management to take interests into
account
account
Surrogates: Q32 (management keeps promises),
Surrogates: Q32 (management keeps promises),
Q34 (leaders remove roadblocks to inclusion)
Q34 (leaders remove roadblocks to inclusion)
Q5: Good opportunities for professional growth
Q5: Good opportunities for professional growth
Surrogates: Q4 (responsibilities good fit with
Surrogates: Q4 (responsibilities good fit with
interests), Q7 (appropriately recognized)
interests), Q7 (appropriately recognized)
Q8: Feel welcome
Q8: Feel welcome
Surrogates: Q7
Recommend to Friend Model:
Key Rules
/*Rules for terminal node 10*/ Matches
• 1,548 surveys (25.3%), • 1,113 recommend (71.9%), • 51.6% of all strong recommends RULE:
If ( Q19= 1 and Q37 = 1 or 2) Then Recommend = 1
P(0) = 0.281;
P(1) = 0.719;; Lift = 2.0
If strongly agree that supervisors treat me with respect, and agree that compensation practice is fair, then
/*Rules for terminal node 9*/ Matches
• 188 surveys (3.1%), • 110 recommend 58.5%), • 5.1% of all strong recommends RULE:
If ( Q19 = 1 and Q37 <> 1or 2 and Q45 = 1)
Then Recommend = 1 P(0) = 0.415;
P(1) = 0.585; Lift = 1.7
If strongly agree that supervisors treat me with respect, and believe that members strongly agree they are highly
/*Rules for terminal node 5*/ Matches
• 350 surveys (5.7%), • 198 recommend (73.5%), • 9.2% of all strong recommends RULE IF ( Q19 <> 1 and Q33 = 1 or 2 and Q45 = 1 ) Then Recommend = 1 P(0)= 0.434; P(1) = 0.566; Lift = 1.4
If agree that trust management will take my interests into account, and believe that members strongly agree they are highly satisfied, even though don’t
35 © Abbott Analytics, 2000-2006
Recommend to Friend Model:
Rules for Not Recommending
/*Rules for terminal node 1 */ Matches
• 1,784 surveys (29.2%),
• 130 highly recommend (7.3%), 94% don’t highly rec. • 6.0% of all highly recommend
RULE:
If ( Q31 <> 1 and Q22 <> 1) Then Don’t Strongly Recommend P(0) = 0.94
P(1) = 0.06;
If don’t strongly agree that supervisors treat me with respect, and don’t agree that management will take interests into account, then don’t strongly agree that will recommend to friend.
/*Rules for terminal node 2 */ Matches
• 846 surveys (13.84%),
• 132 highly recommend (15.6%), 84.4% don’t highly rec. • 6.1% of all highly recommend
RULE
If ( Q19 <>1and Q33 = 1or 2 and Q45 <> 1 and Q5 <> 1 or 2) Then Don’t Strongly Recommend
P(0) = 0.84; P(1) = 0.16;
If don’t strongly agree that supervisors treat me with respect, and don’t strongly believe that members are highly satisfied, and don’t agree that there are good opportunities for professional growth, then even though agree that management will take interests into account,
Intend to Continue Working at Club (=1)
Model: Data Information
File:modeling data with binarized dependents w missing.txt Target Variable: Q39_1 Predictor Variables: Q66, Q67, Q68, Q69, Q3, Q4, Q5, Q6, Q7, Q8, Q9, Q10, Q11, Q12, Q13, Q14, Q15, Q16, Q17, Q18, Q20, Q21, Q22, Q23, Q24, Q25, Q26, Q27, Q28, Q29, Q30, Q31, Q32, Q33, Q34, Q35, Q36, Q37, Q38, Q45, Q46, Q47, Q48, Q49, Q50, Q51, Q52, Q53, Q54, Q55, Q56, Q57, Q58, Q59, Q60, Q61, Q62, Q63, Q64, Q65
Class N Cases
Pct
0
3,030
49.6%
1
3,085
50.4%
37 © Abbott Analytics, 2000-2006
Intend to Continue Working at Club:
Model Performance
Class N Cases N Misclassified
Pct.
Misclass
0
3,030
868
28.65
1
3,085
849
27.52
Node Cases Target Class % of Node Tgt. Class % Target Class Cum % Tgt. Class Cum % Pop % Pop Cases inNode Cum lift Lift 10 1,099 80.81 35.62 35.62 22.24 22.24 1,360 1.60 1.60 9 486 69.63 15.75 51.38 33.66 11.42 698 1.53 1.38 5 349 67.38 11.31 62.69 42.13 8.47 518 1.49 1.34 8 100 65.36 3.24 65.93 44.63 2.50 153 1.48 1.30 4 202 53.87 6.55 72.48 50.76 6.13 375 1.43 1.07 7 75 43.86 2.43 74.91 53.56 2.80 171 1.40 0.87 2 224 35.33 7.26 82.17 63.93 10.37 634 1.29 0.70 3 43 33.59 1.39 83.57 66.02 2.09 128 1.27 0.67 6 65 30.23 2.11 85.67 69.53 3.52 215 1.23 0.60 1 442 23.73 14.33 100.00 100.00 30.47 1,863 1.00 0.47
Intend to Continue Working at Club
Model: Splitters
•
Q8: Feel Welcome
– Surrogate: Q27 (family friendly place), Q28
(diverse environment), Q18 (good working
conditions)
•
Q69: Age
– Surrogate: Q66 (how long worked at Club),
Q68 (education)
•
Q18: Good Working Conditions
– Q17 (have necessary support and
materials to do job)
•
Q5: Good Opportunities for Professional
Growth
– Q7, Q33 (Management will take my
interests into account)
•
Q7: Will be Recognized for Good Job
Q56
Q66
Q7
Q5
Q6
Q5
Q18
Q69
Q8
1
2
6
5
9
10
Intend to Continue Working at Club
Model: Key Variables
Primary splitters only
Variable Score Q8 100 Q18 84.13 Q27 63.23 Q11 57.03 Q28 50.45 Q26 48.54 Q7 43.43 Q5 37.23 Q33 32.81 Q31 23.56 Q69 22.21 Q4 21.86 Q9 18.79 Q3 13.82 Q13 9.98 Q14 9.46 Q16 8.12 Q15 6.03 Q66 5.26 Q17 3.99 Q56 2.15 Q6 2.03 Q23 1.63 Q68 1.23Variable
Score
Q8
100
Q5
37.07
Q69
17.48
Q7
11.24
Q18
10.7
Q66
5.19
Q56
2.15
Q6
2.03
•
Q8: Feel Welcome
– Surrogate: Q27 (family friendly place),
Q28 (diverse environment), Q18 (good
working conditions)
•
Q69: Age
– Surrogate: Q66 (how long worked at
Club), Q68 (education)
•
Q18: Good Working Conditions
– Q17 (have necessary support and
materials to do job)
•
Q5: Good Opportunities for
Professional Growth
– Q7, Q33 (Management will take
my interests into account)
•
Q7: Will be Recognized for Good
Job
Intend to Continue Working at Club
Model: Key Rules
/*Rules for terminal node 10 */ Matches
• 1,360 surveys (22.2%), • 1,099 intend to continue (80.8%),
• 35.6% of all intend to continue RULE:
If (Q8 = 1 and Q69>=2.5 ) Then Intend to continue P(0) = 0.19;
P(1) = 0.81;; Lift = 1.6 If strongly agree that feel
welcome and am 35 years old or older, then strongly agree that
/*Rules for terminal node 9 */ Matches
• 698 surveys (11.4%),
• 486 intend to continue (69.6%), • 15.8% of all intend to continue RULE:
If ( Q8 = 1 and Q18 = 1and Q69 <= 2.5 )
Then Intend to continue P(0) = 0.30;
P(1) = 0.70; Lift = 1.4
If strongly agree that feel welcome and strongly agree that there are good
/*Rules for terminal node 5 */ Matches
• 518 surveys (8.5%),
• 349 intend to continue (67.4%), • 11.3% of all intend to contiue RULE
IF ( Q8 <> 1 and Q5 = 1 or 2 and Q7 = 1 or 2 and Q66 > 2.5 )
Then Intend to continue P(0)= 0.32;
P(1) = 0.68; Lift = 1.3
If I strongly agree that if I do a good job I’ll be recognized, and I strongly agree that there are good opportunities for professional growth, and I have worked at the club for more than 2
41 © Abbott Analytics, 2000-2006
Intend to Continue Working at Club Model:
Rules for Don’t Strongly Intend to Continue
/* Rules for terminal node 1 */ Matches
• 1,863 surveys (30.5%),
• 442 strongly intend to continue working (23.7%), • 14.3% of all strongly intend to continue working • 46.9% of all not strongly intending to continue RULE:
If ( Q8 <> 1 and Q5 <> 1 or 2)
Then not strongly intending to continue working at club P(0) = 0.76;
P(1) = 0.24; Lift 0.47
If don’t strongly agree that feel welcome and don’t strongly agree that there are good opportunities for professional growth, then don’t strongly agree that intend to continue working at the club.
/*Rules for terminal node 2 */ Matches
• 634 surveys (10.4%),
• 224 strongly intend to continue working (35.3%), • 7.3% of all strongly intend to continue working
• 13.5% of all not strongly intending to continue working RULE
If ( Q8 <> 1 and Q5 = 1 or 2 and Q7 <> 1 or 2 ) Then not strongly intending to continue working at club P(0) = 0.65;
P(1) = 0.35; Lift 0.70
If don’t strongly agree that feel welcome and don’t strongly agree that if I do a good job I’ll be recognized, even though I strongly agree that there are good
opportunities for professional growth, then don’t strongly agree that intend to continue working at the club.