Analytical People 1 1
ASC September
Proving value in complex analytics
26
thSeptember 2014
John McConnell
Information and
Data Management
Analytical People2 Rivers
2 Research Operational/TransactionalInferential
Statistics
Inferring parameter values in atarget population based on sample statistics. Often using parametric
assumptions.
Data Mining
Applying historical patterns topredict future outcomes. Tested empirically
Analytical People
Data mining
Discovering previously undetected patterns andrelationships in data
Predictive
analytics
Applying historical patterns topredict future outcomes
DM and PA
Analytical People
Major Analytical Pillars
4
P
e
o
p
le
• Customer Lifecycle • Acquisition • Up-sell • Retention (Churn)O
p
e
ra
ti
o
n
al
• Predictive Maintenance • Supply Chain Forecasting • Pricing • Product lifecycleTh
re
a
t
&
R
is
k
• Fraud • Risk analysis • Crime predictionAnalytical People Time Revenue Loss Less Loss Profit
a) Use Data we have on the
customer to the time before the last period (e.g. month)
b) To model against known behaviour (churn or stay) in the last period
Processes and events
Analytical People
Data Types
Behavioral data - Orders - Transactions -Payment history -Usage history Descriptive data -Attributes -Characteristics -Self-declared info -(Geo)demographics Attitudinal data - Opinions -Preferences-Needs & Desires Interaction data
-E-Mail / chat transcripts -Call center notes -Web Click-streams -In person dialogues
Who?
What?
Why?
How?
Analytical People
People and Roles
Business • Domain • Subject Matter Analytical • Methodologies • What to use when Data • Understanding • Management • Structure Technology • Integration • Building apps Analytical People
The CRISP-DM process
1.Business Understanding 2.Data Understanding 3.Data Preparation 4.Modelling 5.Evaluation 6.Deployment
Analytical People
1. Business understanding
•
Get a clear understanding of the business objectives
– To reduce churn rates
– To acquire valuable customers – To cross-sell/up-sell
– To prevent fraud
•
Agree success criteria
– To reduce out annual churn rate from 5% to 3% – Reduce acquisition costs by 30%
•
Assess the situation
•
Translate to analytical objectives (if possible)
•
Evaluate the cost/benefit
•
Clearly understand how action can be taken based on the likely
outcomes
– How to deploy
•
Document relevant resources, constraints, systems
1.Business Understanding
The CRISP-DM process
1.Business Understanding 2.Data Understanding 3.Data Preparation 4.Modelling 5.Evaluation 6.Deployment http://crisp-dm.eu/ 1 2 3. Data Preparation 4 5 6 TimeAnalytical People
1. Business understanding
• Get a clear understanding of the business/research objectives – To reduce churn rates
– To acquire valuable customers – To cross-sell/up-sell
– To prevent fraud • Agree success criteria
– E.g. To reduce our annual churn rate from 5% to 3% • Assess the situation
• Translate to analytical objectives (if possible) • Evaluate the cost/benefit
• Clearly understand how action can be taken based on the likely outcomes – How to deploy
• Document relevant resources, constraints, systems
1.Business Understanding
Analytical People
2. Data understanding – High Level
• Identify the data sources and fields which may have a bearing on the business/analytical objectives
• Review data schemas and any other data documentation • What looks relevant?
• What are the formats?
– Databases, text files, excel, etc.
• What are the fieldnames?
– Metadata
• Crucially … what is the likely target field that maps to the business objective e.g.
– Customers purchasing for the first time – Customers re-purchasing
– Revenue/Profit/ROI – Visits to the web site – Campaign response – Customers churning
2.Data Understanding
Analytical People
3. Data Preparation
• Data Understanding effectively designs this step
• Together with Data Understanding this can be more time
consuming than expected
– Sometimes 80% of a project – Especially for new initiatives
• Typically integrates data from different sources
• Aggregate data
• Create composite measures
– E.g. band variables
– Apply formulae e.g. compute annualised figures and other ratios
• Comparable to ETL (Extract Transform Load)
3.Data Preparation
Analytical People
Integrating Data
14
Level 1 – Matching IDs. The ideal situation
Level 2 – Similar Fields/Values. Need to clean or apply “Entity” matching
Level 3 – More Fuzzy. If possible we approximate e.g. Space/Time matches
Source Data
OperationalAnalytical People 1.Business Understanding 2.Data Understanding 3.Data Preparation 4.Modelling 5.Evaluation 6.Deployment
Modelling & Evaluation
Analytical People
4.Modelling
• Apply a variety of modelling techniques
• Candidate list identified during understanding phase
– Driven by data types (see later) – Constrained by available tools
• 2 broad styles:
a) Hypothesis led. Add the fields/predictors that we believe are driving the outcome
b) Data led. Add more fields at the beginning and incrementally reduce (and/or let the algorithms do that)
• The best performing modelling algorithm is a function of the
specific data/problem
Analytical People
5.Evaluation
• Essential that the models are tested against unseen data
• Typically the data is partitioned into 2 (or 3) sets at random
e.g. 70%:30%
1. Training (modelling) set 2. Test (holdout) set 3. Evaluation set
• Evaluate against the success criteria agreed in the
understanding phase
• Often it is about how well the model performs against a given
value criteria e.g. revenue
– Defined in Data Understanding phase
Analytical People
On-line segmentation in News media
Why do they visit the site and what do they
think of it?
?
?
?
?
Who visits the site? What do they do on
Analytical People
Developing the visitor segments
Behavioural segmentation based on
content consumption
Segments profiled using other behavioural data and also additional survey and/or customer data
Analytical People
Data sources / integration
20 Analytical Data Views Click Stream (Adobe) Registered Customer Data (CRM) Advertising revenue (Ad serving) Survey (Confirmit)
Analytical People
Daytime online
•The most valuable segment
•View most evenly throughout the day •Highest visit frequency
•More in the week and to a lesser extent at weekends
•More likely females under 34
•Typically looking after the house/children or alternatively
students
•More likely to be offline readers as well… or read one of the other competitive publications •Likely to look for an article in the publication
•Often interested in certain articles or other specific sections in general
•Broadest repertoire of content read •Most likely to use search
•Most likely to visit once a day
Analytical People
Our 6 segments – size and value
Seg 5 Seg 3 Seg 2 First timers Seg 4 Seg 6 12.7 51.0 52.8 39.2 94.5 29.4 40% 26% 14% 7% 9% 4%
Width shows segment size (% of all visitors) Height shows the average visitor value in each segment (value displayed in block)
Analytical People
“Optimising” processes in Telco Managed Services
• Can we predict what is needed to fix a fault from the initial
call/alarm?
– Save time and money by having the right parts and sending engineers with the right skills
• Can we improve service levels by having the right skills/stock at
the right place at the right time?
• Can we predict when failures will happen and perform
pro-active maintenance to prevent them?
– “Predictive Maintenance”
• Can we predict faults according to the weather?
23
Analytical People
Joining Work Force and Tom Tom (GPS) data
Within the Tom Tom data we match sites to trip destinations using latitude and longitude (to 3 decimal places) – approximately within 111metres
Tom
Tom
Sites TripsFTs
DatesWFM
Sites DatesFTs
Tickets Work Orders Weather Stations DatesAnalytical People
Software tools to visualise the data flow
Analytical People
Retaining subscribers
• Annual Magazine Subscription Renewal Modelling
• Predict the likelihood of each customer to renew at their next renewal • Ensure predictive accuracy
• The model must make sense to the business – it must be usable and ‘deployable’
Analytical People
Data sources and fields
D
e
sc
ri
p
ti
v
e
Company/Individual Company size Business type Job function Age Association membership Gender LocationV
a
lu
e
Lifetime Lifetime value Annualised value Back issue claims Payment method Time taken to pay Amount paid last timeM
a
rk
e
ti
n
g
Frequency of contact Acquisition Channel Renewal channel Subscription term Preferred response method 3.Data Preparation Analytical PeopleCampaign Test & Control
• Revenue in the test groups is up 18%
• Profit in the test groups is up 21%
• The success of this test means it is being rolled out across 100%
of records for participating titles
• We’re co-developing an on-line (SaaS) application - “PX” - that
will enable subscription managers to build and deploy models
themselves
Analytical People
Brammer
• A leading automotive parts distributor reduces the cost of
carrying surplus stock and improves customer service
• Applications & Benefits
– Predictive analytics helped Brammer to manage its inventory more efficiently, significantly reducing the need to carry surplus stock, resulting in a total inventory reduction of £31.1 million in one year – Inventory turnover improved from 3.2 times at the end of 2008, to 3.7
times at the end of the first half of 2009
– Greater understanding of patterns and trends in customer purchasing data helps Brammer forecast marginal stock products more accurately and improve customer satisfaction by making a wider product range available for immediate dispatch
– Detailed insight into inventory requirements has helped Brammer develop closer relationships with strategic suppliers leading to further cost benefits
Analytical People
What about “Big Data”?
• We have done some work in true “Big Data”
• Deploying models against Big Data is easier (though not trivial)
than Modelling against Big Data
• Often the data we need to analyse is a subset of the source
data
– “The disappearing Terabyte” – And sampling works!
• BUT. The data still has to be prepared hence…
31 Data Preparation
Analytical People
Summary
• Proving value seems to be more necessary than ever
• Big Data projects need to be evaluated like smaller data
projects
• Evaluate the potential upside up-front
– Use external sources where appropriate • http://nucleusresearch.com/
• http://www.predictiveanalyticsworld.com/
• CRISP-DM helps
• Prove it
– With a business case up-front – With a pilot/proof-of-value project
Analytical People 33 33