Session 42 PD, Predictive Analytics for Actuaries: Building an Effective Predictive Analytics Team
Moderator: Courtney Nashan
Presenters:
Ian G. Duncan, FSA, FCIA, FIA, MAAA Andy Ferris, FSA, MAAA
Christine Irene Hofbeck, FSA, MAAA Courtney Nashan
Building an Effective
Predictive Modeling Team
Christine Hofbeck, FSA, MAAA
Ian Duncan, FSA, FIA, FCIA, FCA, MAAA Andy Ferris, FSA, FCA, CFA, MAAA
Courtney Nashan October 12, 2015
Overall Approach
1. Phase 1 – Planning
2. Phase 2 - Data Assembly and Model Build 3. Phase 3 - Technical Implementation
4. Phase 4 - Business Implementation
To comprehensively incorporate predictive analytics into a core
operational business process, we follow four phases:
1. Assembling a Team
2. Laying the Foundation
3. Selecting a Project
Consider skillsets both individually and collectively
1. Ability to manipulate large datasets (SAS, R, SQL) 2. Modeling expertise
3. Business acumen
4. Ability to explain highly technical information to a non-technical audience
5. Ability to represent results graphically for ease of
communication
6. Consider mix of prior experience 7. Charisma
Phase 1 - Assembling a Team
Those who prepare the data are as important as
those who build the model, who are as important as
Building a predictive modeling capability is not only about hiring
a team. Consider:
1. Technology
2. Legal commitments to customers 3. Data privacy and compliance
4. Objective
5. Change management 6. Cross functional support 7. Budget
Phase 1 - Laying the Foundation
Consider the cultural and political impacts of this change,
not only the strategic.
Your first project will get a lot of attention – select it wisely
1. Large enough that it can make a true business impact
2. Not so large that it takes over a year or more to build (your colleagues will be anxious to see results!)
3. Available data
4. Projects which may have been unsolvable in the past with current methods
5. The business wants to implement (use) it to improve decision making 6. What are my competitors doing? Where should I invest the effort?
Phase 1 - Selecting a Project
Remember that predictive modeling makes an impact when the model is
implemented and better informed decisions are made
Phase 2 – Data Assembly & Model Build
There are two important challenges to keep in mind with
modeling:
1. How to organize the data for efficient interrogation; and
2. How to organize the data for replicability (remember that at some point, your model is going to go into production).
How to organize the data for efficient interrogation
Here is an example of a data management and warehousing problem from healthcare:
• We know that diagnoses are an important contributing factor to illness, health risk
and cost.
• There are about 17,000 diagnosis codes currently in use (ICD-9). With ICD-10 this
number grows to 140,000 (from October 2015!) There are 100,000 CPT (procedure) codes, and the National Drug Code directory contains hundreds of thousands of drug codes (updated daily!)
• Obviously this creates an unmanageable set of codes for analysis purposes.
In healthcare we have solved this problem with the use of “grouper models.” Grouper models group like diagnosis codes into diagnostic categories. Drug codes are similarly grouped into therapeutic classes.
For a lot of analytical work, grouper models are all that is required. The SOA has studied the predictive accuracy of these models in three studies (1994-2007); a fourth study is in preparation.
How to organize the data for replicability
The use of grouper-type models or models that assign a categorical value to a
continuous variable is very valuable in modeling because these models can be built into a warehousing process. They will then be used in the practical application of the model in production.
Another example from Healthcare:
• Body Mass Index is defined as Weight (in kg)/Height2 (in cm). Obviously, a
continuous variable. But clinicians have provided categories, as follows, which provide a useful guide to the status of a particular patient:
Category BMI Underweight < 18.0 Normal weight 18.0 – 25.0 Heavy weight 25.0 – 30.0 Obese 30.0 – 40.0 Morbidly Obese 40.0+
A few quotes to keep us grounded:
“The year 1930, as a whole, should
prove at least a fairly good year.”
--
Harvard Economic Service, December 1929
“All models are wrong but some are useful.” George E.P. Box, Professor Statistics, University of Wisconsin-Madison.
Frequently-used software:
• SAS
• R
• Internally developed software
• Other commercially available models
Not as popular:
Python, SPSS, Salford SystemsFrequently-used models:
• OLS Regression • GLM • Time series • Decision Trees • ClusteringNot as popular:
Neural network, Bayes.Phase 3 – Technical Implementation
What we have accomplished:
• We have a mathematical equation:
What we have not accomplished:
• No real time “scoring engine” to enable use of the equation
Objective of this phase:
• A real-time flow of data inputs from multiple internal and external
sources to the “scoring engine”
• A real-time flow of model output (“score”, “reason codes”, etc.) to
business unit operations
Phase 3 – Technical Implementation
• Lack of early engagement of IT staff in planning • Lack of sufficient dedicated IT resources
• Format of data received (scanned images, etc.) in current environment • Collecting data fields in real time business production from multiple
internal systems (administrative system, agent licensing system, illustration, etc.)
• Sensitive data fields that prior phase found to be predictive
• Fixed system release dates conflict with desired program rollout
Phase 3 – Technical Implementation
Hints in overcoming common challenges
• Engage IT resources early in the project
• Plan in advance to discover more data challenges than you initially
expect
• Avoid reputational risk by carefully considering how each data field
will be used in new business process
Phase 4 – Business Implementation
What we have accomplished:
• We can deliver model output in real time to a business unit
What we have not accomplished:
• Not changed any core business operations to take advantage of
the model output Objective of this phase:
• Classic business process change exercise
• Change an existing business process to save time, save money, be
more efficient, etc.
Phase 4 – Business Implementation
• Lack of Early Engagement - by business unit in how algorithm will be used;
how/why business process will change
• Lack of Sufficient Communication - with business stakeholders (other
departments, customers, producers) on changes in operational procedures
• Unrealistic Expectations - by business stakeholders in impact of predictive
modeling and associated changes to business processes
• Reputation Risk – Are you comfortable explaining on 60 Minutes data
sources used by your business process in making decisions on individual customers?
• Implementing tools and metrics to monitor the ongoing impact of the new
business process
Common Challenges of this phase
Phase 4 – Business Implementation
Hints in overcoming common challenges
• Engage business unit early to ensure large model development effort
will be deployed in tangible business process change
• Design change management plan, including any impacts to operating
model, org design, as well as communications plan for program rollout
• Manage expectations to communicate what the new process will NOT
do
• Carefully consider how any new data sources may be perceived as
sensitive in future state business process
• Implement tools and metrics to monitor the ongoing impact of the
algorithm on the business process