IHBI@CMU
IHBI@CMU: Snapshot
Not-for-profit consulting and applied research organization with more than a decade of experience
Purpose
• Build, foster and promote the use of advanced and predictive analytics, and big data for
competitive advantage in US-based organizations, across multiple industries • Train the next generation of data scientists
Mature team
• PhD and MS level resources – physics, statistics, economics, computer science
• SAS Certified with extensive SAS professional training
• Strengths: predictive modeling, time series forecasting, machine learning, development
mentoring
Exceptional infrastructure
• Analytics Insight Lab – Greenplum DCA MPP environment
• SAS, ESRI, Tableau, Model Factory (POC pending)
Founded as Central Michigan University Research Corporation (CMU-RC) in 2001 by a group that included David Kepler, Dow Chemical Corporation; IBM Watson; Dow Corning and others.
Analytic Advantage
Repor(ng Queries/drill down Alerts Sta(s(cal Analysis Forecas(ng Predic(ve Modeling Op(miza(on What happened?Where is the problem?
What ac3on are needed?
Why is this happening?
What if these trends con3nue?
What will happen next?
What is the best that can happen?
An al y& ca lly imp ai re d
Adapted from Compe&ng on Analy&cs: The New Science of Winning (Davenport, 2007).
Business Expertise/Services
•
Forecasting
•
Predictive warranty
•
Customer loyalty
•
Early warning
•
Market segmentation
•
Price optimization
•
Site location
•
New customer
identification
•
Work force predictive
modeling
•
Website monitoring
•
Customer intelligence
•
Text/unstructured data
mining
4Sample of Methods
Data Mining and Modeling
Decision
Trees Forecas(ng Networks Neural Regression
Op(miza(on / Simula(on
Systems
Dynamics Agent-‐based modeling
Discrete-‐ event simula(on Op(miza(on with uncertain data 5
Customers and Partners
Manufacturing• The Dow Chemical Company
• The Dow Corning Corporation
• Ford Motor Company
• General Motors • Harley-Davidson • Monsanto • Steelcase • Whirlpool Corporation Technology • IBM • Information Builders • SAS Institute • Hewlett-Packard
Banking, Finance, Insurance
• Auto-Owners Insurance
• Comerica Bank
Health and Healthcare
• Central Michigan District Health Dept.
• College of Health Professions, CMU
• College of Medicine, CMU
• Eli Lilly
• Henry Ford Health System
• Michigan Health Information Alliance
• Michigan Health Information Network
• Partners Healthcare (Boston)
• Spectrum Health System
• Synergy Medical
Other
• Proctor and Gamble
• DTE Energy
• Domino's Pizza
• Gordon Food Service
• State of Michigan 6
Services
Innovation Workshop – A structured process to identify high-value analytics opportunities.
Exploratory Data Analysis – A statistical approach to evaluating the relative strengths and weakness of data to be used for a specific purpose.
Analytics Proof-of-Concept -- Custom projects, usually involving a series of complex models, designed to answer specific questions.
Analytics Staff Augmentation – Get the right help when
you need it.
A PERFECT STORM
Ambitious Question
Business Challenge
: Dramatically
increase the ability to predict demand for
products and services by customer segment
•
customer segment (age, race gender)
•
geographic region (zip code/census tract)
•
over time (3, 5, 10 years in the future)
Ambitious Data (External)
Scope• Census, Bureau of Labor Statistics, NOAA, American
Community Survey, and more. Time
• 10 years of history
• 10 years of future forward projections
Space
• Zip code and census tract
Ambitious Modeling
•
300 models
–
Driven by the customer segments
•
Machine learning approach
–
Artificial neural network
The Pain Point
Data Size• Terabytes
Loading the data
• 7-10 days to load
Looking at the data
• Traversing a table -- hours
Testing a model
Our Solution
EMC/Greenplum – Data Computing Appliance
• Quarter rack
• Scalable
Performance to date – FAST (POC in process)
• Loading went from days to minutes
• Generate a 100K row sample -- hours to minutes
• Sample queries -- 24 minutes to 1 minute (400 million
row result set)
• Training the model --??
Better Answers
•
Forecasting
•
Predictive warranty
•
Customer loyalty
•
Early warning
•
Market segmentation
•
Price optimization
•
Site location
•
New customer
identification
•
Work force predictive
modeling
•
Website monitoring
•
Customer intelligence
•
Text/unstructured data
mining
15Big Data/Analytics Sandbox
The Analytics Insight Lab (A-LAB)
A secure platform to leverage data and solve real-world problems/challenges. Provides a low-risk way to get started
• Leading edge data visualization,
• Integration of proprietary and public data,
• Advanced mining of structured and unstructured data,
including social media
Big Data/Analytics Sandbox
Technical Environment
• EMC/Greenplum DCA
• Remote access provided through virtual machines
• SAS, ESRI, Tableau
Contextual Database
• 18 billion rows, demographic, socioeconomic,
• 20 years of data at the census tract
Problem Solving/Modeling Support
• As much or as little as you need
Subject Matter Experts for Hire
• Faculty available
Summary
Big data and high performance computing offer
new opportunity
•
Improve current data-driven problem solving
•
Solve completely new problems
This is not a fad, but a fundamental shift in how
successful organizations will compete for
Contact
Tracy Irwin Hewitt Associate Director 734-837-0279
19
Map Produced by: The Institute for Health & Business Insight, Central Michigan University, Oct. 1 2012 Opportunity by Postal Code
This map shows the 'Opportunity Forecast' for Michigan at the Postal Code level. Each mark represents a Postal Code point.
Opportunity; 2015
High Low
TECHNICAL ENVIRONMENT
ICEBOX Technical Specs
•
Hosts:
– 2x Dell R715 servers with Dual AMD Opteron 6136
processors and 96GB RAM. VMware ESX 4.1
•
Greenplum DCA:
– 4 Greenplum Database Modules
•
Storage:
– 2 Xio ISE1 FC 49.6 TB total
– 1 Drobo 16 TB
– 1 Synology 16 TB
– 3 EMC Isilon X200 105TB total (coming soon)