• No results found

Leveraging Big Data

N/A
N/A
Protected

Academic year: 2021

Share "Leveraging Big Data"

Copied!
22
0
0

Loading.... (view fulltext now)

Full text

(1)

IHBI@CMU

(2)

IHBI@CMU: Snapshot

Not-for-profit consulting and applied research organization with more than a decade of experience

Purpose

•  Build, foster and promote the use of advanced and predictive analytics, and big data for

competitive advantage in US-based organizations, across multiple industries •  Train the next generation of data scientists

Mature team

•  PhD and MS level resources – physics, statistics, economics, computer science

•  SAS Certified with extensive SAS professional training

•  Strengths: predictive modeling, time series forecasting, machine learning, development

mentoring

Exceptional infrastructure

•  Analytics Insight Lab – Greenplum DCA MPP environment

•  SAS, ESRI, Tableau, Model Factory (POC pending)

Founded as Central Michigan University Research Corporation (CMU-RC) in 2001 by a group that included David Kepler, Dow Chemical Corporation; IBM Watson; Dow Corning and others.

(3)

Analytic Advantage

Repor(ng   Queries/drill   down   Alerts   Sta(s(cal   Analysis   Forecas(ng   Predic(ve   Modeling   Op(miza(on   What  happened?  

Where  is  the  problem?    

What  ac3on  are  needed?  

Why  is  this  happening?  

What  if  these  trends  con3nue?  

What  will  happen  next?  

What  is  the  best  that  can  happen?  

An al y& ca lly  imp ai re d  

Adapted  from  Compe&ng  on  Analy&cs:  The  New  Science  of  Winning  (Davenport,  2007).    

(4)

Business Expertise/Services

Forecasting

Predictive warranty

Customer loyalty

Early warning

Market segmentation

Price optimization

Site location

New customer

identification

Work force predictive

modeling

Website monitoring

Customer intelligence

Text/unstructured data

mining

4  

(5)

Sample of Methods

Data  Mining   and   Modeling  

Decision  

Trees   Forecas(ng   Networks  Neural   Regression  

Op(miza(on   /  Simula(on  

Systems  

Dynamics   Agent-­‐based  modeling  

Discrete-­‐ event   simula(on   Op(miza(on   with   uncertain   data   5  

(6)

Customers and Partners

Manufacturing

•  The Dow Chemical Company

•  The Dow Corning Corporation

•  Ford Motor Company

•  General Motors •  Harley-Davidson •  Monsanto •  Steelcase •  Whirlpool Corporation Technology •  IBM •  Information Builders •  SAS Institute •  Hewlett-Packard

Banking, Finance, Insurance

•  Auto-Owners Insurance

•  Comerica Bank

Health and Healthcare

•  Central Michigan District Health Dept.

•  College of Health Professions, CMU

•  College of Medicine, CMU

•  Eli Lilly

•  Henry Ford Health System

•  Michigan Health Information Alliance

•  Michigan Health Information Network

•  Partners Healthcare (Boston)

•  Spectrum Health System

•  Synergy Medical

Other

•  Proctor and Gamble

•  DTE Energy

•  Domino's Pizza

•  Gordon Food Service

•  State of Michigan 6  

(7)

Services

Innovation Workshop – A structured process to identify high-value analytics opportunities.

Exploratory Data Analysis – A statistical approach to evaluating the relative strengths and weakness of data to be used for a specific purpose.

Analytics Proof-of-Concept -- Custom projects, usually involving a series of complex models, designed to answer specific questions.

Analytics Staff Augmentation – Get the right help when

you need it.

(8)

A  PERFECT  STORM  

(9)

Ambitious Question

Business Challenge

: Dramatically

increase the ability to predict demand for

products and services by customer segment

customer segment (age, race gender)

geographic region (zip code/census tract)

over time (3, 5, 10 years in the future)

(10)

Ambitious Data (External)

Scope

•  Census, Bureau of Labor Statistics, NOAA, American

Community Survey, and more. Time

•  10 years of history

•  10 years of future forward projections

Space

•  Zip code and census tract

(11)

Ambitious Modeling

300 models

Driven by the customer segments

Machine learning approach

Artificial neural network

(12)

The Pain Point

Data Size

•  Terabytes

Loading the data

•  7-10 days to load

Looking at the data

•  Traversing a table -- hours

Testing a model

(13)

Our Solution

EMC/Greenplum – Data Computing Appliance

•  Quarter rack

•  Scalable

Performance to date – FAST (POC in process)

•  Loading went from days to minutes

•  Generate a 100K row sample -- hours to minutes

•  Sample queries -- 24 minutes to 1 minute (400 million

row result set)

•  Training the model --??

(14)
(15)

Better Answers

Forecasting

Predictive warranty

Customer loyalty

Early warning

Market segmentation

Price optimization

Site location

New customer

identification

Work force predictive

modeling

Website monitoring

Customer intelligence

Text/unstructured data

mining

15  

(16)

Big Data/Analytics Sandbox

The Analytics Insight Lab (A-LAB)

A secure platform to leverage data and solve real-world problems/challenges. Provides a low-risk way to get started

•  Leading edge data visualization,

•  Integration of proprietary and public data,

•  Advanced mining of structured and unstructured data,

including social media

(17)

Big Data/Analytics Sandbox

Technical Environment

•  EMC/Greenplum DCA

•  Remote access provided through virtual machines

•  SAS, ESRI, Tableau

Contextual Database

•  18 billion rows, demographic, socioeconomic,

•  20 years of data at the census tract

Problem Solving/Modeling Support

•  As much or as little as you need

Subject Matter Experts for Hire

•  Faculty available

(18)

Summary

Big data and high performance computing offer

new opportunity

Improve current data-driven problem solving

Solve completely new problems

This is not a fad, but a fundamental shift in how

successful organizations will compete for

(19)

Contact

Tracy Irwin Hewitt Associate Director 734-837-0279

[email protected]

19  

Map Produced by: The Institute for Health & Business Insight, Central Michigan University, Oct. 1 2012 Opportunity by Postal Code

This map shows the 'Opportunity Forecast' for Michigan at the Postal Code level. Each mark represents a Postal Code point.

Opportunity; 2015

High Low

(20)

TECHNICAL  ENVIRONMENT  

(21)

ICEBOX Technical Specs

Hosts:

–  2x Dell R715 servers with Dual AMD Opteron 6136

processors and 96GB RAM. VMware ESX 4.1

Greenplum DCA:

–  4 Greenplum Database Modules

Storage:

–  2 Xio ISE1 FC 49.6 TB total

–  1 Drobo 16 TB

–  1 Synology 16 TB

–  3 EMC Isilon X200 105TB total (coming soon)

Networking:

(22)

Selected Software

Greenplum Database v4.2.2.4

SAS

SAS 9.3

SAS Enterprise Guide 5.1

SAS Enterprise Miner 12.1

JMP 10

Tableau 7

References

Related documents

The distribution of DILI severity scores was dichotomous, in that hepatocellular cases had higher proportions that were mild (ALT elevations without jaundice) but also a

Inhalation: Use respiratory protection unless adequate local exhaust ventilation is provided or exposure assessment demonstrates that exposures are within recommended

Beutal was the head of the Dow Chemical Texas Division and with Alden Dow, son of Dow Chemical founder Herbert Henry Dow, they were considered the founding fathers of

Comments: Product evolves acetic acid (HOAc) when exposed to water or humid air.. Provide ventilation during use to control HOAc within exposure guidelines or use

(Dow Corning Construction Primer P should not be used with Dow Corning 790 Silicone Building Sealant, Dow Corning Contractors Concrete Sealant or Dow Corning Parking

, Determination of naltrexone and its major metabolite, 6-b-naltrexol, in human plasma using liquid chromatography with electrochemical detection, Journal of Pharmaceutical

Figure 6.34 A comparison of the theoretical and measured partition ratios based on a triangular heat flux for the wet grinding of M2 tool steel with the A200 alumina wheel

In an attempt to evaluate the difference in mother– child interaction based on the panic status of the mother as well as the anxiety status of the child, the sample was divided into