• No results found

2011 Data Miner Survey Highlights

N/A
N/A
Protected

Academic year: 2021

Share "2011 Data Miner Survey Highlights"

Copied!
11
0
0

Loading.... (view fulltext now)

Full text

(1)

Karl Rexer, PhD

President

Rexer Analytics

www.RexerAnalytics.com

2011 Data Miner Survey Highlights

The Views of 1,319 Data Miners

Predictive Analytics World

New York, NY

(2)

2

2011 Data Miner Survey: Overview

40%

27%

18%

Corporate

Consultants

Academics

NGO / Gov’t (7%)

Vendors (8%)

47%

37%

North America

USA 44%

Canada 3%

Mexico 1%

Europe

Germany 9%

UK 4%

France 4%

Switzerland 3%

Asia Pacific

(10%)

India 4%

Australia 2%

China 1%

Central & South

America

(3%)

Argentina 1%

Brazil 2%

Middle East & Africa

(3%)

Israel 1%

South Africa 1%

5

th

annual survey

52 questions

10,000+ invitations emailed,

plus promoted by newsgroups,

vendors, and bloggers

Respondents:

1,319 data miners

from over 60

countries

Note: Data from software vendors

was excluded from many analyses.

(3)

There’s Strong Demand & We’re Working Hard

Question: How will the number of data mining projects your organization conducts in 2011 compare to what has been typical in the past few years?

Data miner hiring is very strong*.

And company use of data mining is increasing.

• 

78% of data miners foresee increases in the number of data mining projects.

• 

This is on top of similar increases reported last year.

• 

Data miners working in diverse settings share this optimism.

Number of Data Mining Projects Projected in 2011

* Multiple sources: Use of “data mining” in online job ads, KDnuggets job listings, recruiters, salary reports.

(4)

4

Data Miners are Working Everywhere

• 

More data miners report working in CRM / Marketing, Academia and Financial

Services than any other fields.

-

These have been the three most commonly reported fields in each of the five

annual Data Miner Surveys (2007-2011).

• 

Fewer data miners report working in CRM/Marketing this year (42% in 2010).

• 

Many data miners work in several fields.

Question: In what fields do you TYPICALLY apply data mining? (Select all that apply)

10% 10% 11% 11% 12% 12% 13% 14% 14% 27% 31% 33% 0% 10% 20% 30% 40% 50% Manufacturing Pharmaceutical Government Internet-based Medical Retail Telecommunications Technology Insurance Financial Academic CRM/Marketing

Vendors were excluded from this analysis.

Data Mining is everywhere!

Data miners also report working in

Non-profit (6%), Hospitality /

Entertainment / Sports (3%), Military /

Security (3%), and Other (9%).

(5)

We Enjoy our Work

• 

Data miners are generally satisfied with their jobs, with more than a quarter

reporting being “very satisfied”.

• 

68% report being likely to remain with their current employer for the next two years.

Very Unsatisfied

6%

19%

46%

27%

0%

20%

40%

60%

80%

100%

Satisfaction

Unsatisfied

Neutral

Satisfied

Satisfied

Very

Questions: What is your current level of job satisfaction?

How likely are you to remain with your current employer for the next two years?

Very Unlikely

8%

18%

32%

36%

0%

20%

40%

60%

80%

100%

Likelihood

Unlikely

(6)

6 9% 9% 12% 12% 12% 13% 14% 15% 22% 23% 23% 26% 30% 32% 34% 35% 58% 68% 69% 0% 10% 20% 30% 40% 50% 60% 70%

MARS

Uplift Modeling

Link Analysis

Genetic Algorithms

Rule Induction

Social Network Analysis

Anomoly Detection

Survival Analysis

Ensemble Models

Bayesian

Support Vector

Association Rules

Text Mining

Factor Analysis

Neural Nets

Time Series

Cluster Analysis

Decision Trees

Regression

The Algorithms Data Miners are Using

• 

Decision trees, regression, and cluster analysis continue to form a triad of core

algorithms for most data miners. This has been very consistent over time.

• 

However, a wide variety of algorithms are being used.

Question: What algorithms/analytic methods do you TYPICALLY use? (Select all that apply)

Corporate Consultants Academic NGO / Gov’t

10% 15% 3% 6%

Consultants are more likely to use

Ensemble Models

Consultants and corporate data miners are

more likely to use Uplift Modeling

Corporate Consultants Academic NGO / Gov’t

22% 29% 22% 23%

(7)

Survey Questions:

• 

What Data mining/analytic tools did you

use in 2010? (rate each as “never”,

“occasionally”, or “frequently”)

• 

What one Data Mining software

package do you use most frequently?

Overall

Corporate

Consultants

Academics

NGO / Gov’t

• 

The average data miner reports using 4 software tools.

• 

R is used by the most data miners (47%).

• 

STATISTICA is the primary data mining tool chosen most often (17%).

(8)

8

Satisfaction question: Please rate your overall satisfaction with your primary Data Mining software package.

Satisfaction

Continued Use

• 

STATISTICA, KNIME, Rapid Miner and Salford Systems received the

highest satisfaction ratings.

• 

The users of these tools are also the most likely to continue using them

as their primary tools for the next three years.

Continued Use question: What is the likelihood that you will continue to use this tool as your primary Data Mining software package over the next 3 years?

Extremely Dissatisfied Extremely Satisfied Extremely Unlikely Extremely Likely

Tools: Satisfaction & Continued Use

Vendors were excluded from this analysis.

(9)

R Command

Line

R Usage

R Studio

Other

51%

8%

6%

5%

5%

4%

3%

2%

16%

The Popularity of R Software is Growing Fast

• 

The proportion of data miners using R is rapidly growing!

-

R is also the #1 most used data mining tool (in both 2010 & 2011). Up from #5 in 2007.

• 

An increasing number of data miners consider R their primary tool.

-

R is now #2 in primary tool rankings. Up from #7 in 2008.

• 

Half of R data miners use the command line interface. Among the rest, R Studio,

scripts, R Commander, and STATISTICA are popular interfaces.

0%

10%

20%

30%

40%

50%

2007

2008

2009

2010

2011

Use R

Primary Tool

R Interface

R Commander

STATISTICA

Rapid Miner

Rattle

Knime

Scripts

Question: If you use the R software package, what is your primary interface to R?

(10)

10

• 

Analytic capability:

There’s room to improve if we’re going to

“Compete on Analytics”.

• 

Analytic capabilities boost company performance.

Company Performance Question: Which statement best describes

the recent performance of your company / organization?

Analytic Capability Question: In general, with what

degree of sophistication does your company /

organization approach analytic problems?

Room for Improvement

And it Matters!

Low

Moderate

Very Low

High

Very High

Corporate Analytic Sophistication

Company

Pe

rfo

rm

an

ce

Only 12% of corporate

respondents rate their

company as having very

high analytic sophistication.

Companies with

better analytic

capabilities are

outperforming

their peers!

Caution: this is self

report data &

correlation analysis.

But go talk with

Tom Davenport about

the causal direction.

(11)

How to Get More Information

Questions? – Talk with me at PAW

Call or email me if you don’t see me in the hallways

Copy of these slides – Available now

2011 Data Miner Survey Summary Report (Free)

Available later this Fall

Available at PAW website or email me

2010 Data Miner Survey Summary Report (Free)

37 page report, available now

Available at PAW website or email me

Karl Rexer, PhD

[email protected]

www.RexerAnalytics.com

617-233-8185

References

Related documents

● From the Start/Finish at the River House Barn, head south and cross Fig Ave into the River Campground and head EAST and connect with Main Loop trail.. ● The main loop trail will

The American Recovery and Reinvestment Act of 2009 (ARRA, P.L. 111-5) includes a temporary provision that allowed non-itemizing homeowners to claim an additional standard deduction

This study reports on detection, localization, and quanti fication of frequent small rockfalls and infrequent pyroclastic density currents descending the southeast flanks of

This Operations Update is requesting to extend the timeframe by six weeks; and an additional allocation of CHF to 102,770 expand the activities planned into eight additional

Key words: bile duct proliferation, chlordanes, dichlorodiphenyltrichloroethane, dieldrin, East Greenland, HCB, hexacyclohexanes, Ito cells, lipid granulomas, liver, mononuclear

On the whole, further spread of superlarge corporate farms is not probable in the Hungarian cereals market due to evolved market conditions (limited land sizes,

The purpose of this study was to describe the process providing a faculty-guided structured simulation debriefing session grounded in Kolb’s Experiential Theory and Gibb’s

With transparent workflow, validation and real-time compliance checking of new content, and integrated applications to assess and remediate existing content, Sitecore greatly