Copyright © 2013. 100FirstHits.com
www.100FirstHits.com
BIG DATA
MISTAKES REPORT
2013
100#
First#Hits#
TM#The$Big$3
$
1.Lack$of$Competence$
2.Lack$of$Goals$
3.Lack$of$Strategy$and$$$
$$Corporate$support$
53,7%$
Poor$Data$Quality$
11,4%$
Seek$Data$PerfecGon$
9,9%$
Aiming$to$high$
5,9%$
Lack$of$Change$ Management$ 5,1%$ Seeking$cause$over$ correlaGon$ 4,9%$ Lack$of$Data$Relevance$ 4,5%$ Lack$of$Governance$ 2,4%$ Select$the$right$ visualizaGons$ 1,5%$ Lack$of$Permission$ 0,8%$ Lack ofCompetence Lack of Goals Corporate Support Lack of Strategy &
1
2
3
Scores
Lack of Competence
Big data platforms are relatively new. Most organizations don't have trained teams in place to use the platforms successfully.
Understanding what Big Data technologies are good for is crucial for success. Traditional warehouse solutions could still be a better fit depending on the application.
“
Combining Big and Small Data is key to Success. Big Data tells whatbased on what happened in the past. Small data can tell why and explain the past so that we can predict a changing future based on a better understand of what is causing this change.
”
–Albert Fitzgerald
Your team need to understand how to create training models, identifying best predictors from a wide range of independent variables, etc., etc…
Lack of Goals
“
Many organizations fall into the trap of collecting and analyzing data purely for the sake of doing it. The problem is without a clear business objective, the data is unlikely to yield any clear benefits or useful business intelligence.”
–Nelson Estrada
If you don’t ask the right questions, the data will ultimately be useless to you. Example of good questions to ask are: -Who will buy?
-Who will leave / churn? -What will break? -Who will pre-pay? -What will sell? -Etc., Etc.
Another common mistake is to attempt to set goals that the data does not support.
Lack of Strategy & Corporate Support
A lot of companies consider Big Data a “side-job”. Going down that road, you might as well stop doing Big Data at all.
What comes out of Big Data analysis is often affecting both corporate and operational strategies, and the decisions to make can be hard and risky.
After all, Big Data is all about unlocking new forms of value, and in the end, it is still up to humans to decide what to go after.
TOP 3
53,7%
1450
655
608
575
499
297
256
245
228
121
74
42
0 200 400 600 800 1000 1200 1400 1600 Lack ofCompetence Lack of Goals Lack of Strategy and Corporate support
Poor Data
Quality Perfec<on Seek Data Aiming to high Lack of Change Management over correla<on Seeking cause Lack of Data Relevance Governance Lack of Select the right visualiza<ons Permission Lack of
4
5
Seek Data Perfection
“
Big data beats sampling, hands down. In the past everyone relied on small data sets, or “samples.” But you needn't settle for samples. Now it’s about using as much data as you can get your hands on, which lights the way to new insights never before available.”
-Daniel Kehrer
“
The benefits of using vastly more data of variable quality outweigh the costs of using smaller amounts of very exact data.”
-Cukier and Mayer-Schoenberger
What we see here is two different approaches to data quality, each having its own area of application, correlation vs. predictive modeling.
Poor Data Quality
Shit in. Shit out. Understanding the quality of existing data in legacy systems is a huge pitfall that companies often don't spend enough time on.
“
Many data scientists spend most of their time preparing data to ensure that results are not skewed or subject to confirmation bias.”
-Eric A. KingContradiction?
0
20
40
60
80
100
Lack of Competence Lack of Goals Lack of Strategy and CorporatePoor Data Quality Seek Data
Perfection Aiming to high Lack of Change
Management Seeking cause over
correlation Lack of Data
Relevance Lack of Governance Select the right
visualizations Lack of Permission
Average Score
6
7
8
Aiming Too High
How do you eat an elephant? -One bite at a time! Don’t try to import, load and link all your data at once. That is cost- and time consuming.
“
Too many companies start out with expensive and high-risk big-data initiatives. Big-bang implementations are rarely a path to success.”
-Shira OvideLack of Change Management
Many companies are afraid of running their business on data. People are mourning the loss of creativity and common sense. With proper change
management you can instead heighten the need for creativity and common sense.
Seeking Cause Over Correlation
Sometimes it is enough to understand what happens, and not bother too much over why it happens. Google for instance did correlate certain search queries with geographic locations, and by doing so were able to track flu outbreaks much faster authorities world-wide.
About the Author
First#Hits#
100#
TM#Daniel Garplid, born in 1976
• Founder of the internet research company 100FirstHits.com
hCp://www.100firsthits.com
• Founder of the Real-‐Jme Business Intelligence company Manifact AB
hCp://www.manifact.com
Contact and Feedback
E. [email protected] Ph. +46 735 10 2770
hCp://twiCer.com/danielgarplid hCp://www.linkedin.com/in/garplid
Interested in being contacted for:
-‐Feedback -‐Talks and Trainings -‐Consultancy -‐Business deals
Methodology
First#Hits#
100#
TM#I searched for big data mistakes on Google. I then opened the first link and recorded every header/bullet and took notes on what they were all about. First header/bullet got a score of 100. Second header/bullet got a score of 99, and so on until I had collected 100 headers/bullets. I then categorized the data and summarized the score for each category. Search engine used in this research: Google
Lack of Data Relevance
228
121
Select the right visualizations 74 Lack of Permission 42THE LAST FOUR
9
10
11
12
Scores
Categories Sum of Score % of total Count of Header Average of Score
Lack of Competence 1450 28,7% 27 54 Lack of Goals 655 13,0% 14 47 Lack of Strategy and Corporate support 608 12,0% 12 51 Poor Data Quality 575 11,4% 13 44 Seek Data Perfection 499 9,9% 6 83 Aiming to high 297 5,9% 8 37 Lack of Change Management 256 5,1% 4 64 Seeking cause over correlation 245 4,9% 3 82 Lack of Data Relevance 228 4,5% 5 46 Lack of Governance 121 2,4% 5 24 Select the right visualizations 74 1,5% 1 74 Lack of Permission 42 0,8% 2 21
Grand Total 5050 100,0% 100 51
“Big%Data”%Mistakes%
Just because big data allows you to use huge data sets does not mean you should include all of your data in an analysis.
Who has the rights to create, approve, edit, or remove data from the system? You will dilute the value of even the best statistical models if you don’t choose the right type of visualizations. More and more data is collected from consumers without their explicit knowledge or permission.
Receive information about updated researches and release information of new researches by subscribing to our mailing list. No SPAM, and we will never ever share or sell your email address to third parties. Sign up at:
http://www.100FirstHits.com/BigData.html