SEBASTIAN LAND - RAPIDMINER GMBH, DORTMUND
BIG DATA
BETWEEN REVOLUTION AND CONFUSION
Don‘ts – „Me, too“ effect
• Starting a Big Data Project for all costs • But without purpose
• Likely to result in „scorched earth“
• Topic won‘t be touched again as „we had that already“
Don‘ts – Big is Better
• Starting a Big Data Project as Big is better • Replacing existing solution
• Doomed to turn out being worse and more expensive
Don‘ts – Believe marketing
• Starting a Big Data Project believing it’s easy
• So existing staffing should be sufficient • Results in bad maintained infrastructure
• Projects will be delayed, expensive expert time wasted
Don‘ts – Believe you can see
• Humans are not made for orienting in huge data sets
• Evolutionary optimized to see patterns
• On Big Data its even more probable to be pure chance
• Don’t aim for a “simple” visualization of Big Data: It will simply lie to you in a cool way
Don‘ts – Save money
• Starting a Big Data Project as it’s cheap • Software is free, small commodity
hardware is sufficient
• Maintenance costs will explode, infrastructure not up to the tasks
• Projects will be delayed, expensive expert time wasted
Don‘ts – Ignore legal aspects
• Big Data projects are more likely to infringe privacy regulations
• Ignoring that fact might result in a huge set back during projects
• Suddenly it turns out a ready to go solution is illegal
Why?
• Why does it happen that so many project fall by the same mistakes?
• Fundamental lack of understanding of “Big Data”
Big Confusion
• Marketing departments across the board contributed to the confusion:
– Oversimplyfying (real Big Data vendors) – Stretching the term (Database vendors)
– Even more (Business Intelligence vendors) – Hopping on the buzz (Consulting companies
and all others)
• Nobody wants to listen the ridiculous
Another approach
• Proposition to approach it from another angle
• Define a real problem
• See where Big Data can help us
Typical Problems
• Win new customers
– Send marketing material – Special Offers
• Keep old customers
– Better Service – Special Offers
– Send marketing material
Costs
• All three different actions cost money • How to save money?
• If we would know:
– Would a customer churn without the special offer
– Would a customer buy the product if he receives the material
Crystal ball
• All are information about a potential future • With these information, we can
personalize the campaigns and maximize effectivity
• We need a crystal ball to look into future • Or more likely to work out: Predictive
Predictive Analytics
• Takes data about customers
• Searches for pattern that influence the probability whether customer
– Churned
– Bought a product – …
• With these patterns we can predict that for other customers
Data = Big Data?
• Number of customers is nearly never Big • We can easily process that on a single
computer
• But it’s crucial that the data we have about a customer is related to the decision we
Finally Big Data
• Nowadays we leave a lot of traces with our digital lifestyle:
– Logs of Webservers and Apps
– Sensor data about our environment – …
• If we face a problem, where such data becomes important for the decision, we should take it into account
Utilizing Big Data
• Therefore we need to extract that part of the data from the mass of data, that is relevant for our project
• And connect it to the single customer • Once the connection is made, we can
push that through our Crystal Ball of predictive analytics
Do‘s
• Before starting a project set a clear goal • Embrace new approach: Derive tactical
decisions on individual level, not hunt for global effects
• Use domain knowledge to guide pattern search
Do‘s
• Consider if you really need Big Data to reach goal: Think big, start small
• Many goals can be achieved to a
substantial degree with easier technology much faster
Do‘s
• But if you see reasonable chances,
integrate Big Data approach with standard approaches
• Take your time with tool evaluation, never buy on marketing message alone. Keep growth and later project phases in mind. • Hire the experts first, let them help you
Why „Big Data Revolution“?
• First time we leave accessible traces about our daily life
• First time computers exist that can analyze this data
• This will change our lives!
• Computer will learn a lot from our traces and replace us in more and more areas
No revolution without danger
• The combination of Big Data and
Predictive Analytics could become Dual Use technology of the 21st century
• Further automation will questionnaire the foundation of our society: Selling your
workforce to make a living
• As a society we need to find answers on that