© 2013 by The 451 Group. All rights reserved
‘Big Data’ reconsidered
Separating Hype from Reality for Hosting and Cloud Providers
Matthew Aslett
© 2013 by The 451 Group. All rights reserved Matthew Aslett
• Research Director, Data Management and Analytics [email protected]
www.twitter.com/maslett
Responsible for data management and analytics research agenda
Focus on operational and analytic databases, including NoSQL,
NewSQL, and Hadoop
© 2013 by The 451 Group. All rights reserved
Agenda
‘Big Data’: reconsidered
• Definition
• Drivers
• Technologies
• Use-cases
Opportunities for hosting and cloud providers
• Data and the cloud
• Use-cases
• Opportunities and challenges
HCTS preview
• What to expect at HCTS 2013
© 2013 by The 451 Group. All rights reserved
Big Data: reconsidered
‘Big Data’ has the potential to revolutionize the IT industry by enabling new business insight based on previously-ignored and under-utilized data.
‘Big Data’ is massively over-hyped
The challenge – for users,
vendors, investors and service providers alike (not to mention analysts) – is to negotiate towards a successful outcome while
avoiding the pitfalls Image: sezzles on Flickr http://bit.ly/12gbY7j
© 2013 by The 451 Group. All rights reserved
To understand big data…
There is no such thing as ‘big’ data
‘Big data’ is primarily driven by economics, not data
There is no single ‘big data’ use-case
You don’t need a ‘big data’ strategy
‘Big data’ is not just for Web startups
There is no single ‘big data’ platform
There is no such thing as ‘big data as a service’
© 2013 by The 451 Group. All rights reserved
There’s no such thing as “big” data
As in “data that is big”. It’s a trend, not a class or type of data
‘Big data’ is phrase applied to new approaches to storing, processing and analyzing data
© 2013 by The 451 Group. All rights reserved
There’s no such thing as “big” data
As in “data that is big”. It’s a trend, not a class or type of data
‘Big data’ is phrase applied to new approaches to storing, processing and analyzing data
But ‘Big data’ had no agreed definition, until June 2013
Good ‘layman’s’ definition, but overly simplistic, and focuses solely on volume
© 2013 by The 451 Group. All rights reserved
‘Big data’ is used to describe data in relation to its growing:
The three (or more) Vs
Velocity
The data is being produced at a rate that is beyond the performance limits of traditional
systems
Volume
The volume of data is too large for
traditional database software tools to cope with
Variety
The data lacks the structure to make it suitable for storage and analysis in
traditional databases and data warehouses
© 2013 by The 451 Group. All rights reserved
‘Big data’ is used to describe data in relation to its growing:
The three (or more) Vs
Velocity
The data is being produced at a rate that is beyond the performance limits of traditional
systems
Volume
The volume of data is too large for
traditional database software tools to cope with
Variety
The data lacks the structure to make it suitable for storage and analysis in
traditional databases and data warehouses
© 2013 by The 451 Group. All rights reserved
‘Big data’ is used to describe data in relation to its growing:
However, data has always been growing in terms of the 3Vs
And data will always continue to grow in terms of the 3Vs
The three (or more) Vs
Velocity
The data is being produced at a rate that is beyond the performance limits of traditional
systems
Volume
The volume of data is too large for
traditional database software tools to cope with
Variety
The data lacks the structure to make it suitable for storage and analysis in
traditional databases and data warehouses
© 2013 by The 451 Group. All rights reserved
‘Big data’ is used to describe data in relation to its growing:
However, data has always been growing in terms of the 3Vs
And data will always continue to grow in terms of the 3Vs
The three (or more) Vs
Velocity
The data is being produced at a rate that is beyond the performance limits of traditional
systems
Volume
The volume of data is too large for
traditional database software tools to cope with
Variety
The data lacks the structure to make it suitable for storage and analysis in
traditional databases and data warehouses
Plus: Veracity Variability Validity Viability Value V… V…
© 2013 by The 451 Group. All rights reserved
‘Big data’ is used to describe data in relation to its growing:
However, data has always been growing in terms of the 3Vs
And data will always continue to grow in terms of the 3Vs
We stand by our statement that ‘Big data’ has no agreed definition
The three (or more) Vs
Velocity
The data is being produced at a rate that is beyond the performance limits of traditional
systems
Volume
The volume of data is too large for
traditional database software tools to cope with
Variety
The data lacks the structure to make it suitable for storage and analysis in
traditional databases and data warehouses
Plus: Veracity Variability Validity Viability Value V… V…
© 2013 by The 451 Group. All rights reserved
There’s no such thing as “big” data
As a result ‘Big data’ is being (mis)applied to almost anything related to data, whether appropriate or not
And ‘Big data’ as a term is also being used to avoid explaining complex concepts
As such, ‘Big data’ will inevitably face a backlash due to failure to live up to vague and unrealistic expectations
© 2013 by The 451 Group. All rights reserved
‘Big data’ is primarily driven by economics, not data
It is now more economically feasible to store and process many
data sets that were previously ignored - using clusters of commodity servers and advanced data processing software.
© 2013 by The 451 Group. All rights reserved
‘Big data’ is primarily driven by economics, not data
It is now more economically feasible to store and process many
data sets that were previously ignored - using clusters of commodity servers and advanced data processing software.
Moved from storing 1% of data for 60 days in EDW @ $100,000/TB
© 2013 by The 451 Group. All rights reserved
‘Big data’ is primarily driven by economics, not data
It is now more economically feasible to store and process many
data sets that were previously ignored - using clusters of commodity servers and advanced data processing software.
Moved from storing 1% of data for 60 days in EDW @ $100,000/TB
To 100% of data for a year in Hadoop @ $900/TB
“Big data is what happened when the cost of keeping information became less than the cost of throwing it away.”
© 2013 by The 451 Group. All rights reserved
‘Big data’ is primarily driven by economics, not data
… are what happened when the cost of keeping information became less than the cost of throwing it away
“Big data is what happened when the cost of keeping information became less than the cost of throwing it away?
The processing and analysis of very large data sets in their entirety
Increased adoption of massively parallel processing approaches
Storage and analysis of both structured and multi-structured data
Integration of external (social) and corporate data for more complete perspective
Schema-free and schema-on-read approaches to data storage/analysis
Adoption of exploratory analytic approaches to identify new patterns in data
Predictive analytics as a fundamental component of BI strategies
Machine-learning algorithms automate the reflection of collective intelligence
Increased adoption of in-memory databases for rapid data ingestion
Stream processing of sensor and other machine-generated data/events
Real-time analysis of data prior to storage within the data warehouse/Hadoop
© 2013 by The 451 Group. All rights reserved
‘Big data’ is primarily driven by economics, not data
“Big data is what happened when the cost of keeping information became less than the cost of throwing it away?
The processing and analysis of very large data sets in their entirety
Increased adoption of massively parallel processing approaches
Storage and analysis of both structured and multi-structured data
Integration of external (social) and corporate data for more complete perspective
Schema-free and schema-on-read approaches to data storage/analysis
Adoption of exploratory analytic approaches to identify new patterns in data
Predictive analytics as a fundamental component of BI strategies
Machine-learning algorithms automate the reflection of collective intelligence
Increased adoption of in-memory databases for rapid data ingestion
Stream processing of sensor and other machine-generated data/events
Real-time analysis of data prior to storage within the data warehouse/Hadoop
Interactive, native, SQL-based analysis of data in Hadoop and HBase
… are what what users are concentrating on
© 2013 by The 451 Group. All rights reserved
There is no single ‘big data’ use case
User requirements: Totality Exploration Frequency Dependency Characteristics: Volume Velocity Variety Technologies: Hadoop
In-memory data processing
Stream processing
NoSQL databases
Machine learning
Predictive analytics
Object storage
“Big data is what happened when the cost of keeping information became less than the cost of throwing it away?
© 2013 by The 451 Group. All rights reserved
Total Data approach
Big Data:
The growing volume, velocity and variety of data
BIG DATA
© 2013 by The 451 Group. All rights reserved
Total Data approach
Big Data Technologies: New technologies being adopted to store and process that data
BIG DATA TECHNOLOGY
© 2013 by The 451 Group. All rights reserved
Total Data approach
Total Data:
The user trends driving the adoption of Big Data Technologies
Processing and analysis of very large data sets in their entirety Massively parallel
processing approaches Both structured and multi-structured data External (social) and corporate data
TOTAL DATA
© 2013 by The 451 Group. All rights reserved
Total Data approach
Total Data:
The user trends driving the adoption of Big Data Technologies
Processing and analysis of very large data sets in their entirety Massively parallel
processing approaches Both structured and multi-structured data External (social) and corporate data
Schema-free and schema-on-read data storage/analysis
Predictive analytics as a fundamental BI tool Reflection of collective intelligence Identification of new patterns in data TOTAL DATA
© 2013 by The 451 Group. All rights reserved
Total Data approach
Total Data:
The user trends driving the adoption of Big Data Technologies
Processing and analysis of very large data sets in their entirety Massively parallel
processing approaches Both structured and multi-structured data External (social) and corporate data
Schema-free and schema-on-read data storage/analysis
Predictive analytics as a fundamental BI tool Reflection of collective intelligence Identification of new patterns in data
Stream processing of sensor and machine-generated data
Native, SQL-based analysis of data in Hadoop and HBase In-memory databases for rapid data ingestion Real-time analysis of data prior to storage
TOTAL DATA
© 2013 by The 451 Group. All rights reserved
Total Data approach
Processing and analysis of very large data sets in their entirety Massively parallel
processing approaches Both structured and multi-structured data External (social) and corporate data
Schema-free and schema-on-read data storage/analysis
Predictive analytics as a fundamental BI tool Reflection of collective intelligence Identification of new patterns in data
Stream processing of sensor and machine-generated data
Native, SQL-based analysis of data in Hadoop and HBase In-memory databases for rapid data ingestion Real-time analysis of data prior to storage TOTAL DATA Management alongside existing data technologies
© 2013 by The 451 Group. All rights reserved
You don’t need a ‘big data’ strategy
“If you tell a vendor that you’re working on your Big Data strategy, you’ve made them very happy. That’s because you’ve told them two things: that you have no idea what your own actual business requirements are, and you’re ready to spend money.”
© 2013 by The 451 Group. All rights reserved
You don’t need a ‘big data’ strategy
Google does not have a ‘big data’ strategy
Facebook does not have a ‘big data’ strategy
eBay does not have a ‘big data’ strategy
Yahoo does not have a ‘big data’ strategy
These are the companies that inspired ‘big data’ but they were all simply solving their own business problems
“If you tell a vendor that you’re working on your Big Data strategy, you’ve made them very happy. That’s because you’ve told them two things: that you have no idea what your own actual business requirements are, and you’re ready to spend money.”
© 2013 by The 451 Group. All rights reserved
You don’t need a ‘big data’ strategy
The processing and analysis of very large data sets in their entirety
Increased adoption of massively parallel processing approaches
Storage and analysis of both structured and multi-structured data
Integration of external (social) and corporate data for more complete perspective
Schema-free and schema-on-read approaches to data storage/analysis
Adoption of exploratory analytic approaches to identify new patterns in data
Predictive analytics as a fundamental component of BI strategies
Machine-learning algorithms automate the reflection of collective intelligence
Increased adoption of in-memory databases for rapid data ingestion
Stream processing of sensor and other machine-generated data/events
Real-time analysis of data prior to storage within the data warehouse/Hadoop Interactive, native, SQL-based analysis of data in Hadoop and HBase
“If you tell a vendor that you’re working on your Big Data strategy, you’ve made them very happy. That’s because you’ve told them two things: that you have no idea what your own actual business requirements are, and you’re ready to spend money.”
© 2013 by The 451 Group. All rights reserved
‘Big data’ is not just for Web startups
Social media Web search Advertising Gaming
WEB STARTUPS
Mobile apps© 2013 by The 451 Group. All rights reserved
‘Big data’ is not just for Web startups
Social media Web search Advertising Gaming Retail Financial services Telecommunications Healthcare
WEB STARTUPS
ENTERPRISES
(DBMS ADOPTERS)
Mobile apps InsuranceMedia
Manufacturing
© 2013 by The 451 Group. All rights reserved
‘Big data’ is not just for Web startups
Social media Web search Advertising Gaming Financial services Telecommunications Healthcare Seismology Bioinformatics Aerospace
WEB STARTUPS
IntelligenceENTERPRISES
(DBMS ADOPTERS)
PUBLIC SECTOR
(HPC ADOPTERS)
Pharma Climate Energy AcademiaMobile apps Insurance
Technology
© 2013 by The 451 Group. All rights reserved
‘Big data’ is not just for Web startups
WEB STARTUPS
HPC + DBMS
Social media Web search Advertising GamingENTERPRISES
Financial services Telecommunications Healthcare Seismology Bioinformatics Aerospace IntelligencePUBLIC SECTOR
Pharma Mobile apps
Climate Insurance
Energy
Technology Academia
Retail Media Manufacturing
© 2013 by The 451 Group. All rights reserved
‘Big data’ and the cloud
Cloud computing is a logical endpoint for the combination of grid computing, HPC, virtualization and utility computing
© 2013 by The 451 Group. All rights reserved
‘Big data’ and the cloud
Cloud computing is a logical endpoint for the combination of grid computing, HPC, virtualization and utility computing
As such, cloud computing feels like a potential logical endpoint for ‘big data’
© 2013 by The 451 Group. All rights reserved
‘Big data’ and the cloud
Cloud computing is a logical endpoint for the combination of grid computing, HPC, virtualization and utility computing
As such, cloud computing feels like a potential logical endpoint for ‘big data’
Barriers to adoption
• Data security, privacy and management concerns
• Complexity of large-scale data migration and configuration
• Database software, and licensing, is not designed for the cloud
Opportunities
• Complex to configure, deploy and manage, and skilled staff are at a premium
• Enterprises want to make the most of existing tools/skills
• Opportunity for service providers to mask complexity with managed services rather than self-service cloud
© 2013 by The 451 Group. All rights reserved
‘Big data’ and the cloud
Potential for increased revenue for DBaaS providers
For 451 Research clients:
• “DBaaS poised to drive next-generation database
growth”https://451research.com/report-short?entityId=78105
• “Next-Generation Operational Databases:
2012-2016”https://451research.com/report-long?icid=2852
Free (registration required) presentation of key findings
• http://www.brainshark.com/the451group/nextgenopDB?tx=maslett
© 2013 by The 451 Group. All rights reserved
There is no single ‘big data’ platform
Whereas the traditional data management platform is relatively straightforward
© 2013 by The 451 Group. All rights reserved
There is no single ‘big data’ platform
Whereas the traditional data management platform is relatively straightforward A ‘big data’ management platform is made up of a combination of multiple optional components and moving parts
© 2013 by The 451 Group. All rights reserved
There is no such thing as ‘big data as a service’
But there are opportunities for hosting and cloud providers at key points in the stack
Particularly through
© 2013 by The 451 Group. All rights reserved
There is no such thing as ‘big data as a service’
Hadoop vendors: • Cloudera • Hortonworks • MapR • Pivotal Hadoop enablers: • Infochimps • ZettaSet • Continuuity • Qubole Hadoop-aaS providers: • TreasureData • VertiCloud • Mortar Data
© 2013 by The 451 Group. All rights reserved
There is no such thing as ‘big data as a service’
NoSQL vendors: • Basho • 10gen • DataStax • Couchbase • Neo4j • Objectivity • Sqrrl Data NoSQL-aaS providers: • Cloudant • MongoHQ • Garantia Data
© 2013 by The 451 Group. All rights reserved
There is no such thing as ‘big data as a service’
NewSQL vendors: • NuoDB • ParElastic • GenieDB • Clustrix • TransLattice • ScaleBase • ScaleArc • Continuent DW-aaS providers: • Kognitio • 1010data • BitYota • TempoDB
© 2013 by The 451 Group. All rights reserved
451 Research's annual HCTS-North America is the premier forum for executives in the hosting, cloud computing, datacenter and Internet infrastructure sectors.
Creating a Digital Infrastructure Playbook for Next-Generation Datacenters – 24th 9:00 - 9:30am
Data in the Cloud: Best Practices – 24th 1:30 - 2:00pm
Who is doing what with big data in the cloud (public and private) – and how do they balance the two? How do they determine best execution venue decisions etc? And what are service providers doing to help?
The Rise of the Machines – 25th 1:25 - 2:05pm
The Cloud as a Platform for Big Data Analytics of System Traffic and Log Monitoring for Better Systems Management
© 2013 by The 451 Group. All rights reserved
HCTS-North America – Special Offer for Today’s Webinar Attendees!
451 Research's annual HCTS-North America is the premier forum for executives in the hosting, cloud computing, datacenter and Internet infrastructure sectors.
$250 Discount for Today’s Webinar Attendees.* Please use discount code: WEBINAR250
For more information, or to register, please visit:
http://na.hostingtransformation.com/
© 2013 by The 451 Group. All rights reserved