• No results found

Big Data reconsidered Separating Hype from Reality for Hosting and Cloud Providers

N/A
N/A
Protected

Academic year: 2021

Share "Big Data reconsidered Separating Hype from Reality for Hosting and Cloud Providers"

Copied!
46
0
0

Loading.... (view fulltext now)

Full text

(1)

© 2013 by The 451 Group. All rights reserved

‘Big Data’ reconsidered

Separating Hype from Reality for Hosting and Cloud Providers

Matthew Aslett

(2)

© 2013 by The 451 Group. All rights reserved  Matthew Aslett

• Research Director, Data Management and Analytics  [email protected]

 www.twitter.com/maslett

 Responsible for data management and analytics research agenda

 Focus on operational and analytic databases, including NoSQL,

NewSQL, and Hadoop

(3)

© 2013 by The 451 Group. All rights reserved

Agenda

 ‘Big Data’: reconsidered

• Definition

• Drivers

• Technologies

• Use-cases

 Opportunities for hosting and cloud providers

• Data and the cloud

• Use-cases

• Opportunities and challenges

 HCTS preview

• What to expect at HCTS 2013

(4)

© 2013 by The 451 Group. All rights reserved

Big Data: reconsidered

 ‘Big Data’ has the potential to revolutionize the IT industry by enabling new business insight based on previously-ignored and under-utilized data.

 ‘Big Data’ is massively over-hyped

 The challenge – for users,

vendors, investors and service providers alike (not to mention analysts) – is to negotiate towards a successful outcome while

avoiding the pitfalls Image: sezzles on Flickr http://bit.ly/12gbY7j

(5)

© 2013 by The 451 Group. All rights reserved

To understand big data…

 There is no such thing as ‘big’ data

 ‘Big data’ is primarily driven by economics, not data

 There is no single ‘big data’ use-case

 You don’t need a ‘big data’ strategy

 ‘Big data’ is not just for Web startups

 There is no single ‘big data’ platform

 There is no such thing as ‘big data as a service’

(6)

© 2013 by The 451 Group. All rights reserved

There’s no such thing as “big” data

 As in “data that is big”. It’s a trend, not a class or type of data

 ‘Big data’ is phrase applied to new approaches to storing, processing and analyzing data

(7)

© 2013 by The 451 Group. All rights reserved

There’s no such thing as “big” data

 As in “data that is big”. It’s a trend, not a class or type of data

 ‘Big data’ is phrase applied to new approaches to storing, processing and analyzing data

 But ‘Big data’ had no agreed definition, until June 2013

 Good ‘layman’s’ definition, but overly simplistic, and focuses solely on volume

(8)

© 2013 by The 451 Group. All rights reserved

 ‘Big data’ is used to describe data in relation to its growing:

The three (or more) Vs

Velocity

The data is being produced at a rate that is beyond the performance limits of traditional

systems

Volume

The volume of data is too large for

traditional database software tools to cope with

Variety

The data lacks the structure to make it suitable for storage and analysis in

traditional databases and data warehouses

(9)

© 2013 by The 451 Group. All rights reserved

 ‘Big data’ is used to describe data in relation to its growing:

The three (or more) Vs

Velocity

The data is being produced at a rate that is beyond the performance limits of traditional

systems

Volume

The volume of data is too large for

traditional database software tools to cope with

Variety

The data lacks the structure to make it suitable for storage and analysis in

traditional databases and data warehouses

(10)

© 2013 by The 451 Group. All rights reserved

 ‘Big data’ is used to describe data in relation to its growing:

 However, data has always been growing in terms of the 3Vs

 And data will always continue to grow in terms of the 3Vs

The three (or more) Vs

Velocity

The data is being produced at a rate that is beyond the performance limits of traditional

systems

Volume

The volume of data is too large for

traditional database software tools to cope with

Variety

The data lacks the structure to make it suitable for storage and analysis in

traditional databases and data warehouses

(11)

© 2013 by The 451 Group. All rights reserved

 ‘Big data’ is used to describe data in relation to its growing:

 However, data has always been growing in terms of the 3Vs

 And data will always continue to grow in terms of the 3Vs

The three (or more) Vs

Velocity

The data is being produced at a rate that is beyond the performance limits of traditional

systems

Volume

The volume of data is too large for

traditional database software tools to cope with

Variety

The data lacks the structure to make it suitable for storage and analysis in

traditional databases and data warehouses

Plus:  Veracity  Variability  Validity  Viability  Value  V…  V…

(12)

© 2013 by The 451 Group. All rights reserved

 ‘Big data’ is used to describe data in relation to its growing:

 However, data has always been growing in terms of the 3Vs

 And data will always continue to grow in terms of the 3Vs

 We stand by our statement that ‘Big data’ has no agreed definition

The three (or more) Vs

Velocity

The data is being produced at a rate that is beyond the performance limits of traditional

systems

Volume

The volume of data is too large for

traditional database software tools to cope with

Variety

The data lacks the structure to make it suitable for storage and analysis in

traditional databases and data warehouses

Plus:  Veracity  Variability  Validity  Viability  Value  V…  V…

(13)

© 2013 by The 451 Group. All rights reserved

There’s no such thing as “big” data

 As a result ‘Big data’ is being (mis)applied to almost anything related to data, whether appropriate or not

 And ‘Big data’ as a term is also being used to avoid explaining complex concepts

 As such, ‘Big data’ will inevitably face a backlash due to failure to live up to vague and unrealistic expectations

(14)

© 2013 by The 451 Group. All rights reserved

‘Big data’ is primarily driven by economics, not data

 It is now more economically feasible to store and process many

data sets that were previously ignored - using clusters of commodity servers and advanced data processing software.

(15)

© 2013 by The 451 Group. All rights reserved

‘Big data’ is primarily driven by economics, not data

 It is now more economically feasible to store and process many

data sets that were previously ignored - using clusters of commodity servers and advanced data processing software.

 Moved from storing 1% of data for 60 days in EDW @ $100,000/TB

(16)

© 2013 by The 451 Group. All rights reserved

‘Big data’ is primarily driven by economics, not data

 It is now more economically feasible to store and process many

data sets that were previously ignored - using clusters of commodity servers and advanced data processing software.

 Moved from storing 1% of data for 60 days in EDW @ $100,000/TB

 To 100% of data for a year in Hadoop @ $900/TB

 “Big data is what happened when the cost of keeping information became less than the cost of throwing it away.”

(17)

© 2013 by The 451 Group. All rights reserved

‘Big data’ is primarily driven by economics, not data

… are what happened when the cost of keeping information became less than the cost of throwing it away

 “Big data is what happened when the cost of keeping information became less than the cost of throwing it away?

 The processing and analysis of very large data sets in their entirety

 Increased adoption of massively parallel processing approaches

 Storage and analysis of both structured and multi-structured data

 Integration of external (social) and corporate data for more complete perspective

 Schema-free and schema-on-read approaches to data storage/analysis

 Adoption of exploratory analytic approaches to identify new patterns in data

 Predictive analytics as a fundamental component of BI strategies

 Machine-learning algorithms automate the reflection of collective intelligence

 Increased adoption of in-memory databases for rapid data ingestion

 Stream processing of sensor and other machine-generated data/events

 Real-time analysis of data prior to storage within the data warehouse/Hadoop

(18)

© 2013 by The 451 Group. All rights reserved

‘Big data’ is primarily driven by economics, not data

 “Big data is what happened when the cost of keeping information became less than the cost of throwing it away?

 The processing and analysis of very large data sets in their entirety

 Increased adoption of massively parallel processing approaches

 Storage and analysis of both structured and multi-structured data

 Integration of external (social) and corporate data for more complete perspective

 Schema-free and schema-on-read approaches to data storage/analysis

 Adoption of exploratory analytic approaches to identify new patterns in data

 Predictive analytics as a fundamental component of BI strategies

 Machine-learning algorithms automate the reflection of collective intelligence

 Increased adoption of in-memory databases for rapid data ingestion

 Stream processing of sensor and other machine-generated data/events

 Real-time analysis of data prior to storage within the data warehouse/Hadoop

 Interactive, native, SQL-based analysis of data in Hadoop and HBase

… are what what users are concentrating on

(19)

© 2013 by The 451 Group. All rights reserved

There is no single ‘big data’ use case

 User requirements:  Totality  Exploration  Frequency  Dependency  Characteristics:  Volume  Velocity  Variety  Technologies:  Hadoop

 In-memory data processing

 Stream processing

 NoSQL databases

 Machine learning

 Predictive analytics

 Object storage

 “Big data is what happened when the cost of keeping information became less than the cost of throwing it away?

(20)

© 2013 by The 451 Group. All rights reserved

Total Data approach

Big Data:

The growing volume, velocity and variety of data

BIG DATA

(21)

© 2013 by The 451 Group. All rights reserved

Total Data approach

Big Data Technologies: New technologies being adopted to store and process that data

BIG DATA TECHNOLOGY

(22)

© 2013 by The 451 Group. All rights reserved

Total Data approach

Total Data:

The user trends driving the adoption of Big Data Technologies

Processing and analysis of very large data sets in their entirety Massively parallel

processing approaches Both structured and multi-structured data External (social) and corporate data

TOTAL DATA

(23)

© 2013 by The 451 Group. All rights reserved

Total Data approach

Total Data:

The user trends driving the adoption of Big Data Technologies

Processing and analysis of very large data sets in their entirety Massively parallel

processing approaches Both structured and multi-structured data External (social) and corporate data

Schema-free and schema-on-read data storage/analysis

Predictive analytics as a fundamental BI tool Reflection of collective intelligence Identification of new patterns in data TOTAL DATA

(24)

© 2013 by The 451 Group. All rights reserved

Total Data approach

Total Data:

The user trends driving the adoption of Big Data Technologies

Processing and analysis of very large data sets in their entirety Massively parallel

processing approaches Both structured and multi-structured data External (social) and corporate data

Schema-free and schema-on-read data storage/analysis

Predictive analytics as a fundamental BI tool Reflection of collective intelligence Identification of new patterns in data

Stream processing of sensor and machine-generated data

Native, SQL-based analysis of data in Hadoop and HBase In-memory databases for rapid data ingestion Real-time analysis of data prior to storage

TOTAL DATA

(25)

© 2013 by The 451 Group. All rights reserved

Total Data approach

Processing and analysis of very large data sets in their entirety Massively parallel

processing approaches Both structured and multi-structured data External (social) and corporate data

Schema-free and schema-on-read data storage/analysis

Predictive analytics as a fundamental BI tool Reflection of collective intelligence Identification of new patterns in data

Stream processing of sensor and machine-generated data

Native, SQL-based analysis of data in Hadoop and HBase In-memory databases for rapid data ingestion Real-time analysis of data prior to storage TOTAL DATA Management alongside existing data technologies

(26)

© 2013 by The 451 Group. All rights reserved

You don’t need a ‘big data’ strategy

 “If you tell a vendor that you’re working on your Big Data strategy, you’ve made them very happy. That’s because you’ve told them two things: that you have no idea what your own actual business requirements are, and you’re ready to spend money.”

(27)

© 2013 by The 451 Group. All rights reserved

You don’t need a ‘big data’ strategy

 Google does not have a ‘big data’ strategy

 Facebook does not have a ‘big data’ strategy

 eBay does not have a ‘big data’ strategy

 Yahoo does not have a ‘big data’ strategy

 These are the companies that inspired ‘big data’ but they were all simply solving their own business problems

 “If you tell a vendor that you’re working on your Big Data strategy, you’ve made them very happy. That’s because you’ve told them two things: that you have no idea what your own actual business requirements are, and you’re ready to spend money.”

(28)

© 2013 by The 451 Group. All rights reserved

You don’t need a ‘big data’ strategy

 The processing and analysis of very large data sets in their entirety

 Increased adoption of massively parallel processing approaches

 Storage and analysis of both structured and multi-structured data

 Integration of external (social) and corporate data for more complete perspective

 Schema-free and schema-on-read approaches to data storage/analysis

 Adoption of exploratory analytic approaches to identify new patterns in data

 Predictive analytics as a fundamental component of BI strategies

 Machine-learning algorithms automate the reflection of collective intelligence

 Increased adoption of in-memory databases for rapid data ingestion

 Stream processing of sensor and other machine-generated data/events

Real-time analysis of data prior to storage within the data warehouse/Hadoop  Interactive, native, SQL-based analysis of data in Hadoop and HBase

 “If you tell a vendor that you’re working on your Big Data strategy, you’ve made them very happy. That’s because you’ve told them two things: that you have no idea what your own actual business requirements are, and you’re ready to spend money.”

(29)

© 2013 by The 451 Group. All rights reserved

‘Big data’ is not just for Web startups

Social media Web search Advertising Gaming

WEB STARTUPS

Mobile apps

(30)

© 2013 by The 451 Group. All rights reserved

‘Big data’ is not just for Web startups

Social media Web search Advertising Gaming Retail Financial services Telecommunications Healthcare

WEB STARTUPS

ENTERPRISES

(DBMS ADOPTERS)

Mobile apps Insurance

Media

Manufacturing

(31)

© 2013 by The 451 Group. All rights reserved

‘Big data’ is not just for Web startups

Social media Web search Advertising Gaming Financial services Telecommunications Healthcare Seismology Bioinformatics Aerospace

WEB STARTUPS

Intelligence

ENTERPRISES

(DBMS ADOPTERS)

PUBLIC SECTOR

(HPC ADOPTERS)

Pharma Climate Energy Academia

Mobile apps Insurance

Technology

(32)

© 2013 by The 451 Group. All rights reserved

‘Big data’ is not just for Web startups

WEB STARTUPS

HPC + DBMS

Social media Web search Advertising Gaming

ENTERPRISES

Financial services Telecommunications Healthcare Seismology Bioinformatics Aerospace Intelligence

PUBLIC SECTOR

Pharma Mobile apps

Climate Insurance

Energy

Technology Academia

Retail Media Manufacturing

(33)

© 2013 by The 451 Group. All rights reserved

‘Big data’ and the cloud

 Cloud computing is a logical endpoint for the combination of grid computing, HPC, virtualization and utility computing

(34)

© 2013 by The 451 Group. All rights reserved

‘Big data’ and the cloud

 Cloud computing is a logical endpoint for the combination of grid computing, HPC, virtualization and utility computing

 As such, cloud computing feels like a potential logical endpoint for ‘big data’

(35)

© 2013 by The 451 Group. All rights reserved

‘Big data’ and the cloud

 Cloud computing is a logical endpoint for the combination of grid computing, HPC, virtualization and utility computing

 As such, cloud computing feels like a potential logical endpoint for ‘big data’

 Barriers to adoption

• Data security, privacy and management concerns

• Complexity of large-scale data migration and configuration

• Database software, and licensing, is not designed for the cloud

 Opportunities

• Complex to configure, deploy and manage, and skilled staff are at a premium

• Enterprises want to make the most of existing tools/skills

• Opportunity for service providers to mask complexity with managed services rather than self-service cloud

(36)

© 2013 by The 451 Group. All rights reserved

‘Big data’ and the cloud

 Potential for increased revenue for DBaaS providers

 For 451 Research clients:

• “DBaaS poised to drive next-generation database

growth”https://451research.com/report-short?entityId=78105

• “Next-Generation Operational Databases:

2012-2016”https://451research.com/report-long?icid=2852

 Free (registration required) presentation of key findings

• http://www.brainshark.com/the451group/nextgenopDB?tx=maslett

(37)

© 2013 by The 451 Group. All rights reserved

There is no single ‘big data’ platform

 Whereas the traditional data management platform is relatively straightforward

(38)

© 2013 by The 451 Group. All rights reserved

There is no single ‘big data’ platform

 Whereas the traditional data management platform is relatively straightforward  A ‘big data’ management platform is made up of a combination of multiple optional components and moving parts

(39)

© 2013 by The 451 Group. All rights reserved

There is no such thing as ‘big data as a service’

 But there are opportunities for hosting and cloud providers at key points in the stack

 Particularly through

(40)

© 2013 by The 451 Group. All rights reserved

There is no such thing as ‘big data as a service’

 Hadoop vendors: • Cloudera • Hortonworks • MapR • Pivotal  Hadoop enablers: • Infochimps • ZettaSet • Continuuity • Qubole  Hadoop-aaS providers: • TreasureData • VertiCloud • Mortar Data

(41)

© 2013 by The 451 Group. All rights reserved

There is no such thing as ‘big data as a service’

 NoSQL vendors: • Basho • 10gen • DataStax • Couchbase • Neo4j • Objectivity • Sqrrl Data  NoSQL-aaS providers: • Cloudant • MongoHQ • Garantia Data

(42)

© 2013 by The 451 Group. All rights reserved

There is no such thing as ‘big data as a service’

 NewSQL vendors: • NuoDB • ParElastic • GenieDB • Clustrix • TransLattice • ScaleBase • ScaleArc • Continuent  DW-aaS providers: • Kognitio • 1010data • BitYota • TempoDB

(43)

© 2013 by The 451 Group. All rights reserved

451 Research's annual HCTS-North America is the premier forum for executives in the hosting, cloud computing, datacenter and Internet infrastructure sectors.

Creating a Digital Infrastructure Playbook for Next-Generation Datacenters – 24th 9:00 - 9:30am

Data in the Cloud: Best Practices – 24th 1:30 - 2:00pm

Who is doing what with big data in the cloud (public and private) – and how do they balance the two? How do they determine best execution venue decisions etc? And what are service providers doing to help?

The Rise of the Machines – 25th 1:25 - 2:05pm

The Cloud as a Platform for Big Data Analytics of System Traffic and Log Monitoring for Better Systems Management

(44)

© 2013 by The 451 Group. All rights reserved

HCTS-North America – Special Offer for Today’s Webinar Attendees!

451 Research's annual HCTS-North America is the premier forum for executives in the hosting, cloud computing, datacenter and Internet infrastructure sectors.

$250 Discount for Today’s Webinar Attendees.* Please use discount code: WEBINAR250

For more information, or to register, please visit:

http://na.hostingtransformation.com/

(45)

© 2013 by The 451 Group. All rights reserved

A recording of this webinar along with the slides will be

posted on our website within 48 hours:

(46)

© 2013 by The 451 Group. All rights reserved

Questions? Comments?

[email protected]

@maslett

References

Related documents

World Health Organization and the European research Organization on Genital Infection and Neoplasia in the year 2000 mentioned that HPV testing showed

intensive care unit. Prevalence and factors of intensive care unit conflicts: the conflicus study. Am J Respir Crit Care Med. Conflicts in the ICU: perspectives of

In conclusion, for the studied Taiwanese population of diabetic patients undergoing hemodialysis, increased mortality rates are associated with higher average FPG levels at 1 and

We hope you will join the Cancer Support Community Redondo Beach at our 18th Annual Celebrate Wellness – A Food & Wine Tasting Event in the Garden at the South Coast

Among many TCM medical and philosophical concepts, I specifically focus on the healing, the silence and the miracle cure and how they are embodied and co-constructed by

The Trends in International Mathematics and Science Study (TIMSS) uses five dimen- sions related to school climate: class learning environment, discipline, safety, absence of

considered this study especially useful for the desired purpose because Martin and col- leagues used latent indices scaled under the assumption of cross-national measurement

While suppliers will continue to work to monetize the computing and network assets that underpin the cloud services, it is the operational expertise of billing