Barriers
So what is Big Data?!
Big Data is the “modern scale” at which we are defining or data usage challenges. Big Data begins at the point where need to seriously start thinking about the technologies used to drive our information needs.!
While Big Data as a term seems to refer to volume this isnʼt the case. Many existing technologies have little problem physically handling large volumes (TB or PB) of data. Instead the Big Data challenges result out of the combination of volume and our usage demands from that data. And those usage demands are nearly always tied to timeliness.!
Big Data is therefore the push to utilize “modern” volumes of data within “modern” timeframes. The exact definitions are of course are relative & constantly changing, however right now this is somewhere along the path towards the end goal. This is of course the ability to handle an unlimited volume of data, processing all requests in real time.!
So what are Big Data technologies?!
More than at any point in the past, data related technologies are the focus of research & innovation. But Big Data challenges wonʼt be solved anytime soon by a single approach. Keeping in mind all the different platforms that Big Data is having an impact on (web, cloud, enterprise, mobile) combined with all the Big Data domain challenges (transaction processing, analytics, data mining, visualization) as well as many of the Big Data characteristic requirements (volume, timeliness, availability, consistency), it is easy to see how no single technology will provide a cover-all solution for the eclectic mix of needs. Instead a broad set of technologies that are each focused on meeting specific set of needs are improving our ability to manage data at scale. !
A few common areas of innovation that I describe as Big Data technologies include: MPP Analytics, Cloud Data Services, Hadoop & Map/Reduce (and associate technologies such as HBase, Pig & Hive), In-Memory Databases, some Distributed NoSQL databaes and some Distributed Transaction Processing databases.!
So what is the point of Big Data?!
Someone asked me if Big Data was just tools to “try and sell them more relevant crap they donʼt want”. While up-sell & targeted advertising are too major uses of Big Data technologies I hope that mine and others work in this field does result achievements more significant than just these.!
When describing the point of Big Data I like to think about how the Internet has changed my life in general. By having unlimited & timely access to information we are now better informed in all areas of our existence than ever before. However, we are now facing the problem that there is fast becoming too much data for us to digest in its raw form. To move forward in our understanding we will need to rely on technology to provide timely, summarized & relevant data across all aspects of our lives. This is what those working in Big Data are setting out to achieve.!
A few weeks back I attended venture firm Accel Partners' New Data Workshop event and learned quite a bit about the state of what we are now commonly referring to as "big data" and the challenA few weeks back I attended venture firm Accel Partners' New Data Workshop event and learned quite a bit about the state of what we are now commonly referring to as "big data" and the challenges that await the vendors trying to target this new way of slicing and dicing vast amounts of information.
One of the big takeaways for me was the realization that even with all of the processing power available nowadays, the amount of data is growing at such a rapid pace that people are simply looking to cope with the problem, rather than facing it head on.
The issue of processing large amounts of data is not necessarily new--most developers and IT staff can tell you about having too much information to deal with--but, the big difference is that there are new approaches, tools and technologies that can help alleviate the difficult in processing.
Trends in New Data
(Credit: Ping Li, Accel Partners)
Over the course of the last 30 years or so the way that machines process transactions has changed, but so too has the vast amount of data that is being processed and collected, now with an eye toward real-time analysis of information.
This has led to the advent of a number of technologies that allow for data processing to be offloaded and managed in both structured and unstructured ways--examples include open-source projects like Memcached and Hadoop as well as NoSQL data storage mechanisms like Cassandra.
For the moment the blocking factor facing any and all of these technologies is the level of difficulty associated with using the software that wasn't initially designed to be used by less-skilled developers and IT staff.
Importantly, the rise of these big data products has introduced a new wave of vendors includingNorthscale (focused on Memcached and Membase), Cloudera (focused on Hadoop) andRiptano (focused on Cassandra) that have expertise in the technology and appear poised to ride the wave, assuming that enterprises are interested and willing to pay. (Disclosure: I am an advisor to Riptano.)