Conventionally, semantic web systems generate metadata and identified en- tities explicitly, ie. by hand or as the output of database values. But as anybody who’s tried to get users to do it will tell you, generating metadata is hard. This is part of why the full semantic web dream isn’t yet realized. Analytical ap- proaches take a different approach: surfacing and classifying the metadata from analysis of the actual content and data itself. (Freely exposing metadata is also controversial and risky, as open data advocates will attest.)
Once big data techniques have been successfully applied, you have identified entities and the connections between them. If you want to join that informa- tion up to the rest of the web, or to concepts outside of your system, you need
a language in which to do that. You need to organize, exchange and reason about those entities. It’s this framework that has been steadily built up over the last 15 years with the semantic web project.
To give an already widespread example: many data scientists use Wikipedia
to help with entity resolution and disambiguation, using Wikipedia URLs to identify entities. This is a classic use of the most fundamental of semantic web technologies: the URI.
For Strata, as our New York series of conferences approaches, we will be start- ing to include a little more semantic web, but with a strict emphasis on utility. Strata itself is not as much beholden to big data, as about being data-driven, and the ongoing consequences that has for technology, business and society.
Big data: Global good or zero-sum arms race?
It remains to be seen if big data will catalyze exponential growth.
by Jim Stogdill
Last month, Netezza CEO Jim Baum gave a talk at the GigaOM big data event. If I’m honest, I was checking my email and missed most of it, but I do remember tuning in just in time to hear him say something like “big data is going to have a huge economic impact.”
I spend most of my days considering how the component pieces of this big data transformation will impact the corporate enterprise. Baum’s comment got me thinking, though, about a more meta question: Is “big data” a key to some kind of industrial revolution reboot? Or, is it just going to be expensive table stakes for previously simple-to-understand businesses?
For 200-plus years the industrial revolution* has been a kind of Moore’s law of human productivity. Over that period our economic output per person has been growing like clockwork, and whatever you think of the various political -isms that sprung from industrialization, this march of productivity has pulled
a lot of people out of poverty and is cause for the first sustained increase in wealth across human history.
But like Moore’s law in a single core, our industrial revolution in advanced economies is kinda playing out. Our economy has been shifting for some time toward services that are proving to be impervious to order-of-magnitude pro- ductivity gains. The thousand-fold increases in productivity we saw on the farm and in the factory just don’t seem likely to happen in health care and other service intensive sectors.
Of course our economy continues to grow, but at a rate that is staying just a skosh ahead of population growth. And since the top 1% are taking all of that (and perhaps more), for the first time in American history parents are worrying that their kids won’t have opportunities better than their own. Voila! There stems the populist anger that feeds the Tea Party.
That’s the U.S.-centric view. Of course on a global basis there is tremendous growth as late-stage industrial revolution innovations are applied with vigor to developing economies. The 8% growth rates many countries are achieving will double their population’s wealth every 10 years. But for the U.S., achieving growth requires parallelism. Of course, in this context we call it globalism and it means if we can’t be more productive in one place, we have to take advantage of modern communications to do it in a bunch of other cheaper places. The problem is, after a few decades of those 8% overseas growth rates, there will be less comparative advantage for us to take advantage of and if we want con- tinued economic growth, we really will need to find ways to be more produc- tive.
So, that’s why Baum’s comment stuck in my head.
At the risk of way over generalizing, so far “big data” has mostly been about behavioral analysis to better target ads. Is that what Baum meant? That more effectively matching producer and consumer long tails through precision ad placement is going to fundamentally change the economy? That type of match- ing can promote economic activity, which is good, but I don’t see the link to fundamentally improved productivity. If this kind of innovation pulls another tranche of the bell curve out of poverty it will do it by putting more people to work doing the same stuff, not by making our economy fundamentally more efficient.
When I heard “huge impact on the economy,” my first thought was maybe it’s just a throw-away comment. Maybe he just meant the economy as seen through the narrow lens of his company revenues. But then I tried to think about this on a deeper level: What’s here that I haven’t considered? Could he somehow be saying that this is a catalyst for the next big phase of productivity growth in our 200-year-old industrial revolution? Is this the industrial revolu-
tion equivalent of nanometer chip design, which starts the next decade of doubling? Is it the thing that gets the middle class growing again and eases all this populist anger?
Yeah, that might sound kind of absurd, but that’s how my head works — a daily stream of ADHD-fueled big dreams immediately dashed on the rocks of reality.
(As an aside, half way through writing this I came across a prediction of the “wine and roses” we’ll all experience with this “New Information Age.” Don’t sweat the death of privacy, the surveillance state is highly unlikely.)
So, back to the question: Is big data an economic driver or just a must-have to be in the game?
As early as the 1950s it was obvious that robotic automation was going to fundamentally change manufacturing. As automobiles increasingly were built by robotic labor, the industry saw incredible productivity gains. The hours of human labor per automobile dropped by orders of magnitude over the next 30 years. Naturally, cars didn’t just get cheaper, they also got more complex and feature-rich. But anyone could understand the return on capital of instal- ling robotic lines. What’s the return on capital look like for a Hadoop cluster? It’s worth noting that robots didn’t just increase productivity, they also resha- ped labor’s relationship with management. If you’re labor, competing with a robot sucks. This was presciently described by Norbert Weiner in his classic “The Human Use of Human Beings, Cybernetics and Society.” Of course we don’t need history’s warning to know that big data might have a dark side, too. If you don’t see it now, you will when you download a new car stereo software version and it resets all your radio station presets based on Toyota’s notion of people like you. Of course, for a car company to be as obnoxious as your software and search bar providers have long been, they have to learn as much about you as those software guys do, and that’s weird. We aren’t really used to the idea of a manufacturer knowing where we go and who we go there with.
There obviously are places where large-scale data and analysis will improve efficiencies and productivity. Particularly in areas like smart grid, where it will reduce the investment necessary in power plant construction, or financial services, where it promises to help fight fraudulent transactions. What else? Are there big opportunities for order-of-magnitude productivity gains out there that come to mind? Or is most of the value created by this “new infor- mation age” going to be in some mushy upper region of Maslow’s hierarchy? A kind of middle class feel-good machine that remains completely irrelevant to the working poor dreaming of their first homes?
Norbert Weiner was concerned that automation-based productivity gains would disrupt the working man and woman’s living. He held that concern in the face of the obvious and compelling productivity gains that were sure to flow through to GDP as wealth.
As we enter the big data era of the information age and give up what’s left of our privacy, I’d like to think that it will be for more than a zero-sum game of musical chairs to decide the next winners.