Web giants pumping us for data

by John Naughton

The Observer

Should you be looking for an example of hucksterish cynicism, then the mantra that “data is the new oil” is as good as they come. Although its first recorded uttering goes as far back as 2006, in recent times it has achieved the status of an approved corporate cliche, though nowadays “data” is generally qualified by the adjective “big.”

And if you want a measure of how deeply the cliche has penetrated the collective unconscious, ponder this: a Google search for “big data” turns up more than 1.5 billion results. And a search for “data mining” turns up 167 million results.

The idea of big data as a metaphor for oil is seductive. It’s also revealing in interesting ways. Given that the oil business is one of the biggest industries in the history of the world, for example, the metaphor hints at untold future riches. But it conveniently skates over the fact that oil wealth overwhelmingly benefits either ruling elites in corrupt and/or authoritarian countries, or huge corporations in democratic states.

But at least oil is a physical, non-renewable resource that is extracted from the earth. Big data, on the other hand, is extracted from the activities of people and machines. As society becomes more and more networked, and as the so-called Internet-of-things evolves, the amounts of data available to be “mined” will increase exponentially. And, unlike fossil fuels, these data reserves are infinitely renewable.

“Big data,” says Kenneth Cukier, co-author of the best book on the subject to appear so far (“Big Data: A Revolution That Will Transform How We Live, Work and Think”), “will transform how we work, how we live and how we think.” He argues that, at least in the case of data, “more is not just more; more is different” — by which he means that quantitative abundance can lead to qualitative change. The availability of huge amounts of data turbocharges machine learning; for example, turning hitherto impossible tasks — such as accurate, instantaneous language translation — into delivered realities.

The key question about any major technological development is: who benefits? The answer in the case of big data is: huge corporations — the Googles, Amazons and Facebooks of this world, which are the only outfits (outside of the U.S. National Security Agency) with the computational resources to mine, analyze and process the data torrents unleashed by us as we go about our networked lives. The companies don’t talk about it this way, of course. Instead they have soothing patter about how their analytical capabilities enable them to serve you better: how the ability to analyze the Web searches conducted by you and your friends enables them to provide better search results, for example; or how analysis of your online behavior enables Amazon to suggest products that you might like; and so on.

All true, of course, but skilfully avoiding the awkward fact that you are the resource that is being mined and that the playing field that is cyberspace is tilted in favor of the corporations who have come to dominate it.

Which brings us to another aspect of the subject: open data. Since 2005, activists have been campaigning for “open government data” initiatives — demanding the publication of public datasets in machine-readable, freely reusable formats. The argument for this is impeccable: the data is collected by public bodies; it should therefore be available to the public that paid for it. The motivations behind the campaigns are likewise admirable: if the data is available, then civic-minded geeks can do useful things with it.

The open government data campaigns have been surprisingly successful in both the United States and Britain. Huge swaths of public data are now available. I can download a vast spreadsheet containing details of every contract worth more than £500 (¥76,300) entered into by my local authority, for example. And in many cases, people have already developed useful services on top of public data. For example, busitlondon.co.uk provides a helpful online tool for planning a journey by bus in London.

There’s lots more in that vein, and it’s all good stuff. At first sight, therefore, open government data looks encouraging. But there are a couple of flies in the ointment. The first is that there is a difference between open data and open government. The current Hungarian administration, for example, has been quite good at publishing public data, but is morphing into one of the most secretive and authoritarian regimes in Europe.

And then there’s that awkward question again: who benefits? Certainly the public, to some extent. But there are signs that open government data favors private companies bidding for local-authority contracts. The companies know what it costs the authority to collect the refuse, for instance; but their own finances are opaque, so it’s impossible to judge whether they would really be more efficient than a public body.

The moral? Be careful what you wish for.