The tension between information creators and information organizers

In 2006, a group of Belgian newspapers sued Google, ostensibly to get snippets of their news stories removed from Google News (full story). In fact, the newspapers were well aware that this could be easily achieved by putting a suitable file on their webservers, instructing Google’s web crawler to ignore their webservers. What then was the real purpose of the lawsuit? It’s difficult to know for sure, but it seems likely that it was part of a ploy to pressure Google into paying the newspapers for permission to reuse the newspaper’s content.

This story is an example of a growing tension between creators of information, whether it be blogs, books, movies, music, or whatever, and organizers of information, such as Google. This tension is tightening sharply as people develop more services for organizing information, and profits increasingly flow toward the organizers rather than the creators.

As another example, in 2007, Google had advertising revenues of approximately 16 billion dollars(!), most of it from search. Yet, according to one study, approximately twenty-five percent of the number one search results on Google led to Wikipedia. Wikipedia, of course, does not directly benefit from Google’s advertising profits. I bet that at least some of Google’s best sources – e.g., Wikipedia, the New York Times, and some of the top blogs – are not happy that Google reaps what may seem a disproportionately large share of the advertising dollar.

Other examples of new niches in the organization of information include RSS readers (Bloglines, Netvibes); social news sites (Digg, Reddit); even my own Academic Reader. In each case, there is a natural tension between the creators of the underlying information, and the organizing service.

Now, of course, it’s greatly to the public benefit for such organizing services to thrive. However, for this to happen, a great deal of information must be made publicly available, preferably in a machine-readable format, like RSS or OAI. If the information is partially or completely locked up (think, e.g., Facebook’s friendship graph), then that enormously limits the web of value that can be built on top of the information. Yet Facebook is understandably very cautious about opening that information up, fearing that it would harm their business.

The situation is further complicated by the fact that the best people to organize and add value to information are often not the original creators of that information. This is for two reasons. First, is lack of technical expertise – the New York Times does lots of good reporting, but this doesn’t mean they’ll do a good job at providing a search interface to their archive of old articles. Second, is the problem of conflicts of interest – the New York Times would have a much harder time running something like Google News than Google does, since other news organizations would not co-operate with them.

Summing up the problem here in a single sentence, the question is this: to what extent should information be made freely accessible, in order to best serve the public interest?

There has, of course, been a lot of debate about this question, but much of that debate has centered around filesharing of music, movies and so on, where the additional value being added to the information is often minimal. The question becomes much more interesting when applied to services like Google News which add additional layers of meaning and organization to information.

At present, the legal situation is not clear. As an example, in the Belgian newspaper case, one might ask whether or not Google’s useage was acceptable under the fair use doctrine for copyright? After all, Google News only excerpted a few lines from the Belgian newspapers. Obviously, the Belgian Courts thought this was not fair use, but other jurisdictions are yet to follow suit.

If the situation today has not yet been resolved, then what might we see in an ideal future? On the one hand, it is highly desirable for information to be freely available for other people to add value. This will often mean making use of a large fraction (or all) of the content, a type of reuse not currently recognized as fair use, yet which is clearly in the public’s interest.

On the other hand, it is also highly desirable for content producers to have incentives to produce content. What we’re seeing at present is a migration of value up the chain from content creators like the New York Times to content organizers, like Google. This, in turn, is causing the content creators to erect fences around their data. The net result is not in anybody’s best interest.

I don’t know what the resolution of this problem is. But it is a real problem, and it’s going to get worse, and it worries me that we’ll end up in a world where the balance is too much one way or the other.


  1. This reminds me of Project Xanadu:

    As I understand it, the idea of Xanadu is to build management of copyright, intellectual property rights, ownership, and royalties into the basic underlying protocols of the Xanadu system. The current conflict between information creators, organizers, and consumers might eventually drive the web into becoming something like Xanadu.

  2. Hi Michael:

    That’s an interesting development, thanks for reporting on it. I thought you might be interested in a recent thread I had about the legal vacuum in which search engines operate, though I was focusing on the political aspects, see The Spirits that We Called. Best,


  3. Bee, you might find it interesting to look into the Nutch project, an open source search engine that looks to be much more advanced than wikia, and perhaps also this post. More generally, have you read Cass Sunstein’s “Republic 2.0”? I think you might enjoy it a lot – it’s about the problems caused in a democracy by the advent of widespread electronic communication, and much of what he has to say is closely related to your post. If you’re interested, I can lend it to you.

Comments are closed.