The tension between information creators and information organizers
In 2006, a group of Belgian newspapers sued Google, ostensibly to get snippets of their news stories removed from Google News (full story). In fact, the newspapers were well aware that this could be easily achieved by putting a suitable file on their webservers, instructing Google’s web crawler to ignore their webservers. What then was the real purpose of the lawsuit? It’s difficult to know for sure, but it seems likely that it was part of a ploy to pressure Google into paying the newspapers for permission to reuse the newspaper’s content.
This story is an example of a growing tension between creators of information, whether it be blogs, books, movies, music, or whatever, and organizers of information, such as Google. This tension is tightening sharply as people develop more services for organizing information, and profits increasingly flow toward the organizers rather than the creators.
As another example, in 2007, Google had advertising revenues of approximately 16 billion dollars(!), most of it from search. Yet, according to one study, approximately twenty-five percent of the number one search results on Google led to Wikipedia. Wikipedia, of course, does not directly benefit from Google’s advertising profits. I bet that at least some of Google’s best sources – e.g., Wikipedia, the New York Times, and some of the top blogs – are not happy that Google reaps what may seem a disproportionately large share of the advertising dollar.
Other examples of new niches in the organization of information include RSS readers (Bloglines, Netvibes); social news sites (Digg, Reddit); even my own Academic Reader. In each case, there is a natural tension between the creators of the underlying information, and the organizing service.
Now, of course, it’s greatly to the public benefit for such organizing services to thrive. However, for this to happen, a great deal of information must be made publicly available, preferably in a machine-readable format, like RSS or OAI. If the information is partially or completely locked up (think, e.g., Facebook’s friendship graph), then that enormously limits the web of value that can be built on top of the information. Yet Facebook is understandably very cautious about opening that information up, fearing that it would harm their business.
The situation is further complicated by the fact that the best people to organize and add value to information are often not the original creators of that information. This is for two reasons. First, is lack of technical expertise – the New York Times does lots of good reporting, but this doesn’t mean they’ll do a good job at providing a search interface to their archive of old articles. Second, is the problem of conflicts of interest – the New York Times would have a much harder time running something like Google News than Google does, since other news organizations would not co-operate with them.
Summing up the problem here in a single sentence, the question is this: to what extent should information be made freely accessible, in order to best serve the public interest?
There has, of course, been a lot of debate about this question, but much of that debate has centered around filesharing of music, movies and so on, where the additional value being added to the information is often minimal. The question becomes much more interesting when applied to services like Google News which add additional layers of meaning and organization to information.
At present, the legal situation is not clear. As an example, in the Belgian newspaper case, one might ask whether or not Google’s useage was acceptable under the fair use doctrine for copyright? After all, Google News only excerpted a few lines from the Belgian newspapers. Obviously, the Belgian Courts thought this was not fair use, but other jurisdictions are yet to follow suit.
If the situation today has not yet been resolved, then what might we see in an ideal future? On the one hand, it is highly desirable for information to be freely available for other people to add value. This will often mean making use of a large fraction (or all) of the content, a type of reuse not currently recognized as fair use, yet which is clearly in the public’s interest.
On the other hand, it is also highly desirable for content producers to have incentives to produce content. What we’re seeing at present is a migration of value up the chain from content creators like the New York Times to content organizers, like Google. This, in turn, is causing the content creators to erect fences around their data. The net result is not in anybody’s best interest.
I don’t know what the resolution of this problem is. But it is a real problem, and it’s going to get worse, and it worries me that we’ll end up in a world where the balance is too much one way or the other.