I’ve recently been reviewing the history of open source software, and one thing I’ve been struck by is the enormous effort many open source projects put it into making their development modular. They do this so work can be divided up, making it easier to scale the collaboration, and so get the benefits of diverse expertise and more aggregate effort.
I’m struck by this because I’ve sometimes heard sceptics of open science assert that software has a natural modularity which makes it easy to scale open source software projects, but that difficult science problems often have less natural modularity, and this makes it unlikely that open science will scale.
It looks to me like what’s really going on is that the open sourcers have adopted a posture of conscious modularity. They’re certainly not relying on any sort of natural modularity, but are instead working hard to achieve and preserve a modular structure. Here are three striking examples:
- The open source Apache webserver software was originally a fork of a public domain webserver developed by the US National Center for Supercomputing Applications (NCSA). The NCSA project was largely abandoned in 1994, and the group that became Apache took over. It quickly became apparent that the old code base was far too monolithic for a distributed effort, and the code base was completely redesigned and overhauled to make it modular.
- In September 1998 and June 2002 crises arose in Linux because of community unhappiness at the slow rate new code contributions were being accepted into the kernel. In some cases contributions from major contributors were being ignored completely. The problem in both 1998 and 2002 was that an overloaded Linus Torvalds was becoming a single point of failure. The situation was famously summed up in 1998 by Linux developer Larry McVoy, who said simply “Linus doesn’t scale”. This was a phrase repeated in a 2002 call-to-arms by Linux developer Rob Landley. The resolution in both cases was major re-organization of the project that allowed tasks formerly managed by Torvalds to be split up among the Linux community. In 2002, for instance, Linux switched to an entirely new way of managing code, using a package called BitKeeper, designed in part to make modular development easier.
- One of the Mozilla projects is an issue tracking system (bugzilla), designed to make modular development easy, and which Mozilla uses to organize development of the Firefox web browswer. Developing bugzilla is a considerable overhead for Mozilla, but it’s worth it to keep development modular.
The right lesson to learn from open source software, I think, is that it may be darned hard to achieve modularity in software development, but it can be worth it to reap the benefits of large-scale collaboration. Some parts of science may not be “naturally” modular, but that doesn’t mean they can’t be made modular with conscious effort on the part of scientists. It’s a problem to be solved, not to give up on.
Although not as pronounced as in the open source community, it is apparent that “modularity” has become a cultural trait of computer science in general. In principle, this trait should naturally transpire to areas of computer science that are closest to mathematics. I read a fair amount of theoretical computer science papers and there are many visible cultural differences between TCS papers and pure math papers. I never paid attention to modular thinking per se, but I’m sure that has visible manifestations.
It would be interesting to take an anthropological perspective and look at how this type of thinking has already changed TCS and similar fields. More to the point, it would be of interest to see how collaboration in computer science has evolved with the emergence of massive collaboration in open source software.
The amazing thing about Bugzilla is how quickly it was adopted by a vast range of projects. You rarely — if ever — encounter a project nowadays that doesn’t use bugzilla or a similar system.
I’ve created so many bugzilla accounts over the last six or seven years that I’ve lost track. As a serial early adopter of open source projects (I’m still not sure why I feel the compulsion to build new versions of software so often — I’m not a programmer, and it’s not a very productive way to spend my time) I try to file as many bugs as possible. This is where systems like bugzilla are *really* useful: they provide a way for thousands of interested users to contribute to a project, even if they don’t know how to code.
I searched for the keyword “Manhattan” in your blog search and came up empty. Maybe the Manhattan Project can teach us some lessons about the practice of collective science
This reminds me of discussions about modularity and evolution in biological systems and more generally the whole issue of “evolvability”. If you haven’t read them already I suggest having a look at Marc Kirschner and John Gerhart’ paper in PNAS and this review about protein modularity and evolution of signalling. There are many analogies from biological evolution that could be informative.
Pedro – Thanks for the recommendations. I’ll take a look.
Tim O’Reilly seems to be hitting the same themes here (did he read your post?)
http://fyi.oreilly.com/2009/05/an-interview-with-tim-oreilly-.html