This is the title of a thought-provoking essay by Tim Gowers, which seems to have been stimulated in part by my recent essay on doing science online. What follows are some excerpts from Gowers’ essay, and some thoughts by me:
Of course, one might say, there are certain kinds of problems that lend themselves to huge collaborations. One has only to think of the proof of the classification of finite simple groups, or of a rather different kind of example such as a search for a new largest prime carried out during the downtime of thousands of PCs around the world. But my question is a different one. What about the solving of a problem that does not naturally split up into a vast number of subtasks? Are such problems best tackled by n people for some n that belongs to the set \{1,2,3\}? (Examples of famous papers with four authors do not count as an interesting answer to this question.)
It seems to me that, at least in theory, a different model could work: different, that is, from the usual model of people working in isolation or collaborating with one or two others. Suppose one had a forum (in the non-technical sense, but quite possibly in the technical sense as well) for the online discussion of a particular problem. The idea would be that anybody who had anything whatsoever to say about the problem could chip in. And the ethos of the forum — in whatever form it took — would be that comments would mostly be kept short. In other words, what you would not tend to do, at least if you wanted to keep within the spirit of things, is spend a month thinking hard about the problem and then come back and write ten pages about it. Rather, you would contribute ideas even if they were undeveloped and/or likely to be wrong.
A similar approach is used in the open-source software community – essentially, a dynamic division of labour that is not planned entirely in advance, but rather arises in response to the exigencies of the problem at hand. This dynamic division of labour is typically co-ordinated through one or more online forums. Examples close to this spirit, and also somewhat close in spirit to modern mathematics include Kasparov versus the World and the Matlab programming competition.
On the subject of the desirable size of contributions, in open source the most frequent contributions change just a single line of code. The second most frequent contributions change two lines of code, and so on. One study suggests the number of contributions [tex]n[/tex] scales as [tex]n(l) \propto l^{-1.13}[/tex], where [tex]l[/tex] is the number of lines of code changed or added (“committed”) in a single contribution.
It’s notable that with this distribution the total line count is still dominated by the larger contributions. Despite this, my guess is that the smaller contributions are still very significant for maintaining momentum and morale, which are so important in creative projects. In this regard, it’s a little like a good creative conversation – not all contributions to the conversation need to be world-shaking, some are simply needed to keep the conversation moving.
This suggestion raises several questions immediately. First of all, what would be the advantage of proceeding in this way? My answer is that I don’t know for sure that there would be an advantage. However, I can see the following potential advantages.
(i) Sometimes luck is needed to have the idea that solves a problem. If lots of people think about a problem, then just on probabilistic grounds there is more chance that one of them will have that bit of luck.
(ii) Furthermore, we don’t have to confine ourselves to a purely probabilistic argument: different people know different things, so the knowledge that a large group can bring to bear on a problem is significantly greater than the knowledge that one or two individuals will have. This is not just knowledge of different areas of mathematics, but also the rather harder to describe knowledge of particular little tricks that work well for certain types of subproblem, or the kind of expertise that might enable someone to say, “That idea that you thought was a bit speculative is rather similar to a technique used to solve such-and-such a problem, so it might well have a chance of working,” or “The lemma you suggested trying to prove is known to be false,” and so on—the type of thing that one can take weeks or months to discover if one is working on one’s own.
I think of this as the “annoying little conjecture” problem: many conjectures that arise in the course of research are often essentially routine to prove or disprove, but it can take days or weeks to determine which it’s going to be. If you talk to just the right person, they can often cut that down to minutes or hours. Ordinarily, though, finding that right person is often just as laborious (and may be less enlightening) than solvig the problem yourself. Having a mechanism to find the right person, even if it’s essentially just broadcast search, would be enormously beneficial.
The next obvious question is this. Why would anyone agree to share their ideas? Surely we work on problems in order to be able to publish solutions and get credit for them. And what if the big collaboration resulted in a very good idea? Isn’t there a danger that somebody would manage to use the idea to solve the problem and rush to (individual) publication?
Here is where the beauty of blogs, wikis, forums etc. comes in: they are completely public, as is their entire history. To see what effect this might have, imagine that a problem was being solved via comments on a blog post. Suppose that the blog was pretty active and that the post was getting several interesting comments. And suppose that you had an idea that you thought might be a good one. Instead of the usual reaction of being afraid to share it in case someone else beat you to the solution, you would be afraid not to share it in case someone beat you to that particular idea. And if the problem eventually got solved, and published under some pseudonym like Polymath, say, with a footnote linking to the blog and explaining how the problem had been solved, then anybody could go to the blog and look at all the comments. And there they would find your idea and would know precisely what you had contributed. There might be arguments about which ideas had proved to be most important to the solution, but at least all the evidence would be there for everybody to look at.
The open source world demonstrates this in action. You can see every single contribution a person has made to a project – the code, the conversations in online forums, and so on. There’s even beautiful visualizations that let you see different people’s contributions to a project. As a result, it’s very difficult to fool people about the extent of your contributions. I’m sure people are sometimes dishonest about this, but I’ll bet they’re a lot more honest than some scientists are about what they contributed to some papers.
True, it might be quite hard to say on your CV, “I had an idea that proved essential to Polymath’s solution of the *** problem,” but if you made significant contributions to several collaborative projects of this kind, then you might well start to earn a reputation amongst people who read mathematical blogs, and that is likely to count for something. (Even if it doesn’t count for all that much now, it is likely to become increasingly important.) And it might not be as hard as all that to put it on your CV: you could think of yourself as a joint author, with the added advantage that people could find out exactly what you had contributed.
And what about the person who tries to cut and run when the project is 85 [percent] finished? Well, it might happen, but everyone would know that they had done it. The referee of the paper would, one hopes, say, “Erm, should you not credit Polymath for your crucial Lemma 13?” And that would be rather an embarrassing thing to have to do.
Now I don’t believe that this approach to problem solving is likely to be good for everything. For example, it seems highly unlikely that one could persuade lots of people to share good ideas about the Riemann hypothesis.
At present, this is undoubtedly true. However, if this sort of approach takes off and comes to be seen as a legitimate and orthodox way of making a contribution to mathematics, the kind of thing valued by (for example) hiring committees, then I think it may eventually be possible to do this for some of the more famous problems. There’s no intrinsic difference between sharing your ideas in a paper, or in a blog comment: a good idea is a good idea. The difference at present is mainly social: one is seen as legitimate, while the other is questionable.
At the other end of the scale, it seems unlikely that anybody would bother to contribute to the solution of a very minor and specialized problem. Nevertheless, I think there is a middle ground that might well be worth exploring, so as an experiment I am going to suggest a problem and see what happens.
I think it is important to do more than just say what the problem is. In order to try to get something started, I shall describe a very preliminary idea I once had for solving a problem that interests me (and several other people) greatly, but that isn’t the holy grail of my area. Like many mathematical ideas, mine runs up against a brick wall fairly quickly. However, like many brick walls, this one doesn’t quite prove that the approach is completely hopeless—just that it definitely needs a new idea.
It may be that somebody will almost instantly be able to persuade me that the idea is completely hopeless. But that would be great—I could stop thinking about it. And if that happens I’ll dig out another idea for a different problem and try that instead.
I’ve been toying for quite some time with doing something similar, though with problems from theoretical physics. I’ll be utterly fascinated to see the result of this experiment, and will certainly follow along. Not sure I’ll have much of mathematical interest to contribute, though – combinatorics is a long way from my expertise.
It’s probably best to keep this post separate from the actual mathematics, so that comments about collaborative problem-solving in general don’t get mixed up with mathematical thoughts about the particular problem I have in mind. So I’ll describe the project in my next post. Actually, make that my next post but one. The next post will say what the problem is and give enough background information about it to make it possible for anybody with a modest knowledge of combinatorics (or more than a modest knowledge) to think about it and understand my preliminary idea. The following post will explain what that preliminary idea is, and where it runs into difficulties. Then it will be over to you, or rather over to us. I’ve already written the background-information post, but will hold it back for a few days in case the responses to this post affect how I decide to do things.
The blog medium is almost certainly not optimal for this purpose, so if a serious discussion starts with lots of worthwhile contributions, then I’ll look into the possibility of migrating it over to some purpose-built site. If anyone has any suggestions for this (apart from the obvious one of using the Tricki — I’m not sure that’s appropriate just yet though) then I’d be delighted to receive them. My feelings at the moment are that blogs are too linear—it would be quite hard to see which comments relate to which, which ones are most worth reading, and so on. A wiki, on the other hand, seems not to be linear enough—it would be quite hard to see what order the comments come in. So my guess is that the ideal forum would probably be a forum: if someone knows an easy way to set up a mathematical forum, I might even do that. But if the discussion is on this blog, then I might from time to time try to assess where it has got to and create new posts if I feel that genuine progress has been made that can be summarized and then built on.
I’ve been thinking of doing this for a long time. The reason I’ve suddenly decided to go ahead is that I followed a couple of links from this post on Michael Nielsen’s blog, and discovered that, unsurprisingly, others have had similar ideas, and some people are already doing research in public. But the idea still seems pretty new, particularly when applied to one single mathematics problem, so I wanted to try it out when it was still fresh. (I would distinguish what I am proposing from what goes on at the n-category café, which is an excellent example of collaborative mathematics, but focused on an entire research programme rather than just one problem.)
To finish, here is a set of ground rules that I hope it will be possible to abide by. At this stage I’m just guessing what will work, so these rules are subject to change. If you can see obvious flaws let me know.
1. The aim will be to produce a proof in a top-down manner. Thus, at least to start with, comments should be short and not too technical: they would be more like feasibility studies of various ideas.
2. Comments should be as easy to understand as is humanly possible. For a truly collaborative project it is not enough to have a good idea: you have to express it in such a way that others can build on it.
Points 3-5 all concern norms of behaviour, and the problem of maintaining a civil tone:
3. When you do research, you are more likely to succeed if you try out lots of stupid ideas. Similarly, stupid comments are welcome here. (In the sense in which I am using “stupid”, it means something completely different from “unintelligent”. It just means not fully thought through.)
4. If you can see why somebody else’s comment is stupid, point it out in a polite way. And if someone points out that your comment is stupid, do not take offence: better to have had five stupid ideas than no ideas at all. And if somebody wrongly points out that your idea is stupid, it is even more important not to take offence: just explain gently why their dismissal of your idea is itself stupid.
5. Don’t actually use the word “stupid”, except perhaps of yourself.
Clay Shirky has pointed out that this problem – the problem of maintaining healthy conduct in online communities – has been around for decades, yet because no-one has synthesized all that is known, the same mistakes keep being made over and over (and over and over) again. The closest thing I know is a short blog post(!) from Theresa Nielsen Hayden, which is nice, but hardly comprehensive. Two suggestions:
- Don’t allow anonymous posting. Forums which do seem inevitably to degenerate. At the least, people should use a consistent handle, and ideally they should be strongly encouraged to use their real name.
- People who want the forum to thrive need to take ownership of social problems. If someone is behaving inappropriately, they should step up to the plate, and gently (at first) suggest alternate conduct. If someone’s behaving like an ass at a dinner party, you don’t leave it all on the host’s shoulders; you try to help out yourself, in whatever ways seem appropriate.
6. The ideal outcome would be a solution of the problem with no single individual having to think all that hard. The hard thought would be done by a sort of super-mathematician whose brain is distributed amongst bits of the brains of lots of interlinked people. So try to resist the temptation to go away and think about something and come back with carefully polished thoughts: just give quick reactions to what you read and hope that the conversation will develop in good directions.
At a talk last year by Mike Beltzner (who manages the development of the front-end for Firefox), he made a case that open-source projects where people went away and coded a lot on their own, only occasionally coming back to add big polished chunks, almost invariably failed.
7. If you are convinced that you could answer a question, but it would just need a couple of weeks to go away and try a few things out, then still resist the temptation to do that. Instead, explain briefly, but as precisely as you can, why you think it is feasible to answer the question and see if the collective approach gets to the answer more quickly. (The hope is that every big idea can be broken down into a sequence of small ideas. The job of any individual collaborator is to have these small ideas until the big idea becomes obvious — and therefore just a small addition to what has gone before.) Only go off on your own if there is a general consensus that that is what you should do.
8. Similarly, suppose that somebody has an imprecise idea and you think that you can write out a fully precise version. This could be extremely valuable to the project, but don’t rush ahead and do it. First, announce in a comment what you think you can do. If the responses to your comment suggest that others would welcome a fully detailed proof of some substatement, then write a further comment with a fully motivated explanation of what it is you can prove, and give a link to a pdf file that contains the proof.
9. Actual technical work, as described in 8, will mainly be of use if it can be treated as a module. That is, one would ideally like the result to be a short statement that others can use without understanding its proof.
If the project thrives, a wiki may be a good place to keep reference materials like this. It seems to be a pretty common pattern for big online collaborations to use a discussion forum to manage the basic conversation, and a wiki for reference materials. Initially, a wiki might not be necessary, and probably shouldn’t be added until there is real demand.
Some wiki software that seems pretty good for mathematical use is the instiki and the TiddlyWiki. Instiki is very well suited for mathematical use; TiddlyWiki wasn’t so much designed for that purpose, but as you can see here seems to work pretty well in practice.
10. Keep the discussion focused. For instance, if the project concerns a particular approach to a particular problem (as it will do at first), and it causes you to think of a completely different approach to that problem, or of a possible way of solving a different problem, then by all means mention this, but don’t disappear down a different track.
11. However, if the different track seems to be particularly fruitful, then it would perhaps be OK to suggest it, and if there is widespread agreement that it would in fact be a good idea to abandon the original project (possibly temporarily) and pursue a new one — a kind of decision that individual mathematicians make all the time — then that is permissible.
I’m not sure what I think about this. It seems rather constraining – why not do some preliminary exploration of the alternate track, if it seems promising? I agree that it would be problematic if it distracted other people too much, but that seems like a problem that could be dealt with, probably in real time, if it comes up.
12. Suppose the experiment actually results in something publishable. Even if only a very small number of people contribute the lion’s share of the ideas, the paper will still be submitted under a collective pseudonym with a link to the entire online discussion.
A couple of final comments.
First, in many ways this (like most open source projects) seems to be primarily a community-building project. If you look at a successful open source-style project – Wikipedia, Linux, the Matlab competition, Kasparov versus the World – at the centre there is always a person who spends a great deal of time simply building and maintaining a healthy community of contributors. I can’t imagine this will be any different.
Second, the systems used need to be easily integrable into people’s workflow. I like the idea of starting on the blog, simply because many people are already in the habit of checking blogs. Migrations to other platforms will need to be handled carefully, to ensure that everyone does start using the new platform successfully. Providing things like RSS feeds or email update services might help greatly with this.
This is another great essay in a wonderful series!
With regard to code contributions, it’s pretty clear that the one-line changes are like single base-pair mutations (they occur in response to the user-community’s vigorous “evolutionary pressure” to fix bugs!)
Large code-block changes more closely resemble Barbara McClintock’s celebrated mechanism of genetic recombination (in which large blocks of genetic code are translocated).
And then (of course) there software forks (speciation) … moribund projects (extinction) … obviously there are lots of parallels with evolutionary biology. So let’s push this parallel further.
Nowadays evolutionary biologists appreciate that the evolution of individual genes is oftentimes less significant than evolution of gene expression. The parallel to gene expression seemingly is software use …
This is where Michael’s closing remarks are very much to the point: “This (like most open source projects) seems in many ways to be primarily a community-building project.”
Yes, this is an absolutely key point! Community-building is the arena where selection effects are strongest, the resource opportunities greatest, confidence—both technical and social—is most essential, and where complicated issues of justice, enterprise, liberty, and equity interact most dynamically.
And that is why community-building is the open-science and open-information arena that is hardest to reason about, write about, and foresee with clarity.
A few points:
As far as different problems and alternate tracks are concerned, I don’t think there’s any way of preventing this from happening — and when it does, there is absolutely nothing wrong about pursuing these tacks. Who knows — perhaps the new problem is solvable while the original one isn’t; or the new problem is independently worth solving… There is a general need to keep working on problems which are of genuine interest (but if that’s really the case, why do people do mathematics in the first place?)
When these kind of divergences occur, however, it’s reasonable to ask several things. Perhaps most importantly, they shouldn’t be too distracting for the people who are still focused on the original problem; but from a social perspective, it’s not really reasonable to insist that the original problem-setter continue to host a discussion which has gone far away from the original problem. Not an unsolvable problem, but something to think about.
In addition to providing a very useful model for the collaboration, open-source seems like the right way to developing the platform to facilitate collaborative thinking.
As far as integration into the workflow is concerned, I’m imagining something that serves a dual social and professional role, a kind of beefed-up online network in which people could keep track of all of their projects and social interactions. “Virtual Tea Time” may be a good name.
I think it’s important to watch out for the dangers of monopolization. This important both technically (I’d be against falling into an entrenched reliance on the likes of Ning), and socially (we need to encourage capable people who want to create their own communities to do so, and to provide convenient pathways to make it possible.
Regarding anonymous posting, there are two sides which need to be reconciled. I totally agree on the necessity of keeping the people who wish to destroy the community from coming to the party, and in a mathematical or physics context it also seems reasonable to insist that people have real identities. In other contexts (such as Groklaw), it isn’t infrequent that some of the most valuable information is provided by people who can’t afford to provide it under a traceable identity. I’m not sure if there is a solution, but perhaps allowing pseudonyms to contribute enough that they have a chance to demonstrate that they are worthy of trust.
Perhaps another approach to the wiki/forum software needed for this would be something like:
http://stackoverflow.com/
Where good contributions get rated as such and float to the top (in principle).
John – Open source seems like a particularly clean-cut version of memetic evolution, doesn’t it? It’d be entertaining to consider code inclusion (i.e., importing as a library) as a type of “genetic success”.
On community-building: there has been an enormous amount written about this outside of science, of course. I recently read and very much enjoyed Peter Block’s “Community”.
Nathaniel,
On the problem-forking issue, yeah, I think simply forking the conversation is probably a good idea, so one community splits into two (possibly overlapping) communities.
There are a lot of networking sites for scientists already, most trying (and failing) to enable this kind of connection. The most successful so far in terms of community seem to be FriendFeed and Nature Network. In particular, some FriendFeed rooms have something of the flavour of good open source communities. Neither is yet really well suited to this kind of open-source approach, although FriendFeed comes close.
Cameron Neylon collected up a partial list of some of the networking sites for scientists:
http://blog.openwetware.org/scienceintheopen/2008/08/01/facebooks-for-scientists-theyre-breeding-like-rabbits/
Jonathan – Yeah, stackoverflow is definitely an interesting model. A lot of pages there seem to have attained a high Google rank quickly (at least, I keep getting them in programming-related Google searches), so they’re doing something right.
Michael, thank you for the pointer to Peter Block’s “Community: The Structure of Belonging”, which I will take a look at.
Somewhat hilariously, all Google searches relating to Block’s work are presently returning the warning “This site may harm your computer.”
Uh-oh … perhaps Google-SkyNet is becoming wary of broadly distributed community-building tools?
Seriously, this major Google bug illustrates the fragility—well known to ecologists—of a search engine monoculture, whose code-base is accessible to only a very tiny fraction of the population.
Viva la PLOS!
To keep this excellent topic from falling silent, perhaps we may inquire as to what Thomas Jefferson had to say about it?
That analysis is mighty impressive for an 18th century farmer … intellects like Jefferson’s raise serious doubt as to the reality of the Flynn Effect!
The above quotation appears on the frontspiece of Steven Johnson’s outstanding new biography of Joseph Priestly, The Invention of Air. Highly recommended. Contrasting Steven Johnson’s ideas with Michael Nielsen’s has helped me to appreciate the existence of two (equally passionate) communities.
On the one hand, we have those persons—generally scientists and pure mathematicians—who build stronger communities with a view toward discovering deeper scientific truths and stronger mathematical theorems.
On the other hand, we have those persons—generally engineers and applied mathematicians, plus quite a few biologists and anthropologists—who perceive scientific truths and mathematical theorems mainly as radically effective tools for community-building.
Steven Johnson’s sympathies (and Priestly’s) are predominantly with the latter camp … it’s not yet clear (to me) where Michael Nielsen’s sympathies lie … possibly in the Obama-like center?
We should all plan to read Michael’s book, to find out!
Tiddlywiki is a remarkable piece of software. But it is not a wiki.
The two defining characteristics, I would contend, of a wiki are:
1) Visitors to a wiki page, published on the web, can edit the content.
2) A version history is maintained, and it is possible to rollback a page to a previous version.
There are variants of TiddlyWiki (notably, ZiddlyWiki, which uses Zope as a backend) which possess these characteristics, and hence, IMHO, can be called wikis. But TiddlyWiki, in its vanilla form (as, say, exemplified by the example you linked to) has neither of these defining characteristics.
In fact, if you go to the TiddlyWiki site, you’ll find TiddlyWiki advertised as “a free reusable, non-linear personal notebook,” not a wiki.
And the site itself? Powered by MediaWiki (an actual piece of Wiki software), not TiddlyWiki.
Jacques,
I used TiddlyWiki for a while, and it did have an option to make things publicly editable. Not all TiddlyWikis use it, of course.
Certainly, a version history and rollback is a great thing to have, and TiddlyWiki doesn’t have it. I’ll be very interested to see if widely-used wikis start to appear that use a less centralized model, like git.
John – The Google bug was affecting all searches. It was apparently the result of a single misplaced “/”. The dangers of a monoculture, indeed!
John – That is, by the way, a very beautiful quote from Jefferson!
How?
Using WebDAV? ZiddlyWiki? Inquiring minds want to know!
You mean using a Git repository as a data-store (instead of flat-files, or a SQLite/MySQL/… database)?
Interesting idea.
I’d worry about performance, but still an interesting idea …
If you’re thinking along those lines, how about putting the database (or flat files) on a WebDAV fileshare?
Jacques,
As I recall, it was as simple as checking an option. I never experimented with it, as I was just using TiddlyWiki on my own computer; I’ve since abandoned it. I did see Garrett Lisi set up a publicly editable TiddlyWiki at the Science in the 21st Century Conference – I made an edit, and can attest that it works.
Re: git – I was thinking more along the lines that git appearss to allow a less restrictive versioning model than programs like svn / CVS. It’s not yet something I’ve thought much about; it’s on my mind because I’ve just started to experiment with git.
Just putting a bunch of Tiddlers up on the Web is certainly not sufficient.
A list of the available options is here. That page says, in part:
The only solutions (of the ones listed) that I have played with are the WebDAV solution and (someone else’s installation of) ZiddlyWiki.
The latter was the only one (IMHO) vaguely suitable for multi-user collaboration. But Zope is a bear to set up …
I am intrigued by the idea of the same content being available in multiple places (my laptop, say, and on a publically-accessible website), and being able to merge changes from these multiple sources.
DVCS’s like BZR/Git/HG are tailor-made for that. But there are other operations — desirable in a Wiki application — for which they might be rather slow. (I admit I haven’t really tested this, though.)
If I were collaborating on a book, however, a DVCS would be the way to go!
From http://virtualteatime.blogspot.com/2009/02/introspection.html
“What interests me most about Timothy Gowers’ experiment, is massively collaborative mathematics possible? is that to my knowledge it’s the first example of a collaborative effort that is collectively self-aware. To be a little more concrete, he began by proposing a set of ‘rules of engagement’, and the discussion began by discussing those rules. Every collaborative community has such rules, but typically they aren’t actively discussed.”
Michael, I’d be very interested to hear what you have to say about this assertion of mine.
Nathaniel – I don’t think it’s factually correct. Going back to, e.g., Usenet and many other online communities, it’s been quite common for such communities to have a code of conduct or rules of engagement. Now, these weren’t for the most part collaborative communities, but I think a lot of that spirit has passed over to many collaborative online communities. For example, many wikis are self-aware from quite early on in their lifecycle. One of my favourites examples is the now sadly defunct meatball wiki, which was a wiki about wikis, and which had a very sophisticated self-awareness from very early on. (I’d give you a URL, but I’m having trouble reaching the site at the moment – just Google it, it was still up as of a couple of months ago.)
With that said, I do think this “self-awareness” as you aptly name it is a fascinating phenomenon, and perhaps even necessary for success in a large-scale collaboration. I’ve occasionally thought it’d be fun to write a book about online community-building.
Michael – thanks for the information. I’m planning to find out what I can about online community building (starting with the meatball wiki).
My sense is that a better online index to the available information may be called for. If so, I’ll need to make such an index as an early step in jump-starting the community I’m trying to build, and I think it would be fun to collaborate with you (and others if there is sufficient interest) in the creation of that index.
The meatball wiki was up, but now it’s down again. Does anyone know how to make a more-reliable mirror? Would such a mirror be permissable?
Michael, you’re definitely hitting the big time. See this 🙂