Michael Nielsen – Page 17

Bill Thurston on collective progress in mathematics

Apropos the polymath project, a nice quote from Bill Thurston on how progress is made collectively in mathematics (via Cosma and Quomodocumque):

In mathematics,it often happens that a group of mathematicians advances with a certain collection of ideas. There are theorems in the path of these advances that will almost inevitably be proven by one person or another. Sometimes the group of mathematicians can even anticipate what these theorems are likely to be. It is much harder to predict who will actually prove the theorem,although there are usually a few â€œpoint peopleâ€who are more likely to score. However, they are in a position to prove those theorems because of the collective efforts of the team.The team has a further function,in absorbing and making use of the theorems once they are proven. Even if one person could prove all the theorems in the path single-handedly,they are wasted if nobody else learns them.

There is an interesting phenomenon concerning the â€œpointâ€people. It regularly happens that someone who was in the middle of a pack proves a theorem that receives wide recognition as being significant. Their status in the communityâ€”their pecking orderâ€”rises immediately and dramatically.When this happens,they usually become much more productive as a center of ideas and a source of theorems.Why? First,there is a large increase in self-esteem, and an accompanying increase in productivity. Second, when their status increases,people are more in the center of the network of ideasâ€”others take them more seriously. Finally and perhaps most importantly, a mathematical breakthrough usually represents a new way of thinking,and effective ways of thinking can usually be applied in more than one situation.

This phenomenon convinces me that the entire mathematical community would become much more productive if we open our eyes to the real values in what we are doing. Jaffe and Quinn propose a system of recognized roles divided into â€œspeculationâ€and â€œprovingâ€. Such a division only perpetuates the myth that our progress is measured in units of standard theorems deduced. This is a bit like the fallacy of the person who makes a printout of the first 10,000 primes. What we are producing is human understanding. We have many different ways to understand and many different processes that contribute to our understanding. We will be more satisfied, more productive and happier if we recognize and focus on this.

Biweekly links for 02/09/2009

RealClimate: On Replication
Creationism Slips Into a Peer-Reviewed Journal | NCSE
- “A strange thing happened in the scientific literature recently. A pair of creationists, who have seemingly legitimate scientific credentials, attempted to publish some creationist assertions in a peer-reviewed journal. Their effort was nearly successful, mostly because they hid their pseudoscience in the middle of the article, surrounded by legitimate scientific discussion of unrelated topics. Luckily, they were caught just in time, and it turned out that they were pretty clumsy. In fact, if they had been just a bit more clever, they might have gotten away with it.”
RealClimate: Antarctic warming is robust
- Fascinating back-and-forth discussion in the comments of the need for reproducible research, and how much disclosure of methods, code, data should be considered full disclosure. You need to skip over a lot of comments (the usual bickering), but it’s worth it.
Inside Google Book Search: 1.5 million books in your pocket
Uncertain Principles: Two Cultures in Beginnings and Endings
- “In the humanities, the whole point of the class is to discuss the books. Nothing useful can be done until and unless the students have had the chance to do the reading. This is why humanities classes tend to let out early on the first day of the term, and have a full class on the last day of the term: the important reading has to be done before class.
  In the sciences, on the other hand, the whole point of class is to give the students enough information to be able to read the textbook and do the problems. The essential step in the learning process is when the students try to apply what they’ve learned to solving problems. This is why science classes tend to have a full class on the first day of the term, and let out early on the last day of the term: the important reading is done after class.”
Twins escape hanging over ID confusion – ABC News
- What DNA testing can’t quite resolve: “A pair of identical twins escaped being convicted and hanged on drugs charges in Malaysia, due to confusion over which one of them was the culprit, reports said Saturday.”

Click here for all of my del.icio.us bookmarks.

Update on the polymath project

A few brief comments on the first iteration of the polymath project, Tim Gowers’ ongoing experiment in collaborative mathematics:

The project is remarkably active, with nearly 300 substantive mathematical comments in just the first week. It shows few signs of slowing down.
It’s perhaps not (yet) a “massively” collaborative project, but many mathematicians are contributing – a quick pass over the comments suggests that so far 14 or so people have made substantive mathematical contributions, and it seems likely that number will rise further. Unsurprisingly, that number already rises considerably if you include people who have made comments on the collaborative process.
Regardless of the outcome of the project, I expect that many beginning research students in mathematics will find this a great resource for understanding what research is about. It’s a way of seeing research mathematicians as they work – trying ideas out, making occcasional errors, backtracking, and so on. I suspect many students will find this incredibly enlightening. To pick just one example of why this may be, my experience is that many beginning students assume that the key to research success lies in having great leaps of insight to solve difficult problems. The discussion shows something quite different: you see excellent mathematicians following up every little lead, trying out many different approaches to problems, seeing many, many ideas fail, and gradually aggregating small insights, as a bigger picture only very slowly emerges.
The discussion so far has been courteous and professional in the highest degree. I suspect such courteous and professional behaviour greatly increases the chances of success in such a collaboration. I’m reminded of the famous Hardy-Littlewood rules for collaboration. Tim Gowers’ rules of collaboration have something of the same flavour.
One might say that this courtesy and professionalism is only to be expected, given the many professional mathematicians participating. Unfortunately, it’s not difficult to find excellent blogs run by professional scientists where the comment sections are notably less courteous and professional. I’ll omit examples.
Initially, I wasn’t so sure about the idea of using the linear medium of blog comments to run such a project. It seemed restrictive to use anything less than a multi-threaded forum, if forum software could be found that was geared towards mathematics. (Something like Google Groups would be good, but it doesn’t provide any way to display mathematics, so far as I’m aware.) The linear format has worked much better than I thought it would. Although at times it makes the discussion difficult to follow, the linear format has the benefit of preventing the conversation (and the collaborative community) from fracturing too much. This may be something to think about for future projects.
Many large-scale collaborative projects make it easy for late entrants to make a contribution. For example, in the Kasparov versus the World chess game, new participants could enter late in the game and come up to speed quickly. This was in part because of the nature of chess (only the current board matters, not past positions), but it was also partially because of the public analysis tree maintained for much of the game by Irina Krush. This acted as a key reference point for World Team decisions, and summarized much of the then-current best thinking about the game. In a similar way, many open source projects encourage late entry, with new participants able to jump in after looking at the existing code base (analogous to the state of the chess board), and the project wiki (analogous to the analysis tree). As the polymath project continues, I hope similar points of entry will enable outsiders to follow what is happening, and to contribute, without necessarily having to follow the entire discussion to that point.

Biweekly links for 02/06/2009

Systeme D: ShareAlike considered harmful for geodata
- Describes some problems that arise from using a Creative Commons ShareAlike license for geodata.
What Contracts Can’t Do: The Limits of Private Ordering in Facilitating a Creative Commons by Niva Elkin-Koren
- “Creative Commons is a non-profit U.S. based organization that operates a licensing platform to promote free use of creative works. The idea is to facilitate the release of creative works under generous license terms that would make works available for sharing and reuse. Creative Commons advocates the use of copyrights in a rather subversive way that would ultimately change their meaning.
  The paper expresses a skeptical view of this worthy pursuit. While I share Creative Commons’ concern with copyright fundamentalism, which inevitably leads to the propertization of everything of value, I am more skeptical of its strategy. The paper explores the legal strategy of Creative Commons and analyzes its potential for enhancing the sharing, distribution and (re)use of creative works.”
Quantum Celebration [Tattoo] | The Loom
- Best tattoo ever.
The Crowd-Sourced Reading List | The Loom | Discover Magazine
- Carl Zimmer’s list of great science writing. I’d add Steven Pinker’s “The Language Instinct” to his list of books.
A Clockwork Black: i was trying to avoid this
- Some of the early history of Amazon EC2.
Science in the open Â» Best practice for data availability â€“ the debate startsâ€¦well over there really
- Cameron Neylon summarizes many of the issues around data and licenses.
Bossa
- Developed by the same group that did SETI@Home (Boinc): “Bossa is an open-source software framework for distributed thinking – the use of volunteers on the Internet to perform tasks that use human cognition, knowledge, or intelligence.
  Bossa minimizes the effort of creating and operating a distributed thinking project. It provides a project web site, hosted on your Linux server, where volunteers go to perform tasks and to interact with other volunteers. All you need to supply are PHP scripts to generate, show, and handle tasks. “
Williams Math/Stat blog
- The entire department of mathematics and statistics at Williams College has a blog.
Frank Morgan: blog
- Blog from Frank Morgan, whose book on geometric measure theory I read and enjoyed many years ago.
Education – Change.org: Snark Attack: UCLA Research Dissing Technology Bombs
- Entertaining and thoughtful response to a recent study published in Science: “Is Technology Producing a Decline in Critical Thinking and Analysis?”
E. Kowalskiâ€™s blog â€º Comments on mathematics, mostly.
- Another astonishing mathematical blog.
The Accidental Mathematician
- Blog from Izabella Laba, a mathematician at UBC.
Consensus Protocols: Paxos at Paper Trail
- Useful overview of the Paxos consensus protocol, as used by Google’s Chubby lock system.
Life at Wal-Mart – Boing Boing
- Interesting story of working at Wal-Mart from Charles Platt.
On new modes of mathematical collaboration Â« What Is Research?
- Points out many of the flaws with online tools as ways of approaching mathematical collaboration.
Questions of procedure Â« Gowersâ€™s Weblog
- Tim Gowers’ rules for his ongoing experiment in massive collaboration in mathematics, the Polymath Project.
Open Knowledge Foundation Blog Â» Blog Archive Â» Open Data Openness and Licensing
- Excellent thoughtful discussion of open data and licensing. Three points where I disagree: (1) the article underrates the problems that may be caused by licensing incompatibilities – witness all the problems this has caused in the open source world, where the commons has fragmented; (2) the article takes for granted that scientists are going to want open licenses – I don’t see that this is necessarily true, certainly if current norms are encoded in the license; and (3) the article implicitly assumes that the license (not the norm) is how enforcement will be handled, yet I think there is little evidence to suggest that this is true in academic science, where norms are far more often the remedy of choice.
How Not to Lose Face on Facebook, for Professors

Click here for all of my del.icio.us bookmarks.

The polymath project

Tim Gower’s experiment in massively collaborative mathematics is now underway. He’s dubbed it the “polymath project” – if you want to see posts related to the project, I suggest looking here.

The problem to be attacked can be understood (though probably not solved) with only a little undergraduate mathematics. It concerns a result known as the Density Hales-Jewett theorem. This theorem asks us to consider the set [tex][ 3 ]^n[/tex] of all length [tex]n[/tex] strings over the alphabet [tex]1, 2, 3[/tex]. So, for example, [tex]11321[/tex] is in [tex][3]^5[/tex]. The theorem concerns the existence of combinatorial lines in subsets of [tex][3]^n[/tex]. A combinatorial line is a set of three points in [tex][3]^n[/tex], formed by taking a string with one or more wildcards in it, e.g., [tex]112*1**3\ldots[/tex], and replacing those wildcards by [tex]1[/tex], [tex]2[/tex] and [tex]3[/tex], respectively. In the example I’ve given, the resulting combinatorial line is:

[tex] \{ 11211113\ldots, 11221223\ldots, 11231333\ldots \} [/tex]

The Density Hales-Jewett theorem asserts that for any [tex]\delta > 0[/tex], for sufficiently large [tex]n = n(\delta)[/tex], all subsets of [tex][3]^n[/tex] of size at least [tex]\delta 3^n[/tex] contain a combinatorial line,

Apparently, the original proof of the Density Hales-Jewett theorem used ergodic theory; Gowers’ challenge is to find a purely combinatorial proof of the theorem. More background can be found here. Serious discussion of the problem starts here.

Open notebook quantum information

Tobias Osborne has decided to take the plunge, becoming (so far as I know) the first person explicitly taking an open notebook approach to quantum information and related areas. He has three posts up; all three concern quantum analogues to Boolean formulae.

Biweekly links for 02/02/2009

Parallel Scripting with Python
Habitat Chronicles: You can’t tell people anything
- |We all spend a lot of our time talking to bosses or investors or marketing people or press or friends or other developers. I’m totally convinced that a new idea or a new plan or a new technique is never really understood when you just explain it. People will often think they understand, and they’ll say they understand, but then their actions show that it just ain’t so.”
Useful Chemistry: The ChemSpider Journal and ChemMantis
- “The ChemSpider Journal of Chemistry is about to go live. This is not just another chemistry journal. Not only does it boast the option of an open peer-review in addition to Open Access, but it takes us tantalizing closer to the promise of Web3.0: the semantic web. This is achieved by a sophisticated mark-up system generated by ChemMantis. The automatic identification of molecules is impressive enough. But it also marks up functional groups, reactions, spectral data and even biological entities.”
Overcoming Bias: Academic Ideals
- “I suspect most who support and affiliate with academia only care a little about academia’s aspiring to intellectual virtue, and little would change if we had more obvious image-reality contradictions. But I’d like to be wrong. Or are we somehow better off under hypocrisy? “
Cell – New Science, New Features, New Advisors
- “One issue in particular that we at Cell will be focusing on in 2009 is redefining what constitutes a publishable unit in the age of electronic journals and how we can best present the information content of a scientific article online. The vision in our crystal ball is still blurred, but some key elements are beginning to take shape. The scientific article of the future will no longer be tied to the constraints of a printing press and will take advantage of all the opportunities afforded by the web to introduce a hierarchical rather than linear structure, increased graphical representations, and embedded multimedia. Inherent in our thinking about the scientific article of the future is the need to address the current unchecked growth in the amount of supplemental and supporting material and to identify constructive, well-defined guidelines for what is reasonably and appropriately included in a unit of scientific advance.”
Big data: shoot first, ask questions later Â« What Youâ€™re Doing Is Rather Desperate
- “We used to ask questions, then generate the data. Now we generate the data, then think of the questions. “

Click here for all of my del.icio.us bookmarks.

Is massively collaborative mathematics possible?

This is the title of a thought-provoking essay by Tim Gowers, which seems to have been stimulated in part by my recent essay on doing science online. What follows are some excerpts from Gowers’ essay, and some thoughts by me:

Of course, one might say, there are certain kinds of problems that lend themselves to huge collaborations. One has only to think of the proof of the classification of finite simple groups, or of a rather different kind of example such as a search for a new largest prime carried out during the downtime of thousands of PCs around the world. But my question is a different one. What about the solving of a problem that does not naturally split up into a vast number of subtasks? Are such problems best tackled by n people for some n that belongs to the set \{1,2,3\}? (Examples of famous papers with four authors do not count as an interesting answer to this question.)

It seems to me that, at least in theory, a different model could work: different, that is, from the usual model of people working in isolation or collaborating with one or two others. Suppose one had a forum (in the non-technical sense, but quite possibly in the technical sense as well) for the online discussion of a particular problem. The idea would be that anybody who had anything whatsoever to say about the problem could chip in. And the ethos of the forum â€” in whatever form it took â€” would be that comments would mostly be kept short. In other words, what you would not tend to do, at least if you wanted to keep within the spirit of things, is spend a month thinking hard about the problem and then come back and write ten pages about it. Rather, you would contribute ideas even if they were undeveloped and/or likely to be wrong.

A similar approach is used in the open-source software community – essentially, a dynamic division of labour that is not planned entirely in advance, but rather arises in response to the exigencies of the problem at hand. This dynamic division of labour is typically co-ordinated through one or more online forums. Examples close to this spirit, and also somewhat close in spirit to modern mathematics include Kasparov versus the World and the Matlab programming competition.

On the subject of the desirable size of contributions, in open source the most frequent contributions change just a single line of code. The second most frequent contributions change two lines of code, and so on. One study suggests the number of contributions [tex]n[/tex] scales as [tex]n(l) \propto l^{-1.13}[/tex], where [tex]l[/tex] is the number of lines of code changed or added (“committed”) in a single contribution.

It’s notable that with this distribution the total line count is still dominated by the larger contributions. Despite this, my guess is that the smaller contributions are still very significant for maintaining momentum and morale, which are so important in creative projects. In this regard, it’s a little like a good creative conversation – not all contributions to the conversation need to be world-shaking, some are simply needed to keep the conversation moving.

This suggestion raises several questions immediately. First of all, what would be the advantage of proceeding in this way? My answer is that I donâ€™t know for sure that there would be an advantage. However, I can see the following potential advantages.

(i) Sometimes luck is needed to have the idea that solves a problem. If lots of people think about a problem, then just on probabilistic grounds there is more chance that one of them will have that bit of luck.

(ii) Furthermore, we donâ€™t have to confine ourselves to a purely probabilistic argument: different people know different things, so the knowledge that a large group can bring to bear on a problem is significantly greater than the knowledge that one or two individuals will have. This is not just knowledge of different areas of mathematics, but also the rather harder to describe knowledge of particular little tricks that work well for certain types of subproblem, or the kind of expertise that might enable someone to say, “That idea that you thought was a bit speculative is rather similar to a technique used to solve such-and-such a problem, so it might well have a chance of working,” or “The lemma you suggested trying to prove is known to be false,” and so onâ€”the type of thing that one can take weeks or months to discover if one is working on oneâ€™s own.

I think of this as the “annoying little conjecture” problem: many conjectures that arise in the course of research are often essentially routine to prove or disprove, but it can take days or weeks to determine which it’s going to be. If you talk to just the right person, they can often cut that down to minutes or hours. Ordinarily, though, finding that right person is often just as laborious (and may be less enlightening) than solvig the problem yourself. Having a mechanism to find the right person, even if it’s essentially just broadcast search, would be enormously beneficial.

The next obvious question is this. Why would anyone agree to share their ideas? Surely we work on problems in order to be able to publish solutions and get credit for them. And what if the big collaboration resulted in a very good idea? Isnâ€™t there a danger that somebody would manage to use the idea to solve the problem and rush to (individual) publication?

Here is where the beauty of blogs, wikis, forums etc. comes in: they are completely public, as is their entire history. To see what effect this might have, imagine that a problem was being solved via comments on a blog post. Suppose that the blog was pretty active and that the post was getting several interesting comments. And suppose that you had an idea that you thought might be a good one. Instead of the usual reaction of being afraid to share it in case someone else beat you to the solution, you would be afraid not to share it in case someone beat you to that particular idea. And if the problem eventually got solved, and published under some pseudonym like Polymath, say, with a footnote linking to the blog and explaining how the problem had been solved, then anybody could go to the blog and look at all the comments. And there they would find your idea and would know precisely what you had contributed. There might be arguments about which ideas had proved to be most important to the solution, but at least all the evidence would be there for everybody to look at.

The open source world demonstrates this in action. You can see every single contribution a person has made to a project – the code, the conversations in online forums, and so on. There’s even beautiful visualizations that let you see different people’s contributions to a project. As a result, it’s very difficult to fool people about the extent of your contributions. I’m sure people are sometimes dishonest about this, but I’ll bet they’re a lot more honest than some scientists are about what they contributed to some papers.

True, it might be quite hard to say on your CV, “I had an idea that proved essential to Polymathâ€™s solution of the *** problem,” but if you made significant contributions to several collaborative projects of this kind, then you might well start to earn a reputation amongst people who read mathematical blogs, and that is likely to count for something. (Even if it doesnâ€™t count for all that much now, it is likely to become increasingly important.) And it might not be as hard as all that to put it on your CV: you could think of yourself as a joint author, with the added advantage that people could find out exactly what you had contributed.

And what about the person who tries to cut and run when the project is 85 [percent] finished? Well, it might happen, but everyone would know that they had done it. The referee of the paper would, one hopes, say, “Erm, should you not credit Polymath for your crucial Lemma 13?” And that would be rather an embarrassing thing to have to do.

Now I donâ€™t believe that this approach to problem solving is likely to be good for everything. For example, it seems highly unlikely that one could persuade lots of people to share good ideas about the Riemann hypothesis.

At present, this is undoubtedly true. However, if this sort of approach takes off and comes to be seen as a legitimate and orthodox way of making a contribution to mathematics, the kind of thing valued by (for example) hiring committees, then I think it may eventually be possible to do this for some of the more famous problems. There’s no intrinsic difference between sharing your ideas in a paper, or in a blog comment: a good idea is a good idea. The difference at present is mainly social: one is seen as legitimate, while the other is questionable.

At the other end of the scale, it seems unlikely that anybody would bother to contribute to the solution of a very minor and specialized problem. Nevertheless, I think there is a middle ground that might well be worth exploring, so as an experiment I am going to suggest a problem and see what happens.

I think it is important to do more than just say what the problem is. In order to try to get something started, I shall describe a very preliminary idea I once had for solving a problem that interests me (and several other people) greatly, but that isnâ€™t the holy grail of my area. Like many mathematical ideas, mine runs up against a brick wall fairly quickly. However, like many brick walls, this one doesnâ€™t quite prove that the approach is completely hopelessâ€”just that it definitely needs a new idea.

It may be that somebody will almost instantly be able to persuade me that the idea is completely hopeless. But that would be greatâ€”I could stop thinking about it. And if that happens Iâ€™ll dig out another idea for a different problem and try that instead.

I’ve been toying for quite some time with doing something similar, though with problems from theoretical physics. I’ll be utterly fascinated to see the result of this experiment, and will certainly follow along. Not sure I’ll have much of mathematical interest to contribute, though – combinatorics is a long way from my expertise.

Itâ€™s probably best to keep this post separate from the actual mathematics, so that comments about collaborative problem-solving in general donâ€™t get mixed up with mathematical thoughts about the particular problem I have in mind. So Iâ€™ll describe the project in my next post. Actually, make that my next post but one. The next post will say what the problem is and give enough background information about it to make it possible for anybody with a modest knowledge of combinatorics (or more than a modest knowledge) to think about it and understand my preliminary idea. The following post will explain what that preliminary idea is, and where it runs into difficulties. Then it will be over to you, or rather over to us. Iâ€™ve already written the background-information post, but will hold it back for a few days in case the responses to this post affect how I decide to do things.

The blog medium is almost certainly not optimal for this purpose, so if a serious discussion starts with lots of worthwhile contributions, then Iâ€™ll look into the possibility of migrating it over to some purpose-built site. If anyone has any suggestions for this (apart from the obvious one of using the Tricki â€” Iâ€™m not sure thatâ€™s appropriate just yet though) then Iâ€™d be delighted to receive them. My feelings at the moment are that blogs are too linearâ€”it would be quite hard to see which comments relate to which, which ones are most worth reading, and so on. A wiki, on the other hand, seems not to be linear enoughâ€”it would be quite hard to see what order the comments come in. So my guess is that the ideal forum would probably be a forum: if someone knows an easy way to set up a mathematical forum, I might even do that. But if the discussion is on this blog, then I might from time to time try to assess where it has got to and create new posts if I feel that genuine progress has been made that can be summarized and then built on.

Iâ€™ve been thinking of doing this for a long time. The reason Iâ€™ve suddenly decided to go ahead is that I followed a couple of links from this post on Michael Nielsenâ€™s blog, and discovered that, unsurprisingly, others have had similar ideas, and some people are already doing research in public. But the idea still seems pretty new, particularly when applied to one single mathematics problem, so I wanted to try it out when it was still fresh. (I would distinguish what I am proposing from what goes on at the n-category cafÃ©, which is an excellent example of collaborative mathematics, but focused on an entire research programme rather than just one problem.)

To finish, here is a set of ground rules that I hope it will be possible to abide by. At this stage Iâ€™m just guessing what will work, so these rules are subject to change. If you can see obvious flaws let me know.

1. The aim will be to produce a proof in a top-down manner. Thus, at least to start with, comments should be short and not too technical: they would be more like feasibility studies of various ideas.

2. Comments should be as easy to understand as is humanly possible. For a truly collaborative project it is not enough to have a good idea: you have to express it in such a way that others can build on it.

Points 3-5 all concern norms of behaviour, and the problem of maintaining a civil tone:

3. When you do research, you are more likely to succeed if you try out lots of stupid ideas. Similarly, stupid comments are welcome here. (In the sense in which I am using “stupid”, it means something completely different from “unintelligent”. It just means not fully thought through.)

4. If you can see why somebody else’s comment is stupid, point it out in a polite way. And if someone points out that your comment is stupid, do not take offence: better to have had five stupid ideas than no ideas at all. And if somebody wrongly points out that your idea is stupid, it is even more important not to take offence: just explain gently why their dismissal of your idea is itself stupid.

5. Donâ€™t actually use the word “stupid”, except perhaps of yourself.

Clay Shirky has pointed out that this problem – the problem of maintaining healthy conduct in online communities – has been around for decades, yet because no-one has synthesized all that is known, the same mistakes keep being made over and over (and over and over) again. The closest thing I know is a short blog post(!) from Theresa Nielsen Hayden, which is nice, but hardly comprehensive. Two suggestions:

Don’t allow anonymous posting. Forums which do seem inevitably to degenerate. At the least, people should use a consistent handle, and ideally they should be strongly encouraged to use their real name.
People who want the forum to thrive need to take ownership of social problems. If someone is behaving inappropriately, they should step up to the plate, and gently (at first) suggest alternate conduct. If someone’s behaving like an ass at a dinner party, you don’t leave it all on the host’s shoulders; you try to help out yourself, in whatever ways seem appropriate.

6. The ideal outcome would be a solution of the problem with no single individual having to think all that hard. The hard thought would be done by a sort of super-mathematician whose brain is distributed amongst bits of the brains of lots of interlinked people. So try to resist the temptation to go away and think about something and come back with carefully polished thoughts: just give quick reactions to what you read and hope that the conversation will develop in good directions.

At a talk last year by Mike Beltzner (who manages the development of the front-end for Firefox), he made a case that open-source projects where people went away and coded a lot on their own, only occasionally coming back to add big polished chunks, almost invariably failed.

7. If you are convinced that you could answer a question, but it would just need a couple of weeks to go away and try a few things out, then still resist the temptation to do that. Instead, explain briefly, but as precisely as you can, why you think it is feasible to answer the question and see if the collective approach gets to the answer more quickly. (The hope is that every big idea can be broken down into a sequence of small ideas. The job of any individual collaborator is to have these small ideas until the big idea becomes obvious â€” and therefore just a small addition to what has gone before.) Only go off on your own if there is a general consensus that that is what you should do.

8. Similarly, suppose that somebody has an imprecise idea and you think that you can write out a fully precise version. This could be extremely valuable to the project, but donâ€™t rush ahead and do it. First, announce in a comment what you think you can do. If the responses to your comment suggest that others would welcome a fully detailed proof of some substatement, then write a further comment with a fully motivated explanation of what it is you can prove, and give a link to a pdf file that contains the proof.

9. Actual technical work, as described in 8, will mainly be of use if it can be treated as a module. That is, one would ideally like the result to be a short statement that others can use without understanding its proof.

If the project thrives, a wiki may be a good place to keep reference materials like this. It seems to be a pretty common pattern for big online collaborations to use a discussion forum to manage the basic conversation, and a wiki for reference materials. Initially, a wiki might not be necessary, and probably shouldn’t be added until there is real demand.

Some wiki software that seems pretty good for mathematical use is the instiki and the TiddlyWiki. Instiki is very well suited for mathematical use; TiddlyWiki wasn’t so much designed for that purpose, but as you can see here seems to work pretty well in practice.

10. Keep the discussion focused. For instance, if the project concerns a particular approach to a particular problem (as it will do at first), and it causes you to think of a completely different approach to that problem, or of a possible way of solving a different problem, then by all means mention this, but donâ€™t disappear down a different track.

11. However, if the different track seems to be particularly fruitful, then it would perhaps be OK to suggest it, and if there is widespread agreement that it would in fact be a good idea to abandon the original project (possibly temporarily) and pursue a new one â€” a kind of decision that individual mathematicians make all the time â€” then that is permissible.

I’m not sure what I think about this. It seems rather constraining – why not do some preliminary exploration of the alternate track, if it seems promising? I agree that it would be problematic if it distracted other people too much, but that seems like a problem that could be dealt with, probably in real time, if it comes up.

12. Suppose the experiment actually results in something publishable. Even if only a very small number of people contribute the lionâ€™s share of the ideas, the paper will still be submitted under a collective pseudonym with a link to the entire online discussion.

A couple of final comments.

First, in many ways this (like most open source projects) seems to be primarily a community-building project. If you look at a successful open source-style project – Wikipedia, Linux, the Matlab competition, Kasparov versus the World – at the centre there is always a person who spends a great deal of time simply building and maintaining a healthy community of contributors. I can’t imagine this will be any different.

Second, the systems used need to be easily integrable into people’s workflow. I like the idea of starting on the blog, simply because many people are already in the habit of checking blogs. Migrations to other platforms will need to be handled carefully, to ensure that everyone does start using the new platform successfully. Providing things like RSS feeds or email update services might help greatly with this.

Biweekly links for 01/30/2009

Datawocky: More data usually beats better algorithms
The Dominance of Small Code Contributions
- Links to a study of a big open-source corpus, showing that small code contributions dominate by number (though not by total volume).
Academic Earth – Video lectures from the world’s top scholars
- “Thousands of video lectures from the world’s top scholars.” – Very interesting. Found a few problems, but this has potential.
BBC NEWS | Calls for open source government
- The new White House has asked Scott McNealy (Sun) to prepare a paper on open source.
Ruby on Rails on Vimeo
- A beautiful and informative visualization of Ruby on Rails commit history. Make sure to watch it in HD, in full-screen mode. After you’ve watched it for a bit, it’s worth skipping forward to 4:45 and watching the unbelievable explosion of activity that takes place when they moved to GitHub.
Open Access, Open Data. Open Research?
- Great summary talk about open science, from Cameron Neylon.
Is massively collaborative mathematics possible? Â« Gowersâ€™s Weblog
- A fascinating post from Tim Gowers, with a plan for some action.
Winning the Gnu
- Microsoftie Joey deVilla buys a gnu from Richard Stallman. No animals were harmed in the making of this presentation…
Dive into Python 3
- New version of a classic introduction to Python, by Mark Pilgrim, adapted for Python 3. Just the table of contents at present, with the content to be gradually filled in.

Click here for all of my del.icio.us bookmarks.

Connecting scientists to scientists

I’ve been struggling for some time with a writing problem. This is the problem of finding a really sharp way of conveying one of the most powerful ideas of open science: all the untapped creative potential existing in latent connections between scientists, and which could be released using suitable tools to activate the most valuable of those latent connections. I’ve discussed this idea in previous essays, but something was always lacking. In this post I take another shot at it, this time confronting the problem head on.

A fact of any scientist’s life is that you carry a lot of unsolved problems around in your head. Some of those problems are big (“find a quantum theory of gravity”), some of them are small (“where’d that damned minus sign disappear in my calculation?”), but all are grist for future progress. Mostly, it’s up to you to solve those problems yourself. If you’re lucky, you might also have a few supportive colleagues who can sometimes help you out.

Very occasionally, though, you’ll solve a problem in a completely different way. You’ll be chatting with a new acquaintance, when one of your problems (or something related) comes up. You’re chatting away when all of a sudden, BANG, you realize that this is just the right person to be talking to. Maybe they can just outright solve your problem. Or maybe they give you some crucial insight that provides the momentum needed to vanquish the problem.

Every working scientist recognizes this type of fortuitous serendipitous interaction. The problem is that they occur too rarely.

A few years ago, I started participating in various open source forums. Over time, I noticed something surprising going on in the healthiest of those forums. When people had a problem that was bugging them, rather than keeping silent about it, they’d post a description of the problem to the forum. Often, I’d look at their question and think to myself “yeah, I can see why they posted, that looks like a tough problem.” Then, forty minutes later, someone would come in and say “Oh, that’s easy, you just do X, Y, and Z”. Very often, X, Y and Z were quite ingenious, or at the least relied on knowledge that neither I nor the original questioner possessed. The original problem had been trivial all along.

What’s going on is similar to the fortuitous scientific exchange. A problem that’s difficult or impossible for most people can be trivial or routine to just the right person. But what was interesting and surprising about the open source forums was this: it seemed to be happening all the time. People who I’d never heard of would pop up, ask an interesting question, then someone else I’d never heard of would pop up, and provide an insightful answer. It didn’t happen every time, but it was happening over and over again.

A big “ahah!” moment for me occurred when I understood what was going on. By scaling up the creative conversation, those open source projects were providing a systematic mechanism that enabled people to find other people with just the right expertise to make their problem easy. Most of us spend much of our time stymied by problems that would be routine, if only we could find the right person to help us. As recently as 20 years ago, finding that right person was likely to be difficult. But what open source forums show is that it is possible to scale up conversation in this way, and significantly increase the likelihood of such serendipitous interaction.

Needless to say, scientists mostly don’t work this way. Many skeptics of open science say they never could, that scientists will forever be unwilling to share their problems and ideas in the way necessary to make this work. For the present post, it’s fine if you hold that position, for my purpose here isn’t to discuss the practicality of doing this. That’s a post for another day.

The question I’m concerned with is, instead, what is lost because we don’t do this? How much do we lose because so many scientists waste their time struggling with problems that some other scientist would find entirely routine?

I don’t know how to answer these questions quantitatively. What I do know is that as a practicing scientist, much of my time was spent working on problems that were hard for me, yet which I absolutely knew would be routine for someone else. The time I spent working on such problems was time lost to the whole scientific enterprise. Yet the tools and culture of science were such that I couldn’t easily outsource those problems to a person with a comparative advantage over me. When I talk about topics like restructuring expert attention, collaboration markets and open source research, this is what I’m talking about: tools and norms which allow us to trade in expert attention, and so to concentrate in areas where we have a comparative advantage.

Now, there are many caveats to this story. Most open source projects fail. Many problems – including many of the “big problems” of science – are intrinsically non-routine, and it may be extremely difficult to identify who (if anyone) has a comparative advantage in solving such problems. Furthermore, even for routine problems, there may be considerable intrinsic transaction costs associated with trade in expert attention – finding a common language, coming to a common understanding of the problem, and so on. The market for a problem may be thin (“find the screwdriver yourself!”) – for example, many of the problems facing benchtop experimentalists are problems exclusive to their own laboratories. Finally, finding ways to successfully scale up scientific conversation is not at all trivial. These are all important caveats, deserving extended discussion in their own right. Despite this, I believe the key idea – developing tools to aggregate information about comparative advantage, and to connect people who might benefit from a trade in attention – is worth taking seriously.

I started this post off with a discussion of the difficulty of describing what I believe is a latent potential for discovery within the scientific community. As I finish the post off, I must say that the post falls short of the strength and sharpness I’d like. What’s really needed is a detailed example that shows the mechanics of open source in action: how the dynamic division of labour actually works in a successful open source project. At present, so far as I’m aware there are no really successful examples within science; the culture of science remains too closed. There are, however, some extremely encouraging nascent examples, like open notebook science, and open source biology, and one day hopefully these and others will bloom.

Further reading: