What should a reasonable person believe about the Singularity?

In 1993, the science fiction author Vernor Vinge wrote a short essay proposing what he called the Technological Singularity. Here’s the sequence of events Vinge outlines:

A: We will build computers of at least human intelligence at some time in the future, let’s say within 100 years.

B: Those computers will be able to rapidly and repeatedly increase their own intelligence, quickly resulting in computers that are far more intelligent than human beings.

C: This will cause an enormous transformation of the world, so much so that it will become utterly unrecognizable, a phase Vinge terms the “post-human era”. This event is the Singularity.

The basic idea is quite well known. Perhaps because the conclusion is so remarkable, almost outrageous, it’s an idea that evokes a strong emotional response in many people. I’ve had intelligent people tell me with utter certainty that the Singularity is complete tosh. I’ve had other intelligent people tell me with similar certainty that it should be one of the central concerns of humanity.

I think it’s possible to say something interesting about what range of views a reasonable person might have on the likelihood of the Singularity. To be definite, let me stipulate that it should occur in the not-too-distant future – let’s say within 100 years, as above. What we’ll do is figure out what probability someone might reasonably assign to the Singularity happening. To do this, observe that the probability [tex]p(C)[/tex] of the Singularity can be related to several other probabilities:

[tex] (*) \,\,\,\, p(C) = p(C|B) p(B|A) p(A). [/tex]

In this equation, [tex]p(A)[/tex] is the probability of event [tex]A[/tex], human-level artificial intelligence within 100 years. The probabilities denoted [tex]p(X|Y)[/tex] are conditional probabilities for event [tex]X[/tex] given event [tex]Y[/tex]. The truth of the equation is likely evident, and so I’ll omit the derivation – it’s a simple exercised in applying conditional probability, together with the observation that event [tex]C[/tex] can only happen if [tex]B[/tex] happens, and event [tex]B[/tex] can only happen if [tex]A[/tex] happens.

I’m not going to argue for specific values for these probabilities. Instead, I’ll argue for ranges of probabilities that I believe a person might reasonably assert for each probability on the right-hand side. I’ll consider both a hypothetical skeptic, who is pessimistic about the possibility of the Singularity, and also a hypothetical enthusiast for the Singularity. In both cases I’ll assume the person is reasonable, i.e., a person who is willing to acknowledge limits to our present-day understanding of the human brain and computer intelligence, and who is therefore not overconfident in their own predictions. By combining these ranges, we’ll get a range of probabilities that a reasonable person might assert for the probability of the Singularity.

Now, before I get into estimating ranges, it’s worth keeping in mind a psychological effect that has been confirmed over many decades: the overconfidence bias. When asked to estimate the probability of their opinions being correct, people routinely overestimate the probability. For example, in a 1960 experiment subjects were asked to estimate the probability that they could correctly spell a word. Even when people said they were 100 percent certain they could correctly spell a word, they got it right only 80 percent of the time! Similar effects have been reported for many different problems and in different situations. It is, frankly, a sobering literature to read.

This is important for us, because when it comes to both artificial intelligence and how the brain works, even the world’s leading experts don’t yet have a good understanding of how things work. Any reasonable probability estimates should factor in this lack of understanding. Someone who asserts a very high or very low probability of some event happening is implicitly asserting that they understand quite a bit about why that event will or will not happen. If they don’t have a strong understanding of the event in question, then chances are that they’re simply expressing overconfidence.

Okay, with those warnings out of the way, let’s start by thinking about [tex]p(A)[/tex]. I believe a reasonable person would choose a value for [tex]p(A)[/tex] somewhere between [tex]0.1[/tex] and [tex]0.9[/tex]. I can, for example, imagine an artificial intelligence skeptic estimating [tex]p(A) = 0.2[/tex]. But I’d have a hard time taking seriously someone who estimated [tex]p(A) = 0.01[/tex]. It seems to me that estimating [tex]p(A) = 0.01[/tex] would require some deep insight into how human thought works, and how those workings compare to modern computers, the sort of insight I simply don’t think anyone yet has. In short, it seems to me that it would indicate a serious overconfidence in one’s own understanding of the problem.

Now, it should be said that there have, of course, been a variety of arguments made against artificial intelligence. But I believe that most of the proponents of those arguments would admit that there are steps in the argument where they are not sure they are correct, but merely believe or suspect they are correct. For instance, Roger Penrose has speculated that intelligence and consciousness may require effects from quantum mechanics or quantum gravity. But I believe Penrose would admit that his conclusions relies on reasoning that even the most sympathetic would regard as quite speculative. Similar remarks apply to the other arguments I know, both for and against artificial intelligence.

What about an upper bound on [tex]p(A)[/tex]? Well, for much the same reason as in the case of the lower bound, I’d have a hard time taking seriously someone who estimated [tex]p(A) = 0.99[/tex]. Again, that would seem to me to indicate an overconfidence that there would be no bottlenecks along the road to artificial intelligence. Sure, maybe it will only require a straightforward continuation of the road we’re currently on. But, maybe some extraordinarily hard-to-engineer but as yet unknown physical effect is involved in creating artificial intelligence? I don’t think that’s likely – but, again, we don’t yet know all that much about how the brain works. Indeed, to pursue a different tack, it’s difficult to argue that there isn’t at least a few percent chance that our civilization will suffer a major regression over the next one hundred years. After all, historically nearly all civilizations last no more than a few centuries.

What about [tex]p(B|A)[/tex]? Here, again, I think a reasonable person would choose a probability between [tex]0.1[/tex] and [tex]0.9[/tex]. A probability much above [tex]0.9[/tex] discounts the idea that there’s some bottleneck we don’t yet understand that makes it very hard to bootstrap as in step [tex]B[/tex]. And a probability much below [tex]0.1[/tex] again seems like overconfidence: to hold such an opinion would, in my opinion, require some deep insight into why the bootstrapping is impossible.

What about [tex]p(C|B)[/tex]? Here, I’d go for tighter bounds: I think a reasonable person would choose a probability between [tex]0.2[/tex] and [tex]0.9[/tex].

If we put all those ranges together, we get a “reasonable” probability for the Singularity somewhere in the range of 0.2 percent – one in 500 – up to just over 70 perecent. I regard both those as extreme positions, indicating a very strong commitment to the positions espoused. For more moderate probability ranges, I’d use (say) [tex]0.2 < p(A) < 0.8[/tex], [tex]0.2 < p(B|A) < 0.8[/tex], and [tex]0.3 < p(C|B) < 0.8[/tex]. So I believe a moderate person would estimate a probability roughly in the range of 1 to 50 percent. These are interesting probability ranges. In particular, the 0.2 percent lower bound is striking. At that level, it's true that the Singularity is pretty darned unlikely. But it's still edging into the realm of a serious possibility. And to get this kind of probability estimate requires a person to hold quite an extreme set of positions, a range of positions that, in my opinion, while reasonable, requires considerable effort to defend. A less extreme person would end up with a probability estimate of a few percent or more. Given the remarkable nature of the Singularity, that's quite high. In my opinion, the main reason the Singularity has attracted some people's scorn and derision is superficial: it seems at first glance like an outlandish, science-fictional proposition. The end of the human era! It's hard to imgaine, and easy to laugh at. But any thoughtful analysis either requires one to consider the Singularity as a serious possibility, or demands a deep and carefully argued insight into why it won't happen. My book “Reinventing Discovery” will be released in 2011. It’s about the way open online collaboration is revolutionizing science. Many of the themes in the book are described in this essay. If you’d like to be notified when the book is available, please send a blank email to the.future.of.science@gmail.com with the subject “subscribe book”. You can subscribe to my blog here, and to my Twitter account here.

10th Anniversary Edition of “Quantum Computation and Quantum Information”

I’m pleased to say that a 10th anniversary edition of my book with Ike Chuang on quantum computing has just been released by Cambridge University Press (Amazon link).

Apart from expressing some authorial pleasure, the point of this post is to let people who already have a copy of the book know that the book hasn’t been substantially revised. Please don’t buy another copy under the impression that it’s an all-new edition. If you actually see a copy in “real life”, the publisher has gone to some effort to make it clear that the changes to the book are largely cosmetic – we have a new foreword, and afterword, and several people kindly contributed new endorsements, and the cover has changed a bit. But I’d hate to think that someone who already owns a copy and who orders their books online will buy a copy under the impression that the book has changed a lot. It hasn’t.

Why release a 10th Anniversary edition? The suggestion came from the publisher. So far as I understand publishing (which is not well), CUP was keen because it lets them make a renewed push on the book with booksellers. Keep in mind that the publisher’s primary customer is the booksellers, not the reader, and it’s to buyers at the booksellers that they are actually selling. I don’t understand the dynamics of sales to booksellers, but it seems that releasing an edition like this, with new marketing materials and endorsements, does result in an uptick in sales. Rather pleasantly, it also drops the recommended retail price substantially (from US 100 to US 75), so if you’ve been put off by the price, now is a good time to buy. (Admittedly, the way Amazon discounts books, it ends up not making that much difference if you buy from Amazon.)

How hard is space travel, in principle?

Our Earth-based intuitions about the difficulty of travelling long distances break down badly in space. It’s tempting to think that just because Saturn, say, is nearly 4,000 times further away than the moon, it must be 4,000 times more difficult to reach. After all, on Earth it takes about 10 times as much work to go 10 kilometers as it does to go 1 kilometer. But in space, where there is no friction, this intuition is entirely wrong. In fact, in this essay I’ll show that with some important caveats the situation is far more favourable, and it doesn’t take all that much more energy to get to the outer planets than it does to get to low-Earth orbit.

The numbers are striking. We’ll see that it takes about 10 times as much energy to get to the moon as it does to get to low-Earth orbit. About 4 times more energy than that will get you to Mars, despite the fact that Mars is more than 200 times further away than the moon. Tripling that energy gets you all the way out to Saturn! So in some sense, the gap between getting to the moon and getting to Saturn isn’t really all that much different from the difference between getting into low-Earth orbit and getting to the moon.

The calculations required to obtain these results are all simple calculations in Newtonian gravitation, and they’re done in the next section. If you want to skip the details of the calculations, you should move to the final section, where I discuss some important caveats. In particular, we’ll see that just because getting to Saturn requires only 12 times as much energy as getting to the moon doesn’t mean that we only need to build rockets twelve times as big. The situation is more complex than that, and depends on the details of the propulsion technologies used.

Calculations

To see why the numbers I quoted above are true, we need to figure out how much energy is required to get into low-Earth orbit. I will assume that the only obstacle to getting there is overcoming the Earth’s gravitation. In actual fact, of course, we need to overcome many other forces as well, most notably atmospheric friction, which adds considerably to the energy cost. I’ll ignore those extra forces – because the cost of getting to low-Earth orbit is the yardstick I’m using to compare with other possibilities this will mean that my analysis is actually quite pessimistic.

Suppose we have a mass [tex]m[/tex] that we want to send into space. Let’s use [tex]m_E[/tex] to denote the mass of the Earth. We suppose the radius of the Earth is [tex]r_E[/tex] and that we want to send our mass to a radius [tex]d r_E[/tex] for some [tex]d > 1[/tex]. We’ll call [tex]d[/tex] the distance parameter – it’s the distance from the center of the Earth that we want to send our mass to, measured in units of the Earth’s radius. For instance, if we want to send the mass to a height of [tex]600[/tex] kilometers above the Earth’s surface, then [tex]d = 1.1[/tex], since the Earth’s radius is about [tex]6,000[/tex] kilometers. The energy cost to do this is [tex]Gm_Em/r_E-Gm_Em/ d r_E[/tex], where [tex]G[/tex] is Newton’s gravitational constant. We can rewrite this energy cost as:

[tex]e_d = \frac{Gm_em}{r_E} (1-1/d)[/tex]

We can use this formula to analyse both the energy cost to send an object to low-Earth orbit and also to the moon. (We can do this because in going to the moon the main barrier to overcome is also the Earth’s gravitation). Suppose [tex]d_l[/tex] is the distance parameter for low-Earth orbit, let’s say [tex]d_l = 1.1[/tex], as above, and [tex]d_m[/tex] is the distance parameter for the moon, [tex]d_m = 60[/tex]. Then the ratio of the energy cost to go to the moon as opposed to the energy cost for low-Earth orbit is:

[tex]\frac{e_{d_m}}{e_{d_l}} = \frac{1-1/d_m}{1-1/d_l} = 10[/tex].

That is, it takes [tex]10[/tex] times as much energy to get to the moon as to get to low-Earth orbit.

What about if we want to go further away, outside the influence of the Earth’s gravitational field, but still within the sun’s gravitational field? For instance, what if we want to go to Mars or Saturn? For those cases we’ll do a similar calculation, but in terms of parameters relevant to the gravitational field of the sun, rather than the Earth – to avoid confusion, I’ll switch to using upper case letters to denote those parameters, where before I used lower-case letters. In particular, let’s use [tex]M_s[/tex] to denote the mass of the sun, and [tex]R_e[/tex] to denote the radius at which the Earth orbits the sun. Let’s suppose we want to send a mass [tex]m[/tex] to a distance [tex]D R_e[/tex] from the sun, i.e., now we’re measuring distance in units of the radius of the Earth’s orbit around the sun, not the radius of the Earth itself. The energy cost to do this is [1]:

[tex]E_D = \frac{GM_sm}{R_e}(1-1/D)[/tex]

The ratio of the energy required to go to a distance [tex]DR_E[/tex] from the sun versus the energy required to reach low-Earth orbit is thus:

[tex]\frac{E_{D}}{e_{d_l}} = \frac{M_sr_e}{m_e R_e}\frac{1-1/D}{1-1/d_l}[/tex]

Now the sun is about [tex]330,000[/tex] times as massive as the Earth. And the radius of the Earth’s orbit around the sun is about [tex]25,000[/tex] times the radius of the Earth. Substituting those values we see that:

[tex]\frac{E_{D}}{e_{d_l}} = 13 \frac{1-1/D}{1-1/d_l}[/tex]

For Mars, [tex]D[/tex] is roughly [tex]1.5[/tex], and this tells us that the energy cost to get to Mars is roughly [tex]40[/tex] times the energy cost to get to low-Earth orbit (but see footnote [1], below, if you haven’t already). For Saturn, [tex]D[/tex] is roughly [tex]10[/tex], and so the energy cost to get to Saturn is roughly [tex]120[/tex] times the energy cost to get to low-Earth orbit. For the stars, [tex]D[/tex] is infinity, or close enough to make no difference, and so the energy cost to get to the stars is only about 10 percent more than the cost to get to Saturn! Of course, with current propulsion technologies it might take rather a long time to get to the stars.

Caveats

There are important caveats to the results above. Just because sending a payload to Mars only requires giving it about four times more energy than sending it to the moon, it doesn’t follow that if you want to send it on a rocket to Mars then you’ll only need about four times as much rocket fuel. In fact, nearly all of the energy expended by modern rockets goes into lifting the rocket fuel itself, and only a small amount into the payload. An unfortunate consequence of this is that you need a lot more than four times as much fuel – either that, or a much more efficient propulsion system than the rockets we have today. In a way, we’ve been both very lucky and very unlucky with our rockets. We’ve been lucky because our propulsion systems are just good enough to be able to carry tiny payloads into space, using enormous quantities of fuel. And we’ve been unlucky because to give those payloads even a tiny bit more of the energy they need to go further requires a lot more fuel. See, e.g., ref,ref and the pointers therein for more discussion of these points.

The extent to which any of this is a problem depends on your launch technology. Ideas like space guns and the space elevator don’t require any fuel to be carried along with the payload, and so escape the above problem. Of course, they’re also still pretty speculative ideas at this point! Still, I think it’s an interesting observation that the energy required to get a large mass to a distant part of the solar system is not, in principle, all that far beyond what we’ve already achieved in getting to the moon.

Footnote

[1] Note that the object needs to first escape out of the Earth’s gravitational field, and this imposes an extra energy cost. This extra cost is roughly 10 times the cost of getting to low Earth orbit, by a calculation similar to that we did for getting to the moon. Strictly speaking, this energy cost should be added on to the numbers we’ll derive later for Mars and Saturn. But for the rough-and-ready calculation I’m doing, I won’t worry about it – we’re trying to get a sense for orders of magnitude here, not really detailed numbers!

You can subscribe to my blog here, and to my Twitter account here.

The mismeasurement of science

Albert Einstein’s greatest scientific “blunder” (his word) came as a sequel to his greatest scientific achievement. That achievement was his theory of gravity, the general theory of relativity, which he introduced in 1915. Two years later, in 1917, Einstein ran into a problem while trying to apply general relativity to the Universe as a whole. At the time, Einstein believed that on large scales the Universe is static and unchanging. But he realized that general relativity predicts that such a Universe can’t exist: it would spontaneously collapse in on itself. To solve this problem, Einstein modified the equations of general relativity, adding an extra term involving what is called the “cosmological constant”, which, roughly speaking, is a type of pressure which keeps a static Universe from collapsing.

Twelve years later, in 1929, Edwin Hubble discovered that the Universe isn’t static and unchanging, but is actually expanding. Upon hearing the news, Einstein quickly realized that if he’d taken his original 1915 theory seriously, he could have used it to predict the expansion that Hubble had observed. That would have been one of the great theoretical predictions of all time! It was this realization that led Einstein to describe the cosmological constant as the “biggest blunder” of his life.

The story doesn’t end there. Nearly seven decades later, in 1998, two teams of astronomers independently made some very precise measurements of the expansion of the Universe, and discovered that there really is a need for the cosmological constant (ref,ref). Einstein’s “biggest blunder” was, in fact, one of his most prescient achievements.

The point of the story of the cosmological constant is not that Einstein was a fool. Rather, the point is that it’s very, very difficult for even the best scientists to accurately assess the value of scientific discoveries. Science is filled with examples of major discoveries that were initially underappreciated. Alexander Fleming abandoned his work on penicillin. Max Born won the Nobel Prize in physics for a footnote he added in proof to a paper – a footnote that explains how the quantum mechanical wavefunction is connected to probabilities. That’s perhaps the most important idea anyone had in twentieth century physics. Assessing science is hard.

The problem of measuring science

Assessing science may be hard, but it’s also something we do constantly. Countries such as the United Kingdom and Australia have introduced costly and time-consuming research assessment exercises to judge the quality of scientific work done in those countries. In just the past few years, many new metrics purporting to measure the value of scientific research have been proposed, such as the h-index, the g-index, and many more. In June of 2010, the journal Nature ran a special issue on such metrics. Indeed, an entire field of scientometrics is being developed to measure science, and there are roughly 1,500 professional scientometricians.

There’s a slightly surreal quality to all this activity. If even Einstein demonstrably made enormous mistakes in judging his own research, why are the rest of us trying to measure the value of science systematically, and even organizing the scientific systems of entire countries around these attempts? Isn’t the lesson of the Einstein story that we shouldn’t believe anyone who claims to be able to reliably assess the value of science? Of course, the problem is that while it may be near-impossible to accurately evaluate scientific work, as a practical matter we are forced to make such evaluations. Every time a committee decides to award or decline a grant, or to hire or not hire a scientist, they are making a judgement about the relative worth of different scientific work. And so our society has evolved a mix of customs and institutions and technologies to answer the fundamental question: how should we allocate resources to science? The answer we give to that question is changing rapidly today, as metrics such as citation count and the h-index take on a more prominent role. In 2006, for example, the UK Government proposed changing their research assessment exercise so that it could be done in a largely automated fashion, using citation-based metrics. The proposal was eventually dropped, but nonetheless the UK proposal is a good example of the rise of metrics.

In this essay I argue that heavy reliance on a small number of metrics is bad for science. Of course, many people have previously criticised metrics such as citation count or the h-index. Such criticisms tend to fall into one of two categories. In the first category are criticisms of the properties of particular metrics, for example, that they undervalue pioneer work, or that they unfairly disadvantage particular fields. In the second category are criticisms of the entire notion of quantitatively measuring science. My argument differs from both these types of arguments. I accept that metrics in some form are inevitable – after all, as I said above, every granting or hiring committee is effectively using a metric every time they make a decision. My argument instead is essentially an argument against homogeneity in the evaluation of science: it’s not the use of metrics I’m objecting to, per se, rather it’s the idea that a relatively small number of metrics may become broadly influential. I shall argue that it’s much better if the system is very diverse, with all sorts of different ways being used to evaluate science. Crucially, my argument is independent of the details of what metrics are being broadly adopted: no matter how well-designed a particular metric may be, we shall see that it would be better to use a more heterogeneous system.

As a final word before we get to the details of the argument, I should perhaps mention my own prejudice about the evaluation of science, which is the probably not-very-controversial view that the best way to evaluate science is to ask a few knowledgeable, independent- and broad-minded people to take a really deep look at the primary research, and to report their opinion, preferably while keeping in mind the story of Einstein and the cosmological constant. Unfortunately, such a process is often not practically feasible.

Three problems with centralized metrics

I’ll use the term centralized metric as a shorthand for any metric which is applied broadly within the scientific community. Examples today include the h-index, the total number of papers published, and total citation count. I use this terminology in part because such metrics are often imposed by powerful central agencies – recall the UK government’s proposal to use a citation-based scheme to assess UK research. Of course, it’s also possible for a metric to be used broadly across science, without being imposed by any central agency. This is happening increasingly with the h-index, and has happened in the past with metrics such as the number of papers published, and the number of citations. In such cases, even though the metric may not be imposed by any central agency, it is still a central point of failure, and so the term “centralized metric” is appropriate. In this section, I describe three ways centralized metrics can inhibit science.

Centralized metrics suppress cognitive diversity: Over the past decade the complexity theorist Scott Page and his collaborators have proved some remarkable results about the use of metrics to identify the “best” people to solve a problem (ref,ref). Here’s the scenario Page and company consider. Suppose you have a difficult creative problem you want solved – let’s say, finding a quantum theory of gravity. Let’s also suppose that there are 1,000 people worldwide who want to work on the problem, but you have funding to support only 50 people. How should you pick those 50? One way to do it is to design a metric to identify which people are best suited to solve the problem, and then to pick the 50 highest-scoring people according to that metric. What Page and company showed is that it’s sometimes actually better to choose 50 people at random. That sounds impossible, but it’s true for a simple reason: selecting only the highest scorers will suppress cognitive diversity that might be essential to solving the problem. Suppose, for example, that the pool of 1,000 people contains a few mathematicians who are experts in the mathematical field of stochastic processes, but who know little about the topics usually believed to be connected to quantum gravity. Perhaps, however, unbeknownst to us, expertise in stochastic processes is actually critical to solving the problem of quantum gravity. If you pick the 50 “best” people according to your metric it’s likely that you’ll miss that crucial expertise. But if you pick 50 people at random you’ve got a chance of picking up that crucial expertise [1]. Richard Feynman made a similar point in a talk he gave shortly after receiving the Nobel Prize in physics (ref):

If you give more money to theoretical physics it doesn’t do any good if it just increases the number of guys following the comet head. So it’s necessary to increase the amount of variety… and the only way to do it is to implore you few guys to take a risk with your lives that you will never be heard of again, and go off in the wild blue yonder and see if you can figure it out.

What makes Page and company’s result so striking is that they gave a convincing general argument showing that this phenomenon occurs for any metric at all. They dubbed the result the diversity-trumps-ability theorem. Of course, exactly when the conclusion of the theorem applies depends on many factors, including the nature of the cognitive diversity in the larger group, the details of the problem, and the details of the metric. In particular, it depends strongly on something we can’t know in advance: how much or what type of cognitive diversity is needed to solve the problem at hand. The key point, though, is that it’s dangerously naive to believe that doing good science is just a matter of picking the right metric, and then selecting the top people according to that metric. No matter what the metric, it’ll suppress cognitive diversity. And that may mean suppressing knowledge crucial to solving the problem at hand.

Centralized metrics create perverse incentives: Imagine, for the sake of argument, that the US National Science Foundation (NSF) wanted to encourage scientists to use YouTube videos as a way of sharing scientific results. The videos could, for example, be used as a way of explaining crucial-but-hard-to-verbally-describe details of experiments. To encourage the use of videos, the NSF announces that from now on they’d like grant applications to include viewing statistics for YouTube videos as a metric for the impact of prior research. Now, this proposal obviously has many problems, but for the sake of argument please just imagine it was being done. Suppose also that after this policy was implemented a new video service came online that was far better than YouTube. If the new service was good enough then people in the general consumer market would quickly switch to the new service. But even if the new service was far better than YouTube, most scientists – at least those with any interest in NSF funding – wouldn’t switch until the NSF changed its policy. Meanwhile, the NSF would have little reason to change their policy, until lots of scientists were using the new service. In short, this centralized metric would incentivize scientists to use inferior systems, and so inhibit them from using the best tools.

The YouTube example is perhaps fanciful, at least today, but similar problems do already occur. At many institutions scientists are rewarded for publishing in “top-tier” journals, according to some central list, and penalized for publishing in “lower-tier” journals. For example, faculty at Qatar University are given a reward of 3,000 Qatari Rials (US $820) for each impact factor point of a journal they publish in. If broadly applied, this sort of incentive would creates all sorts of problems. For instance, new journals in exciting emerging fields are likely to be establishing themselves, and so have a lower impact factor. So the effect of this scheme will be to disincentivize scientists from participating in new fields; the newer the field, the greater the disincentive! Any time we create a centralized metric, we yoke the way science is done to that metric.

Centralized metrics misallocate resources: One of the causes of the financial crash of 2008 was a serious mistake made by rating agencies such as Moody’s, S&P, and Fitch. The mistake was to systematically underestimate the risk of investing in financial instruments derived from housing mortgages. Because so many investors relied on the rating agencies to make investment decisions, the erroneous ratings caused an enormous misallocation of capital, which propped up a bubble in the housing market. It was only after homeowners began to default on their mortgages in unusually large numbers that the market realized that the ratings agencies were mistaken, and the bubble collapsed. It’s easy to blame the rating agencies for this collapse, but this kind of misallocation of resources is inevitable in any system which relies on centralized decision-making. The reason is that any mistakes made at the central point, no matter how small, then spread and affect the entire system.

In science, centralization also leads to a misallocation of resources. We’ve already seen two examples of how this can occur: the suppression of cognitive diversity, and the creation of perverse incentives. The problem is exacerbated by the fact that science has few mechanisms to correct the misallocation of resources. Consider, for example, the long-term fate of many fashionable fields. Such fields typically become fashionable as the result of some breakthrough result that opens up many new research possiblities. Encouraged by that breakthrough, grant agencies begin to invest heavily in the field, creating a new class of scientists (and grant agents) whose professional success is tied not just to the past success of the field, but also to the future success of the field. Money gets poured in, more and more people pursue the area, students are trained, and go on to positions of their own. In short, the field expands rapidly. Initially this expansion may be justified, but even after the field stagnates, there are few structural mechanisms to slow continued expansion. Effectively, there is a bubble in such fields, while less fashionable ideas remain underfunded as a result. Furthermore, we should expect such scientific bubbles to be more common than bubbles in the financial market, because decision making is more centralized in science. We should also expect scientific bubbles to last longer, since, unlike financial bubbles, there are few forces able to pop a bubble in science; there’s no analogue to the homeowner defaults to correct the misallocation of resources. Indeed, funding agencies can prop up stagnant fields of research for decades, in large part because the people paying the cost of the bubble – usually, the taxpayers – are too isolated from the consequences to realize that their money is being wasted.

One metric to rule them all

No-one sensible would staff a company by simply applying an IQ test and employing whoever scored highest (c.f., though, ref). And yet there are some in the scientific community who seem to want to move toward staffing scientific institutions by whoever scores highest according to the metrical flavour-of-the-month. If there is one point to take away from this essay it is this: beware of anyone advocating or working toward the one “correct” metric for science. It’s certainly a good thing to work toward a better understanding of how to evaluate science, but it’s easy for enthusiasts of scientometrics to believe that they’ve found (or will soon find) the answer, the one metric to rule them all, and that that metric should henceforth be broadly used to assess scientific work. I believe we should strongly resist this approach, and aim instead to both improve our understanding of how to assess science, and also to ensure considerable heterogeneity in how decisions are made.

One tentative idea I have which might help address this problem is to democratize the creation of new metrics. This can happen if open science becomes the norm, so scientific results are openly accessible, online, making it possible, at least in principle, for anyone to develop new metrics. That sort of development will lead to a healthy proliferation of different ideas about what constitutes “good science”. Of course, if this happens then I expect it will lead to a certain amount of “metric fatigue” as people develop many different ways of measuring science, and there will be calls to just settle down on one standard metric. I hope those calls aren’t heeded. If science is to be anything more than lots of people following the comet head, we need to encourage people to move in different directions, and that means valuing many different ways of doing science.

Update: After posting this I Googled my title, out of curiosity to see if it had been used before. I found an interesting article by Peter Lawrence, which is likely of interest to anyone who enjoyed this essay.

Acknowledgements

Thanks to Jen Dodd and Hassan Masum for many useful comments. This is a draft of an essay to appear in a forthcoming volume on reputation systems, edited by Hassan Masum and Mark Tovey.

Footnotes

[1] Sometimes an even better strategy will be a mixed strategy, e.g., picking the top 40 people according to the metric, and also picking 10 at random. So far as I know this kind of mixed strategy hasn’t been studied. It’s difficult to imagine that the proposal to pick, say, one in five faculty members completely at random is going to receive much support at Universities, no matter how well founded the proposal may be. We have too much intuitive sympathy for the notion that the best way to generate global optima is to locally optimize. Incidentally, the success of such mixed strategies is closely related to the phenomenon of stochastic resonance, wherein adding a noise to a system can sometimes improve its performance.

My book “Reinventing Discovery” will be released in 2011. It’s about the way open online collaboration is revolutionizing science. A summary of many of the themes in the book is available in this essay. If you’d like to be notified when the book is available, please send a blank email to the.future.of.science@gmail.com with the subject “subscribe book”. You can subscribe to my blog here, and to my Twitter account here.

Cameron Neylon on practical steps toward open science

Cameron Neylon is a scientist at the UK Science and Technology Facilities Council, an open notebook scientist, and one of the most thoughtful advocates of open science (blog, Twitter). In an email interview I asked Cameron a few questions about practical steps that can be taken toward open science:

Q1: Suppose you’ve just been invited by the head of a major funding agency to advise them on open science. They’re asking for two or three practical suggestions for how they can help move us toward a more open scientific culture. What would you tell them?

For me the key first question is to ask what they see as their mission to maximise, and then seek to measure that effectively. I think there are two different main classes of reason why funders support science. One is to build up knowledge, and the other is to support the generation of economic and social outcomes from research and innovation. A third (but often brushed under the carpet) target is prestige – sometimes the implicit target of small research funders, or those from emerging and transitional economies, seeking to find a place on the global research stage. Nothing wrong with this but if that is the target you should optimise for that. If you want other outcomes you should optimise for that.

Our current metrics and policies largely optimise for prestige rather than knowledge building or social outcomes. On the assumption that most funders would choose one of these two outcomes as their mission I would say that the simple things to do are to actively measure and ask fundees to report on these things.

For knowledge building: Ask about, and measure the use and re-use of research outputs. Has data been re-used, is software being incorporated into other projects, are papers being cited and woven tightly into the networks of influence that we can now start to measure with more sophisticated analysis tools?

For social and economic outcomes: Similar to above but look more explicitly for real measurable outcomes. Evidence of influence over policy, measure of real economic activity generated by outputs (not just numbers of spin out companies), development of new treatment regimes.

Both of these largely seek to measure re-use as opposed to counting outputs. This is arguably not simple but as the aim is to re-align community
attitudes and encourage changes in behaviour its not going to be simple. However this kind of approach takes what we are already doing, and the direction it is taking us in terms of measuring “impact” and makes it more sophisticated.

Asking researchers to report on these, and actively measuring them, will in and of itself lead to greater consideration of these broader impacts and change behaviour with regard to sharing. For some period between 18 months and three years simply collect and observe. Then look at how those who are doing best on specific metrics and seek to capture best practice and implement policies to support it.

Throughout all of this accept that as research becomes less directed or applied that the measurement becomes harder, the error margins larger, and picking of winners (already difficult) near impossible. Consider mechanism to provide baseline funding at some low level, perhaps at the level of 25-50% of a PhD studentship or technician, direct to researchers with no restrictions on use, across disciplines with the aim of maintaining diversity, encouraging exploration, and maintaining capacity. This is both
politically and technically difficult but could have large dividends if the right balance is found. If it drops below an amount which can be useful when combined between a few researchers it is probably not worth it.

Q2: Suppose a chemist early in their career has just approached you. They’re inspired by the idea of open science, but want to know what exactly they can do. How can they get involved in a concrete way?

Any young researcher I speak to today I would say to do three things:

1) Write as much as possible, online and off, in as many different ways as possible. Writing is the transferable skill and people who do it well will always find good employment.

2) Become as good a programmer/software engineer/web developer as possible. A great way to contribute to any project is to be able to take existing tools and adapt them quickly to local needs.

3) Be as open as you can (or as your supervisor will allow you to) about communicating all of the things you are doing. The next stage of your career will depend on who has a favourable impression of what you’ve done. The papers will be important, but not as important as personal connections you can make through your work.

In concrete terms:

1) Start a blog (ask for explicit guidelines from supervisors and institutions about limitations on what you should write about). Contribute to wikipedia. Put your presentations on slideshare, and screencasts and videos of talks online.

2) To the extent that it is possible maintain your data, outputs, and research records in a way that when a decision is taken to publish (whether in a paper, informally on the web or anything in between) that it is easy to do so in a useful way. Go to online forums to find out what tools others find useful and see how they work for you. Include links to real data and records in your research talks

3) Get informed about data licensing and copyright. Find the state of the art in arguments around scooping, data management, and publication, and arm yourself with evidence. Be prepared to raise issues of Open Access publication, data publication, licensing and copyright in group discussions. Expect that you will rarely win these arguments but that you are planting ideas in people’s heads.

Above all, to the extent that you can, walk the walk. Tell stories of successes and failures in sharing. Acknowledge that its complicated but provide access to data, tools, software and records where you can. Don’t act unilaterally unless you have the rights to do so but keep asking whether you can act and explaining why you think its important. Question the status quo.

Q3: One of the things that has made the technology startup world so vibrant is that there’s an enormous innovation ecoystem around startups – they benefit from venture capital, from angel investors, from open source, from University training of students, and so on. That’s part of the reason a couple of students can start Google in a garage, and then take it all the way to being one of the largest companies in the world. At the moment, there is no comparably successful innovation ecosystem in science. Is there a way we can develop such an innovation ecosystem?

There are two problems with taking the silicon valley model into science. Firstly capital and consumable costs are much higher. Particularly today with on demand consumer services it is cheap and easy to scale a web based startup. Secondly the timeframes are much longer. A Foursquare or a Yelp can be expected to start demonstrating revenue streams in 18-24 months whereas research is likely to take much longer. A related timeframe issue is that the expertise required to contribute across these web based startups is relatively common and widespread in comparison with the highly focussed and often highly localised expertise required to solve specific research problems.

Some research could fit this model, particularly analytical tool development, and data intensive science, and certainly it should be applied where it can. More generally applying this kind of model will require cheap access to infrastructure and technical capacity (instruments and materials). Some providers in the biosciences are starting to appear and Creative Commons’ work on MTAs [materials transfer agreements] may help with materials access in the medium term.

The most critical issue however is rapid deployment of expertise to specific problems. To apply a distributed rapid innovation model we need the means to rapidly identify the very limited number of people with appropriate expertise to solve the problem at hand. We also need to rethink our research processes to make them more modular so that they can be divided up and distributed. Finally we need capacity in the system that makes it possible for expertise to actually be rapidly deployed. Its not clear to me how we achieve these goals although things like Innocentive, CoLab, Friendfeed, and others are pointing out potential directions. We are a long way from delivering on the promise and its not clear what a practical route there is.

Practical steps: more effective communication mechanisms will be driven by rewarding people for re-use of their work. Capacity can be added by baseline funding. Modularity is an attitude and a design approach which we will probably need to build into training and will be hard to do in a community where everything is bespoke and great pride is taken in eating our own dogfood but never trusting anyone else’s…

“Collective Intelligence”, by Pierre Levy

More from Pierre Levy’s book Collective Intelligence: mankind’s emerging world in cyberspace, translated by Robert Bononno.

One reason the book is notable is that, so far as I know, it was the first to really develop the term “collective intelligence”. Levy was writing in the mid-1990s, and others had, of course, both used the term before, and also developed related notions. But Levy seems to be the first to have really riffed on the term collective intelligence. Here’s Levy’s definition, and some additional commentary:

What is collective intelligence? It is a form of universally distributed intelligence, constantly enhanced, coordinated in real time, and resulting in the effective mobilization of skills… My initial premise is based on the notion of a universally distributed intelligence. No one knows everything, everyone knows something, all knowledge resides in humanity… New communications systems should provide members of a community with the means to coordinate their interactions within the same virtual universe of knowledge. This is not simply a matter of modeling the conventional physical environment, but of of enabling members of delocalized communities to interact within a mobile landscape of signification… Before we can mobilize skills, we have to identify them. And to do so, we have to recognize them in all their diversity… The ideal of collective intelligence implies the technical, economic, legal, and human enhancement of a universally distributed intelligence that will unleash a positive dynamic of recognition and skills mobilization.

Here’s Levy on the future of the economy:

What remains after we have mechanized agriculture, industry and messaging technologies? The economy will center, as it does already, on that which can never be fully automated, on that which is irreducible: the production of the social bond, the relational… Those who manufacture things will become scarcer and scarcer, and their labour will become mechanized, augmented, automated to a greater and greater extent…. The final frontier will be the human itself, that which can’t be automated: the creation of sensible worlds, invention, relation, the continuous recreation of the community… What kind of engineering will best meet the needs of a growing economy of human qualities?

It’s a provocative thought, although I don’t find it convincing. It’s true that the social bond is increasing in importance, as some other things become less scarce, but other scarcities remain as well.

I liked the following comment of Levy on democracy – it’s incidental to his main point, but nicely distilled an idea for me:

[Democracy] is favored not because it establishes the domination of a majority over a minority, but because it limits the power of government and provides remedies against the arbitrary use of power.

A final quote:

The greater the number of collective intellects with which an individual is involved, the more opportunities he has to diversify his knowledge and desire.

The downside of this may be a kind of glorified dilettantism. But the upside – as so often, the more interesting aspect of events – is the possibility of becoming deeply familiar with many more communities of practice.

Introduction to the Polymath Project and “Density Hales-Jewett and Moser Numbers”

In January of 2009, Tim Gowers initiated an experiment in massively collaborative mathematics, the Polymath Project. The initial stage of this project was extremely successful, and led to two scientific papers: “A new proof of the density Hales-Jewett theorem” and “Density Hales-Jewett and Moser numbers”. The second of these papers will soon appear in a birthday volume in honour of Endre Szemeredi. The editor of the Szemeredi birthday volume, Jozsef Solymosi, invited me to submit an introduction to that paper, and to the Polymath Project more generally. The following is a draft of my introductory piece. I’d be very interested in hearing feedback. Note that the early parts of the article briefly discuss some mathematics, but if you’re not mathematically inclined the remainder of the article should be comprehensible. Many of the themes of the article will be discussed at much greater length in my book about open science, “Reinventing Discovery”, to be published early in 2011.

At first appearance, the paper which follows this essay appears to be a typical mathematical paper. It poses and partially answers several combinatorial questions, and follows the standard forms of mathematical discourse, with theorems, proofs, conjectures, and so on. Appearances are deceiving, however, for the paper has an unusual origin, a clue to which is in the name of the author, one D. H. J. Polymath. Behind this unusual name is a bold experiment in how mathematics is done. This experiment was initiated in January of 2009 by W. Timothy Gowers, and was an experiment in what Gowers termed “massively collaborative mathematics”. The idea, in brief, was to attempt to solve a mathematical research problem working entirely in the open, using Gowers’s blog as a medium for mathematical collaboration. The hope was that a large number of mathematicians would contribute, and that their collective intelligence would make easy work of what would ordinarily be a difficult problem. Gowers dubbed the project the “Polymath Project”. In this essay I describe how the Polymath Project proceeded, and reflect on similarities to online collaborations in the open source and open science communities. Although I followed the Polymath Project closely, my background is in theoretical physics, not combinatorics, and so I did not participate directly in the mathematical discussions. The perspective is that of an interested outsider, one whose main creative interests are in open science and collective intelligence.

Gowers began the Polymath Project with a description of the problem to be attacked (see below), a list of rules of collaboration, and a list of 38 brief observations he’d made about the problem, intended to serve as starting inspiration for discussion. At that point, on February 1, 2009, other people were invited to contribute their thoughts on the problem. Anyone with an interest and an internet connection could follow along and, if they wished, contribute their ideas in the comment section of Gowers’s blog. In just the first 24 hours after Gowers opened his blog up for discussion, six people offered 24 comments. In a sign of things to come, those contributors came from four countries on three continents, and included a high-school teacher, a graduate student, and four professors of mathematics. A collaboration was underway, a collaboration which expanded in the weeks that followed to involve more than twenty people.

The problem originally posed by Gowers was to investigate a new approach to a special case of the density Hales-Jewett theorem (DHJ). Let me briefly describe the statement of the theorem, before describing the special case Gowers proposed to attack. Let [tex][k]^n[/tex] be the set of all length [tex]n[/tex] strings over the alphabet [tex]1,2,\ldots,k[/tex]. A combinatorial line is a set of [tex]k[/tex] points in [tex][k]^n[/tex], formed by taking a string with one or more wildcards (“[tex]x[/tex]”) in it, e.g., 14x1xx3, and replacing those wildcards by [tex]1, 2,\ldots,k[/tex], respectively. In the example I’ve given, the resulting combinatorial line is: [tex]\{ 1411113, 1421223, \ldots, 14k1kk3 \}[/tex]. The density Hales-Jewett theorem says that as [tex]n[/tex] becomes large, even very low density subsets of [tex][k]^n[/tex] must contain a combinatorial line. More precisely, let us define the density Hales-Jewett number [tex]c_{n,k}[/tex] to be the size of the largest subset of [tex][k]^n[/tex] which does not contain a combinatorial line. Then the density Hales-Jewett theorem may be stated as:

Theorem (DHJ): [tex]\lim_{n\rightarrow \infty} c_{n,k}/k^n = 0[/tex].

DHJ was originally proved in 1991 by Furstenberg and Katznelson, using techniques from ergodic theory. Gowers proposed to find a combinatorial proof of the [tex]k=3[/tex] case of the theorem, using a strategy that he outlined on his blog. As the Polymath Project progressed, that goal gradually evolved. Four days after Gowers opened his blog up for discussion, Terence Tao used his blog to start a discussion aimed at understanding the behaviour of [tex]c_{n,3}[/tex] for small [tex]n[/tex]. This discussion rapidly gained momentum, and the Polymath Project split into two subprojects, largely carried out, respectively, on Gowers’s blog and Tao’s blog. The first subproject pursued and eventually found an elementary combinatorial proof of the full DHJ theorem. The results of second subproject are described in the paper which follows, “Density Hales-Jewett and Moser Numbers”. As mentioned, this second subproject began with the goal of understanding the behaviour of [tex]c_{n,3}[/tex] for small [tex]n[/tex]. It gradually broadened to consider several related questions, including the behaviour of [tex]c_{n,k}[/tex] for small [tex]n[/tex] and [tex]k[/tex], as well as the behaviour of the Moser numbers, [tex]c_{n,k}'[/tex], defined to be the size of the largest subset of [tex][k]^n[/tex] which contains no geometric line. As for a combinatorial line, a geometric line is defined by taking a strinq in [tex][k]^n[/tex] with one or more wildcard characters present. But unlike a combinatorial line, there are two distinct types of wildcards allowed (“[tex]x[/tex]” and “[tex]\overline x[/tex]”), with [tex]x[/tex] taken to vary over the range [tex]1,\ldots,k[/tex], and [tex]\overline x = k+1-x[/tex]. So, for example, [tex]13x\overline x2[/tex] generates the geometric line [tex]\{131k2,132(k-1)2,\ldots,13k12\}[/tex].

Both subprojects of the Polymath Project progressed quickly. On March 10, Gowers announced that he was confident that the polymaths had found a new combinatorial proof of DHJ. Just 37 days had passed since the collaboration began, and 27 people had contributed approximately 800 mathematical comments, containing 170,000 words. Much work remained to be done, but the original goal had already been surpassed, and this was a major milestone for the first subproject. By contrast, the goals of the second subproject were more open-ended, and no similarly decisive announcement was possible. Work on both continued for months thereafter, gradually shifting to focus on the writeup of results for publication.

Although the Polymath Project is unusual from the perspective of current practice in mathematics, there is another perspective from which it does not appear so unusual. That is the tradition of open source software development in the computer programming community. Perhaps the best known example of open source software is the Linux operating system. Begun by Linus Torvalds in 1991 as a hobby project, Linux has since grown to become one of the world’s most popular operating systems. Although not as widely used in the consumer market as Microsoft Windows, Linux is the primary operating system used on the giant computer clusters at companies such as Google, Yahoo! and Amazon, and also dominates in markets such as the movie industry, where it plays a major role at companies such as Dreamworks and Pixar.

A key feature of Linux is that, unlike proprietary software such as Microsoft Windows, the original source code for the operating system is freely available to be downloaded and modified. In his original message announcing Linux, Torvalds commented that “I’ve enjouyed [sic] doing it, and somebody might enjoy looking at it and even modifying it for their own needs. It is still small enough to understand, use and modify, and I’m looking forward to any comments you might have.” Because he had made the code publicly available, other people could add features if they desired. People began emailing code to Torvalds, who incorporated the changes he liked best into the main Linux code base. A Linux kernel discussion group was set up to co-ordinate work, and the number of people contributing code to Linux gradually increased. By 1994, 80 people were named in the Linux credits file as contributors.

Today, nearly twenty years later, Linux has grown enormously. The kernel of Linux contains 13 million lines of code. On an average day in 2007 and 2008, Linux developers added 4,300 lines of code, deleted 1,800 lines, and modified 1,500 lines. The social processes and tools used to create Linux have also changed enormously. In its early days, Linux used off-the-shelf tools and ad hoc social processes to manage development. But as Linux and the broader open source community have grown, that community has developed increasingly powerful tools to share and integrate code, and to manage discussion of development. They have also evolved increasingly sophisticated social structures to govern the process of large-scale open source development. None of this was anticipated at the outset by Torvalds – in 2003 he said “If someone had told me 12 years ago what would happen, I’d have been flabbergasted” – but instead happened organically.

Linux is just one project in a much broader ecosystem of open source projects. Deshpande and Riehle have conservatively estimated that more than a billion lines of open source software have been written, and more than 300 million lines are being added each year. Many of these are single-person projects, often abandoned soon after being initiated. But there are hundreds and perhaps thousands of projects with many active developers.

Although it began in the programming community, the open source collaboration process can in principle be applied to any digital artifact. It’s possible, for example, for a synthetic biologist to do open source biology, by freely sharing their DNA designs for living things, and then allowing others to contribute back changes that improve upon those designs. It’s possible for an architect to do open source architecture, by sharing design files, and then accepting contributions back from others. And, it’s possible to write an open source encyclopedia, by freely sharing the text of articles, and making it possible for others to contribute back changes. That’s how Wikipedia was written: Wikipedia is an open source project.

The Polymath Project is a natural extension of open source collaboration to mathematics. At first glance it appears to differ in one major way, for in programming the open source process aims to produce an artifact, the source code for the desired software. Similarly, in synthetic biology, architecture and the writing of an encyclopedia the desired end is an artifact of some sort. At least in the early stages of the Polymath Project, there was no obviously analogous artifact. It’s tempting to conclude that the two papers produced by the polymaths play this role, but I don’t think that’s quite right. In mathematics, the desired end isn’t an artifact, it’s mathematical understanding. And the Polymath process was a way of sharing that understanding openly, and gradually improving it through the contributions of many people.

The Polymath Project’s open approach to collaboration is part of a broader movement toward open science. Other prominent examples include the human genome project and the Sloan Digital Sky Survey, which use the internet to openly share data with the entire scientific community. This enables other scientists to find ingenius ways of reusing that data, often posing and answering questions radically different to those that motivated the people who originally took the data.

An example which gives the flavour of this reuse is the recent work by Boroson and Lauer, who used a computer algorithm to search through the spectra of 17,000 quasars from the Sloan Digital Sky Survey, looking for a subtle signature that they believed would indicate a pair of orbiting black holes. The result was the discovery of a candidate quasar containing a pair of supermassive black holes, 20 million and 800 million times the mass of the sun, respectively, and a third of a light year apart, orbiting one another roughly once every 100 years. This is just one of more than 3,000 papers to have cited the Sloan data, most of those papers coming from outside the Sloan collaboration.

People practicing open notebook science have carried this open data approach to its logical conclusion, sharing their entire laboratory record in real time. The Polymath Project and the open data and open notebook projects are all examples of scientists sharing information which, historically, has not been openly available, whether it be raw experimental data, observations made in a laboratory notebook, or ideas for the solution of a mathematical problem.

There is, however, a historical parallel to the early days of modern science. For example, when Galileo first observed what would later be recognized as Saturn’s rings, he sent an anagram to the astronomer Kepler so that if Kepler (or anyone else) later made the same discovery, Galileo could disclose the anagram and claim the credit. Such secretive behaviour was common at the time, and other scientists such as Huygens and Hooke also used devices such as anagrams to “publish” their discoveries. Many scientists waited decades before genuine publication, if they published at all. What changed this situation – the first open science revolution – was the gradual establishment of a link between the act of publishing a scientific discovery and the scientist’s prospects for employment. This establishment of scientific papers as a reputational currency gave scientists an incentive to share their knowledge. Today, we take this reputational currency for granted, yet it was painstakingly developed over a period of many decades in the 17th and 18th centuries. During that time community norms around authorship, citation, and attribution were slowly worked out by the scientific community.

A similar process is beginning today. Will pseudonyms such as D. H. J. Polymath become a commonplace? How should young scientists report their role in such collaborations, for purposes of job and grant applications? How should new types of scientific contribution – contributions such as data or blog comments or lab notebook entries – be valued by other scientists? All these questions and many more will need answers, if we are to take full advantage of the potential of new ways of working together to generate knowledge.

Open Architecture Democracy

The singer Avril Lavigne’s third hit was a ballad titled “I’m With You”. Let me pose what might seem a peculiar question: should the second word in her song title – “With” – be capitalized or uncapitalized? This seems a matter of small moment, but to some people it matters a great deal. In 2005 an edit war broke out on Wikipedia over whether “With” should be capitalized or not. The discussion drew in a dozen people, took more than a year to play out, and involved 4,000 words of discussion. During that time the page oscillated madly back and forth between capitalizing and not capitalizing “With”.

This type of conflict is not uncommon on Wikipedia. Other matters discussed at great length in similar edit wars include the true diameter of the Death Star in Return of the Jedi – is it 120, 160 or 900 kilometers in diameter? When one says that U2 “are a band” should that really be “U2 is a band”? Should the page for “Iron Maiden” point by default to the band or to the instrument of torture? Is Pluto really a planet? And so on.

Don’t get me wrong. Wikipedia works remarkably well, but the cost in resolving these minor issues can be very high. Let me describe for you an open source collaboration where problems like this don’t occur. It’s a programming competition run by a company called Mathworks. Twice a year every year since 1999 Mathworks has run a week-long competition involving more than one hundred programmers from all over the world. At the start of the week a programming problem is posed. A typical problem might be something like the travelling salesman problem – given a list of cities, find the shortest tour that lets you visit all of those cities. The competitors don’t just submit programs at the end of the week, they can (and do) submit programs all through the week. The reason they do this is because when they submit their program it’s immediately and automatically scored. This is done by running the program on some secret test inputs that are known only to the competition organizers. So, for example, the organizers might run the program on all the capital cities of the countries in Europe. The score reflects both how quickly the program runs, and how short a tour of the cities it finds. The score is then posted to a leaderboard. Entries come in over the whole week because kudos and occasional prizes go to people at the top of the leaderboard.

What makes this a collaboration is that programs submitted to the competition are open. Once you submit your program anyone else can come along and simply download the code you’ve just submitted, tweak a single line, and resumbit it as their own. The result is a spectacular free-for-all. Contestants are constantly “stealing” one another’s code, making small tweaks that let them leapfrog to the top of the leaderboard. Some of the contestants get hooked by the instant feedback, and work all week long. The result is that the winning entry is often fantastically good. After the first contest, in 1999, the contest co-ordinator, Ned Gulley, said: “no single person on the planet could have written such an optimized algorithm. Yet it appeared at the end of the contest, sculpted out of thin air by people from around the world, most of whom had never met before.”

Both Wikipedia and the Mathworks competition use open source patterns of development, but the difference is striking. In the Mathworks competition there is an absolute, objective measure of success that’s immediately available – the score. The score acts as a signal telling every competitor where the best ideas are. This helps the community aggregate all the best ideas into a fantastic final product.

In Wikipedia, no such objective signal of quality is available. What allows Wikipedia to function is that on most issues of contention – like whether “With” should be capitalized – there’s only a small community of interest. A treaty can be beaten out by members of that community that allows them to reach consensus and move forward. Constructing such treaties takes tremendous time and energy, and sometimes devolves into neverending flame wars, but most of the time it works okay. But while this kind of treaty-making might scale to tens or even hundreds of people, we don’t yet know how to make it scale to thousands. Agreement doesn’t scale.

Many of the crucial problems of governance have large communities of interest, and it can be very difficult to get even two people to agree on tiny points of fact, much less values. As a result, we can’t simply open source policy documents in a location where they can be edited by millions of people. But, purely as a thought experiment, imagine you had a way of automatically scoring policy proposals for their social utility. You really could set up a Policyworks where millions of people could help rewrite policy, integrating the best ideas from an extraordinarily cognitively diverse group of people.

The question I have is how we can develop tools that let us scale such a process to thousands or even millions of people? How can we get the full benefit of cognitive diversity in problem-solving, without reaching deadlock? Are there clever new ways we can devise for signalling quality in the face of incomplete or uncertain information? We know some things about how to do this in small groups: it’s the art of good facilitation and good counselling. Is it possible to develop scalable mechanisms of agreement so we can open source key problems of governance?

Let me conclude by floating a brief, speculative idea for a Policyworks. In the one minute I have left there’s not time to even begin discussing the problems with the idea, let alone potential solutions. But hopefully it contains the kernel of something interesting. The idea is to allow open editing of policy documents, in much the same way the Mathworks competition allows open editing of computer programs. But each time you make an edit, it’s sent to a randomly selected jury of your peers – say 50 of them. They’re invited to score your contribution, and perhaps offer feedback. They don’t all need to score it – just a few (say 3) is enough to start getting useful information about whether your contribution is an improvement or not. And, perhaps with some tweaking to prevent abuse, and to help ensure fair scoring, such a score might be used as a reliable way of signalling quality in the face of incomplete or uncertain information. My suspicion is that – as others have said of Wikipedia – this may be one of those ideas that works better in practice than it does in theory.

You can subscribe to my blog here.

This post is based on some brief remarks I made about open architecture democracy at the beginning of a panel on the subject, moderated by Tad Homer-Dixon, and with co-panelists Hassan Masum and Mark Tovey. One day, I hope to expand this into a much more thorough treatment.

From Waterloo to Seattle

I’m deeply engrossed in finishing my book at the moment, but wanted to mention two events which readers of this blog might enjoy hearing about, and perhaps attending.

The first event is a panel on open source democracy that’s being run at the University of Waterloo (just outside Toronto) on February 22. It’s about how and whether ideas like collective intelligence and mass collaboration will have any impact on governance in the 21st century. The panel is being run by Tad Homer-Dixon, and the panelists are Mark Tovey, Hassan Masum, and myself. After some short initial presentations it’s going to be (we hope) very interactive, with people there from a wide variety of backgrounds. I’m looking forward to it!

If you’re interested in open science, Science Commons is organizing a Science Commons Symposium on February 20, in Seattle, at the Microsoft Campus. They’ve organized a great group of speakers, and if I wasn’t chained to my desk writing I’d be on a plane to Seattle!

Update: The open source democracy panel is on Feb 22, not Feb 20, as I originally wrote.