Doing science online

This post is the text for an invited after-dinner talk about doing science online, given at the banquet for the Quantum Information Processing 2009 conference, held in Santa Fe, New Mexico, January 12-16, 2009.

Good evening.

Let me start with a few questions. How many people here tonight know what a blog is?

How many people read blogs, say once every week or so, or more often?

How many people actually run a blog themselves, or have contributed to one?

How many people read blogs, but won’t admit it in polite company?

Let me show you an example of a blog. It’s a blog called What’s New, run by UCLA mathematician Terence Tao. Tao, as many of you are probably aware, is a Fields-Medal winning mathematician. He’s known for solving many important mathematical problems, but is perhaps best known as the co-discover of the Green-Tao theorem, which proved the existence of arbitrarily long arithmetic progressions of primes.

Tao is also a prolific blogger, writing, for example, 118 blog posts in 2008. Popular stereotypes to the contrary, he’s not just sharing cat pictures with his mathematician buddies. Instead, his blog is a firehose of mathematical information and insight. To understand how valuable Tao’s blog is, let’s look at a example post, about the Navier-Stokes equations. As many of you know, these are the standard equations used by physicists to describe the behaviour of fluids, i.e., inside these equations is a way of understanding an entire state of matter.

The Navier-Stokes equations are notoriously difficult to understand. People such as Feynman, Landau, and Kolmogorov struggled for years attempting to understand their implications, mostly without much success. One of the Clay Millenium Prize problems is to prove the existence of a global smooth solution to the Navier-Stokes equations, for reasonable initial data.

Now, this isn’t a talk about the Navier-Stokes equations, and there’s far too much in Terry Tao’s blog post for me to do it justice! But I do want to describe some of what the post contains, just to give you the flavour of what’s possible in the blog medium.

Tao begins his post with a brief statement explaining what the Clay Millenium Problem asks. He shares the interesting tidibt that in two spatial dimenions the solution to the problem is known(!), and asks why it’s so much harder in three dimensions. He tells us that the standard answer is turbulence, and explains what that means, but then says that he has a different way of thinking about the problem, in terms of what he calls supercriticality. I can’t do his explanation justice here, but very roughly, he’s looking for invariants which can be used to control the behaviour of solutions to the equations at different length scales. He points out that all the known invariants give weaker and weaker control at short length scales. What this means is that the invariants give us a lot of control over solutions at long length scales, where things look quite regular, but little control at short length scales, where you see the chaotic variation characteristic of turbulence. He then surveys all the known approaches to proving global existence results for nonlinear partial differential equations — he says there are just three broad approaches – and points out that supercriticality is a pretty severe obstruction if you want to use one of these approaches.

The post has loads more in it, so let me speed this up. He describes the known invariants for the equations, and what they can be used to prove. He surveys and critiques existing attempts on the problem. He makes six suggestions for ways of attacking the problem, including one which may be interesting to some of the people in this audience: he suggests that pseudorandomness, as studied by computer scientists, may be connected to the chaotic, almost random behaviour that is seen in the solutions the Navier-Stokes equations.

The post is filled to the brim with clever perspective, insightful observations, ideas, and so on. It’s like having a chat with a top-notch mathematician, who has thought deeply about the Navier-Stokes problem, and who is willingly sharing their best thinking with you.

Following the post, there are 89 comments. Many of the comments are from well-known professional mathematicians, people like Greg Kuperberg, Nets Katz, and Gil Kalai. They bat the ideas in Tao’s post backwards and forwards, throwing in new insights and ideas of their own. It spawned posts on other mathematical blogs, where the conversation continued.

That’s just one post. Terry Tao has hundreds of other posts, on topics like Perelman’s proof of the Poincare conjecture, quantum chaos, and gauge theory. Many posts contain remarkable insights, often related to open research problems, and they frequently stimulate wide-ranging and informative conversations in the comments.

That’s just one blogger. There are, of course, many other top-notch mathematician bloggers. Cambridge’s Tim Gowers, another Fields Medallist, also runs a blog. Like Tao’s blog, it’s filled with interesting mathematical insights and conversation, on topics like how to use Zorn’s lemma, dimension arguments in combinatorics, and a thought-provoking post on what makes some mathematics particularly deep.

Alain Connes, another Fields Medallist, is also a blogger. He only posts occasionally, but when he does his posts are filled with interesting mathematical tidbits. For example, I greatly enjoyed this post, where he talks about his dream of solving one of the deepest problems in mathematics – the problem of proving the Riemann Hypothesis – using non-commutative geometry, a field Connes played a major role in inventing.

Berkeley’s Richard Borcherds, another Fields Medallist, is also a blogger, although he is perhaps better described as an ex-blogger, as he hasn’t updated in about a year.

I’ve picked on Fields Medallists, in part because at least four of the 42 living Fields Medallists have blogs. But there are also many other excellent mathematical blogs, including blogs from people closely connected to the quantum information community, like Scott Aaronson, Dave Bacon, Gil Kalai, and many others.

Let me make a few observations about blogging as a medium.

It’s informal.

It’s rapid-fire.

Many of the best blog posts contain material that could not easily be published in a conventional way: small, striking insights, or perhaps general thoughts on approach to a problem. These are the kinds of ideas that may be too small or incomplete to be published, but which often contain the seed of later progress.

You can think of blogs as a way of scaling up scientific conversation, so that conversations can become widely distributed in both time and space. Instead of just a few people listening as Terry Tao muses aloud in the hall or the seminar room about the Navier-Stokes equations, why not have a few thousand talented people listen in? Why not enable the most insightful to contribute their insights back?

You can also think of blogs as a way of making scientific conversation searchable. If you type “Navier-Stokes problem” into Google, the third hit is Terry Tao’s blog post about it. That means future mathematicians can easily benefit from his insight, and that of his commenters.

You might object that the most important papers about the Navier-Stokes problem should show up first in the search. There is some truth to this, but it’s not quite right. Rather, insofar as Google is doing its job well, the ranking should reflect the importance and significance of the respective hits, regardless of whether those hits are papers, blog posts, or some other form. If you look at this way, it’s not so surprising that Terry Tao’s blog post is near the top. As all of us know, when you’re working on a problem, a good conversation with an insightful colleague may be worth as much (and sometimes more) than reading the classic papers. Furthermore, as search engines become better personalized, the search results will better reflect your personal needs; in a search utopia, if Terry Tao’s blog post is what you most need to see, it’ll come up first, while if someone else’s paper on the Navier-Stokes problem is what you most need to see, then that will come up first.

I’ve started this talk by discussing blogs because they are familiar to most people. But ideas about doing science in the open, online, have been developed far more systematically by people who are explicitly doing open notebook science. People such as Garrett Lisi are using mathematical wikis to develop their thinking online; Garrett has referred to the site as “my brain online”. People such as chemists Jean-Claude Bradley and Cameron Neylon are doing experiments in the open, immediately posting their results for all to see. They’re developing ideas like lab equipment that posts data in real time, posting data in formats that are machine-readable, enabling data mining, automated inference, and other additional services.

Stepping back, what tools like blogs, open notebooks and their descendants enable is filtered access to new sources of information, and to new conversation. The net result is a restructuring of expert attention. This is important because expert attention is the ultimate scarce resource in scientific research, and the more efficiently it can be allocated, the faster science can progress.

How many times have you been obstructed in your research by the need to prove or disprove a small result that is a little outside your core expertise, and so would take you days or weeks, but which you know, of a certainty, the right person could resolve in minutes, if only you knew who that person was, and could easily get their attention. This may sound like a fantasy, but if you’ve worked on the right open source software projects, you’ll know that this is exactly what happens in those projects – discussion forums for open source projects often have a constant flow of messages posing what seem like tough problems; quite commonly, someone with a great comparative advantage quickly posts a clever way to solve the problem.

If new online tools offer us the opportunity to restructure expert attention, then how exactly might it be restructured? One of the things we’ve learnt from economics is that markets can be remarkably effective ways of efficiently allocating scarce resources. I’ll talk now about an interesting market in expert attention that has been set up by a company named InnoCentive.

To explain InnoCentive, let me start with an example involving an Indian not-for-profit called the ASSET India Foundation. ASSET helps at-risk girls escape the Indian sex industry, by training them in technology. To do this, they’ve set up training centres in several large cities across India. They’ve received many requests to set up training centres in smaller towns, but many of those towns don’t have the electricity needed to power technologies like the wireless routers that ASSET uses in its training centers.

On the other side of the world, in the town of Waltham, just outside Boston, is the company InnoCentive. InnoCentive is, as I said, an online market in expert attention. It enables companies like Eli Lilly and Proctor and Gamble to pose “Challenges” over the internet, scientific research problems theyâ€™d like solved, with a prize for solution, often many thousands of dollars. Anyone in the world can download a detailed description of the Challenge, and attempt to win the prize. More than 160,000 people from 175 countries have signed up for the site, and prizes for more than 200 Challenges have been awarded.

What does InnoCentive have to do with ASSET India? Well, ASSET got in touch with the Rockefeller Foundation, and explained their desire for a low-cost solar-powered wireless router. Rockefeller put up 20,000 in prize money to post an InnoCentive Challenge to design a suitable wireless router. The Challenge was posted for two months at InnoCentive. 400 people downloaded the Challenge, and 27 people submitted solutions. The prize was awarded to a 31-year old Texan software engineer named Zacary Brown, who delivered exactly the kind of design that ASSET was looking for; a prototype is now being built by engineering students at the University of Arizona.

Let’s come back to the big picture. These new forms of contribution – blogs, wikis, online markets and so forth – might sound wonderful, but you might reasonably ask whether they are a distraction from the real business of doing science? Should you blog, as a young postdoc trying to build up a career, rather than writing papers? Should you contribute to Wikipedia, as a young Assistant Professor, when you could be writing grants instead? Crucially, why would you share ideas in the manner of open notebook science, when other people might build on your ideas, maybe publishing papers on the subjects you’re investigating, but without properly giving you credit?

In the short term, these are all important questions. But I think a lot of insight into these questions can be obtained by thinking first of the long run.

At the beginnning of the 17th century, Galileo Galilei constructed the first astronomical telescope, looked up at the sky, and turned his new instrument to Saturn. He saw, for the first time in human history, Saturn’s astonishing rings. Did he share this remarkable discovery with the rest of the world? He did not, for at the time that kind of sharing of scientific discovery was unimaginable. Instead, he announced his discovery by sending a letter to Kepler and several other early scientists, containing a latin anagram, “smaismrmilmepoetaleumibunenugttauiras”. When unscrambled this may be translated, roughly, as “I have discovered Saturn three-formed”. The reason Galileo announced his discovery in this way was so that he could establish priority, should anyone after him see the rings, while avoiding revealing the discovery.

Galileo could not imagine a world in which it made sense for him to freely share a discovery like the rings of Saturn, rather than hoarding it for himself. Certainly, he couldn’t share the discovery in a journal article, for the journal system was not invented until more than 20 years after Galileo died. Even then, journals took decades to establish themselves as a legitimate means of sharing scientific discoveries, and many early scientists looked upon journals with some suspicion. The parallel to the suspicion many scientists have of online media today is striking.

Think of all the knowledge we have, which we do not share. Theorists hoard clever observations and questions, little insights which might one day mature into a full-fledged paper. Entirely understandably, we hoard those insights against that day, doling them out only to trusted friends and close colleagues. Experimentalists hoard data; computational scientists hoard code. Most scientists, like Galileo, can’t conceive of a world in which it makes sense to share all that information, in which sharing information on blogs, wikis, and their descendents is viewed as being (potentially, at least) an important contribution to science.

Over the short term, things will only change slowly. We are collectively very invested in the current system. But over the long run, a massive change is, in my opinion, inevitable. The advantages of change are simply too great.

There’s a story, almost certainly apocryhphal, that the physicist Michael Faraday was approached after a lecture by Queen Victoria, and asked to justify his research on electricity. Faraday supposedly replied “Of what use is a newborn baby?”

Blogs, wikis, open notebooks, InnoCentive and the like aren’t the end of online innovation. They’re just the beginning. The coming years and decades will see far more powerful tools developed. We really will enormously scale up scientific conversation; we will scale up scientific collaboration; we will, in fact, change the entire architecture of expert attention, developing entirely new ways of navigating data, making connections and inferences from data, and making connections between people.

When we look back at the second half of the 17th century, it’s obvious that one of the great changes of the time was the invention of modern science. When historians look back at the early part of the twentyfirst century, they will also see several major changes. I know many of you in this room believe that one of those changes will be related to the sustainability of how humans live on this planet. But I think there are at least two other major historical changes. The first is the fact that this is the time in history when the world’s information is being transformed from an inert, passive, widely separated state, and put into a single, unified, active system that can make connections, that brings that information alive. The world’s information is waking up.

The second of those changes, closely related to the first, is that we are going to change the way scientists work; we are going to change the way scientists share information; we are going to change the way expert attention itself is allocated, developing new methods for connecting people, for organizing people, for leveraging people’s skills. They will be redirected, organized, and amplified. The result will speed up the rate at which discoveries are made, not in one small corner of science, but across all of science.

Quantum information and computation is a wonderful field. I was touched and surprised by the invitation to speak tonight. I have, I think, never felt more honoured in my professional life. But, I trust you can understand when I say that I am also tremendously excited by the opportunities that lie ahead in doing science online.