(Some) garbage in, gold out
It’s a great question. Suppose it’s announced in the next few years that the LHC has discovered the Higgs boson. There will, no doubt, be a peer-reviewed scientific paper describing the result.
How should we regard such an announcement?
The chain of evidence behind the result will no doubt be phenomenally complex. The LHC analyses about 600 million particle collisions per second. The data analysis is done using a cluster of more than 200,000 processing cores, and tens of millions of lines of software code. That code is built on all sorts of extremely specialized knowledge and assumptions about detector and beam physics, statistical inference, quantum field theory, and so on. Whatsmore that code, like any large software package, no doubt has many bugs, despite enormous effort to eliminate bugs.
No one person in the world will understand in detail the entire chain of evidence that led to the discovery of the Higgs. In fact, it’s possible that very few (no?) people will understand in much depth even just the principles behind the chain of evidence. How many people have truly mastered quantum field theory, statistical inference, detector physics, and distributed computing?
What, then, should we make of any paper announcing that the Higgs boson has been found?
Standard pre-publication peer review will mean little. Yes, it’ll be useful as an independent sanity check of the work. But all it will show is that there’s no glaringly obvious holes. It certainly won’t involve more than a cursory inspection of the evidence.
A related situation arose in the 1980s in mathematics. It was announced in the early 1980s that an extremely important mathematical problem had been solved: the classification of the finite simple groups. The proof had taken about 30 years, and involved an effort by 100 or so mathematicians, spread across many papers and thousands of pages of proof.
Unfortunately, the original proof had gaps. Most of them were not serious. But at least one serious gap remained. In 2004, two mathematicians published a two-volume, 1,200 page supplement to the original proof, filling in the gap. (At least, we hope they’ve filled in the gap!)
When discoveries rely on hundreds of pieces of evidence or steps of reasoning, we can be pretty sure of our conclusions, provided our error rate is low, say one part in a hundred thousand. But when we start to use a million or a billion (or a trillion or more) pieces of evidence or steps of reasoning, an error rate of one part in a million
becomes a guarantee of failure, unless we develop systems that can tolerate those errors.
It seems to me that one of the core questions the scientific community will wrestle with over the next few decades is what principles and practices we use to judge whether or not a conclusion drawn from a large body of networked knowledge is correct? To put it another way, how can we ensure that we reliably come to correct conclusions, despite the fact that some of our evidence or reasoning is almost certainly wrong?
At the moment each large-scale collaboration addresses this in their own way. The people at the LHC and those responsible for the classification of finite simple groups are certainly very smart, and I’ve no doubt they’re doing lots of smart things to eliminate or greatly reduce the impact of errors. But it’d be good to have a principled way of understanding how and when we can come to correct scientific conclusions, in the face of low-level errors in the evidence and reasoning used to arrive at those errors.
If you doubt there’s a problem here, then think about the mistakes that led to the Pentium floating point bug. Or think of the loss of the Mars Climate Orbiter. That’s often described as a failure to convert between metric and imperial units, which makes it sound trivial, like the people at NASA are fools. The real problem was deeper. As a NASA official said:
People sometimes make errors. The problem here was not the error [of unit conversion], it was the failure of NASA’s systems engineering, and the checks and balances in our processes to detect the error. That’s why we lost the spacecraft.
In other words, when you’re working at NASA scale, problems that are unlikely at small scale, like failing to do a unit conversion, are certain to occur. It’s foolish to act as though they won’t happen. Instead, you need to develop systems which limit the impact of such errors.
In the context of science, what this means is that we need new methods of fault-tolerant discovery.
I don’t have well-developed answers to the questions I’ve raised above, riffing off David Weinberger’s original question. But I will finish with the notion that one useful source of ideas may be systems and safety engineering, which are responsible for the reliable performance of complex systems such as modern aircraft. According to Boeing, a 747-400 has six million parts, and the first 747 required 75,000 engineering drawings. Not to mention all the fallible human “components” in a modern aircraft. Yet aircraft systems and safety engineers have developed checks and balances that let us draw with very high probability the conclusion “The plane will get safely from point A to B”. Sounds like a promising source of insights to me!
Further reading: An intriguing experiment in distributed verification of a mathematical proof has been done in an article by Doron Zeilberger. Even if you can’t follow the mathematics, it’s stimulating to look through. I’ve taken a stab at some of the issues in this post before, in my essay Science beyond individual understanding. I’m also looking forward to David Weinberger’s new book about networked knowledge, Too Big To Know. Finally, my new book Reinventing Discovery is about the promise and the challenges of networked science.