Science beyond individual understanding

Two years after the breakup of the Soviet Union, British economist Paul Seabright was talking with a senior Russian official who was visiting the UK to learn about the free market. “Please understand that we are keen to move towards a market system,” the official said, “But we need to understand the fundamental details of how such a system works. Tell me, for example: who is in charge of the supply of bread to the population of London?” [1]

The familiar but still astonishing answer to this question is that in a market economy, everyone is in charge. As the market price of bread goes up and down, it informs our collective behaviour: whether to plant a new wheat field, or leave it fallow; whether to open that new bakery you’ve been thinking about opening on the corner; or simply whether to buy two or three loaves of bread this week. The price thus aggregates an enormous amount of what would otherwise be hidden knowledge from all the people interested in the production or consumption of bread, that is, nearly everyone. By using prices to aggregate this knowledge and inform further actions, the market produces outcomes superior to even the brightest and best informed individuals.

Unfortunately, markets don’t always aggregate knowledge accurately. When participants in a market are mistaken in systematic ways, markets don’t so much aggregate knowledge as they aggregate misunderstanding. The result can be an enormous collective error in judgement; when the misjudgement is revealed, the market crashes.

My subject in this essay is not economics, it’s science. So what’s all this got to do with science?

The connection involves the question of what it means to understand something. In economics, many basic facts, such as prices, have an origin which isn’t completely understood by any single person, no matter how bright or well informed, because none of those people have access to all the hidden knowledge that determines those prices.

By contrast, until quite recently the complete justification for even the most complex scientific facts could be understood by a single person.

Consider, for example, astronomer Edwin Hubble’s discovery in the 1920s of the expansion of the Universe. By the standards of the time, this was big science, requiring a complex web of sophisticated scientific ideas and equipment – an advanced telescope, spectroscopic equipment, and even Einstein’s special theory of relativity. To understand all those things in detail requires years of hard work, but a dedicated person like Hubble could master it all, and so in some sense he completely understood his own discovery of the expansion of the Universe.

Science is no longer so simple; many important scientific facts now have justifications that are beyond the comprehension of a single person.

For example, in 1983 mathematicians announced the solution of an important longstanding mathematical problem, the classification of the finite simple groups. The work on this mathematical proof extended between 1955 and 1983, and required approximately 500 journal articles by 100 mathematicians. Many minor gaps were subsequentely found in the proof, and at least one serious gap, now thought (by some) to be resolved; the resolution involved a two-volume, 1300-page supplement to the proof. Although mathematicians are working to simplify the proof, even the simplified proof is expected to be exceedingly complex, beyond the grasp of any single person.

The understanding of results from the Large Hadron Collider (LHC) will be similarly challenging, requiring a deep knowledge of elementary particle physics, many clever ideas in the engineering of the accelerator and the particle detectors, and complex algorithms and statistical techniques. No single person understands all of this, except in broad detail. If the discovery of the Higgs particle is announced next year, there won’t be any single person in the world who can say “I understand how we discovered this” in the same way Hubble understood how he discovered the expansion of the Universe. Instead, there will be a large group of people who collectively claim to understand all the separate pieces that go into the discovery, and how those pieces fit together.

Two clarifications are in order. First, when I say that these are examples of scientific facts beyond individual understanding, I’m not saying a single person can’t understand the meaning of the facts. Understanding what the Higgs particle is requires several years hard work, but there are many people in the world who’ve done this work and who have a solid grasp of what the Higgs is. I’m talking about a deeper type of understanding, the understanding that comes from understanding the justification of the facts.

Second, I don’t mean that to understand something you need to have mastered all the rote details. If we require that kind of mastery, then there’s no one person who understands the human genome, for certainly no-one has memorized the entire DNA sequence. But there are people who understand deeply all the techniques used to determine the human genome; all that is missing from their understanding is the rote work identifying all the DNA base pairs. The examples of the LHC and the classification of the finite simple groups go beyond this, for in both cases there are many distinct deep ideas involved, too many to be mastered by any single person.

Science as complex as the LHC and the classification of finite simple groups is a recent arrival on the historical scene. But there are two forces that will soon make science beyond individual understanding far more common.

The first of these forces is rapid internet-fueled growth in the number of large-scale scientific collaborations. In the short term, these collaborations will mostly just crowdsource rote work, as is being done, for example, by the galaxy classification project Galaxy Zoo, and so the results will pose no challenge to individual understanding. But as the collaborations get more sophisticated we can expect to see many more online collaborations that delegate large amounts of specialized work, building up to a whole whose details aren’t fully understood by any single person.

The second of these forces is the use of computers to do scientific work. A nascent example is the proof of the four-colour theorem in mathematics. A small group of mathematicians outlined a proof, but to complete the proof, they had to check a large number of cases of the theorem, more than they could check by hand. Instead, a computer was used to check those cases. This isn’t an instance of science beyond individual understanding, though, because mathematicians familiar with the proof feel the computer was simply doing rote work. But the people doing computational science are getting cleverer in how they use computers to make discoveries. Machine learning, data mining and artificial intellgience techniques are being used in increasingly sophisticated ways to produce real insights, not just rote work. As the techniques get better, the number of insights found will increase, and we can expect to see examples of science beyond individual understanding generated this way: “I don’t understand how this discovery was made, but my computer and I do together”.

More powerful than either of these forces will be their combination: large-scale computer-assisted collaboration. The discoveries from such collaboration may well not be understood by any single individual, or even by a group. Instead, it will reside inside a combination of the group and their networked computers.

Such scientific discoveries raise challenging issues. How do we know whether they’re right or wrong? The traditional process of peer review and the criterion of reproducibility work well when experiments are cheap, and one scientist can explain to another what was done. But they don’t work so well as experiments get more expensive, when no one person fully understands how an experiment was done, and when experiments and their analyses involve reams of data or ideas.

Might we one day find ourselves in a situation like in a free market where systematic misunderstandings can infect our collective conclusions? How can we be sure the results of large-scale collaborations or computing projects are reliable? Are there results from this kind of science that are already widely believed, maybe even influencing public policy, but are, in fact, wrong?

These questions bother me a lot. I believe wholeheartedly that new tools for online collaboration are going to change and improve how science is done. But such collaborations will be no good if we can’t assess the reliability of the results. And it would disastrous if erroneous results were to have a major impact on public policy. We’re in for a turbulent and interesting period as scientists think through what’s needed to arrive at reliable scientific conclusions in the age of big collaborations.

Acknowledgements

Thanks to Jen Dodd for providing feedback that greatly improved an early draft of this essay. The essay was stimulated in part by the discussion during Kevin Kelly’s session at Science Foo Camp 2008. Thanks to all the participants in that discussion.

Footnote

[1] “Who is in charge of the supply of bread to the population of London?” – see Paul Seabright’s The Company of Strangers.

In general, I am no great fan of business books, but one outstanding exception is Burgelman and Grove Strategy Is Destiny: How Strategy-Making Shapes a Company’s Future (the company referred to is Intel).

By substituting “scientific discipline” for “company”, Burgelman and Grove’s book can be read as an extended case study of models for scientific development that (read with though) is largely consistent with the ideas of Michael’s essay.

To give one example of the influence this book has on our UW QSE Group’s research strategy, Burgelman and Grove document that Intel took care to maintain internal control of (1) process technology and (2) design tools. This is simply the common-sense strategy that Shaquille O’Neal memorably expressed as “A person has to control their own cartoon.”

From a purely factual point of view, there’s not much in Intel’s process technology and design tools that’s not in the peer-reviewed scientific literature. Where Intel has displayed unmatched skill is (in Michael’s phrase) understanding the literature.

An important question for 21st century science, therefore, is “to what extent (if any) can Intel-style integrated scientific understanding exist as a shared resource, rather than held in private hands?”

Our QSE Group takes the point of view that the LAPACK family of codes is an outstanding example of a shared resource (it embodies an integrated mathematical understanding of linear algebra). It is easy to take this resource for granted, but if LAPACK and its derivatives suddenly disappeared, almost all scientific software would cease to operate.

Are there any 21st century integrative opportunities that are similarly exciting? To use Shaquille O’Neal’s phrase — What “cartoons” are most important for us scientists to control?

Well … we scientists presently have *many* cartoons in-development … in pretty much every area of science. The Human Genome Project, the Digital Sky Survey, and the Large Hadron Collider are just the beginning. Broadly speaking, these immense enterprises reflect a trend in which 20th century scientific traditions of experiment and theory are evolving into larger-scale 21st century enterprises centered upon observation and simulation. And there is no obvious bound to the scale of enterprises that humanity can now contemplate attempting.

I don’t too often express definite ethical opinions … for me the world has a lot of “grey” in it … but here I will express the personal opinion that the scientific community has an important obligation to keep key process technologies and simulation tools open … because this is a logical extension of our present commitment to keep the scientific literature open.

That is why (after some discussion) our UW QSE Group now releases our QSEPACK simulation tools under the GPL licenses. As with LAPACK in the twentieth century, we take the view that simulation tools in the twenty-first century will be an important venue for expressing our integrated understanding the scientific literature.

It follows (in our view) that basic simulation tools should continue to be as open as the scientific literature itself.

12 comments

Bee says:

September 24, 2008 at 3:40 pm

Wise words, Michael. We might already be in such a situation “where systematic misunderstandings can infect our collective conclusions”, even in sciences. How would we find out before the collapse? Any suggestion?

Social issues can become very important in these cases. I just yesterday talked to somebody who told me about the case of the IKB (one of the German banks who suffered from the mortgage crisis). Apparently, they had mathematicians warning that the models used for the risk analysis were inappropriate. As a result, these people were fired, the argument being ‘everybody does it, it has to be right’. That kind of thinking is a huge problem which requires we pay a lot of attention to the way argumentation are lead. It also requires that we take sufficient time to learn what can be learned.

Here is a related concern I have been worried about: Consider we have this growing body of knowledge, and more information is added every day. Now you say we might arrive at a point where no single person can understand it all. But you seem to assume the single persons together still connect it all. What if it falls apart? What if we simply add pieces of information too fast to assemble it to useful and coherent knowledge? Or is this already the case? Consider the unfortunate gap we have between the social and natural sciences, which makes exactly this kind of problem so hard to communicate.
Pingback: CoreEcon » Blog Archive » Understanding Science
hal says:

September 24, 2008 at 7:47 pm

I think collective misunderstanding is already happening and having an impact. In paper reviewing and grant proposal reviewing, I’ve come across other reviewers saying things like “so-and-so tried this and it doesn’t work.” It’s something that you can try to fight against, but it still happens. Similar things happen on the positive side as well. Incorrect or misleading results are published, people remember the take away message but not the details, and then contradictory evidence is often ignored (because that’s how humans behave, or so my psychologist friends tell me). I agree that it may become worse in the future because we’ll cease to understand everything individually, but I think it’s already happening for the reason of misleading information.
John Sidles says:

September 27, 2008 at 6:55 am

In general, I am no great fan of business books, but one outstanding exception is Burgelman and Grove Strategy Is Destiny: How Strategy-Making Shapes a Company’s Future (the company referred to is Intel).

By substituting “scientific discipline” for “company”, Burgelman and Grove’s book can be read as an extended case study of models for scientific development that (read with though) is largely consistent with the ideas of Michael’s essay.

To give one example of the influence this book has on our UW QSE Group’s research strategy, Burgelman and Grove document that Intel took care to maintain internal control of (1) process technology and (2) design tools. This is simply the common-sense strategy that Shaquille O’Neal memorably expressed as “A person has to control their own cartoon.”

From a purely factual point of view, there’s not much in Intel’s process technology and design tools that’s not in the peer-reviewed scientific literature. Where Intel has displayed unmatched skill is (in Michael’s phrase) understanding the literature.

An important question for 21st century science, therefore, is “to what extent (if any) can Intel-style integrated scientific understanding exist as a shared resource, rather than held in private hands?”

Our QSE Group takes the point of view that the LAPACK family of codes is an outstanding example of a shared resource (it embodies an integrated mathematical understanding of linear algebra). It is easy to take this resource for granted, but if LAPACK and its derivatives suddenly disappeared, almost all scientific software would cease to operate.

Are there any 21st century integrative opportunities that are similarly exciting? To use Shaquille O’Neal’s phrase — What “cartoons” are most important for us scientists to control?

Well … we scientists presently have *many* cartoons in-development … in pretty much every area of science. The Human Genome Project, the Digital Sky Survey, and the Large Hadron Collider are just the beginning. Broadly speaking, these immense enterprises reflect a trend in which 20th century scientific traditions of experiment and theory are evolving into larger-scale 21st century enterprises centered upon observation and simulation. And there is no obvious bound to the scale of enterprises that humanity can now contemplate attempting.

I don’t too often express definite ethical opinions … for me the world has a lot of “grey” in it … but here I will express the personal opinion that the scientific community has an important obligation to keep key process technologies and simulation tools open … because this is a logical extension of our present commitment to keep the scientific literature open.

That is why (after some discussion) our UW QSE Group now releases our QSEPACK simulation tools under the GPL licenses. As with LAPACK in the twentieth century, we take the view that simulation tools in the twenty-first century will be an important venue for expressing our integrated understanding the scientific literature.

It follows (in our view) that basic simulation tools should continue to be as open as the scientific literature itself.
misanthropope says:

October 27, 2008 at 4:03 am

it is pretty rare when i get to be the voice of optimism.

systematic misunderstandings have _always_ infected our collective conclusions. culture can pretty much be defined as ‘systematic misunderstandings’.

but as you pointed out, being able to apply a theory to make a testable prediction is much easier than fully appreciating the genesis of that theory. working backwards from a falsified prediction to find the failure in the nuts and bolts is likewise fairly approachable.
Pingback: rianjs.net » A little morning reading
Al Gored says:

April 21, 2011 at 1:24 am

“Might we one day find ourselves in a situation like in a free market where systematic misunderstandings can infect our collective conclusions?”

That day arrived some time ago.

And the use of the term “free market” is rather quaint, but useful for this discussion.

Perhaps relevant, the “free market” consensus is often wrong, sometimes spectacularly wrong.
Erl Happ says:

April 21, 2011 at 11:52 am

There is no substitute for a synthesis that arises from a broad overview. So the work of people like Keynes, Marx and the like is vital. Had Keynes persuaded those who set up the system that governs the manner in which international exchange is conducted (after WW2) to be less selfish in their approach, we may have avoided the global financial crisis that is with us now. The overhang of US national debt and the imbalance in the trade account are a product of men with small minds, limited imaginations and utterly selfish viewpoints.

One can not afford to get bogged down in the detail. The narrowing of the focus of secondary and tertiary education is dangerous.

The story about the supply of bread is apt. Economics is an integrative discipline and we need an education for the best and brightest with that broad focus.
Pingback: (Some) garbage in, gold out | Michael Nielsen
Pingback: The ‘Scarce Talent’ Con « Disseminus
Pingback: Metrics Remixed: The Times They Are A Webby | InTechWeb Blog
Pingback: Hidden knowledge | Climate Etc.

Comments are closed.

Acknowledgements

Further reading

Footnote

12 comments