Biweekly links for 09/29/2008

Click here for all of my del.icio.us bookmarks.

Published

Biweekly links for 09/26/2008

Click here for all of my del.icio.us bookmarks.

Published

Science beyond individual understanding

Two years after the breakup of the Soviet Union, British economist Paul Seabright was talking with a senior Russian official who was visiting the UK to learn about the free market. “Please understand that we are keen to move towards a market system,” the official said, “But we need to understand the fundamental details of how such a system works. Tell me, for example: who is in charge of the supply of bread to the population of London?” [1]

The familiar but still astonishing answer to this question is that in a market economy, everyone is in charge. As the market price of bread goes up and down, it informs our collective behaviour: whether to plant a new wheat field, or leave it fallow; whether to open that new bakery you’ve been thinking about opening on the corner; or simply whether to buy two or three loaves of bread this week. The price thus aggregates an enormous amount of what would otherwise be hidden knowledge from all the people interested in the production or consumption of bread, that is, nearly everyone. By using prices to aggregate this knowledge and inform further actions, the market produces outcomes superior to even the brightest and best informed individuals.

Unfortunately, markets don’t always aggregate knowledge accurately. When participants in a market are mistaken in systematic ways, markets don’t so much aggregate knowledge as they aggregate misunderstanding. The result can be an enormous collective error in judgement; when the misjudgement is revealed, the market crashes.

My subject in this essay is not economics, it’s science. So what’s all this got to do with science?

The connection involves the question of what it means to understand something. In economics, many basic facts, such as prices, have an origin which isn’t completely understood by any single person, no matter how bright or well informed, because none of those people have access to all the hidden knowledge that determines those prices.

By contrast, until quite recently the complete justification for even the most complex scientific facts could be understood by a single person.

Consider, for example, astronomer Edwin Hubble’s discovery in the 1920s of the expansion of the Universe. By the standards of the time, this was big science, requiring a complex web of sophisticated scientific ideas and equipment – an advanced telescope, spectroscopic equipment, and even Einstein’s special theory of relativity. To understand all those things in detail requires years of hard work, but a dedicated person like Hubble could master it all, and so in some sense he completely understood his own discovery of the expansion of the Universe.

Science is no longer so simple; many important scientific facts now have justifications that are beyond the comprehension of a single person.

For example, in 1983 mathematicians announced the solution of an important longstanding mathematical problem, the classification of the finite simple groups. The work on this mathematical proof extended between 1955 and 1983, and required approximately 500 journal articles by 100 mathematicians. Many minor gaps were subsequentely found in the proof, and at least one serious gap, now thought (by some) to be resolved; the resolution involved a two-volume, 1300-page supplement to the proof. Although mathematicians are working to simplify the proof, even the simplified proof is expected to be exceedingly complex, beyond the grasp of any single person.

The understanding of results from the Large Hadron Collider (LHC) will be similarly challenging, requiring a deep knowledge of elementary particle physics, many clever ideas in the engineering of the accelerator and the particle detectors, and complex algorithms and statistical techniques. No single person understands all of this, except in broad detail. If the discovery of the Higgs particle is announced next year, there won’t be any single person in the world who can say “I understand how we discovered this” in the same way Hubble understood how he discovered the expansion of the Universe. Instead, there will be a large group of people who collectively claim to understand all the separate pieces that go into the discovery, and how those pieces fit together.

Two clarifications are in order. First, when I say that these are examples of scientific facts beyond individual understanding, I’m not saying a single person can’t understand the meaning of the facts. Understanding what the Higgs particle is requires several years hard work, but there are many people in the world who’ve done this work and who have a solid grasp of what the Higgs is. I’m talking about a deeper type of understanding, the understanding that comes from understanding the justification of the facts.

Second, I don’t mean that to understand something you need to have mastered all the rote details. If we require that kind of mastery, then there’s no one person who understands the human genome, for certainly no-one has memorized the entire DNA sequence. But there are people who understand deeply all the techniques used to determine the human genome; all that is missing from their understanding is the rote work identifying all the DNA base pairs. The examples of the LHC and the classification of the finite simple groups go beyond this, for in both cases there are many distinct deep ideas involved, too many to be mastered by any single person.

Science as complex as the LHC and the classification of finite simple groups is a recent arrival on the historical scene. But there are two forces that will soon make science beyond individual understanding far more common.

The first of these forces is rapid internet-fueled growth in the number of large-scale scientific collaborations. In the short term, these collaborations will mostly just crowdsource rote work, as is being done, for example, by the galaxy classification project Galaxy Zoo, and so the results will pose no challenge to individual understanding. But as the collaborations get more sophisticated we can expect to see many more online collaborations that delegate large amounts of specialized work, building up to a whole whose details aren’t fully understood by any single person.

The second of these forces is the use of computers to do scientific work. A nascent example is the proof of the four-colour theorem in mathematics. A small group of mathematicians outlined a proof, but to complete the proof, they had to check a large number of cases of the theorem, more than they could check by hand. Instead, a computer was used to check those cases. This isn’t an instance of science beyond individual understanding, though, because mathematicians familiar with the proof feel the computer was simply doing rote work. But the people doing computational science are getting cleverer in how they use computers to make discoveries. Machine learning, data mining and artificial intellgience techniques are being used in increasingly sophisticated ways to produce real insights, not just rote work. As the techniques get better, the number of insights found will increase, and we can expect to see examples of science beyond individual understanding generated this way: “I don’t understand how this discovery was made, but my computer and I do together”.

More powerful than either of these forces will be their combination: large-scale computer-assisted collaboration. The discoveries from such collaboration may well not be understood by any single individual, or even by a group. Instead, it will reside inside a combination of the group and their networked computers.

Such scientific discoveries raise challenging issues. How do we know whether they’re right or wrong? The traditional process of peer review and the criterion of reproducibility work well when experiments are cheap, and one scientist can explain to another what was done. But they don’t work so well as experiments get more expensive, when no one person fully understands how an experiment was done, and when experiments and their analyses involve reams of data or ideas.

Might we one day find ourselves in a situation like in a free market where systematic misunderstandings can infect our collective conclusions? How can we be sure the results of large-scale collaborations or computing projects are reliable? Are there results from this kind of science that are already widely believed, maybe even influencing public policy, but are, in fact, wrong?

These questions bother me a lot. I believe wholeheartedly that new tools for online collaboration are going to change and improve how science is done. But such collaborations will be no good if we can’t assess the reliability of the results. And it would disastrous if erroneous results were to have a major impact on public policy. We’re in for a turbulent and interesting period as scientists think through what’s needed to arrive at reliable scientific conclusions in the age of big collaborations.

Acknowledgements

Thanks to Jen Dodd for providing feedback that greatly improved an early draft of this essay. The essay was stimulated in part by the discussion during Kevin Kelly’s session at Science Foo Camp 2008. Thanks to all the participants in that discussion.

Further reading

This essay is adapted from a book I’m currently working on about “The Future of Science”. The basic thesis is described here, and there’s an extract here. If you’d like to be notified when the book is available, please send a blank email to the.future.of.science@gmail.com with the subject “subscribe book”. You’ll be emailed to let you know when the book is to be published; your email address will not be used for any other purpose.

Subscribe to my blog here.

You may enjoy some of my other essays.

Footnote

[1] “Who is in charge of the supply of bread to the population of London?” – see Paul Seabright’s The Company of Strangers.

Published
Categorized as Essays

Biweekly links for 09/22/2008

Click here for all of my del.icio.us bookmarks.

Published

Biweekly links for 09/19/2008

Click here for all of my del.icio.us bookmarks.

Published

Biweekly links for 09/16/2008

  • Ellen Roche
    • Heartbreaking: “Ellen Roche was a healthy 24 year old lab technician at the Johns Hopkins (JH) Asthma Center. She volunteered to take part in an experiment to understand the natural defenses of healthy people against asthma. Roche was part of a group that inhaled hexamethonium, a drug which induced a mild asthma attack. Physicians stood by in case of complications and to measure how the subjects responded to the asthma attack. Within 24 hours of inhaling the drug, Roche had lost one-third of her lung capacity. Within a month she was dead… Dr. Alkis Togias, the director of the
      experiment, apparently limited his hexamethonium research to one contemporary textbook and PubMed… PubMed is a premier example of FOS, a contender for FOS at its best. So does the Ellen Roche case prove that FOS is inadequate, even hazardous? How just is this interpretation? What are the lessons of this case for FOS?”
  • AbÅ« Rayhān BÄ«rÅ«nÄ« – Wikipedia, the free encyclopedia
    • Extraordinary Persian polymath of the 11th century.
  • Caveat Lector » What do we want from IRs, and what are we doing to repository rats?
    • Dorothea Salo on the future of Institutional Repositories.
  • Confessions of a Science Librarian: Science in the 21st Century reading list
    • Really great list of books to read from John Dupuis.
  • A Blog Around The Clock : ScienceOnline’09 – Registration is Open!
    • Building on the very successful Science Blogging 2007 and 2008 events.
  • John Graham-Cumming: Dear Nature
    • “…you want to sell it [the paper] to me for $32. How do you justify selling a PDF of a 76 year old paper that contains just over 700 words for $32?”
  • Paint your roof white, save the planet – Machinist
    • The effects of making urban surfaces white are, apparently, significant. I’m not sure I buy this – the same argument should should that road surfaces have a major greenhouse effect, but I haven’t run the numbers.
  • Uncertain Principles: A Longitudinal Study of Blogging Traffic
    • Chad compares blog traffic for science vs non-science posts, and how they do over time. Do the science posts have greater staying power or not? The data are ambiguous, but if there’s an effect, it’s not large.
  • BarCamp Africa
  • iamelgringo: Mechanical Turk: Now with 25 percent more Awesome.
    • Using crowdsourcing to do data analysis.
  • gapingvoid: good ideas have lonely childhoods
  • Spore’s Piracy Problem – Forbes.com
    • Well, I was all set to go buy Spore this morning. But judging from the reviews the DRM in the game is getting, I think not.
  • Dancing death
    • “Sometime in mid-July 1518, in the city of Strasbourg, a woman stepped into the street and started to dance. She was still dancing several days later. Within a week about 100 people had been consumed by the same irresistible urge to dance.”
  • Peter Suber: More on the arguments to overturn the NIH [open access] policy
  • Internet Bots: Anatomy of a Stock Selling Frenzy
    • Much more about the massive automated United Airlines stock selloff triggered by Google News. Fascinating.
  • How long would it take the LHC to defrost a pizza?: Scientific American Blog
    • Scientific American tackles the big questions.
  • Google News and United Airlines’ share price
    • UAL lost 75% of its market cap over 15 minutes. It’s unclear what happened, but it looks like Google News may have played some role in driving the crash.
  • Has the Large Hadron Collider destroyed the earth yet?
    • Helpful.
  • Uncertain Principles: Micro-Blogging Conference Talks
    • Chad Orzel on the benefits of multiple people simultaneously micro-blogging conference talks.
  • FriendFeed room for “Science in the 21st Century”
  • Science in the 21st Century Talks
    • Video and slides for the talks.
  • Prospects in Theoretical Physics (PiTP) – 2008 | Video Lectures
  • When Academia Puts Profit Ahead of Wonder – NYTimes.com
    • About the Bayh-Dole act, one of the most important pieces of legislation in the 20th century.
  • Mememoir: Wiki For Science
  • Terry Tao’s blog book
  • Twitter / cern
    • Guess who has a Twitter feed?
  • PLoS ONE: Targeted Development of Registries of Biological Parts
    • An analysis of useage patterns in the MIT’s Registry of Standard Biological Parts, which is a prototype for open source science.
  • CIA, FBI push ‘Facebook for spies’ – CNN
    • ‘”It’s a place where not only spies can meet but share data they’ve never been able to share before,” Wertheimer [assistant deputy director of national intelligence for analysis] said. “This is going to give them for the first time a chance to think out loud, think in public amongst their peers…’
  • Cosma Shalizi: Collective Cognition
    • A wonderful collection of links on collective cognition.
  • Mark Newman: The first-mover advantage in scientific publication
    • “Mathematical models of the scientific citation process predict a strong “first-mover” effect under which the first papers in a field will, essentially regardless of content, receive citations at a rate enormously higher than papers published later. Moreover papers are expected to retain this advantage in perpetuity – they should receive more citations indefinitely, no matter how many other papers are published after them. We test this conjecture against data from a selection of fields and in several cases find a first-mover effect of a magnitude similar to that predicted by the theory. Were we wearing our cynical hat today, we might say that the scientist who wants to become famous is better off — by a wide margin — writing a modest paper in next year’s hottest field than an outstanding paper in this year’s. On the other hand, there are some papers…that buck the trend and attract significantly more citations than theory predicts despite having relatively late publication dates…”
  • Rob Carlson :: “Biology is Technology”
    • Draft chapters of Rob Carlson’s book on synthetic biology.
  • Brad DeLong and Michael Froomkin: Speculative Microeconomics for Tomorrow’s Economy
    • “[the paper deconstructs] Adam Smith’s case for the market system. It points out three assumptions about production and distribution technologies that are necessary if the invisible hand is to work as Adam Smith claimed it did. We point out that these assumptions are being undermined more and more by the revolutions currently ongoing in data processing and data communications. “
  • LiveScience: Era of Scientific Secrecy Near End
    • An article on open science for a general audience.
  • Augmented Social Cognition: Long Tail of user participation in Wikipedia
    • The (very well-known) blog post which describes the distribution of user edits in Wikipedia. Based on an academic paper, but this observation seems to have been made _after_ the paper the author wrote was finalized, so it’s not actually in the paper.
  • William James – The PhD Octopus
    • “America is thus a nation rapidly drifting towards a state of things in which no man of science or letters will be accounted respectable unless some kind of badge or diploma is stamped upon him, and in which bare personality will be a mark of outcast estate. It seems to me high time to rouse ourselves to consciousness, and to cast a critical eye upon this decidedly grotesque tendency. Other nations suffer terribly from the Mandarin disease. Are we doomed to suffer like the rest? “
  • The world needs more foxes and fewer hedgehogs
    • Philip Tetlock, who has spent 20 years asking pundits to predict who will win elections, what countries will acquire nuclear weapons or enter the European Union and how the first Gulf war would end… his respondents are not very good. They do better than a chimp who answers at random, but not much, and worse than simple forecasting rules based on extrapolation. But some pundits are better than others. A little knowledge is helpful. Dilettantes – people with the information you will acquire from diligent reading of this newspaper – do much better than undergraduates who based their judgment on a one-page summary of the issues. But experts have little advantage over dilettantes. The reputation of the experts is a guide to which are worth following. But not in the way you might expect. Bad forecasters are consulted more frequently than good ones. The more famous the expert, the worse his prognostications. “
  • Freebase Parallax
    • Very interesting application capable of extracting complex information from Freebase. The video demo is worth watching.
  • Peering into PLoS One comment stats : Deepak Singh
    • Lots of statistics about PLoS One’s experiment with commenting.
  • Vernor Vinge’s View of the Future – Is Technology That Outthinks Us a Partner or a Master ? – John Tierney
  • Offloading Cognition onto Cognitive Technology: Itiel Dror and Stevan Harnad
    • “Cognitive technology allows cognizers to offload some of the functions they would otherwise have had to execute with their own brains and bodies alone; it also extends cognizers’ performance powers beyond those of brains and bodies alone. Language itself is a form of cognitive technology that allows cognizers to offload some of their brain functions onto the brains of other cognizers. Language also extends cognizers’ individual and joint performance powers, distributing the load through interactive and collaborative cognition. Reading, writing, print, telecommunications and computing further extend cognizers’ capacities. And now the web, with its distributed network of cognizers, digital databases and sofware agents, has become the Cognitive Commons in which cognizers and cognitive technology can interact globally with a speed, scope and degree of interactivity that yield performance powers inconceivable with unaided individual cognition alone. “
  • American lawbreaking: Tim Wu – Slate Magazine
    • “The importance of understanding why and when we will tolerate lawbreaking cannot be overstated. Lawyers and journalists spend most of their time watching the president, Congress, and the courts as they make law. But tolerance of lawbreaking constitutes one of the nation’s other major—yet most poorly understood—ways of creating social and legal policy. Almost as much as the laws that we enact, the lawbreaking to which we shut our eyes reflects how tolerant U.S. society really is to individual or group difference. It forms a major part of our understanding of how the nation deals with what was once called “vice.” While messy, strange, hypocritical, and in a sense dishonest, widespread tolerance of lawbreaking forms a critical part of the U.S. legal system as it functions. “
  • Dani Rodrik’s weblog: Why the econ-blogosphere is here to stay
    • “one of the unexpected scholarly benefits of having a blog is that it is like keeping an intellectual journal. You get an idea, you jot it down in your blog. Some months later, you vaguely remember having had the idea and you google your own blog to recover it. I am not kidding: I google my own blog all the time… “
  • The Value of Openness in Scientific Problem Solving
    • “Openness and free information sharing…are supposed to be core norms of the scientific community… these norms are not universally followed. Lack of openness and transparency means… problem solving is constrained to a few scientists… who typically fail to leverage the entire accumulation of scientific knowledge… We present evidence of the efficacy of problem solving when disclosing problem information. The method’s application to 166 discrete scientific problems from the research laboratories of 26 firms is illustrated. Problems were disclosed to over 80,000 independent scientists… approach solved one-third of a sample of problems that large…R & D-intensive firms had been unsuccessful in solving internally… success was…associated with the ability to attract specialized solvers with…diverse scientific interests…. successful solvers solved problems at the boundary or outside of their fields of expertise, indicating a transfer of knowledge from one field to others. “
  • The Quantum Pontiff : Self-Correcting Quantum Computers, Part I
    • The first of Dave Bacon’s excellent multi-part series about how quantum computers can correct themselves.
  • Uncommon Knowledge and Open Innovation – john wilbanks’ blog – john wilbanks’ blog on Nature Network
    • “We are seeing the transformation of knowledge from something that is primarily conveyed in paper formats into something else: a computable graph, in which the knowledge is written in formats that computers can understand and interconnect, based on the same technologies that underlie the internet and web. Paper technology simply contains expressions of ideas, but the very technology of paper makes integration of ideas very difficult, if not impossible… the idea of “the paper” as the core container for knowledge is dying, and technology will be the killer. This transformation is happening first, like the transformation of documents to the Web, in the sciences.”
  • PolishMyWriting.com
    • Checks your writing against more than 7000 rules of plain language. I put a couple of draft essays through it, and found about half the suggestions helpful, which is a pretty good batting average.
  • Upton Sinclair: “It is difficult to get a man to understand something when his salary depends upon his not understanding it.”
    • It’s curious that often the last people to really grok that a profession is disappearing is people within the profession itself. This quote of Sinclair’s summarizes a part of why that is.
  • Dorothea Salo: Innkeeper at the Roach Motel
    • ‘Trapped by faculty apathy and library uncertainty, institutional repositories face a crossroads: adapt or die. The “build it and they will come” proposition has been decisively proven wrong. Citation advantages and preservation have not attracted faculty participants, though current-generation software and services offer faculty little else. Academic librarianship has not supported repositories or their managers. Most libraries consistently under-resource and understaff repositories, further worsening the participation gap. Software and services are wildly out of touch with faculty needs and the realities of repository management. These problems are not insoluble, but they demand serious reconsideration of repository missions, goals, and means.’

Click here for all of my del.icio.us bookmarks.

Published