The most remarkable graph in the history of sport

The following graph is a histogram of the cricket batting average of all the people who’ve played cricket for their country [1]. It may not be obvious at first glance, but it’s a remarkable graph, even if you don’t give a fig about cricket, or even about sport.


cricket
(credit)

What makes it remarkable is the barely noticeable bump at the far right of the graph, which I’ve indicated by a blue arrow. It shows the cricket batting average, 99.94, of one Donald Bradman, an Australian batsman who you could plausibly argue was the most outsized talent in any area of human achievement.

To understand how Bradman’s 99.94 average compares with other batsmen, consider that a typical topflight batsman has an average in the range 45 to 55. Batsmen with averages above 55 are once-in-a-generation phenomena who dominate the entire game. After Bradman, the second highest average in history [2] belongs to South African Graeme Pollock, with 60.97, and the third highest to West Indian George Headley, with 60.83.

It’s tempting to think that the greats of other sports, people like Michael Jordan, Wayne Gretsky, and so on, must stand out just as far as Bradman. But a look at the statistics doesn’t back this up. For example, Jordan scored an average of 30.12 points per game, a monumental achievement, but only a fraction ahead of Wilt Chamberlain’s 30.07, with a somewhat larger gap to Allen Iverson, with 27.73. Following Iverson there are many others with averages of around 26 or 27 points per game.

For comparison, Bradman could have deliberately thrown his innings away for zero (a “duck” in cricket parlance) one time in every three innings, and he’d still have a career average of nearly 67; he’d still be far and away the greatest batsman ever to live. Even if Bradman had deliberately thrown his innings away one time in two, his average would be about 50, and he’d have been a topflight batsman.

Why yes, I am a cricket fan!

[1] Technically, it’s a graph of batting averages in test cricket, the oldest form of the game played internationally. Until the 1970s, it was also the only form of the game played internationally, and so its the only graph relevant to this post. Note that only players with at least 20 innings are included.

[2] Current Australian batsman Michael Hussey has an average of about 70. It remains to be seen if he can keep this up.

Published

Biweekly links for 10/17/2008

  • In Honor of Paul Krugman: Avinash Dixit
    • “…Paul’s typical modus operandi. He spots an important economic issue coming down the pike months or years before anyone else. Then he constructs a little model of it, which offers some new and unexpected insight. Soon the issue reaches general attention, and Krugman’s model is waiting for other economists to catch up. Their reaction is generally a mixture of admiration and irritation. The model is wonderfully clear and simple. But it leaves out so much, and relies on so many special assumptions including specific functional forms, that they don’t think it could possibly do justice to the complexity of the issue. Armies of well-trained economists go to work on it, and extend and generalize it to the point where it would get some respect from rigorous theorists. In this process they do contribute some new ideas and find some new results. But, as a rule, they find something else… His special assumptions go to the heart of the problem, like a narrow and sharp stiletto.”

Click here for all of my del.icio.us bookmarks.

Published

Biweekly links for 10/10/2008

  • The Collapse of Peer Review « The Scholarly Kitchen
    • “Ellison has painstakingly documented the decline of articles published in top economics journals by authors working in the highest-ranked schools. These authors are continuing to publish, but are seeking other outlets, including unrefereed preprint and working paper servers.”
  • Open Access Day – FriendFeed Room
    • A FriendFeed Room for Open Access Day, October 14.
  • Paul Ginsparg: The global-village pioneers
    • Superb. Many choice quotes, including this one: “If scholarly infrastructure can be upgraded to encourage maximal spontaneous participation, then we can expect not only an increasing availability of materials online for algorithmic harvesting — articles, datasets, lecture notes, multimedia and software — but also qualitatively new forms of academic effort. “
  • Ober, J.: Democracy and Knowledge: Innovation and Learning in Classical Athens.
    • “argues that the key to Athens’s success lay in how the city-state managed and organized the aggregation and distribution of knowledge among its citizens. Ober explores the institutional contexts of democratic knowledge management, including the use of social networks for collecting information, publicity for building common knowledge, and open access for lowering transaction costs. He explains why a government’s attempt to dam the flow of information makes democracy stumble. Democratic participation and deliberation consume state resources and social energy. Yet as Ober shows, the benefits of a well-designed democracy far outweigh its costs.”
  • Editorial: APS now leaves copyright with authors for derivative works
    • There are some significant caveats (read the whole thing!), but the thrust is: “When you submit an article to an APS journal, we ask you to sign our copyright form. It transfers copyright for the article to APS, but keeps certain rights for you, the author. We have recently changed the form to add the right to make ‘‘derivative works’’ that reuse parts of the article in a new work.”
  • Timo Hannay: The Future Is A Foreign Country
    • The text for Timo’s superb presentation about the future of scientific publishing at the recent Science in the 21st Century Workshop.

Click here for all of my del.icio.us bookmarks.

Published

Biweekly links for 10/03/2008

  • The story of the WorldWide Telescope « Jon Udell
    • Quoting Jim Gray in 2002: “Most scientific data will never be directly examined by scientists; rather it will be put into online databases where it will be analyzed and summarized by computer programs. Scientists increasingly see their instruments through online scientific archives and analysis tools, rather than examining the raw data. Today this analysis is primarily driven by scientists asking queries, but scientific archives are becoming active databases that self-organize and recognize interesting and anomalous facts as data arrives. “
  • Nascent: Social Not Working?
    • A stimulating talk by Timo Hannay about Science 2.0.
  • Peter Norvig: Presidential Election 2008 FAQ
    • A great deal of useful information about both campaigns.
  • xkcd – Height
    • xkcd does “Powers of Ten”. Very cool.
  • Jay Walker’s Library
    • Lust.
  • Media Bias: Going beyond Fair and Balanced: Scientific American
    • A clever way to test for bias: “Groeling collected two different data sets: in-house presidential approval polling by ABC, CBS, NBC and FOX News and the networks’ broadcasts of such polls on evening news shows from January 1997 to February 2008. Groeling found that, with varying degrees of statistical significance, CBS, NBC and ABC showed what Groeling calls a pro-Democrat bias. For instance, CBS was 35 percent less likely to report a five-point drop in approval for Bill Clinton than a similar rise in approval and was 33 percent more likely to report a five-point drop than a rise for George W. Bush. Meanwhile FOX News showed a statistically significant pro-Republican bias in the most controlled of the three models Groeling tested: its Special Report program was 67 percent less likely to report a rise in approval for Clinton than a decrease and 36 percent more likely to report the increase rather than the decrease for Bush.”
  • Adding Noughts in Vain: Shock: Global Warming Still Happening!
    • Useful discussion of the last 30 years of data on global temperatures.

Click here for all of my del.icio.us bookmarks.

Published

Biweekly links for 09/29/2008

Click here for all of my del.icio.us bookmarks.

Published

Biweekly links for 09/26/2008

Click here for all of my del.icio.us bookmarks.

Published

Science beyond individual understanding

Two years after the breakup of the Soviet Union, British economist Paul Seabright was talking with a senior Russian official who was visiting the UK to learn about the free market. “Please understand that we are keen to move towards a market system,” the official said, “But we need to understand the fundamental details of how such a system works. Tell me, for example: who is in charge of the supply of bread to the population of London?” [1]

The familiar but still astonishing answer to this question is that in a market economy, everyone is in charge. As the market price of bread goes up and down, it informs our collective behaviour: whether to plant a new wheat field, or leave it fallow; whether to open that new bakery you’ve been thinking about opening on the corner; or simply whether to buy two or three loaves of bread this week. The price thus aggregates an enormous amount of what would otherwise be hidden knowledge from all the people interested in the production or consumption of bread, that is, nearly everyone. By using prices to aggregate this knowledge and inform further actions, the market produces outcomes superior to even the brightest and best informed individuals.

Unfortunately, markets don’t always aggregate knowledge accurately. When participants in a market are mistaken in systematic ways, markets don’t so much aggregate knowledge as they aggregate misunderstanding. The result can be an enormous collective error in judgement; when the misjudgement is revealed, the market crashes.

My subject in this essay is not economics, it’s science. So what’s all this got to do with science?

The connection involves the question of what it means to understand something. In economics, many basic facts, such as prices, have an origin which isn’t completely understood by any single person, no matter how bright or well informed, because none of those people have access to all the hidden knowledge that determines those prices.

By contrast, until quite recently the complete justification for even the most complex scientific facts could be understood by a single person.

Consider, for example, astronomer Edwin Hubble’s discovery in the 1920s of the expansion of the Universe. By the standards of the time, this was big science, requiring a complex web of sophisticated scientific ideas and equipment – an advanced telescope, spectroscopic equipment, and even Einstein’s special theory of relativity. To understand all those things in detail requires years of hard work, but a dedicated person like Hubble could master it all, and so in some sense he completely understood his own discovery of the expansion of the Universe.

Science is no longer so simple; many important scientific facts now have justifications that are beyond the comprehension of a single person.

For example, in 1983 mathematicians announced the solution of an important longstanding mathematical problem, the classification of the finite simple groups. The work on this mathematical proof extended between 1955 and 1983, and required approximately 500 journal articles by 100 mathematicians. Many minor gaps were subsequentely found in the proof, and at least one serious gap, now thought (by some) to be resolved; the resolution involved a two-volume, 1300-page supplement to the proof. Although mathematicians are working to simplify the proof, even the simplified proof is expected to be exceedingly complex, beyond the grasp of any single person.

The understanding of results from the Large Hadron Collider (LHC) will be similarly challenging, requiring a deep knowledge of elementary particle physics, many clever ideas in the engineering of the accelerator and the particle detectors, and complex algorithms and statistical techniques. No single person understands all of this, except in broad detail. If the discovery of the Higgs particle is announced next year, there won’t be any single person in the world who can say “I understand how we discovered this” in the same way Hubble understood how he discovered the expansion of the Universe. Instead, there will be a large group of people who collectively claim to understand all the separate pieces that go into the discovery, and how those pieces fit together.

Two clarifications are in order. First, when I say that these are examples of scientific facts beyond individual understanding, I’m not saying a single person can’t understand the meaning of the facts. Understanding what the Higgs particle is requires several years hard work, but there are many people in the world who’ve done this work and who have a solid grasp of what the Higgs is. I’m talking about a deeper type of understanding, the understanding that comes from understanding the justification of the facts.

Second, I don’t mean that to understand something you need to have mastered all the rote details. If we require that kind of mastery, then there’s no one person who understands the human genome, for certainly no-one has memorized the entire DNA sequence. But there are people who understand deeply all the techniques used to determine the human genome; all that is missing from their understanding is the rote work identifying all the DNA base pairs. The examples of the LHC and the classification of the finite simple groups go beyond this, for in both cases there are many distinct deep ideas involved, too many to be mastered by any single person.

Science as complex as the LHC and the classification of finite simple groups is a recent arrival on the historical scene. But there are two forces that will soon make science beyond individual understanding far more common.

The first of these forces is rapid internet-fueled growth in the number of large-scale scientific collaborations. In the short term, these collaborations will mostly just crowdsource rote work, as is being done, for example, by the galaxy classification project Galaxy Zoo, and so the results will pose no challenge to individual understanding. But as the collaborations get more sophisticated we can expect to see many more online collaborations that delegate large amounts of specialized work, building up to a whole whose details aren’t fully understood by any single person.

The second of these forces is the use of computers to do scientific work. A nascent example is the proof of the four-colour theorem in mathematics. A small group of mathematicians outlined a proof, but to complete the proof, they had to check a large number of cases of the theorem, more than they could check by hand. Instead, a computer was used to check those cases. This isn’t an instance of science beyond individual understanding, though, because mathematicians familiar with the proof feel the computer was simply doing rote work. But the people doing computational science are getting cleverer in how they use computers to make discoveries. Machine learning, data mining and artificial intellgience techniques are being used in increasingly sophisticated ways to produce real insights, not just rote work. As the techniques get better, the number of insights found will increase, and we can expect to see examples of science beyond individual understanding generated this way: “I don’t understand how this discovery was made, but my computer and I do together”.

More powerful than either of these forces will be their combination: large-scale computer-assisted collaboration. The discoveries from such collaboration may well not be understood by any single individual, or even by a group. Instead, it will reside inside a combination of the group and their networked computers.

Such scientific discoveries raise challenging issues. How do we know whether they’re right or wrong? The traditional process of peer review and the criterion of reproducibility work well when experiments are cheap, and one scientist can explain to another what was done. But they don’t work so well as experiments get more expensive, when no one person fully understands how an experiment was done, and when experiments and their analyses involve reams of data or ideas.

Might we one day find ourselves in a situation like in a free market where systematic misunderstandings can infect our collective conclusions? How can we be sure the results of large-scale collaborations or computing projects are reliable? Are there results from this kind of science that are already widely believed, maybe even influencing public policy, but are, in fact, wrong?

These questions bother me a lot. I believe wholeheartedly that new tools for online collaboration are going to change and improve how science is done. But such collaborations will be no good if we can’t assess the reliability of the results. And it would disastrous if erroneous results were to have a major impact on public policy. We’re in for a turbulent and interesting period as scientists think through what’s needed to arrive at reliable scientific conclusions in the age of big collaborations.

Acknowledgements

Thanks to Jen Dodd for providing feedback that greatly improved an early draft of this essay. The essay was stimulated in part by the discussion during Kevin Kelly’s session at Science Foo Camp 2008. Thanks to all the participants in that discussion.

Further reading

This essay is adapted from a book I’m currently working on about “The Future of Science”. The basic thesis is described here, and there’s an extract here. If you’d like to be notified when the book is available, please send a blank email to the.future.of.science@gmail.com with the subject “subscribe book”. You’ll be emailed to let you know when the book is to be published; your email address will not be used for any other purpose.

Subscribe to my blog here.

You may enjoy some of my other essays.

Footnote

[1] “Who is in charge of the supply of bread to the population of London?” – see Paul Seabright’s The Company of Strangers.

Published
Categorized as Essays

Biweekly links for 09/22/2008

Click here for all of my del.icio.us bookmarks.

Published