February 2009 – Michael Nielsen

Biweekly links for 02/27/2009

Paul Graham: What I’ve Learned from Hacker News
- “When a technology is this young, the existing solutions are usually terrible; which means it must be possible to do much better; which means many problems that seem insoluble aren’t. Including, I hope, the problem that has afflicted so many previous communities: being ruined by growth.”
All Consuming: Giving up the Gun: Japan’s Reversion to the Sword, 1543-1879
- “Guns were introduced to Japan early on with the arrival of the first Europeans in 1543. And yet in 1854 when Cmdr Perry â€œopened up Japanâ€ in his â€œblack shipsâ€, it seemed like nobody knew much about them. In fact, between those years, Japan had taken up the gun, improved it vastly (initially swordsmen triumphed over gunners because guns were susceptible to wet weather and took a long time to prime and reload), and then gave up on them. This book seeks to find out why.”
Geeking with Greg: Jeff Dean keynote at WSDM 2009
- “Jeff gave several examples of how Google has grown from 1999 to 2009. They have x1000 the number of queries now. They have x1000 the processing power (# machines * speed of the machines). They went from query latency normally under 1000ms to normally under 200ms. And, they dropped the update latency by a factor of x10000, going from months to detect a changed web page and update their search results to just minutes.
  The last of those is very impressive. Google now detects many web page changes nearly immediately, computes an approximation of the static rank of that page, and rolls out an index update. For many pages, search results now change within minutes of the page changing. There are several hard problems there — frequency and importance of recrawling, fast approximations to PageRank, and an architecture that allows rapid updates to the index — that they appear to have solved.”

Click here for all of my del.icio.us bookmarks.

Biweekly links for 02/23/2009

Cathemeral Thinking: Blurts: The value of short, rapid, open communication to collective creativity
Parent of gamer asks his son to honor the Geneva Conventions – Boing Boing
- “I asked Evan to google the Geneva Convention. Then he had to read it and then we had to discuss it. This we did. So the deal is that Evan has to fight according to the rules of the Geneva Convention. If his team-mates violate the Convention then play stops and Call of Duty goes away for a while.
  We’ll see how it goes, but Evan keeps his word. Especially about his games. “
Coding Horror: The Bad Apple: Group Poison
- “Groups of four college students were organized into teams and given a task to complete some basic management decisions in 45 minutes. To motivate the teams, they’re told that whichever team performs best will be awarded $100 per person. What they don’t know, however, is that in some of the groups, the fourth member of their team isn’t a student. He’s an actor hired to play a bad apple… Invariably, groups that had the bad apple would perform worse. And this despite the fact that were people in some groups that were very talented, very smart, very likeable. Phelps found that the bad apple’s behavior had a profound effect — groups with bad apples performed 30 to 40 percent worse than other groups.”

Click here for all of my del.icio.us bookmarks.

Biweekly links for 02/20/2009

Even more open science? Â« Tobias J. Osborneâ€™s research notes
- “Whether a theoretical weblog can be truly allowed to be called open notebook science has been questioned recently. Iâ€™m not sure where I stand here. Wikipediaâ€™s definition reads: â€œOpen Notebook Science is the practice of making the entire primary record of a research project publicly available online as it is recorded.â€ This is roughly what Iâ€™m trying to do here: I have notebooks containing the records of my research projects and instead of letting them collect dust on my filing cabinet Iâ€™m typing them up as I go and sharing them here. (So, naturally, this means youâ€™ll get a lot of dead ends and half explored ideasâ€¦) If enough people feel passionately that my weblog doesnâ€™t count then Iâ€™m happy to go with the flow and accept whatever definition is deemed more appropriate by those more involved in this kind of thing (open theoretical brain dump, or open theoretical posturing perhaps?) “
Disco
- “Disco is an open-source implementation of the Map-Reduce framework for distributed computing. As the original framework, Disco supports parallel computations over large data sets on unreliable cluster of computers.
  The Disco core is written in Erlang, a functional language that is designed for building robust fault-tolerant distributed applications. Users of Disco typically write jobs in Python, which makes it possible to express even complex algorithms or data processing tasks often only in tens of lines of code. This means that you can quickly write scripts to process massive amounts of data. “
Wikipedia in academic studies – Wikipedia

Click here for all of my del.icio.us bookmarks.

Biweekly links for 02/13/2009

New Kindle Audio Feature Causes a Stir – WSJ.com
- The Authors Guild, in their great wisdom, on the “read-aloud” feature of the Kindle: “They don’t have the right to read a book out loud,” said Paul Aiken, executive director of the Authors Guild. “That’s an audio right, which is derivative under copyright law.”
Continuous Deployment at IMVU: Doing the impossible fifty times a day. Â« Timothy Fitz
- Deploying code to production 50 times a day: uses very aggressive and complete testing, and (of course) a fully automated deployment cycle.
i9606: non-anonymous peer review
- Benjamin Good: “I spent this afternoon acting as a voluntarily non-anonymous peer reviewer – its scary. I ended up advocating rejection of the article I was reading and I have to say that Vince Smith … was absolutely right that the act of signing your review “keeps you in check”. Knowing from the outset that your words are going to be linked to your name can really change what you have to say – it certainly makes you think about it for a while longer. It is scary though – I hope that I managed to convey enough of my reasoning and suggestions for ways to improve the article that the authors don’t despise me and attempt to ruin my life… I also hope that the editors of the journal manage to acquire at least one additional reviewer for this manuscript – safety in numbers! Or perhaps the editors will strip my name from my comments? Time will tell I guess.”
Wiki Research Bibliography
- A bibliography of publications about wikis and Wikipedia. Many very interesting looking papers, many of which are new to me.
digitalresearchtools
- “This wiki collects information about tools and resources that can help scholars (particularly in the humanities and social sciences) conduct research more efficiently or creatively. Whether you need software to help you manage citations, author a multimedia work, or analyze texts, Digital Research Tools will help you find what you’re looking for. We provide a directory of tools organized by research activity, as well as reviews of select tools in which we not only describe the tool’s features, but also explore how it might be employed most effectively by researchers.”
Seb’s Open Research
- Seb was an early blogger who ran an excellent blog called “Seb’s Open Research”. He’s recently started up blogging again.

Click here for all of my del.icio.us bookmarks.

Writing to be taken out of context

In a world where itâ€™s becoming easier and easier to reshare information in other forums, where it can be a sign of authorial success to have your words reshared and then reshared again, Iâ€™ve noticed that the way I think about what I write is gradually changing.

Increasingly, I write with an eye not only to the immediate way my words will be received, but also with an eye to how they might be read, in the event that they are taken and reshared in other places, perhaps stripped of the context from which they were taken.

This effect has been strongest, by far, in my stream of delicious bookmarks. These are automatically shared not just with people who follow my delicious bookmarks, but also on my friendfeed, and on my blog.

Each of these forums has a slightly different tone and constraints. I find myself sometimes thinking “oh, that description wonâ€™t quite work on friendfeed”, when I know that itâ€™s just fine on delicious and on my blog. And I sometimes pause to think about how things might read, when taken and put in a context over which I have no control.

None of this is new, of course â€“ people have always quoted both themselves and each other. Still, I canâ€™t help believing that the frequency and ease of doing this has increased so dramatically over the past few years that it must gradually be coming to strongly affect all those who write on the web.

Bill Thurston on collective progress in mathematics

Apropos the polymath project, a nice quote from Bill Thurston on how progress is made collectively in mathematics (via Cosma and Quomodocumque):

In mathematics,it often happens that a group of mathematicians advances with a certain collection of ideas. There are theorems in the path of these advances that will almost inevitably be proven by one person or another. Sometimes the group of mathematicians can even anticipate what these theorems are likely to be. It is much harder to predict who will actually prove the theorem,although there are usually a few â€œpoint peopleâ€who are more likely to score. However, they are in a position to prove those theorems because of the collective efforts of the team.The team has a further function,in absorbing and making use of the theorems once they are proven. Even if one person could prove all the theorems in the path single-handedly,they are wasted if nobody else learns them.

There is an interesting phenomenon concerning the â€œpointâ€people. It regularly happens that someone who was in the middle of a pack proves a theorem that receives wide recognition as being significant. Their status in the communityâ€”their pecking orderâ€”rises immediately and dramatically.When this happens,they usually become much more productive as a center of ideas and a source of theorems.Why? First,there is a large increase in self-esteem, and an accompanying increase in productivity. Second, when their status increases,people are more in the center of the network of ideasâ€”others take them more seriously. Finally and perhaps most importantly, a mathematical breakthrough usually represents a new way of thinking,and effective ways of thinking can usually be applied in more than one situation.

This phenomenon convinces me that the entire mathematical community would become much more productive if we open our eyes to the real values in what we are doing. Jaffe and Quinn propose a system of recognized roles divided into â€œspeculationâ€and â€œprovingâ€. Such a division only perpetuates the myth that our progress is measured in units of standard theorems deduced. This is a bit like the fallacy of the person who makes a printout of the first 10,000 primes. What we are producing is human understanding. We have many different ways to understand and many different processes that contribute to our understanding. We will be more satisfied, more productive and happier if we recognize and focus on this.

Biweekly links for 02/09/2009

RealClimate: On Replication
Creationism Slips Into a Peer-Reviewed Journal | NCSE
- “A strange thing happened in the scientific literature recently. A pair of creationists, who have seemingly legitimate scientific credentials, attempted to publish some creationist assertions in a peer-reviewed journal. Their effort was nearly successful, mostly because they hid their pseudoscience in the middle of the article, surrounded by legitimate scientific discussion of unrelated topics. Luckily, they were caught just in time, and it turned out that they were pretty clumsy. In fact, if they had been just a bit more clever, they might have gotten away with it.”
RealClimate: Antarctic warming is robust
- Fascinating back-and-forth discussion in the comments of the need for reproducible research, and how much disclosure of methods, code, data should be considered full disclosure. You need to skip over a lot of comments (the usual bickering), but it’s worth it.
Inside Google Book Search: 1.5 million books in your pocket
Uncertain Principles: Two Cultures in Beginnings and Endings
- “In the humanities, the whole point of the class is to discuss the books. Nothing useful can be done until and unless the students have had the chance to do the reading. This is why humanities classes tend to let out early on the first day of the term, and have a full class on the last day of the term: the important reading has to be done before class.
  In the sciences, on the other hand, the whole point of class is to give the students enough information to be able to read the textbook and do the problems. The essential step in the learning process is when the students try to apply what they’ve learned to solving problems. This is why science classes tend to have a full class on the first day of the term, and let out early on the last day of the term: the important reading is done after class.”
Twins escape hanging over ID confusion – ABC News
- What DNA testing can’t quite resolve: “A pair of identical twins escaped being convicted and hanged on drugs charges in Malaysia, due to confusion over which one of them was the culprit, reports said Saturday.”

Click here for all of my del.icio.us bookmarks.

Update on the polymath project

A few brief comments on the first iteration of the polymath project, Tim Gowers’ ongoing experiment in collaborative mathematics:

The project is remarkably active, with nearly 300 substantive mathematical comments in just the first week. It shows few signs of slowing down.
It’s perhaps not (yet) a “massively” collaborative project, but many mathematicians are contributing – a quick pass over the comments suggests that so far 14 or so people have made substantive mathematical contributions, and it seems likely that number will rise further. Unsurprisingly, that number already rises considerably if you include people who have made comments on the collaborative process.
Regardless of the outcome of the project, I expect that many beginning research students in mathematics will find this a great resource for understanding what research is about. It’s a way of seeing research mathematicians as they work – trying ideas out, making occcasional errors, backtracking, and so on. I suspect many students will find this incredibly enlightening. To pick just one example of why this may be, my experience is that many beginning students assume that the key to research success lies in having great leaps of insight to solve difficult problems. The discussion shows something quite different: you see excellent mathematicians following up every little lead, trying out many different approaches to problems, seeing many, many ideas fail, and gradually aggregating small insights, as a bigger picture only very slowly emerges.
The discussion so far has been courteous and professional in the highest degree. I suspect such courteous and professional behaviour greatly increases the chances of success in such a collaboration. I’m reminded of the famous Hardy-Littlewood rules for collaboration. Tim Gowers’ rules of collaboration have something of the same flavour.
One might say that this courtesy and professionalism is only to be expected, given the many professional mathematicians participating. Unfortunately, it’s not difficult to find excellent blogs run by professional scientists where the comment sections are notably less courteous and professional. I’ll omit examples.
Initially, I wasn’t so sure about the idea of using the linear medium of blog comments to run such a project. It seemed restrictive to use anything less than a multi-threaded forum, if forum software could be found that was geared towards mathematics. (Something like Google Groups would be good, but it doesn’t provide any way to display mathematics, so far as I’m aware.) The linear format has worked much better than I thought it would. Although at times it makes the discussion difficult to follow, the linear format has the benefit of preventing the conversation (and the collaborative community) from fracturing too much. This may be something to think about for future projects.
Many large-scale collaborative projects make it easy for late entrants to make a contribution. For example, in the Kasparov versus the World chess game, new participants could enter late in the game and come up to speed quickly. This was in part because of the nature of chess (only the current board matters, not past positions), but it was also partially because of the public analysis tree maintained for much of the game by Irina Krush. This acted as a key reference point for World Team decisions, and summarized much of the then-current best thinking about the game. In a similar way, many open source projects encourage late entry, with new participants able to jump in after looking at the existing code base (analogous to the state of the chess board), and the project wiki (analogous to the analysis tree). As the polymath project continues, I hope similar points of entry will enable outsiders to follow what is happening, and to contribute, without necessarily having to follow the entire discussion to that point.

Biweekly links for 02/06/2009

Systeme D: ShareAlike considered harmful for geodata
- Describes some problems that arise from using a Creative Commons ShareAlike license for geodata.
What Contracts Can’t Do: The Limits of Private Ordering in Facilitating a Creative Commons by Niva Elkin-Koren
- “Creative Commons is a non-profit U.S. based organization that operates a licensing platform to promote free use of creative works. The idea is to facilitate the release of creative works under generous license terms that would make works available for sharing and reuse. Creative Commons advocates the use of copyrights in a rather subversive way that would ultimately change their meaning.
  The paper expresses a skeptical view of this worthy pursuit. While I share Creative Commons’ concern with copyright fundamentalism, which inevitably leads to the propertization of everything of value, I am more skeptical of its strategy. The paper explores the legal strategy of Creative Commons and analyzes its potential for enhancing the sharing, distribution and (re)use of creative works.”
Quantum Celebration [Tattoo] | The Loom
- Best tattoo ever.
The Crowd-Sourced Reading List | The Loom | Discover Magazine
- Carl Zimmer’s list of great science writing. I’d add Steven Pinker’s “The Language Instinct” to his list of books.
A Clockwork Black: i was trying to avoid this
- Some of the early history of Amazon EC2.
Science in the open Â» Best practice for data availability â€“ the debate startsâ€¦well over there really
- Cameron Neylon summarizes many of the issues around data and licenses.
Bossa
- Developed by the same group that did SETI@Home (Boinc): “Bossa is an open-source software framework for distributed thinking – the use of volunteers on the Internet to perform tasks that use human cognition, knowledge, or intelligence.
  Bossa minimizes the effort of creating and operating a distributed thinking project. It provides a project web site, hosted on your Linux server, where volunteers go to perform tasks and to interact with other volunteers. All you need to supply are PHP scripts to generate, show, and handle tasks. “
Williams Math/Stat blog
- The entire department of mathematics and statistics at Williams College has a blog.
Frank Morgan: blog
- Blog from Frank Morgan, whose book on geometric measure theory I read and enjoyed many years ago.
Education – Change.org: Snark Attack: UCLA Research Dissing Technology Bombs
- Entertaining and thoughtful response to a recent study published in Science: “Is Technology Producing a Decline in Critical Thinking and Analysis?”
E. Kowalskiâ€™s blog â€º Comments on mathematics, mostly.
- Another astonishing mathematical blog.
The Accidental Mathematician
- Blog from Izabella Laba, a mathematician at UBC.
Consensus Protocols: Paxos at Paper Trail
- Useful overview of the Paxos consensus protocol, as used by Google’s Chubby lock system.
Life at Wal-Mart – Boing Boing
- Interesting story of working at Wal-Mart from Charles Platt.
On new modes of mathematical collaboration Â« What Is Research?
- Points out many of the flaws with online tools as ways of approaching mathematical collaboration.
Questions of procedure Â« Gowersâ€™s Weblog
- Tim Gowers’ rules for his ongoing experiment in massive collaboration in mathematics, the Polymath Project.
Open Knowledge Foundation Blog Â» Blog Archive Â» Open Data Openness and Licensing
- Excellent thoughtful discussion of open data and licensing. Three points where I disagree: (1) the article underrates the problems that may be caused by licensing incompatibilities – witness all the problems this has caused in the open source world, where the commons has fragmented; (2) the article takes for granted that scientists are going to want open licenses – I don’t see that this is necessarily true, certainly if current norms are encoded in the license; and (3) the article implicitly assumes that the license (not the norm) is how enforcement will be handled, yet I think there is little evidence to suggest that this is true in academic science, where norms are far more often the remedy of choice.
How Not to Lose Face on Facebook, for Professors

Click here for all of my del.icio.us bookmarks.

The polymath project

Tim Gower’s experiment in massively collaborative mathematics is now underway. He’s dubbed it the “polymath project” – if you want to see posts related to the project, I suggest looking here.

The problem to be attacked can be understood (though probably not solved) with only a little undergraduate mathematics. It concerns a result known as the Density Hales-Jewett theorem. This theorem asks us to consider the set [tex][ 3 ]^n[/tex] of all length [tex]n[/tex] strings over the alphabet [tex]1, 2, 3[/tex]. So, for example, [tex]11321[/tex] is in [tex][3]^5[/tex]. The theorem concerns the existence of combinatorial lines in subsets of [tex][3]^n[/tex]. A combinatorial line is a set of three points in [tex][3]^n[/tex], formed by taking a string with one or more wildcards in it, e.g., [tex]112*1**3\ldots[/tex], and replacing those wildcards by [tex]1[/tex], [tex]2[/tex] and [tex]3[/tex], respectively. In the example I’ve given, the resulting combinatorial line is:

[tex] \{ 11211113\ldots, 11221223\ldots, 11231333\ldots \} [/tex]

The Density Hales-Jewett theorem asserts that for any [tex]\delta > 0[/tex], for sufficiently large [tex]n = n(\delta)[/tex], all subsets of [tex][3]^n[/tex] of size at least [tex]\delta 3^n[/tex] contain a combinatorial line,

Apparently, the original proof of the Density Hales-Jewett theorem used ergodic theory; Gowers’ challenge is to find a purely combinatorial proof of the theorem. More background can be found here. Serious discussion of the problem starts here.