Biweekly links for 11/30/2009

  • How I Hire Programmers (Aaron Swartz’s Raw Thought)
    • Much more interesting than the traditional approach, which seems to be a hybrid of the Microsoft and Google approaches.
  • Wikimedia blog » Blog Archive » Wikipedia’s Volunteer Story
    • “What’s happening to Wikipedia’s volunteer community? Earlier this week, the Wall Street Journal reported that “Volunteers Log Off as Wikipedia Ages”. The article is a comprehensive description of the challenges and opportunities facing the Wikipedia community… A quote from the article: “In the first three months of 2009, the English-language Wikipedia suffered a net loss of more than 49,000 editors, compared to a net loss of 4,900 during the same period a year earlier, according to Spanish researcher Felipe Ortega.”

      Other news stories have further focused on this particular number, some going so far to predict Wikipedia’s imminent demise… It’s understandable that media will look for a compelling narrative. Our job is to arrive at a nuanced understanding of what’s going on. This blog post is therefore an attempt to dig deeper into the numbers and into what’s happening with Wikipedia’s volunteer community, and to describe our big picture strategy.”

Click here for all of my del.icio.us bookmarks.

Biweekly links for 11/27/2009

  • So, Where’s My Robot?
    • Social machine learning blog from Andrea Thomaz
  • Reddit interview with former Director-General of the World Trade Organisation
  • Distilling Free-Form Natural Laws from Experimental Data — Schmidt and Lipson 324 (5923): 81 — Science
    • “A key challenge to finding analytic relations automatically is defining algorithmically what makes a correlation in observed data important and insightful. We propose a principle for the identification of nontriviality. We demonstrated this approach by automatically searching motion-tracking data captured from various physical systems, ranging from simple harmonic oscillators to chaotic double-pendula. Without any prior knowledge about physics, kinematics, or geometry, the algorithm discovered Hamiltonians, Lagrangians, and other laws of geometric and momentum conservation. The discovery rate accelerated as laws found for simpler systems were used to bootstrap explanations for more complex systems, gradually uncovering the “alphabet” used to describe those systems.”
  • Statistical Learning as the Ultimate Agile Development Tool
    • Excellent lecture from Peter Norvig. Interesting ideas include: the phrase “data-driven programming”; evidence that a bad algorithm with lots of data may outperform a good algorithm with less data; the idea that it may be possible to solve very complex problems with incredibly simple programs and lots of data.
  • Rambles at starchamber.com » Blog Archive » Information obesity
    • “It occurred to me that I was suffering from information obesity. Prosperity has caused most of us to go from problems associated too little food to problems associated with too much food. Until you adjust to the change, hoarding and binging can make you fat, sick, and miserable. Once I started thinking about information the same way, I could just picture the greasy fat folds in my brain.”
  • LibriVox
    • “LibriVox provides free audiobooks from the public domain.”
  • Galaxy Zoo Blog » Galaxy Zoo: Understanding Cosmic Mergers
    • A new Galaxy Zoo subproject: “Starting at midnight 11/24, our new site ‘Galaxy Zoo: Understanding Cosmic Mergers’ went on-line as a new project in Galaxy Zoo. In Mergers, we are working to understand the cosmic collisions that lead to galaxy mergers. Every day we will have a new target galaxy that we need your help to model. Based on the basic input parameters that we provide, a Java applet running in your browser will simulate some possible collision scenarios. Computers don’t do a good job comparing simulations and real astronomical images, so we need your help to find out which simulations are the most similar to the real galaxy collision.”
  • The Happiness Project: A Little-Known Occupational Hazard Affecting Writers.
    • Yup: “There’s a very common occupational hazard that affects writers, but I’ve never heard anyone talk about it: the desire to write outside your main field… Of course, you can choose what you write about. You just can’t choose what you want to write about.”
  • Einstein Declines
    • 1952 article in Time Magazine about Einstein declining the offer to become the second President of Israel.

Click here for all of my del.icio.us bookmarks.

Biweekly links for 11/23/2009

  • Local Bookstores, Social Hubs, and Mutualization « Clay Shirky
    • Shirky on the future of bookstores.
  • Charlie Stross on reading and the book business
    • A very short, interesting tidbit from Stross.
  • bit-player
    • Brian Hayes’ excellent blog on computing and mathematics
  • An Unstoppable Force Meets…: INTERNETPOKERS: Poker Blog
    • Great upheavals in the world of high-stakes online poker.
  • Corrupted Blood incident – Wikipedia, the free encyclopedia
    • “The Corrupted Blood incident was a widely reported virtual plague outbreak and video game glitch found in the … game World of Warcraft… The plague began on September 13, 2005, when an area was introduced in a new update. One boss could cast a spell called Corrupted Blood, which would deal a certain amount of damage over a period of time, and which could be transferred from character to character. It was intended to be exclusive to this area, but players discovered ways to take it out, causing an epidemic across several servers. During the epidemic, some players would help combat the disease by volunteering healing services, while select others would maliciously spread the disease. These people have been compared to real-world disease spreaders, including early AIDS patient Gaëtan Dugas and Typhoid patient Mary Mallon… [World of Warcraft creator] Blizzard [was forced] to do a hard reset of all of its servers for the game.”
  • Access denied? : Article : Nature
    • “Every weekday, thousands of researchers around the world access the Arabidopsis Information Resource (TAIR), which contains the most reliable and up-to-date genomic information available on the most widely used model organism in the plant kingdom. But now, to those users’ horror, TAIR faces collapse: the US National Science Foundation (NSF) is phasing out funding after 10 years as the data resource’s sole supporter (see page 258).

      TAIR’s plight is emblematic of a broader crisis facing many of the world’s biological databases and repositories. Research funding agencies recognize that such infrastructures are crucial to the ongoing conduct of science, yet few are willing to finance them indefinitely. Such agencies tend to support these resources during the development phase, but then expect them to find sustainable funding elsewhere.”

Click here for all of my del.icio.us bookmarks.

Biweekly links for 11/20/2009

  • Your Looks and Your Inbox « OkTrends
    • Utterly fascinating data-driven look at the dating market from someone who helps run a dating site.
  • Google Scholar now lets you restrict your search to legal opinions and journals
    • I use Scholar’s advanced search pretty often, and only just noticed this – I presume it was added recently. Should be very handy.
  • Damn Cool Pics: Best Hand Painting Art Ever
    • The title appears hyperbolic, but this is remarkable. I often had no idea I was looking at hand.
  • Steven Pinker on technology
    • “Many of the articles in printed encyclopedias stink — they are incomprehensible, incoherent, and instantly obsolete. The vaunted length of the news articles in our daily papers is generally plumped out by filler that is worse than useless: personal-interest anecdotes, commentary by ignoramuses, pointless interviews with bystanders (“My serial killer neighbor was always polite and quiet”). Precious real-estate in op-ed pages is franchised to a handful of pundits who repeatedly pound their agenda or indulge in innumerate riffing (such as interpreting a “trend” consisting of a single observation). The concept of “science” in many traditional literary-cultural-intellectual magazines… is personal reflections by belletristic doctors. And the policy that a serious book should be evaluated in a publication of record by a single reviewer (with idiosyncratic agendas, hobbyhorses, jealousies, tastes, and blind spots) would be risible if we hadn’t grown up with it.”
  • A Speculative Post on the Idea of Algorithmic Authority « Clay Shirky
    • “when people become aware not just of their own trust but of the trust of others: “I use Wikipedia all the time, and other members of my group do as well.” Once everyone in the group has this realization, checking Wikipedia is tantamount to answering the kinds of questions Wikipedia purports to answer, for that group. This is the transition to algorithmic authority. “
  • Geo Hashing
    • “Geohashing is a method for finding an effectively random location nearby and visiting it: a Spontaneous Adventure Generator. Every day, the algorithm generates a new set of coordinates for each 1°×1° latitude/longitude zone (known as a graticule) in the world. The coordinates can be anywhere — in the forest, in a city, on a mountain, or even in the middle of a lake! Everyone in a given region gets the same set of coordinates relative to their graticule.

      As such, these coordinates can be used as destinations for adventures, à la Geocaching, or for local meetups.”

  • Zeroth Order Approximation: Summary dismissal
    • When is it appropriate to dismiss an idea out of hand? “So I am not opposed in principle to the “summary dismissal” of an idea – a rejection that precedes a full discussion of the factual merits. Such judgments are necessary and inevitable. They are a legitimate part of the practical art of reason. Yet I am uneasy, because this kind of preemptive action carries obvious risks. After all, the idea that I reject might be a good one. If I never grant it a real hearing, how will I ever find out?”

Click here for all of my del.icio.us bookmarks.

Biweekly links for 11/16/2009

Click here for all of my del.icio.us bookmarks.

The Wikipedia Paradox

To determine whether any given subject deserves an entry, Wikipedia uses the criterion of notability. This lead to an interesting question:

Question 1: What’s the most notable subject that’s not notable enough for inclusion in Wikipedia?

Let’s assume for now that this question has an answer (“The Answer”), and call the corresponding subject X. Now, we have a second question whose answer is not at all obvious.

Question 2: Is subject X notable merely by being The Answer?

If the answer to Question 2 is “no”, then there’s no problem, and we can all go home.

If the answer to Question 2 is “yes”, well, we have a contradiction, and in a manner similar to the interesting number paradox, it follows that Question 1 must have no answer, and so every conceivable subject must meet Wikipedia’s notability criterion.

Take that, deletionists!

Here’s the amusing thing: whether the answer to Question 2 is yes or no depends on where I publish this analysis. If I publish it on my blog and no-one pays any attention, the answer to Question 2 is, most Wikipedians would likely agree, “no”.

But suppose I went to great trouble to convene a conference series on The Answer, was able to convince leading logicians and philosophers to take part, writing papers about The Answer, convinced a prestigious journal to publish the proceedings, arranged media coverage, and so on. The Answer would then certainly have exceeded Wikipedia’s notability guidelines, and thus the answer to Question 2 would be “yes”.

In other words, whether this is a paradox or not depends on where it’s been published 🙂

(This line of thought was inspired by a lunchtime conversation two years ago with a group of physicists. I don’t remember who, or I’d spread the blame.)

Update: A number of people have made comments along the lines of “But aren’t you assuming a well-ordering” / “What if the most notable article isn’t unique” and so on. It’s easy to modify Question 1 to deal with this: all that’s needed is (a) for the set of non-notable subjects to be well-defined; and (b) for there to be some way to pick out a unique one from that set. Point (a) is, of course, debatable, but outside the scope of the game, which starts by assuming that the Notability policy is well-defined to start with. With that, point (b) follows because the set of possible subjects on Wikipedia is a subset of the set of unicode strings, and is thus countable.

Biweekly links for 11/13/2009

  • Polymath and the origin of life « Gowers’s Weblog
    • Tim Gowers has some very interesting ideas for an open science project to come up with a simple theoretical model where self-replication organisms are likely to spontaneously arise. In this post he tries to formulate a question or questions such a project could feasibly attack, and discusses what would count as a success.
  • The Law of Unintended Consequences
    • “From 1992 to September 2003, pharmaceutical companies tied up the federal courts with 494 patent suits. That’s more than the number filed in the computer hardware, aerospace, defense, and chemical industries combined. Those legal expenses are part of a giant, hidden “drug tax”–a tax that has to be paid by someone. And that someone, as you’ll see below, is you. You don’t get the tab all at once, of course. It shows up in higher drug costs, higher tuition bills, higher taxes–and tragically, fewer medical miracles.

      So how did we get to this sorry place? It was one piece of federal legislation that you’ve probably never heard of–a 1980 tweak to the U.S. patent and trademark law known as the Bayh-Dole Act. That single law, named for its sponsors, Senators Birch Bayh and Bob Dole, in essence transferred the title of all discoveries made with the help of federal research grants to the universities and small businesses where they were made. “

  • Evil reptilian kitten-eater from another planet – Wikipedia, the free encyclopedia
    • “”Evil reptilian kitten-eater from another planet (Sorry.)” was a pejorative used to refer to then Ontario Liberal Party opposition leader Dalton McGuinty in a press release disseminated by the Progressive Conservative Party of Ontario on September 12, 2003, during the provincial election campaign in Ontario, Canada.”
  • Structure+Strangeness: Power laws and all that jazz, redux
    • Aaron Clauset summarizes his review (with Cosma Shalizi and Mark Newman) of how to fit and validate power-law distributions in empirical data. A lot of phenomena that people think follow power laws… well, don’t.
  • bixo
    • Open source Hadoop-based web crawler. Backed by EMI Music and ShareThis.
  • Math Overflow « Combinatorics and more
    • Gil Kalai’s impressions of Math Overflow, the new question and answer site for mathematicians.
  • Charlie’s Diary: Designing society for posterity
    • “So. You, and a quarter of a million other folks, have embarked on a 1000-year voyage aboard a hollowed-out asteroid. What sort of governance and society do you think would be most comfortable, not to mention likely to survive the trip without civil war, famine, and reigns of terror?”
  • History’s greatest comet hunter discovers 1000th comet
    • A fascinating amateur-professional hybrid model for processing data. A wide field robotic solar observatory takes data, which is then examined by amateurs (and pros), who find comets as they graze the sun. One of the amateurs has discovered more than 150 comets this way, which is an appreciable fraction of all the comets discovered in all of history.
  • Shigeki Murakami;Can comet hunters survive?
    • Despair and elation over new automated telescopes.
  • Barack Obama’s Work in Progress
    • Many interesting tidbits on how Obama writes.
  • Open Knowledge Conference (OKCon) 2010: Call for Proposals
    • “We welcome proposals on any aspect of creating, publishing or reusing content or data that is open in accordance with opendefinition.org. “
  • Scientific software quality: what would it take to convince software engineers?
    • “what would convince you, as a software engineer, that a climate model is of good software quality or not? I asked this question at the CASCON workshop… No one had an answer. In fact, most people just dismissed the question with a laugh. Is it that silly of a question? I think it’s a great one… I’ve asked a few climate scientists the same question in earnest: what convinces you that climate model software is of good quality or not? The answers have been quite varied. Knowing the history of the model, or the development team, the state of the documentation, whether they’ve seen the model code or not, and generally how open the development is, are some of the things that factor into their assessment.”
  • The Sleep Experiment
    • Beautiful virtual choir on YouTube
  • Rough Type: Nicholas Carr’s Blog: Does my tweet look fat?
    • “…it becomes kind of annoying when somebody actually uses the full 140 characters. Jeez, I’m going to skip that tweet. It’s too long.

      The same thing has happened, of course, with texting. Who sends a 160-character text? A 160-character text would feel downright Homeric. And that’s what a 140-character tweet is starting to feel like, too.

      I think our alphabetic system of writing may be doomed. It doesn’t work well with realtime communication. That’s why people are forced to use all sorts of abbreviations and symbols – the alphabet’s just too damn slow. In the end, I bet we move back to a purely hieroglyphic system of writing, with the number of available symbols limited to what can fit onto a smartphone keypad. Honestly, I think that communicating effectively in realtime requires no more than 25 or 30 units of meaning. “

  • To science!
    • Cheers!
  • The original quasar paper: 3C 273 : A Star-Like Object with Large Red-Shift : Nature
    • 3C 273 is approximately 100 times brighter than our Milky Way; it’s probably about the size of our solar system. It’s 2 billion light years away, and can be seen in a good optical amateur telescope. Other quasars had been seen earlier, but this was the paper that nailed what strange objects they are.
  • NYT’s Keller: “What you can do with less, is less” » Nieman Journalism Lab
    • Lengthy, fascinating remarks from the editor of the New York Times, Bill Keller.
  • The problem with data-driven science
    • “However, data-driven science becomes more messy, methodologically and conceptually, when generation and testing of hypotheses are both based on the same, enormous data sets, and when the hypotheses to be tested are products of an automated search for patterns. Thousand-to-one odds in favor of a hypothesis (based on the usual kind of analysis) don’t mean much when a million hypotheses were screened to find it — but the evidence is the same, so what is the problem?

      In other words, What is so special about starting with a human-generated hypothesis? Bayesian methods suggest what I think is the right answer: To get from probabilistic evidence to the probability of something requires combining the evidence with a prior expectation, a “prior probability”, and human hypothesis generation enables this requirement to be ignored with considerable practical success.”

  • An interview with Alain Connes (pdf)
    • Connes on the future of mathematics, the value of freedom in mathematical research, and much else.

Click here for all of my del.icio.us bookmarks.

Biweekly links for 11/06/2009

  • Lo and Behold: the Internet
    • “Forty years ago today, a team led by Leonard Kleinrock typed the “Lo” of “Login” into a Stanford computer, which promptly crashed before the command could be entered. But because Kleinrock’s team was sending this message from a UCLA machine, he had just taken part in one of the great milestones in communication history.”
  • iGEM 2009: In the thick of it. – synthesis
    • Rob Carlson on current progress in synthetic biology.
  • The Public Terabyte Dataset project « Elastic Web Mining | Bixolabs
    • “This is a high quality crawl of top web sites, using AWS’s Elastic Map Reduce, Concurrent’s Cascading workflow API, and Bixolab’s elastic web mining platform.

      Hosting for the resulting dataset will be provided by Amazon in S3, and freely available to all EC2 users.

      In addition, the code used to create and process the dataset will be available for download”

Click here for all of my del.icio.us bookmarks.

Biweekly links for 11/02/2009

  • Charlie’s Diary: How habitable is the Earth?
    • Not very, according to this very interesting article by Charlie Stross.
  • Google Search Guru Singhal: We Will Try Outlandish Ideas – BusinessWeek
    • Informative interview with the head of Google’s search ranking team. They run 6000 experiments per year, make about 500 changes per year to how search ranking works. Academic stemming algorithms don’t really work for Google. They have a very low-friction system for running tests, making it very easy to deploy the large amounts of infrastructure needed to run the tests.
  • The Prime lexicon
    • English words that are prime when interpreted as base 36 numbers – “Animation” is prime, for example. Sentences made up with primes would be fun, or even a book.

Click here for all of my del.icio.us bookmarks.