  • Miniature Pearl
    • Cosma Shalizi on Judea Pearl on causality. Pearl is one of the leading figures in understanding causal inference. When I hear the old line that “correlation doesn’t imply causation” a little voice inside always ask “so what _exactly_ does imply causation?” Pearl seems to understand the answer to that question as well as anyone.

There is no single future for scientific journals

A question I sometimes hear which I find odd is “What’s the future of scientific journals?” Often – not always, but often – underlying the question is a presumption that there is a single future for journals. The point of view seems to be that we’ve had journals in the past, and now we have this interesting new medium – the internet – so the big question is how journals are going to evolve, or (if slightly more ambitious) what we’re going to replace them with?

This seems to me a peculiar point of view. The origin of the point of view seems to be the fact that paper is a static, relatively inflexible medium. There’s only a limited number of things you can do with paper and a printing press, so scientific publishing to date has ended up concentrated in just a few forms (journals, monographs, textbooks, and a few others). This monolithic character leads to a presumption that scientific communication will continue to evolve in a monolithic way.

The problem with this point of view is that computers and the network are extraordinarily flexible. If you believe AI enthusiasts, computers will eventually end up smarter than us, along pretty much every axis. Imagine a medium that’s smarter, more flexible, and faster than us. What could it be used to do? Of course, the dreams of the AI enthusiasts are quite some ways off. But even now, the internet is an extraordinarly flexible medium. Paper can’t even begin to compare: we’re talking about a single medium that supports World of Warcraft, Intellipedia (collaborative data sharing for spooks), and flash mobs for pillow fighters. We’re not going to have a single future for scientific journals; asking what THE scientific journal of the future will be makes no more sense than asking a programmer what THE program of the future will be. What we will have instead is an increasing number of ways of sharing scientific information, and, in many cases, of doing science. We’re seeing signs of this fragmentation already, from video journals to slide sharing services to all sorts of databases.

There will, of course, be some concentration in particular formats and platforms. Network effects in science are strong – we don’t make discoveries alone, we make them as part of a larger culture of discovery! – and this will drive the broad adoption of shared platforms (and, for that matter, of open standards). But there’s no reason at all to think that there will be just a single platform or standard, not when there’s so much to be gained from multiple approaches.

I should make it clear that I think journals will play a role in all of this. There’s a great deal to be said for having a narrative to explain a new discovery. But we should expect a gradual proliferation in formats and platforms, and (inevitably) for conventional journal articles to recede to be just one of many ways new science is communicated. If that doesn’t happen, then we’re failing to take proper advantage of this new medium. This is what I think successful scientific publishers will do in the future. They’ll be the ones who create the platforms and standards scientists use to communicate science, and, in many cases, to actually do science. But scientific journals don’t have a single future.

  • Marginal Revolution: How should economists integrate their personal and professional lives?
    • “In many ways the core of blogging is a willingness to apply what you know to every problem you encounter, and see how good a job you can do of it in a more or less integrated fashion.”
  • Who Will Determine Who Pays for Equality in Health Care?
    • “Imagine that someone invented a pill… the Dorian Gray pill, after the Oscar Wilde character. Every day that you take the Dorian Gray, you will not die, get sick, or even age…The catch? A year’s supply costs $150,000.

      Anyone who is able to afford this new treatment can live forever. Certainly, Bill Gates can afford it. Most likely, thousands of upper-income Americans…shell out $150,000 a year for immortality.

      Most Americans, however, would not be so lucky. Because the price of these new pills well exceeds average income, it would be impossible to provide them for everyone, even if all the economy’s resources were devoted to producing Dorian Gray tablets.

      So here is the hard question: How should we, as a society, decide who gets the benefits…? Are we going to be health care egalitarians and try to prohibit Bill Gates from using his wealth to outlive Joe Sixpack? Or are we going to learn to live (and die) with vast differences in health outcomes? Is there a middle way?”

  • Amir Ban on Deep Junior « Combinatorics and more
    • Nice short history of computer chess.
  • Is the Internet melting our brains? | Salon Books
    • Apparently not. Who knew?
  • Astronomy Picture of the Day Aug 4 2009: Discussion
    • “In my opinion, your image also highlighted a relatively new variation of human collective intelligence. APOD is not only a picture web site — its readership define perhaps the most collectively intelligent group of sky enthusiasts in history in terms of identifying sky phenomena. The debate that took place over your image — and several other images as well — was amazing. In my opinion, this power intelligence engine zeroed in on the right answer. And your image has helped measure and calibrate this intelligence. In the future, I hope to write a paper about the powerful collective intelligence that APOD has become, and I hope to use your image — given your permission — as one example.”
  • Greg Kroah Hartman on the Linux Kernel
    • Amazing talk by Greg Kroah Hartman on the development process for the Linux Kernel. The rate of change is unbelievable – thousands of lines of code per day, many commits per hour. Loads of details about the technical and social process. All sorts of fault-tolerance in the social process: if someone disappears, the process still grinds on, and produces a reliable product. Well worth watching.
  • Style Guides for Google-originated Open-source Projects
  • A League ladder of PSI openness? | Government 2.0 Taskforce
    • “Why doesn’t Google report on governments’ preparedness to release data. It could produce a methodology and apply it consistently.” Could also be done by a not-for-profit, in a similar way to the reports issued by, e.g., Human Rights Watch, on human rights around the world.
  • Tetris effect – Wikipedia, the free encyclopedia
    • “People who play Tetris for a prolonged amount of time may then find themselves thinking about ways different shapes in the real world can fit together, such as the boxes on a supermarket shelf or the buildings on a street.[1] In this sense, the Tetris effect is a form of habit. They might also see images of falling Tetris shapes at the edges of their visual fields or when they close their eyes.[1] In this sense, the Tetris effect is a form of hallucination. They might also dream about falling Tetris shapes when drifting off to sleep.[2] In this sense, the Tetris effect is a form of hypnagogic imagery.”
  • World’s best Tetris player
    • This guy is to Tetris what Tiger Woods is to golf. Skip to 4:40 and watch the pieces go invisible.
  • Science fiction: The stories of now – 16 September 2009 – New Scientist
    • A letter from Virginia Woolf to Olad Stapledon about “Star Maker”: “Dear Mr. Stapledon,

      I would have thanked you for your book before, but I have been very busy and have only just had time to read it. I don’t suppose that I have understood more than a small part – all the same I have understood enough to be greatly interested, and elated too, since sometimes it seems to me that you are grasping ideas that I have tried to express, much more fumblingly, in fiction. But you have gone much further and I can’t help envying you – as one does those who reach what one has aimed at.

      Many thanks for giving me a copy,

      yours sincerely,

      Virginia Woolf”

  • Terry Tao: A speech for the American Academy of Arts and Sciences
    • Terry Tao on how the internet is changing science, especially mathematics.

  • Bin Laden’s Reading List for Americans – The Lede Blog –
    • “While Oprah’s seal of approval on a book cover is sought after in America, Osama Bin Laden’s is, to put it mildly, not. On Monday, the authors of three books apparently recommended to American readers by the leader of Al Qaeda in his latest communique might be wondering how one goes about returning an unsolicited endorsement to a shadowy militant who has been in hiding for eight years. “
  • Giles Bowkett: There’s No Such Thing As A Good Client
    • “You don’t want to be in the position of having an idiot boss, quitting your job, working for yourself, and discovering that your new boss is an even bigger idiot.”
  • New paper on “Goal Oriented Communication” « Algorithmic Game Theory
    • “An intriguing paper titled “A Theory of Goal-Oriented Communication” by Oded Goldreich, Brendan Juba, and Madhu Sudan has recently been uploaded to the ECCC, expanding a line of work started by the last two authors here and here. The basic issue studied is how is it possible to effectively communicate without agreeing on a language in advance. The basic result obtained is that, as long as the parties can “sense” whether some progress is made toward their goals, prior agreement about a language is not necessary and a “universal” protocol exists. My nerdier side cannot help but thinking about the application to communicating with an alien species (which I bet the authors did not mention on purpose.)”
  • Post-Medium Publishing
    • Excellent essay on the future of publishing, by Paul Graham.
  • Possible future Polymath projects « Gowers’s Weblog
  • Douglas Adams: How to Stop Worrying and Learn to Love the Internet
    • “people complain that there’s a lot of rubbish online…or that you can’t necessarily trust what you read on the web. Imagine [applying] any of those criticisms to what you hear on the telephone. Of course you can’t ‘trust’ what people tell you on the web anymore than you can ‘trust’ what people tell you on megaphones, postcards or in restaurants… For some batty reason we turn off this natural scepticism when we see things in any medium which require a lot of work or resources to work in, or in which we can’t easily answer back – like newspapers, television or granite. Hence ‘carved in stone.’ What should concern us is not that we can’t take what we read on the internet on trust – of course you can’t, it’s just people talking – but that we ever got into the dangerous habit of believing what we read in the newspapers or saw on the TV… One of the most important things you learn from the internet is that there is no ‘them’ out there. It’s just an awful lot of ‘us’.”
  • [0909.2925] Galaxy Zoo: Exploring the Motivations of Citizen Science Volunteers
    • “The Galaxy Zoo citizen science website invites anyone with an Internet connection to participate in research by classifying galaxies from the Sloan Digital Sky Survey. As of April 2009, more than 200,000 volunteers had made more than 100 million galaxy classifications. In this paper, we present results of a pilot study into the motivations and demographics of Galaxy Zoo volunteers, and define a technique to determine motivations from free responses that can be used in larger multiple-choice surveys with similar populations. Our categories form the basis for a future survey, with the goal of determining the prevalence of each motivation. “
  • Are Your Friends Making You Fat? –
    • Fascinating discussion of correlations in social networks.
  • Dean Karnazes – Wikipedia, the free encyclopedia
    • “After [running 50 marathons in 50 days], Karnazes decided to run home to San Francisco from New York City.”
  • The Billion Dollar Gram | Information Is Beautiful
    • Nice visualization of the amounts of money required to do different things.
  • Theory Has Bet On P=NP « Gödel’s Lost Letter and P=NP
    • Thoughtful post questioning the conventional wisdom that P is not equal to NP, and the wisdom of completely ignoring the possibility that P = NP.
  • Galaxy Zoo Blog » She’s an Astronomer: Kate Land
    • Lovely quote from one of the Zoo-builders, Kate Land: “The popularity of the site was absolutely heart-warming. I used to get quite emotional reading emails and posts on the forum from zooites who loved the project and were wild about astronomy. So much of an academic’s work can be remote, abstract, and cut off from the ‘real-world’. And it was just brilliant to work on something that touched so many people.”
  • PLoS Biology: Real Lives and White Lies in the Funding of Scientific Research
    • “The peculiar demands of our granting system have favoured an upper class of skilled scientists who know how to raise money for a big group… They have mastered a glass bead game that rewards not only quality and honesty, but also salesmanship and networking.” I agree with much in this article. Some years back I constructed a list of papers I especially admired, and was surprised to discover that with only a few exceptions they were produced from unfunded research. This was sobering, since it suggest that receiving research grants was (at least according to my judgement of scientific quality) anticorrelated with doing work of the highest quality. Grants seem to be good at sustaining an established area, but not very good at all at producing the conceptual innovations that start new subfields.
  • RSS never blocks you or goes down: why social networks need to be decentralized – O’Reilly Radar
    • Broad survey of peer-to-peer services.
  • Fotopedia: Images for Humanity
    • Collaborative photographic encylopedia, with generous licensing.

  • Paul Buchheit: Evaluating risk and opportunity (as a human)
    • Excellent post from Paul Bucheit, essentially pointing out that (a) it’s often too damn difficult to figure out expected returns from a course of action; (b) we often can get some picture of the tails of the distribution (what’s the best that can happen, what’s the worst); and (c) in iterated situations, we often care a lot more about the tails, anyway. This is an especially valuable heuristic in situations with limited downside. An example that comes to mind is hiring: there is very limited downside in approaching (potential) superstars, and you’re in an iterated situation, so you may as well swing for the fences. Yet most people think about expected value – “he/she would never want to work here”, and so confine themselves to the middle of the Bell curve.
  • I Will Not Read Your Fucking Script
  • The Bio-Economist
    • Survey of the cost of gene sequencing, synthesis etc.
  • Would You Bet Your Life? « Gödel’s Lost Letter and P=NP
    • Richard Lipton on how sure people really are that P is not equal to NP. As in, willing to bet money at high odds.
  • Galaxy Zoo Blog » The Hyper-Velocity Stars Project: Serendipity at its Best
    • Hyper-Velocity Stars are stars moving at very high speeds – typically a percent or more the velocity of light – relative to other local stars. This is the story of how the Galaxy Zoo Hyper-Velocity star project started small, and then snowballed, with more and more people getting involved.
  • The Canon of Medicine – Wikipedia, the free encyclopedia
    • Remarkable 1025 text by Ibn Sina, apparently describing randomized controlled trials, risk factor analysis, and an awe-inspiring range of treatments, diseases, symptoms, methods of surgery and so on.
  • The New Atlantis: Francis Bacon (1627)
    • Fascinating throughout: “We have three that try new experiments, such as themselves think good. These we call Pioneers or Miners.

      “We have three that draw the experiments of the former four into titles and tables, to give the better light for the drawing of observations and axioms out of them. These we call Compilers.

      “We have three that bend themselves, looking into the experiments of their fellows, and cast about how to draw out of them things of use and practise for man’s life, and knowledge… These we call Dowry-men or Benefactors.

      “Then… we have three that take care… to direct new experiments, of a higher light, more penetrating into nature than the former. These we call Lamps.

      “We have three others that do execute the experiments so directed, and report them. These we call Inoculators.

      “Lastly, we have three that raise the former discoveries by experiments into greater observations, axioms, and aphorisms. These we call Interpreters of Nature.”

  • …My heart’s in Accra » Steven Downes, Anders Sandberg on Cloud Intelligence
  • Anders’ Mad Scientist Page
    • Awesome set of links (many gone, but the titles still amuse): “This page is dedicated to all Seekers of Truth, regardless of how warped the truth may be.”
  • AIP UniPHY
    • Social networking site, aimed at physical scientists.
  • Cathemeral Thinking: What is a magazine?
    • Discussion of what a magazine is from David Harris, the founder of Symmetry magazine. I think the post makes a mistake in conceiving of “the magazine” as some sort of platonic ideal – it’s just a tiny little corner of the enormous space of possible ways of connecting readers and writers. But thought-provoking nonetheless.
  • How all Nigerians Became Scammers. | OoTheNigerian
    • A thoughtful post on modern stereotypes and the damage they can cause. The tune may change but the song remains the same.
  • Open Learning Initiative
  • Semantic Web-related Research using Wikipedia
    • Very little of this actually uses the semantic web in any serious way, but it’s still an interesting list of papers. Lots of articles on automated extraction of information, clustering, topic extraction, recommendation systems, and so on.
  • Eric Schadt – Enlisting Computers to Unravel the True Complexity of Disease – Biography –
    • New York Times profile of Eric Schadt, and open approaches to innovation in biology. Interesting, although it would have been a lot better with more concrete detail about open innovation.
  • Paul Krugman: How Did Economists Get It So Wrong?
    • Krugman’s version of economic history. I found it informative and stimulating, even if oversimplified in places. Lurking in the background is the question of what it means to understand a phenomenon. The most obvious candidate is the ability to make predictions, but this seems to me to be neither necessary nor sufficient. It’s bothersome that sometimes knowing more actually leads one to make worse predictions.
  • The Open Dinosaur Project
    • An open invitation for people to help construct a database of skeletal measurements for ornithischian dinosaurs. Anyone can help out – they’re trying to do a comprehensive literature survey.
  • Market Design: Federal Judges Law Clerk Hiring
    • Fascinating summary of work on “cheating” (i.e., not obeying prevailing norms) in a market, in this case the hiring of clerks for Federal Judges in the US, as well as many interesting links to other work on the functioning of that market.
  • Nascent: Andrew Savikas visits Nature
    • Timo Hannay’s (head of notes on Andrew Savikas’ (O’Reilly media) talk at Nature. Many fascinating facts: O’Reilly ebooks outsell print by 2:1; ebook sales doubling every 18 months for last 5 years; “free” is much more complicated than you might think; price discrimination as a useful strategy (technically, this is illegal in the US, for reasons I don’t quite get, although there are easy ways around it); nice analogy to the first TV programs being just like radio.
  • The Trouble with Nonprofits (Aaron Swartz’s Raw Thought)
    • I thought this was interesting, and probably contained a kernel of truth: “What distinguishes people who are great at what they do from those who are just mediocre? The answer, it seems, is feedback.” Swartz gives as examples playing chess (rapid incontrovertible feedback) versus making political predictions (slow, vague feedback, easy to discount or ignore). I suspect that what’s going on in the political pundit case is a different kind of feedback, one not based on how correct the pundit is, but rather based on more superficial traits which make a person seem impressive. I wonder to what extent it’s possible to manufacture (and stick to) feedback methods for one’s work?

  • …My heart’s in Accra » Xiao Qiang and Evgeny Morozov with dueling views of digital activism
    • “Evgeny Morozov offers… a healthy dose of skepticism about the possibility of digital activists changing the world via Facebook and Twitter. He begins with the story of Anders Colding-Jørgensen, a Danish psychologist who created a Facebook activism group to protest the dismantling of Stork Fountain in Copenhagen. Of course, the government wasn’t actually planning on dismantling the fountain, a national symbol. But his Facebook group implied that the fountain was under threat, and from his initial 100 invitations to the group, there were 27,500 members of a Facebook group demanding the fountain be saved within three days. At the peak, two people were joining per minute – Jorgensen decided to end the experiment shortly afterwards. (Amusingly enough, there are still more than 26,000 members, even though the fiction as been well exposed.)”
  • Cool Tools: The United States Constitution
    • “The US Constitution is one of our most remarkable inventions of all time. A lot of people in other countries think so too. It is a robust self-correcting legal OS. But it was written in an arcane code long ago. To make any sense from it you need some help.

      This lively graphic novel adaptation of the Constitution is by far the best aid I’ve found to deciphering its code. It is the comic book version, but rather than dumbing it down, it smartens it up. The graphic novel goes through the Constitution article by article, and explains what each bit means, why it is there, and how it came to be. Like the Bible, the Constitution doesn’t say what you thought it did. I was surprised what was not there as well as what was. I learned tons from this annotation, despite studying it in high school. It renewed my respect for it, and in a way, also makes clear its limitation. I feel I can be a slightly better citizen. Best of all, this book does all that with pictures, which makes it a page-turner.”

  • Wine, Physics, and Song
    • Howard Barnum’s blog. True to the promise of the name, he does indeed cover wine, physics and song, with wine currently having a slight edge over physics, and song well behind. Economics occasionally sneaks in.
  • ongoing: interview with Ravelry’s Casey Jones
    • Interview with Casey Jones, the main developer on Ravelry, the amazingly successful site for knitters and crocheters.
  • EtherPad Blog: Saving is Obsolete
    • Very cool: move to any point in the history of a document: “Have you ever forgot to hit “save” and lost work? Ever wished you could go back to an earlier version of a document to see how the document evolved?

      Now you can. EtherPad keeps track of all your typing in realtime. With our new Time-Slider, you can browse the complete history of a document using a familiar user interface.”

  • Story Time : Common Knowledge
    • Excellent thoughtful piece from John Wilbanks. Takeaway for me: scientific results need a narrative explanation, so humans can understand them, and a structured machine readable explanation, so computers can understand them. Who will provide the latter? John points to publishers. I’m doubtful. I wonder whether it can’t be baked into the paper preparation process, the same way blogging platforms like wordpress bake machine readable metadata automatically into RSS feeds. Tough problem, still.
  • Finding and Fixing Errors in Google’s Book Catalog | Freedom to Tinker
    • “There was a fascinating exchange about errors in Google’s book catalog over at the Language Log recently. We rarely see such an open and constructive discussion of errors in large data sets, so this is an unusual opportunity to learn about how errors arise and what can be done about them… What’s most interesting to me is a seeming difference in mindset between critics like Nunberg on the one hand, and Google on the other. Nunberg thinks of Google’s metadata catalog as a fixed product that has some (unfortunately large) number of errors, whereas Google sees the catalog as a work in progress, subject to continual improvement. Even calling Google’s metadata a “catalog” seems to connote a level of completion and immutability that Google might not assert. An electronic “card catalog” can change every day — a good thing if the changes are strict improvements such as error fixes — in a way that a traditional card catalog wouldn’t.”
  • “Open Access” Journals are Advertising « Algorithmic Game Theory
    • Noam Nisan with some thoughtful concerns about author pays open access. Caveats are necessary (see my comment at the post), but the concerns are worth thinking about.
  • Howard Rheingold : Mindful Infotention: Dashboards, Radars, Filters
    • “Knowing what to pay attention to is a cognitive skill that steers and focuses the technical knowledge of how to find information worth your attention. More and more, knowing where to direct your attention involves a third element, together with your own attentional discipline and use of online power tools – other people. Increasingly, most of the recommendations that make it possible to find fresh and useful signals amid the overwhelming noise of the Internet are social media – online networks that make possible social exchange and relationship. Tuning and feeding our personal learning networks is where the internal and the technological meet the social. “
  • Copenhagen’s Living Library
    • Borrow a human from Copenhagen’s Living Library.
  • Second Skin | Savage Minds
    • “So I just watched Second Skin, a documentary—as far as I know, the only documentary—which focuses squarely on the lives of on-line game players.”
  • Stephen Fry: In search of the planet’s most endangered species | Environment | The Guardian
    • “Are the animals worth saving because they hold an important place in the great interconnected web of existence? Are they worth saving because they might one day yield important clues and compounds to help us with medicine or some other useful technology? Or are they worth saving because they are the beautiful achievement of millions of years of natural selection? Extinction is a natural part of creation, this is unquestionably true: yet no matter what one’s views on climate change or global warming, it is impossible, impossible, to deny that man-made alterations to habitat are threatening thousands of plant and animal species across the planet at an unprecedented rate and scale. So the question is perhaps not “Why should we save them?” but “What right do we have to destroy them?””
  • Exploring dangerous neighborhoods: Latent Semantic Analysis and computing beyond the bounds of the familiar
    • Using data mining to evaluate whether a psychiatric patient poses a danger to themselves or others.
  • Do Bugreporters Become Better Over Time? « NetworkLabs
    • Fascinating study of bug reporting patterns for Mozilla. Takeaways: (1) people rapidly improve the quality of their bug reports; (2) there appears to be a sizeable difference in quality between the bug reports of newbies and experienced developers; and (3) there are a small minority of people who have submitted lots of bugs, but who don’t seem to be any better than the newbies.
  • Intervening in the life cycles of scientific knowledge – Don Swanson
    • “the fragmentation that inevitably accompanies the growth of science has created an altogether different set of problems–as well as opportunities. Interrelationships among the fragments, unnoticed because of the insularity of specialties, have been shown to harbor previously unknown solutions to authentic scientific problems, and so to hold a potential for rejuvenating knowledge that might otherwise be considered obsolete. The invisible growth of relatedness probably follows a combinatorial law and so may far exceed even the explosive growth rates that have characterized both the scientific community and the mountains of print it produces. “
  • A Protocol for Packet Network Intercommunication (Cerf and Kahn, pdf)
    • You’re reading these words over the protocol described in this paper.
  • Galaxy Zoo Blog » A Galaxy Zoo – WorldWide Telescope Mashup
    • “Have you ever found yourself staring proudly at the collection of beautiful and exotic galaxies that fill your favourites list? Have you ever wanted to share these objects with a friend or loved one and realized there was just no easy way to do it? Sure, you can click on the image, delve into SkyServer, and copy and paste one image at a time into an email, but… That gets kind of tedious pretty quickly, and if your favourites list is like mine, it’s not a 5-minute copy and paste kind of task.

      Well, now there is an easier way to inflict your favourites on others.”

  • What Kate Saw in Silicon Valley
    • Summary: startups fail, even ones from famous founders (customers don’t care how famous the founder is); startups completely change what they’re doing, on short timescales; it costs almost nothing to start a startup; the founders are scrappy (an asset not appreciated by most of society); founders are not obviously trying to stand out (see above: customers don’t care how impressive the founder seems); founders need mentors; starting a startup is a very solitary activity. “By inverting this list, we can get a portrait of the “normal” world. It’s populated by people who talk a lot with one another as they work slowly but harmoniously on conservative, expensive projects whose destinations are decided in advance, and who carefully adjust their manner to reflect their position in the hierarchy.”
  • A Computational View of Market Efficiency
    • “We propose to study market efficiency from a computational viewpoint. Borrowing from theoretical computer science, we define a market to be efficient with respect to resources S (e.g., time, memory) if no strategy using resources S can make a profit. As a first step, we consider memory-m strategies whose action at time t depends only on the m previous observations at times t-m,…,t-1. We introduce and study a simple model of market evolution, where strategies impact the market by their decision to buy or sell. We show that the effect of optimal strategies using memory m can lead to “market conditions” that were not present initially, such as (1) market bubbles and (2) the possibility for a strategy using memory m’ > m to make a bigger profit than was initially possible. We suggest ours as a framework to rationalize the technological arms race of quantitative trading firms. “
  • Explicit semantic analysis (pdf)
    • Very interesting paper on extracting concepts from a very large corpus of data (e.g., Wikipedia), using ideas based on latent semantic analysis. The rough idea seems to be to do a singular value decomposition (SVD) of the word-frequency matrix, and then to truncate the SVD to a much smaller “concept space”. The isometries appearing in the SVD can then be used to define a rotation into concept space. These rotations can then be used to compare general phrases, e.g., “the author wrote a blog post” versus “the essayist penned an essay”, seeing how closely they overlap in concept space. For more details, see the paper. I wonder how much the results would be improved by starting with a larger corpus (e.g., Google’s cache of the web).
  • The Semantic Web – my personal (unofficial) FAQ: James Hendler
  • Facebook’s Religion Question Prompts Soul-Searching
    • Facebook gives people a free-form text box to describe their religion. Asking such a personal question gives some surprising answers. My favourite was probably the woman who summed up both her Catholicism and her difficulties with Catholicism by describing her religion as “Matthew 25”. “Jedi” comes in at number 10.

Finding Primes: A Fun Subproblem

The ongoing open mathematics project finding primes aims to find a deterministic algorithm to efficiently generate [tex]k[/tex]-digit primes. The fastest known algorithm seems to be a method of Odlyzko which generates a [tex]k[/tex]-digit prime in time [tex]O(10^{k/2})[/tex]. The people working on the project have made some observations which come tantalizingly close to breaking that barrier. The obstruction is a beautiful little problem that I thought many people might enjoy, and which may well be tractable. If you’re interested in participating in the finding primes project, it might be a good entry point. So if you’ve got a good idea to solve the problem, pitch in and help over at the Polymath blog. But please be polite: read some background first, and take a look at some of the research threads to get a feel for how things work, and what’s already known.

One small caveat: the argument that follows is not in any way mine, it’s all the work of other people! A recent comment thread on this argument starts here.

Let [tex]\pi(x)[/tex] denote the number of primes less than or equal to [tex]x[/tex]. [tex]\pi(x)[/tex] has a lot of structure, and there’s a surprising amount that can be said about it. In particular, the people working on finding primes have figured out a clever way of computing the parity of [tex]\pi(x)[/tex] in time about [tex]x^{5/11+o(1)}[/tex].

Suppose you can find two [tex]k[/tex]-digit numbers [tex]x[/tex] and [tex]y[/tex] such that the parity of [tex]\pi(x)[/tex] and [tex]\pi(y)[/tex] are different. Set [tex]z = \lfloor (x+y)/2 \rfloor[/tex], i.e., take [tex]z[/tex] to be the midpoint between [tex]x[/tex] and [tex]y[/tex]. Compute the parity of [tex]z[/tex]. It must have either a different parity to [tex]x[/tex] or a different parity to [tex]y[/tex]. Repeating this procedure [tex]O(k)[/tex] times, we can use a binary search to find adjacent [tex]k[/tex]-digit numbers [tex]p-1[/tex] and [tex]p[/tex] such that [tex]\pi(p-1)[/tex] and [tex]\pi(p)[/tex] have different parity. We conclude that [tex]p[/tex] must be prime. That takes time [tex]O(10^{5/11 k + o(1)})[/tex], and so breaks the barrier set by Odlyzko’s method.

What’s the problem with this algorithm? The problem is finding the initial [tex]k[/tex]-digit numbers [tex]x[/tex] and [tex]y[/tex] such that [tex]\pi(x)[/tex] and [tex]\pi(y)[/tex] have different parity. It would surprise me a great deal if this weren’t possible, but it’s not obvious (at least to me) how to do it quickly. Is there a fast way of doing this?