Open source software at centralized servers?

Does anyone know of examples of open source software projects which are developing software that is run on large centralized servers? I can think of one example off the top of my head – Second Life – but can’t think of any others.

(I am, of course, asking for a reason – I’m interested in whether open source might be a viable development model for tools for scientific collaboration and publication.)

My impression at the moment is that there are few centralized web services which are open source. I can think of a couple of natural reasons why this might be the case.

First are security issues. In any software development one needs to be sure the programmers are honest, and not slipping back doors into the code, or making unethical use of the database. This is potentially harder to control in an open source software project.

Second, although the software may in some sense be owned by the wider community, it does not necessarily follow that the server is owned by the wider community. The group that owns the servers has a much greater incentive to contribute, and other people less so, which lessens the advantages to be had by open sourcing the project.

Are there any reasons I’m missing? Centralized services other than Second Life which are open source?

Published

Scientific communication in the 21st century

By guest blogger Peter Rohde

In the last year the number of papers I have fully read can easily be counted on your hands. For the larger part I only read abstracts. Why is this? Because for most academic works I’m not especially interested in the details of calculations or the nitty gritty fine points of results. That’s something I’ll refer back to when/should I need it. For the larger part I’m only interested in understanding what it is that’s been done, what approaches were used to obtain the results, and what the remaining unanswered questions are. Typically these things can be characterized much more compactly than via a full scientific paper.

Aside from reading abstracts I gain much of my knowledge by speaking to people. This is a particularly useful way of learning for two reasons. Firstly, it is efficient, unlike verbose papers, and secondly it is interactive. If a particular point is not clear to me, I can grill for more detail. So, for the larger part, verbose scientific papers are far less useful to me than are their abstracts or talking to other people. Both of these points concur with the suggestions made in Robin Blume-Kohout’s contribution to this blog, where he advocates the “choose you own adventure” or hierarchically structured model. Evidently, speaking to other people is an example of this model – we prefer the terse over the verbose, with elaborations only when required. In such a structure, as I would envisage it, the abstract would be the root node of a tree. It would summarize the paper in a condensed, but completely self contained way – a micro-publication in very compact form. Each of the components in the abstract could be folded out to reveal further underlying details. This way the content is tailored to every reader. It means that I can continue doing what I normally do – only reading abstracts – with the bonus that if a particular aspect of the abstract is of interest to me, I can delve into it a little further without requiring me to read the entire paper.

This type of scientific communication lends itself exclusively to online publication. Indeed electronic media provides a plethora of new ways to structure and modularize information. Despite this, scientific publication has been stuck in a time warp where the archaic form of publication has been preserved. Essentially, present day electronic publications are structured and organized in exactly the same way as printed publications were 50 years ago, the only difference being that an LCD replaces paper. This is a sad misuse of resources.

Almost every other aspect of e-society has adopted, to some extent, the ideas advocated here and by Robin. The Wikipedia is the obvious, and perhaps most sophisticated example of this. Here every point in every article cross-references to other articles, creating a highly modularized and hierarchical structure. There are also less obvious examples. These days I never purchase newspapers, and it’s not an issue of saving money, it’s an issue of structural design. If I go to any major online news source, I’m presented with a very elegantly structured, hyperlinked front page. At the top of the page are all the headlines, each with a single line summary. Below this are divisions for international news, politics, technology, science etc, each with their own headlines and single line summary. In principle I could read just the front page and have a pretty good idea of what’s going on in the world and if I want more detail I can follow the links. This is much more efficient than the style adopted by many conventional newspapers of having one main story on the front page in addition to a few other headlines crammed at the bottom of the page, and all the rest jammed into separate pullouts.

Another area where the e-world is a step ahead of the paper world is in creating awareness of content. In present day scientific communication awareness of articles is created via two primary means. The first is by speaking with fellow scientists who draw our attention to articles that interested them. The second is by stumbling across things by oneself, for example, by reading the daily arXiv feeds. The trouble is that nowadays there is so much throughput that it becomes increasingly difficult to keep track of it all. A good analogy is the internet itself. Clearly the amount of material becoming available online is impossibly large to manage oneself. So to increase awareness of things that are of general interest, sites such as Slashdot, reddit and Digg have emerged. All these sites use some voting mechanism to create a list of pages that are of most interest to the online community. I think it is rapidly reaching the point where coping with the massive quantity of scientific communication will necessitate these kinds of approaches.

Another example of awareness creation, which is perhaps more suited to scientific publication, is that of recommendation systems. Some well known examples of recommendation systems are Amazon, iTunes, StumbleUpon and Last.fm. Here users’ preferences for pages/books/music are tracked, but not with the intention of creating a popularity list. Instead the preferences are hidden and only used internally by the service provider, who cross correlates your preferences with other users’ to suggest pages/books/music that might be of interest to you. This approach to discovering material is clearly much more effective than trawling through the immense amount of material out there on my own. Instead I can exploit the fact that others have done it for me.

In summary, the structure of present day scientific communication is inherently archaic. It replaces paper with LCD while taking little advantage of the abundance of possibilities for structuring information. Second, the sheer magnitude of scientific communication necessitates new means for creating awareness of material, using, for example, recommendation systems. While it’s very easy for me to sit here and bawl criticism at the current system, it’s not so straightforward to actually effect a transition to a different model. One route would be to convince a major publisher to adopt some of the aforementioned suggestions, and hope that it’s a success. The other would be set up a new system (e.g. a wiki or the like) and convince a group of reputable scientists to transition to that system. In either case, the success of the pursuit would require a certain critical mass.

Published

Changing fields

After 12 years of work on quantum information and quantum computation, I’ve decided to shift my creative work to a completely new direction.

I’m making this shift because I believe I can contribute more elsewhere.

I became interested in quantum information and computation in 1992, and started working fulltime on it in 1995. When I started it was a tiny little field with a handful of practitioners around the world. Most scientists hadn’t even heard of quantum computers. Those few who had would often use what they’d heard to pour cold water on the idea of ever being able to build one. Now, in 2007 the field is one of the hottest in physics, and many researchers, myself included, believe it is only a matter of time and concentrated effort before a large-scale quantum computer is built.

To me this seems a propitious time to change direction.

The new direction I’ll be working toward is the development of new tools for scientific collaboration and publication. This is a tremendously exciting area, and it’s also one where my skills and interests seem likely to be useful. I’m a beginner in the area, and so for the next few months, I’ll be doing a “reconnaissance in force”, orienting myself, figuring out what I need to learn, where I might be able to make a contribution, and launching some small projects. It ought to be a blast.

Published
Categorized as General

Reinventing scientific papers

By guest blogger Robin Blume-Kohout

In 2005, Slate published twelve essays on “How to reinvent higher education”. The opening paragraphs of one, by Alison Gopnik, still burn in my mind:

I’m a cognitive scientist who is also a university professor. There is a staggering contrast between what I know about learning from the lab and the way I teach in the classroom. … I know that children, and even adults, learn about the everyday world around them in much the way that scientists learn. …Almost none of this happens in the average university classroom, including mine. In lecture classes, the teacher talks and the students write down what the teacher says. In seminars, the students write down what other students say. This is, literally, a medieval form of learning

In short, we are screwing up — and we should know better.

Scientific publishing — the primary means by which we communicate with other scientists — is in the same boat:

  1. We’re doing it badly,
  2. Our methods are medieval,
  3. We should know better.

Technically, point #2 is unfair. Scientific publishing dates from the 1660s, when Proceedings of the Royal Society emerged from Henry Oldenburg‘s voluminous scientific correspondence. If you wanted to show off your research in 1665, you wrote a letter to Henry. When he got it (a month or two later), he forwarded it to someone who could tell him whether it was any good. If the referee liked it, then (after a few more month-long postal delays), Henry read your letter out loud to the Royal Society, and it got recorded in the Proceedings.

These days, it’s quite different. Specifically:

  1. We write letters in LaTeX, and email them,
  2. There are so many journals that nobody reads most of them,
  3. Henry doesn’t read your letter out loud.

The rest of the system is unchanged. This raises a bunch of questions, like “Why does publication take 6 months?”, “Why is it so expensive?”, and “Does anybody read journals, what with the arXiv?” I’m not going to discuss these questions, but if you’re interested, you might try the Wikipedia article on scientific journals. Which is a perfect example of why we should know better.

I’m not talking about the content. I’m talking about the article itself, and how I referenced it — with a hyperlink. I’ve given you incredible power. Quickly and easily, you can:

  • Verify my sources,
  • Find answers to questions I’ve raised — if you’re interested,
  • Get more detailed explanations,
  • Discover and explore related topics.

Enabling you this way is part of the core mission: The purpose of scientific communication is to educate, extensibly and efficiently. Education: After months of research, I publish a paper so that you can learn what I know — without all the hard work. Extensibility: I include proofs, arguments, figures, explanations, and citations — so that you can verify my work and place it in the context of prior work. Efficiency: Writing this way takes more months — but thousands of my colleagues can save months by reading my paper.

We are failing at efficiency, for Wikipedia illustrates a more efficient way of educating — or, if you prefer, a source for more efficient learning. I don’t mean that Wikipedia is The Answer. We need to build a new medium, replacing medieval features with the best features of Wikipedia. For instance,

  • Hypertext revolutionizes scientific writing, by organizing content as a tree instead of a list. Articles and textbooks have a linear structure. To find a specific answer, I have to read (on average) half the text. In a hypertext environment like Wikipedia, I can search through a cluster of ideas for answers — even to questions I haven’t been able to formulate yet. Hyperlinking specifically enables…
  • Choose your own adventure” approaches to a body of work. Scientific papers represent a cluster of related ideas. Different readers, with different background knowledge, will benefit from different paths. A well-structured (and judiciously hyperlinked) electronic text can become the reader’s personalized guide. Parts of several such texts can be combined by a customized path, to form an entirely new text. This requires…
  • Modular content, dividing a text into bite-sized chunks. Modularity also offers intrinsic benefits. One is reusability; a single explanation can be referenced in many contexts. Current scientific writing is necessarily terse. Hyperlinks and modularity allow the text to be larded with optional explanations, which clarify potential confusion without breaking the flow. Modularity also allows alternative approaches, providing the reader with multiple analyses of the same concept. Such alternatives are particularly useful when combined with…
  • Distributed editing by a large community of contributors. This is a vast can of worms that I shan’t open here, but two things are clear. First, a forum for scientific communication cannot adopt Wikipedia’s “anyone can edit” motto. Second, the potential benefits of post-publication editing, combined with an unlimited pool of “editors”, are too great to ignore. Balancing these imperatives is an outstanding challenge, but a relatively uncontroversial technique is…
  • Attached commentary, either critical or explanatory, by readers. Consider, for example, the Talmud, where post-publication analysis (the Gemara) attempts to clarify the original text (the Mishnah). More recently, commenting systems have proliferated on blogs and (with much, much less intellectual rigor) news-sites like Slashdot. In a scientific publishing context, commentary can
    • correct mistakes, either technical or factual, in the original text,
    • provide an alternative to a module that (the reader feels) could be improved,
    • critique and question the original work,
    • update older work in light of new research.

These points are not a prescription. They are a manifesto (“We can do better, see!”), and a plea (“Help make it better!”). Published scientific communications are the collective memory of scientists. If we cannot access it quickly and efficiently, we are effectively brain damaged. Improving our access makes us — quite simply — smarter. All we need to do is to use the computing tools before us intelligently.

We’ve taken first steps — the preprint arXiv, central repositories like PROLA, and online publishing by the likes of Nature. These are baby steps. We’re doing the same old thing a little better with new technology. Sooner or later, scientific communication is going to be restructured to really take advantage of what we can do now… and it’s going to make us (collectively) a lot smarter.

I can’t wait.

Published
Categorized as General

How to write consistently boring scientific literature

How to write consistently boring scientific literature, by Kaj Sand-Jensen

Although scientists typically insist that their research is very exciting and adventurous when they talk to laymen and prospective students, the allure of this enthusiasm is too often lost in the predictable, stilted structure and language of their scientific publications. I present here, a top-10 list of recommendations for how to write consistently boring scientific publications. I then discuss why we should and how we could make these contributions more accessible and exciting.

Sadly, this is hidden behind a publisher pay wall. I particularly enjoyed the opening quote:

“Hell – is sitting on a hot stone reading your own scientific publications”
– Erik Ursin, fish biologist

Published
Categorized as General

Non-abelian money

What would happen if we replaced the current monetary system, which is based on an abelian group [*] by a non-abelian currency system?

[*] If someone gives you x dollars, then y dollars, the result is the same as if you were given y dollars first, then x.

I’ve been puzzling about this for a few years. It raises lots of big questions. How would markets function differently? Might this lead to more efficient allocation of resources, at least in some instances? (At the very least, it’d completely change our notion of what it means to wealthy!) Might new forms of co-operation emerge? How would results in game theory change if we could use non-abelian payoffs?

More generally, it seems like this sort of idea might be used to look at all of economics through an interesting lens.

A nice toy model in this vein is to work with the group of 2 by 2 invertible matrices, with the group operation being matrix multiplication. By taking matrix logarithms, it can be shown that this model is a generalization of the current monetary system.

Electronic implementation of non-abelian money would be a snap. The social implementation might be a bit tougher, however – convincing people that their net wealth should be a matrix would be a tough sell, at least initially. Still, if non-abelian money changed some key results from economics, then in some niches it may be advantageous to make the switch, and possible to convince people that this is a good idea.

(It should, of course, be noted that there are in practice already many effects which make money act in a somewhat non-abelian fashion, e.g., inflation. From the point of view of this post, these are kludges: I’m talking about changing the underlying abstraction to a new one.)

Published
Categorized as ideas

The standard negative referee report

“The work reported in this paper is obvious, and wrong. Besides, I did it all 5 years ago, anyway.”

(I heard this from my PhD supervisor, Carl Caves, about 10 years ago. At the time, I thought it was funny…)

Published
Categorized as General

Kasparov versus the World

It is the greatest game in the history of chess. The sheer number of ideas, the complexity, and the contribution it has made to chess make it the most important game ever played.
-Garry Kasparov (World Chess Champion) in a Reuters interview conducted during his 1999 game against the World

In 1999, world chess champion Garry Kasparov, widely acknowledged as the greatest player in the history of the game, agreed to participate in a chess match sponsored by Microsoft, playing against “the World”. One move was to be made each 24 hours, with the World’s move being decided by a vote; anyone at all was allowed to vote on the World Team’s next move.

The game was staggering. After 62 moves of innovative chess, in which the balance of the game changed several times, the World Team finally resigned. Kasparov revealed that during the game he often couldn’t tell who was winning and who was losing, and that it wasn’t until after the 51st move that the balance swung decisively in his favour. After the game, Kasparov wrote an entire book about it. He claimed to have expended more energy on this one game than on any other in his career, including world championship games.

What is particularly amazing is that although the World Team had input from some very strong players, none were as strong as Kasparov himself, and the average quality was vastly below Kasparov’s level. Yet, collectively, the World Team produced a game far stronger than one might have expected from any of the individuals contributing, indeed, one of the strongest games ever played in history. Not only did they play Kasparov at his best, but much of the deliberation about World Team strategy and tactics was public, and so accessible to Kasparov, an advantage he used extensively. Imagine that not only are you playing Garry Kasparov at his best, but that you also have to explain in detail to Kasaparov all the thinking that goes into your moves!

How was this remarkable feat achieved?

It is worth noting that another “Grandmaster versus the world” game was played prior to this game, in which Grandmaster and former world champion Anatoly Karpov crushed the World Team. However, Kasparov versus the World used a very different system to co-ordinate the World Team’s efforts. Partially through design, and partially through good luck, this system enabled the World Team to co-ordinate their efforts far better than in the earlier game.

The basic idea used was that anyone in the world could register a vote for their preferred next move. The move taken was whichever garnered the most votes. Microsoft did not release detailed statistics, but claimed that on a typical move more than 5000 people voted. Furthermore, votes came from people at all levels of chess excellence, from chess grandmasters to rank amateurs. On one move, Microsoft reported that 2.4 percent of the votes were cast for moves that were not merely bad, but actually illegal! On other occasions moves regarded as obviously bad by experts obtained up to 10 percent of the vote. Over the course of the match, approximately 50,000 individuals from more than 75 countries participated in the voting.

Critical to the experiment were several co-ordinating devices that enabled the World Team to act more coherently.

An official game forum was set up by Microsoft so that people on the World Team could discuss and co-ordinate their ideas.

Microsoft appointed four official advisors to the World Team. These were outstanding teenage chess players, including two ranked as grandmasters, all amongst the best of their age in the world, although all were of substantially lower caliber than Kasparov. These four advisors agreed to provide advice to the World Team, and to make public recommendations on what move to take next.

In addition these formal avenues of advice, as the game progressed various groups around the world began to offer their own commentary and advice. Particuarly influential, although not always heeded, was the GM school, a strong Russian chess club containing several grandmasters.

Most of these experts ignored the discussion taking place on the game forum, and made no attempt to engage with the vast majority of people making up the World Team, i.e., the people whose votes would actually decide the World’s moves.

However, one of the World Team’s advisors did make an effort to engage the World Team. This was an extraordinary young chess player named Irina Krush. Fifteen years old, Krush had recently become the US Women’s chess champion. Although not as highly rated as two of the other World Team advisors, or as some of the grandmasters offering advice to the World Team, Krush was certainly in the international elite of junior chess players.

Unlike her expert peers, Krush focused considerable time and attention on the World Team’s game forum. Shrugging off flames and personal insults, she worked to extract the best ideas and analysis from the forum, as well as building up a network of strong chess-playing correspondents, including some of the grandmasters now offering advice.

Simultaneously, Krush built a publicly accessible analysis tree, showing possible moves and countermoves, and containing the best arguments and refutations for different lines of play, both from the game forum, and from her correspondence with others, including the GM school. This analysis tree enabled the World Team to focus its attention much more effectively, and served as a reference point for discussion, for further analysis, and for voting.

As the game went on, Krush’s role on the World Team gradually became more and more pivotal, despite the fact that according to their relative rankings, Kasparov would ordinarily have beaten Krush easily, unless he made a major blunder.

Part of the reason for this was the quality of Krush’s play. On move 10, Krush suggested a completely novel move that Kasparov called “A great move, an important contribution to chess”, and which all expert analysts agree blew the game wide open, taking it into uncharted chess territory. This raised her standing with the World Team, and helped her assume a coordinating role. Between moves 10 and 50 Krush’s recommended move was always played by the World Team, even when it disagreed with the recommendations of the other three advisors to the World Team, or with influential commentators such as the GM school.

As a result, some people have commented that the game was really Kasparov versus Krush, and Kasparov himself has claimed that he was really playing Smart Chess, Krush’s management team. Krush has repudiated this point of view, commenting on how important many other people’s input was to her recommendations. It seems likely that a more accurate picture is that Krush was at the center of the co-ordination effort for the World Team, and so had a better sense of the best overall recommendation made by the members of the World Team. Other, ostensibly stronger players weren’t as aware of all these different points of view, and so didn’t make as good decisions about what move to make next.

Krush’s coordinating role brought the best ideas of all contributors into a single coherent whole, weeding out bad moves from the good. As the game went on, much stronger players began to channel their ideas through her, including one of the strongest players from the GM school, Alexander Khalifman. The result was that the World Team emerged stronger than any individual player, indeed, arguably stronger than any player in history with the exception of Kasparov at his absolute peak, and with the advantage of being able to see the World “thinking” out loud as they deliberated the best course of action.

Kasparov versus the World is a fascinating case study in the power of collective collaboration. Most encouragingly for us, Kasparov versus the World provides convincing evidence that large groups of people acting in concert can solve creative problems well beyond the reach of any of them alone.

More practically, Kasparov versus the World suggests the value of providing centralized repositories of information which can serve as reference points for decision making and for the allocation of effort. Krush’s analysis tree was critical to the co-ordination of the World Team. It prevented duplication of effort on the part of the World Team, who didn’t have to chase down lines of play known to be poor, and acted as a reference point for discussion, for further analysis, and for voting.

Finally, Kasparov versus the World suggests the value of facilitators who act to channel community opinion. These people must have the respect of the community, but they need not be the strongest individual contributor. If such facilitators are flexible and responsive (without being submissive), they can co-ordinate and focus community opinion, and so build a whole stronger than any of its parts.

Further reading

This essay is an abridged extract from a book I’m writing about “The Future of Science”. If you’d like to be notified when the book is available, please send a blank email to
the.future.of.science@gmail.com with the subject “subscribe book”. I’ll email you to let you know in advance of publication. I will not use your email address for any other purpose, nor reshare it with anyone else!

Subscribe to my
blog here.

Published
Categorized as General

Links

Konrad Forstner has a very interesting talk on what he sees as the future of scientific communication.

Nature runs a terrific blog, Nascent, which has frequent discussions of the future of science and scientific communication. Most scientific publishers have their head in the sand about the web. Nature, however, is innovating and experimenting in really interesting ways.

A few more: The Coming Revolution in Scholarly Communications & Cyberinfrastructure, an open access collection of articles by people such as Paul Ginsparg (of arxiv.org), Timo Hannay (Nature), Tony Hey (Microsoft), and many others.

An interesting report by Jon Udell on the use of the web for scientific collaboration. It’s a bit dated in some ways, but in other ways remains very fresh.

Kevin Kelly (founding editor of Wired) speculating on the future of science.

The Django Book, which is a nice example of a book (now published, I believe) that was developed in a very open style, with a web-based commenting s used to provide feedback to the authors as the book was written. I thought about doing something similar with my current book, but concluded that I don’t write in a linear enough style to make it feasible.

An article on open source science from the Harvard Business School.

Fullcodepress, a 24-hour event that’s happening in Sydney as I write. It’s a very cool collaborative project, where two teams are competing to build a fully functional website for a non-profit in 24 hours. Similar in concept to the Startup Weekends that are now springing up all over the place. What, exactly, can a group of human beings achieve when they come together and co-operate really intensively for 24 or 48 hours? Surprisingly much, seems to be the answer.

A thoughtful essay on the problems associated with all the social data people are now putting on the web. Starts from the (common) observation that it would be a lot more useful if it were more publicly available rather than locked up in places like Flickr, Amazon, Facebook, etc, and then makes many insightful observations about how to move to a more open system.

How to read a blog. This is a riff on one of my all-time favourite books, How to read a book, by Mortimer Adler.