SciBarCamp 2

Last year’s SciBarCamp was one of my favourite events ever – here’s a great explanation of why, from Jim Thomas. It’s on again this year, May 8-9, in Toronto. Take a look at the participant list, and sign up (space is limited)! Here’s more, from the organizers:

SciBarCamp is a gathering of scientists, artists, and technologists for a weekend of talks and discussions. The goal is to create connections between science, entrepreneurs and local businesses, and arts and culture.

In the tradition of BarCamps, otherwise known as “unconferences”, the program is decided by the participants at the beginning of the meeting, in the opening reception. SciBarCamp will require active participation; while not everybody will present or lead a discussion, everybody will be expected to contribute substantially – this will help make it a really creative event.

Our venue, Hart House, is a congenial space with plenty of informal areas to work or talk. The space, which made such a wonderful venue for last year’s SciBarCamp, is being made available through a collaboration with Science Rendezvous.

Published

Biweekly links for 04/20/2009

  • singletasking: Caterina Fake
  • Massively Multiplayer Online Game service granted banking license
    • “MMO operator MindArk has been granted a banking license for its virtual world Entropia Universe, by the Swedish Financial Supervisory Authority.

      MindArk says the move will allow it to act as a central bank for all variations of Entropia Universe and integrate the in-game economies with the real world.

      “This is an exciting and important development for the future of all virtual worlds being built using the Entropia Platform,” commented MindArk CEO, Jan Welter Timkrans.

      “Together with our partner planet owner companies we will be in a position to offer real bank services to the inhabitants of our virtual universe.”

      Entropia Universe acts as a platform from which partners can launch virtual worlds within, with the focus being on microtransactions and virtual currency monetisation. “

  • Luis von Blog

Click here for all of my del.icio.us bookmarks.

Published

Biweekly links for 04/17/2009

  • Pooling of Unshared Information in Group Decision Making: Biased Information Sampling During Discussion
    • “Decision-making groups can potentially benefit from pooling members’ information, particularly when members individually have partial and biased information but collectively can compose an unbiased characterization of the decision alternatives. The proposed biased sampling model of group discussion, however, suggests that group members often fail to effectively pool their information because discussion tends to be dominated by (a) information that members hold in common before discussion and (b) information that supports members’ existent preferences. In a political caucus simulation, group members individually read candidate descriptions that contained partial information biased against the most favorable candidate and then discussed the candidates as a group. Even though groups could have produced unbiased composites of the candidates through discussion, they decided in favor of the candidate initially preferred by a plurality rather than the most favorable candidate…”
  • SciBarCamp Toronto 2
    • SciBarCamp Toronto 2 is happening May 8-9, Hart House, Toronto. See the Participant page to register!
  • Killer Bean Forever
    • Feature-length animated movie animated entirely by one person, Jeff Lew (of the Matrix Reloaded). Will be released on DVD in July (US and Canada).
  • arXiview: A New iPhone App for the arXiv
    • Browse the preprint arXiv from your iPhone.
  • A Comparison of Approaches to Large-Scale Data Analysis
    • “There is currently considerable enthusiasm around the MapReduce (MR) paradigm.. Although the basic control flow of this framework has existed in parallel SQL database management systems (DBMS) for over 20 years, some have called MR a dramatically new computing model [8, 17]. In this paper… we evaluate both kinds of systems in terms of performance and development complexity… we define a benchmark consisting of a collection of tasks that we have run on an open source version of MR as well as on two parallel DBMSs. For each task, we measure each system’s performance for various degrees of parallelism on a cluster of 100 nodes… Although the process to load data into and tune the execution of parallel DBMSs took much longer than the MR system, the observed performance of these DBMSs was strikingly better. We speculate about the causes of the dramatic performance difference and consider implementation concepts that future systems should take from both kinds of architectures”

Click here for all of my del.icio.us bookmarks.

Published

Biweekly links for 04/06/2009

Click here for all of my del.icio.us bookmarks.

Published

Biweekly links for 04/03/2009

  • Amazon Elastic MapReduce
    • “Amazon Elastic MapReduce is a web service that enables businesses, researchers, data analysts, and developers to easily and cost-effectively process vast amounts of data. It utilizes a hosted Hadoop framework running on the web-scale infrastructure of Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Simple Storage Service (Amazon S3).”
  • Data produced, analyzed and consumed. The impact of big science : business|bytes|genes|molecules
    • “The fact remains that today we are moving towards a clear separation between data producers, data consumers and methods developers. There was a time that a small group of people could cover all that ground, but with the industrialization of data production (microarrays are already there, mass specs and sequencers not quite yet), traditional roles, even in an academic setting are not efficient. “
  • Adding Noughts in Vain
    • Andrew Doherty’s wonderful blog about politics, climate, New Zealand, and whatever else strikes his fancy.
  • Mathemata: the blog of Francois Dorais
  • Noam Chomsky on Post-Modernism
    • “There are lots of things I don’t understand — say, the latest debates over whether neutrinos have mass or the way that Fermat’s last theorem was … proven … But from 50 years in this game, I have learned two things: (1) I can ask friends who work in these areas to explain it to me at a level that I can understand, and they can do so…; (2) if I’m interested, I can proceed to learn more so that I will come to understand it. Now Derrida, Lacan, Lyotard, Kristeva, etc. — even Foucault, whom I knew and liked, and who was somewhat different from the rest — write things that I also don’t understand, but (1) and (2) don’t hold: no one who says they do understand can explain it to me and I haven’t a clue as to how to proceed to overcome my failures. That leaves one of two possibilities: (a) some new advance in intellectual life has been made… which has created a form of “theory” that is beyond quantum theory, topology, etc., in depth and profundity; or (b) … I won’t spell it out.
  • Caveat Lector » Blog preservation
    • “I suggest mildly that this [blog preservation] would be a fantastic problem to tackle for an academic library looking to make a name for itself. If you can’t make the argument for a general blog-preservation program (and that’s hard, because libraries are so inward-looking at times of crisis), dig up the ten or fifteen best blogs published by people at your institution and make an argument about those. Then release the code you write to the rest of us who want to do this!”
  • Preservation for scholarly blogs – Gavin Baker
    • How will we preserve scholarly blogs for the future?
  • A Blog Around The Clock : Defining the Journalism vs. Blogging Debate, with a Science Reporting angle
    • Thoughtful and thought-provoking.
  • Anarchism Triumphant: Free Software and the Death of Copyright (Eben Moglen)
  • Western internet censorship: The beginning of the end or the end of the beginning? – Wikileaks

Click here for all of my del.icio.us bookmarks.

Published

Conscious modularity and scaling open collaboration

I’ve recently been reviewing the history of open source software, and one thing I’ve been struck by is the enormous effort many open source projects put it into making their development modular. They do this so work can be divided up, making it easier to scale the collaboration, and so get the benefits of diverse expertise and more aggregate effort.

I’m struck by this because I’ve sometimes heard sceptics of open science assert that software has a natural modularity which makes it easy to scale open source software projects, but that difficult science problems often have less natural modularity, and this makes it unlikely that open science will scale.

It looks to me like what’s really going on is that the open sourcers have adopted a posture of conscious modularity. They’re certainly not relying on any sort of natural modularity, but are instead working hard to achieve and preserve a modular structure. Here are three striking examples:

  • The open source Apache webserver software was originally a fork of a public domain webserver developed by the US National Center for Supercomputing Applications (NCSA). The NCSA project was largely abandoned in 1994, and the group that became Apache took over. It quickly became apparent that the old code base was far too monolithic for a distributed effort, and the code base was completely redesigned and overhauled to make it modular.
  • In September 1998 and June 2002 crises arose in Linux because of community unhappiness at the slow rate new code contributions were being accepted into the kernel. In some cases contributions from major contributors were being ignored completely. The problem in both 1998 and 2002 was that an overloaded Linus Torvalds was becoming a single point of failure. The situation was famously summed up in 1998 by Linux developer Larry McVoy, who said simply “Linus doesn’t scale”. This was a phrase repeated in a 2002 call-to-arms by Linux developer Rob Landley. The resolution in both cases was major re-organization of the project that allowed tasks formerly managed by Torvalds to be split up among the Linux community. In 2002, for instance, Linux switched to an entirely new way of managing code, using a package called BitKeeper, designed in part to make modular development easier.
  • One of the Mozilla projects is an issue tracking system (bugzilla), designed to make modular development easy, and which Mozilla uses to organize development of the Firefox web browswer. Developing bugzilla is a considerable overhead for Mozilla, but it’s worth it to keep development modular.

The right lesson to learn from open source software, I think, is that it may be darned hard to achieve modularity in software development, but it can be worth it to reap the benefits of large-scale collaboration. Some parts of science may not be “naturally” modular, but that doesn’t mean they can’t be made modular with conscious effort on the part of scientists. It’s a problem to be solved, not to give up on.

Published

First Principles

How would you use 100 million dollars if someone asked you to set up and run an Institute for Theoretical Physics?  My friend Howard Burton has written a memoir of his 8 years as the founding Executive Director of the Perimeter Institute, taking it from conception to being one of the world’s best known institutes for theoretical physics. I’ve heard many people theorize about how a scientific institution ideally should be organized (“consider a spherical physicist…”), and I’ve contributed more than a few thoughts of my own to such discussions. What I really liked about this book, and what gives it a unique perspective, is that it’s from someone who was actually in the hot seat, from the get-go.

Published

Biweekly links for 03/30/2009

Click here for all of my del.icio.us bookmarks.

Published

On scaling up the Polymath project

Tim Gowers has an interesting post on the problem of scaling up the Polymath project to involve more contributors. Here are a few comments on the start of Tim’s post. I’ll return to the remainder of the post tomorrow:

As I have already commented, the outcome of the Polymath experiment differed in one important respect from what I had envisaged: though it was larger than most mathematical collaborations, it could not really be described as massive. However, I haven’t given up all hope of a much larger collaboration, and in this post I want to think about ways that that might be achieved.

As discussed in my earlier post, I think part of the reason for the limited size was the short time-frame of the project. The history of open source software suggests that building a large community usually takes considerably more time than Polymath had available – Polymath’s community of contributors likely grew faster than open source projects like Linux and Wikipedia. In that sense, Polymath’s limited scale may have been in part a consequence of its own rapid success.

With that said, it’s not clear that the Polymath community could have scaled up much further, even had it taken much longer for the problem to be solved, without significant changes to the collaborative design. The trouble with scaling conversation is that as the number of people participating goes up, the effort required to track the conversation also goes up. The result is that beyond a certain point, participants are no longer working on the problem at hand, but instead simply trying to follow the conversation (c.f. Brooks’ law). My guess is that Polymath was near that limit, and, crucially, was beyond that limit for some people who would otherwise like to have been involved. The only way to avoid this problem is to introduce new social and technical means for structuring the conversation, limiting the amount of attention participants need to pay to each other, and so increasing the scale at which conversation can take place. The trick is to do this without simultaneously destroying the effectiveness of the medium as a means of problem-solving.

(As an aside, it’s interesting to think about what properties of a technological platform make it easy to rapidly assemble and grow communities. I’ve noticed, for example, that the communities in FriendFeed rooms can grow incredibly rapidly, under the right circumstances, and this growth seems to be a result of some very particular and clever features of the way information is propagated in FriendFeed. But that’s a discussion for another day.)

First, let me say what I think is the main rather general reason for the failure of Polymath1 to be genuinely massive. I had hoped that it would be possible for many people to make small contributions, but what I had not properly thought through was the fact that even to make a small contribution one must understand the big picture. Or so it seems: that is a question I would like to consider here.

One thing that is undeniable is that it was necessary to have a good grasp of the big picture to contribute to Polymath1. But was that an essential aspect of any large mathematical collaboration, or was it just a consequence of the particular way that Polymath1 was organized? To make this question more precise, I would like to make a comparison with the production of open-source software (which was of course one of the inspirations for the Polymath idea). There, it seems, it is possible to have a large-scale collaboration in which many of the collaborators work on very small bits of code that get absorbed into a much larger piece of software. Now it has often struck me that producing an elaborate mathematical proof is rather like producing a complex piece of software (not that I have any experience of the latter): in both cases there is a clearly defined goal (in one case, to prove a theorem, and in the other, to produce a program that will perform a certain task); in both cases this is achieved by means of a sequence of strings written in a formal language (steps of the proof, or lines of code) that have to obey certain rules; in both cases the main product often splits into smaller parts (lemmas, subroutines) that can be treated as black boxes, and so on.

This makes me want to ask what it is that the producers of open software do that we did not manage to do.

Here’s two immediate thoughts inspired by that question, both of which are ways large open-source projects (a) reduce barriers to entry, and (b) limit the amount of attention required from potential contributors.

Clear separation of what is known from how it is known: In some sense, to get involved in an open source project, all you need do is understand the current source code. (In many projects, the code is modular, which means you may only need to understand a small part of the code.) You don’t need to understand all the previous versions of the code, or read through all the previous discussion that led to those versions. By contrast, it was, I think, somewhat difficult to follow the Polymath project without also following a considerable fraction of the discussion along the way.

Bugtracking: One of the common answers to the question “How can I get involved in open source?” is “Submit a bug report to your favourite open source project’s bugtracking system”. The next step up the ladder is: “Fix a bug in the bugtracking system”. Bugtracking systems are a great way of providing an entry point for new contributors, because they narrow the scope of problems down, limiting what a new contributor needs to learn, and how many other contributors they need to pay attention to. Of course, many bugs will be beyond a beginning contributor to fix. But it’s easy to browse through the bug database to find something within your ability to solve. While I don’t think bugtracking is quite the right model for doing mathematics, it’s possible that a similar system for managing problems of limited scope may help in projects like Polymath.

More tomorrow.