Does anyone know of examples of open source software projects which are developing software that is run on large centralized servers? I can think of one example off the top of my head – Second Life – but can’t think of any others.
(I am, of course, asking for a reason – I’m interested in whether open source might be a viable development model for tools for scientific collaboration and publication.)
My impression at the moment is that there are few centralized web services which are open source. I can think of a couple of natural reasons why this might be the case.
First are security issues. In any software development one needs to be sure the programmers are honest, and not slipping back doors into the code, or making unethical use of the database. This is potentially harder to control in an open source software project.
Second, although the software may in some sense be owned by the wider community, it does not necessarily follow that the server is owned by the wider community. The group that owns the servers has a much greater incentive to contribute, and other people less so, which lessens the advantages to be had by open sourcing the project.
Are there any reasons I’m missing? Centralized services other than Second Life which are open source?
Can you be more clear about what you mean by “centralized web service”? There’s a lot of open source software for running such services: Apache, phpBB, version control software, content management software, etc.
Or maybe you mean that you’re looking for an example of a particular web site that has opened up its entire server code? This seems less useful–better to modularize the code running the site and release the tools individually.
Also, not an important point, but why hasn’t someone started up a Third Life server with a better Linden/dollar exchange rate and undermined Second Life’s business?
On the security issue, have I misunderstood something?
I should have that that open source would be more secure than closed source for the same reason that bugs are corrected more quickly – the code is available for anyone to read. (“Given enough eyeballs, all bugs are shallow.”) While anyone can modify the code for his own purposes, it is still up to the owner of the project to decide whether to accept or reject the contributions as part of the ‘official’ project.
That quote in parentheses should be a link. Here it is:
http://en.wikipedia.org/wiki/Linus's_Law
WordPress is open source blogging software (as you are certainly aware). WordPress.com is a hosted service built on that open source software. MediaWiki/Wikipedia is another such example.
Those aren’t quite the sort of examples you’re looking for since it seems you want an example of open source software that is only run on centralized servers. I’m not sure why that requirement matters in the end, though. The software was written to do a specific task. Both Wikipedia and smaller, “single server” wikis provide the “physical” resources for that task. They do so at different scales and with different intents, yes, but that just shows the flexibility and widespread adoption of the software, not that it’s any better or worse at accomplishing the task.
I also disagree with your specific concern about security. I don’t see the backdoor risk as any higher in OSS than in non open source software.
Take WordPress again as an example. Just because anyone can see and use the code, doesn’t mean that anyone can alter the core code that WordPress offers for download. Anyone can submit patches, but those patches are reviewed by a relatively small group of core developers who have worked on the project for some time and have proven their skills and trust.
That’s one way to manage contributors and contributions. I’m sure there are others.
More generally, though, in (actively developed) open source software, there’s many more people watching the development and reviewing the code and so more chance that any backdoor or other security problem will be identified and rectified. At least, that is the standard argument. There’s certainly a lot of debate about that argument’s strength.
As to your second point, it’s true that the servers will be owned by a few, but the service will be improvable by anyone willing to help. And if it turns out that those owning the servers cripple the software too much or have undesirable policies, someone else can set up their own servers and offer a competing service based on the familiar software. That takes a lot of work though, and really I agree with this part of your assessment: it’s easy to generate a feeling of “insiders” and “outsiders”. The perceived discrepancy between insiders and outsiders can hopefully be reduced or eliminated with open dialog and good PR 🙂
Slashcode, WordPress, MediaWiki etc are all open source and they mostly run at their flagship “centralized servers”, even though you can download a copy and run it at your own site.
Do they fit? If not, what’s missing from these examples?
I wasn’t clear enough in the post. I’m talking about centralized services with a large server infrastructure. Flickr, Second Life, Google are examples of what I’m talking about. WordPress, Drupal, SVN, Apache etc are not examples of this.
Andy: On the security issue, what concerns me is, for example, someone deliberately inserting a buffer overflow bug or perhaps something more subtle, as part of a much larger “contribution” to the code. Such things can be hard to spot, and an overworked person on the commit team may approve it without noticing the potential for an exploit.
On more reflection, this is presumably equally a problem in all open source projects. So maybe I’m being overly paranoid.
There is a genuine issue in the question of how much access you give outsiders to a central database. E.g., if Flickr were open source, how much access should developers get to the Flickr database?
Maverick: SlashCode and MediaWiki are interesting examples, since in each case the flagship runs on a large centralized server. Do you happen to know how easily they can be deployed to run on a large server infrastructure? Or is that part custom developed?
Ben: Regarding Second Life, I wondered something similar when they made the announcement. I think the answer is probably that setting up a large server infrastructure has historically been both hard and expensive.
Services like Amazon EC2 / S3 are starting to change that, though. It actually might not be all that hard to do!
@8 Michael: it really depends on the load. I assume by “large”, you are thinking the scenario when a single fairly powerful server is not sufficient? In that case, there are many load-balancing solutions, some of which are open-source. For the frontend, Apache reverse proxy or Squid may be used. For the backend database, I am not familiar with the open-source solutions.
Maverick: Yes, I’m talking about instances where large numbers of machines are involved.
I think the WordPress/drupal/Apache model is probably better for scientific collaboration software. Universities already have a network infrastructure, which you would probably want to make use of. Also, you could run into intellectual property and data protection issues depending on the type of project the software is being used to manage. Universities will probably want to be in charge of the security of their own data.
Having each university run their own server for the software, but communicating with other servers to share data between collaborators is what I would suggest. Of course, this would require persuading several universities to support the project from the word go, which would be a challenge.
Matt: For small projects, the drupal / WP model would be fine. However, for larger projects existing University infrastructure would not be sufficient (especially in poorer countries). Systems like Amazon S3 / EC2 beat University networks comprehehsively on reliability, security, scalability, and price, which is a hard combination to resist.
@11 Michael: BTW, you may also be interested in (at least) the organization structure of PlanetLab. It’s a consortium and institutes join so that their members can use the infrastructure.
Maverick: Thanks for the pointer. I’d never heard of PlanetLab before. It looks quite interesting, and I’ll have to look into it more.
Well, there is the Mugshot project (http://www.mugshot.org). Not quite like Facebook or Orkut in terms of popularity, but it is open source and centralized. The team (from RedHat) plans on building on this idea and eventually releasing a sort of open source internet based data service, like .mac, but for linux/GNOME.
There is also planetmath.org, and related sites. Wikipedia can also be considered open source.
The reason why you don’t seem so many centralized server-based open source projects, in my opinion, is because the general approach of the open source community is to de-centralize everything, and make things interoperable so that they can be deployed anywhere and not be tied down to one company. One example of this is Jabber, on which Google talk was built. The tendency is to develop software, not provide services. The Mugshot team (especially Havoc Penington) are advocating open source services, but even then I doubt they will be so centralized.
Define “large” then. As best I can tell, Wikipedia is running on something like 350 servers, WordPress.com on about 290 and Second Life on not yet 10,000.
Of course, what those boxes are makes a big difference and I don’t have any info on that.
I don’t think WordPress.com is clearly a non-example, but I certainly won’t be put out if it is.
For the sake of full disclosure, I work for Automattic on WordPress.com (though my roots are in Caltech’s IQI), so my thoughts, be they compatible with your question or not, are from first hand experience.
Michael:
My original question wasn’t very clearly phrased, and I haven’t cleared it up sufficiently in the comments.
Let me start over, and explain a bit more context. What I’m really interested in is products running on large servers (let us say > 100 machines for the sake of definiteness) where there is a corresponding very large centralized database.
Something like Second Life or Wikipedia fits the bill. WordPress.com isn’t quite what I had in mind. There’s certainly no sense in which I’m trying to “exclude” WordPress from some club. It’s just that some of the tools I’m interested in developing seem quite close in requirements to Second Life or Wikipedia, and I’m wondering if open source is a viable development process. The more examples of such projects I have to look at, the better, from my point of view, which is why I asked the question.
Incidentally, I think WordPress.com is a great site, and WordPress is one of my favourite products.
When were you at IQI?
I didn’t mean to say WordPress or WordPress.com was feeling lonely 🙂 I was just curious about what you were looking for (and, in the process, trying to glean some information about your project).
If you’d like to chat about some of the non-WordPress but still open source tools we use, I’d be happy to discuss it or refer you to others that know more about it any time.
Oh, and I’m technically a graduate student at IQI now (entering class of 2003), though I’m on “sabbatical” working for Automattic.