Skip to content

The role of open licensing in open science

by Michael Nielsen on January 21, 2009

The open science movement encourages scientists to make scientific information freely available online, so other scientists may reuse and build upon that information. Open science has many striking similarities to the open culture movement, developed by people like Lawrence Lessig and Richard Stallman. Both movements share the idea that powerful creative forces are unleashed when creative artifacts are freely shared in a creative commons, enabling other people to build upon and extend those artifacts. The artifact in question might be a set of text documents, like Wikipedia; it might be open source software, like Linux; or open scientific data, like the data from the Sloan Digital Sky Survey, used by services such as Galaxy Zoo. In each case, open information sharing enables creative acts not conceived by the originators of the information content.

The advocates of open culture have developed a set of open content licenses, essentially a legal framework, based on copyright law, which strongly encourages and in some cases forces the open sharing of information. This open licensing strategy has been very successful in strengthening the creative commons, and so moving open culture forward.

When talking to some open science advocates, I hear a great deal of interest and enthusiasm for open licenses for science. This enthusiasm seems prompted in part by the success of open licenses in promoting open culture. I think this is great – with a few minor caveats, I’m a proponent of open licenses for science – but the focus on open licenses sometimes bothers me. It seems to me that while open licenses are important for open science, they are by no means as critical as they are to open culture; open access is just the beginning of open science, not the end. This post discusses to what extent open licenses can be expected to play a role in open scientific culture.

Open licenses and open culture

Let me review the ideas behind the licensing used in the open culture movement. If you’re familiar with the open culture movement, you’ll have heard this all before; if you haven’t, hopefully it’s a useful introduction. In any case, it’s worth getting all this fixed in our heads before addressing the connection to open science.

The obvious thing for advocates of open culture to do is to get to work building a healthy public domain: writing software, producing movies, writing books and so on, releasing all that material into the public domain, and encouraging others to build upon those works. They could then use a moral suasion argument to encourage others to contribute back to the public domain.

The problem is that many people and organizations don’t find this kind of moral suasion very compelling. Companies take products from the public domain, build upon them, and then, for perfectly understandable reasons, fiercely protect the intellectual property they produce. Disney was happy to make use of the old tale of Cinderella, but they take a distinctly dim view of people taking their Cinderella movie and remixing it.

People like Richard Stallman and Lawrence Lessig figured out how to add legal teeth to the moral suasion argument. Instead of relying on goodwill to get people to contribute back to the creative commons, they invented a new type of licensing that compels people to contribute back. There’s now a whole bunch of such open licenses – the various varieties of the GNU Public License (GPL), Creative Commons licenses, and many others – with various technical differences between them. But there’s a basic idea of viral licensing that’s common to many (though not all) of the open licenses. This is the idea that anyone who extends a product released under such a license must release the extension under the same terms. Using such an open license is thus a lot like putting material into the public domain, in that both result in content being available in the creative commons, but the viral open licenses differ from the public domain in compelling people to contribute back into the creative commons.

The consequences of this compulsion are interesting. In the early days of open licensing, the creative commons grew slowly. As the amount of content with an open license grew, though, things began to change. This has been most obvious in software development, which was where viral open licenses first took hold. Over time it became more tempting for software developers to start development with an existing open source product. Why develop a new product from scratch, when you can start with an existing codebase? This means that you can’t use the most obvious business model – limit distribution to executable files, and charge for them – but many profitable open source companies have shown that alternate business models are possible. The result is that as time has gone on, even the most resolutely closed source companies (e.g., Microsoft) have found it difficult to avoid infection by open source. The result has been a gradually accelerating expansion of the creative commons, an expansion that has enabled extraordinary creativity.

Open licenses and open science

I’m not sure what role licensing will play in open science, but I do think there are some clear signs that it’s not going to be as central a role as it’s played in open culture.

The first reason for thinking this is that a massive experiment in open licensing has already been tried within science. By law, works produced by US Federal Government employees are, with some caveats, automatically put into the public domain. Every time I’ve signed a “Copyright Transfer” agreement with an academic journal, there’s always been in the fine print a clause exclusing US Government employees from having to transfer copyright. You can’t give away what you don’t own.

This policy has greatly enriched the creative commons. And it’s led to enormous innovation – for example, I’ve seen quite a few mapping services that build upon US Government data, presumably simply because that data is in the public domain. But in the scientific realm I don’t get the impression that this is doing all that much to promote the growth of the same culture of mass collaboration as open licenses are enabling.

(A similar discussion can be had about open access journals. The discucssion there is more complex, though, because (a) many of the journals have only been open access for a few years, and (b) the way work is licensed varies a lot from journal to journal. That’s why I’ve focused on the US Government.)

The second reason for questioning the centrality of open licenses is the observation that the main barriers to remixing and extension of scientific content aren’t legal barriers. They are, instead, cultural barriers. If someone copies my work, as a scientist, I don’t sue them. If I were to do that, it’s in any case doubtful that the courts would do more than slap the violator on the wrist – it’s not as though they’ll directly make money. Instead, there’s a strong cultural prohibition against such copying, expressed through widely-held community norms about plagiarism and acceptable forms of attribution. If someone copies my work, the right way to deal with it is to inform their colleagues, their superiors, and so on – in short, to deal with it by cultural rather than legal means.

That’s not to say there isn’t a legal issue here. But it’s a legal issue for publishers, not individual scientists. Many journal publishers have business models which are vulnerable to systematic large-scale attempts to duplicate their content. Someone could, for example, set up a “Pirate Bay” for scientific journal articles, making the world’s scientific articles freely available. That’s something those journals have to worry about, for legitimate short-term business reasons, and copyright law provides them with some form of protection and redress.

My own opinion is that over the long run, it’s likely that the publishers will move to open access business models, and that will be a good thing for open science. I might be wrong about that; I can imagine a world in which that doesn’t happen, yet certain varieties of open science still flourish. Regardless of what you think about the future of journals, the larger point is that the legal issues around openness are only a small part of a much larger set of issues, issues which are mostly cultural. The key to moving to a more open scientific system is changing scientist’s hearts and minds about the value and desirability of more openly sharing information, not reforming the legal rights under which they publish content.

So, what’s the right approach to licensing? John Wilbanks has argued, persuasively in my opinion, that data should be held in the public domain. I’ve sometimes wondered if this argument shouldn’t be extended beyond data, to all forms of scientific content, including papers, provided (and this is a big “provided”) the publisher’s business interests can be met in way that adequately serves all parties. After all, if the scientific community is primarily a reputation economy, built around cultural norms, then why not simply remove the complication of copyright from the fray?

Now, I should say that this is speculation on my part, and my thinking is incomplete on this set of issues. I’m most interested to hear what others have to say! I’m especially interested in efforts to craft open research licenses, like the license Victoria Stodden has been developing. But I must admit that it’s not yet clear to me why, exactly, we need such licenses, or what interests they serve.

Further reading

I’m writing a book about “The Future of Science”; this post is part of a series where I try out ideas from the book in an open forum. A summary of many of the themes in the book is available in this essay. If you’d like to be notified when the book is available, please send a blank email to the.future.of.science@gmail.com with the subject “subscribe book”. I’ll email you to let you know in advance of publication. I will not use your email address for any other purpose! You can subscribe to my blog here.

28 Comments
  1. I agree that the scientific culture is probably more important than the legal framework. It seems to me that among the research community reputation is the primary factor, from which research funding, job opportunities and pride are derived. If people could be convinced that in a more collaborative, open research model there were mechanisms that sufficiently illuminated the quality researchers while weeding out the cruft, then it seems there would be an inevitable shift to the new system. It’s clear that both the pace and efficiency of much research would improve with increased sharing across communities, so as long as those sharers knew they were being legitimately “compensated” in reputation for their contributions, it would be hard not to take part.

    I don’t know how to formalize the reputation building mechanisms that arise in an open source project, but they seem to work quite well. Blog posts about new feature additions and bug fixes, mailing list discussions and wikipedia like community pages that describe the state of the art all result in a relatively clear community structure. It’s just that the structure and reputations that compose it are not obviously demonstrated and proven to those outside the community, like journal or conference publications might do. It seems the reputation problem boils down to the evaluation of contributions, and then finding a way to normalize these contribution metrics across projects and communities.

    As for licenses, it seems that as long as people can extend ideas in the research domain indefinitely, without worry of lawsuits, most people will be happy. Maybe there could be a dual licensing option, so that work could simultaneously be patented and commercialized while also being further extended by the community…?

  2. Oh, heck, the above HTML formatting came out all wrong (darn this lack of a “preview” feature) please accept my apologies Michael, and delete the above post.

    … let’s try it again …

    Freedom-Loving Human: These GPL’d scientific tools and open databases are greatly enhancing my personal freedom!

    Locutus of Borg: A limited point of view. These same GPL’d tools are powerfully assisting the Borg assimilation of your culture, by processes of which you are ideologically unconscious.

    To appreciate Locutus’ point, we need only reflect that open-source free (as in “freedom”) tools now enable the creation of what historian Stephen B. Johnson has called “bureaucracies for innovation”:

    “In a hotly contested Cold War race for technical superiority … those funding the Space Race demanded results. In response, development organizations created what few expected and what even fewer wanted — a bureaucracy for innovation.”

    As Ed Wilson has noted, bureaucracies for innovatio are becoming the most efficient mode of science not only in aerospace engineering, but in all disciplines of science and technology:

    ” The convergence of databases on biodiversity into a few free single-access, on-command systems has begun to benefit biologists and students dramatically. … Now it is the turn of invisible life to be revealed. … New technologies … already fast, will soon be much faster as well as affordable. … As the information comes together online, the big picture of Earth’s biodiversity will emerge as a mosaic at high resolution. … Despite their modest outward appearance, the all-species inventories are in fact collectively “big science’… This will be a moonshot effort that will eventually engage many times the number of professional and citizen scientists now active”

    My own interest in simulation science (especially quantum simulation science) derives largely from an appreciation that bureaucracies for innovation, that are founded upon modern simulation science,and that are embedded in open-source free software tools and databases, are *already* proving to be this century’s most potent single tool for large-scale public-private social organization.

    We can regard the HGP, the LHC, the Boeing 787, or UCAR, as examples of bureaucracies for innovation that are founded upon simulation science.

    It is soberingly plausible, though, that if all we embrace a too-simple faith in the free market of scientific ideas—and thereby, we are all collectively derelict in our responsibility to understand the complexities of the world that we are creating—the all-too-likely result may be a global train-wreck of the enterprise of science and technology, which will resemble the chaos of the present-day implosion of financial markets, but on an even larger scale.

    The point being, that the too-simple twentieth century idea that “markets can take care of themselves” has failed utterly. The contrasting modern view is that satisfactory progress is equally likely to arise from in-depth analysis, prudent compromises, and wise checks and balances, as it is to arise from the embrace of a few simple principles of justice.

    America’s Founders did not shrink from in-depth social analysis—as the Federalist Papers testify—and we should not shrink from it either.

    Bottom Line: No previous century of human history has been simple, and it is highly implausible that our present century will be any different.

  3. As someone who works in a medical library and who is frequently dismayed by the appalling restrictions on content that publishers impose at the expense of patient care, I found Michael’s article fascinating and important. I wish he would elaborate on the intriguing point, “…open access is just the beginning of open science, not the end.”

    Jeff’s comments here are excellent:

    “I don’t know how to formalize the reputation building mechanisms that arise in an open source project, but they seem to work quite well. Blog posts about new feature additions and bug fixes, mailing list discussions and wikipedia like community pages that describe the state of the art all result in a relatively clear community structure. It’s just that the structure and reputations that compose it are not obviously demonstrated and proven to those outside the community, like journal or conference publications might do. It seems the reputation problem boils down to the evaluation of contributions, and then finding a way to normalize these contribution metrics across projects and communities.”

    I would argue that the current reputation building structure premised on publication in prestigious journals is bound to break down given that more and more research libraries are under such budgetary stresses that they can’t afford to pay the extortionate prices that Elsevier et al demand for those journals. You can’t build a reputation if your peers lack access to the journals in which you have published.

    Could Jeff and Michael provide specific examples of scientists who have made themselves known via contributions to wikis and by blogging? Such examples of the acquisition of enhanced professional stature via Web 2.0 methods would be powerful arguments for the value to scientists of such activities and bolster them in what can be a lonely endeavor and one fraught with concern that such activities might only drain their energies or indeed lead them to being dismissed as “mere” bloggers.

    Kudos to Michael for the cogent overview of an important topic. I hope this article gets widely cited and read by librarians and heads of tenure panels.

  4. Jeff – Thanks for the thoughtful comment. Agreed, if we move to a more collaborative model – which is where I think things are going – we need scalable ways of assessing contribution, and that’s going to take some effort to work out.

    Your comment at the end and points to a problem my analysis misses: the growing interest in commercialization within science, due to the Bayh-Dole act and similar legislation in other countries. I guess my personal view is that Bayh-Dole was a mistake, but it may be one we have to live with permanently. In that case, some sort of dual licensing scheme like that you propose may be the way to go.

  5. John – not sure I agree with you that bureaucracies for innovation are the way to go, although they are certainly a fascinating model to study, and have achieved some major successes. Their problem seems to be that the power structure creates an enormous bottleneck for innovation. Decentralized mass collaboration (still with space for the individual) seems a better model.

  6. Hope – Thanks for your comment. I’m glad you enjoyed the article.

    As regards the elaboration of the statement that open access is just the beginning of open access, I’ve elaborated on this some in my past essays, especially http://michaelnielsen.org/blog/?p=448. I have several more essays I’m planning over the next month or two that will develop this idea much further.

    Regarding the intellectual value of blogging, Google “Terry Tao blog” and “Tim Gowers blog” for two examples of blogging of the highest intellectual caliber. In fact, the American Mathematical Society just published a book containing selections from the first year of Tao’s blog, and year two is apparently in the works. There are many other outstanding examples of mathematical blogs – just look at the blogroll at Tao’s blog. You don’t need to be a mathematician to see that something very interesting is going on in the mathematical blogosphere.

  7. Hilary permalink

    Hi Michael,

    I very much agree with you in your observation that science depends on community norms rather than legal means to enforce appropriate behavior in science (e.g. preventing plagiarism). I recently noticed that some researchers are using CC licenses or the GPL for online (scientific) databases; most tend to use non-commercial licenses. I am curious regarding the motivation for using these licenses, as I suspect it is an attempt to signal community intentions rather than a desire for a legally enforceable license.

    The GPL and CC licenses serve as reminders of community norms and function as signals of appropriate behavior. I doubt many people have read the full legal text of the CC licenses; most people (who are familiar with CC licenses) have read the “human readable summaries” (such as this one) or just seen the logos. I also suspect not many people have read the full legal code of the GPL, although it seems that most people who work in technology are familiar with its general terms and attempt to hew to these.

    Perhaps what the scientific community needs is not a license, per se, but a set of clearly recognizable tools (logos?) for designating appropriate uses of scientific content and data (no plagiarizing, credit/attribution required, etc.)

  8. Echoing Hilary’s comment: “The GPL and CC licenses serve as reminders of community norms and function as signals of appropriate behavior.”

    I think the value of open licenses in open science is symbolic as much as (if not more than) legal. Not that there aren’t legitimate legal challenges — and that there isn’t a cost to the uncertainty caused by illegitimate claims or misunderstanding — that matters, too. But when a researcher chooses CC Zero for his dataset, it’s comparable to (say) putting a Barack Obama sticker on your car: The researcher feels that he has taken an affirmative step to encourage sharing, and has a stake in the outcome; and it’s a signal to the researcher’s peers that they ought to share, too.

  9. Michael says: John — not sure I agree with you that bureaucracies for innovation are the way to go, although they are certainly a fascinating model to study, and have achieved some major successes. Their problem seems to be that the power structure creates an enormous bottleneck for innovation.

    Oh, I dunno … oftentimes, the hardest revolutions to perceive are the ones we are right in the middle of.

    Lets consider some of the trends that favor simulation-centric innovation (which sounds so much nicer than “bureaucracies for innovation”): (1) 1000X faster, cheaper, computational hardware, (2) 1000X faster, more accurate algorithms, (3) globalized economies and work forces, allowing innovation to roll forward 24×7, (4) planetary-scale databases standing ready to be filled, (5) on the order of one billion human beings who are looking for family-supporting jobs.

    Most of all, there is a growing appreciation of the unprecedented scale of twenty-first century enterprises like the one Ed Wilson envisioned: “a moonshot effort [i.e., a comprehensive biome survey at all scales from Angstrom to planetary] that will eventually engage many times the number of professional and citizen scientists now active”.

    “Moonshot effort?” Doesn’t that understate the scale of Ed Wilson’s program? Isn’t Wilson’s whole-biome, all-scales survey program in fact enormously larger even than the Apollo Program?

    AFAICT, the open science community is not seriously considering programs of this scale, but (arguably) they should be. Because the key element that open science tools are now providing (especially tools for classical and quantum simulation science), is the confidence to undertake these programs.

    There is a ton of wonderful literature on the central role of simulation software tools in large-scale enterprises. A classic is Booton and Ramo’s The development of systems engineering:

    Systems Engineering is the design of the whole as distinguished from the design of the parts. The systems engineer harmonizes optimally an ensemble of subsystems and components—machines, communications networks, humans, space—all related by channeled flows of information, mass, and energy. Of course, the designer of a chair, a watch, or even a necktie deals in the end with the whole; so, in a sense, every designer is partially a systems engineer. But where the whole has many components and many complicated interactions occur when they are connected, real systems engineering is required. Then systems engineering becomes a demanding intellectual discipline

    Nowadays these ideas seem obvious … but they were very far from obvious, to most folks, only a few decades ago.

    Even today, these classical and quantum system engineering tools are (obviously) very far from reaching their fundamental computational or informatic limits. So I think any attempt to foresee the future of science has to take them into account.

    ———–

    By the way, you are right about the excellence, and the broad influence, of Terry Tao’s and Tim Gowers’ blogs — I’m a huge fan! Also of Igor Carron’s Nuit Blanche. And there are *so* many more … it’s an amazing and wonderful phenomenon that is cause for celebration.

    ———–

    @article{Ramo:84,Author = {R. Booton and S. Ramo}, Journal = {{IEEE} Transactions on Aerospace and Electronic Systems},Month = jul,Pages = {306–9},Title = {The development of systems engineering},Volume = {20},Year = 1984}

  10. Hi, all. I liked Michael’s comment to Jeff “…we need scalable ways of assessing contribution, and that’s going to take some effort to work out.”

    And Hillary’s idea here is an excellent one, “Perhaps what the scientific community needs is not a license, per se, but a set of clearly recognizable tools (logos?) for designating appropriate uses of scientific content and data (no plagiarizing, credit/attribution required, etc.)…” Such logos are a great idea. Many articles at Biomed Central are clearly stamped, “Open Access.” Pretty straightforward.

    Thank you, Michael, for your reference to your earlier article, Building A Better Collective Memory. I am about halfway through it and found this interesting, “The problem all these sites have is that while thoughtful commentary on scientific papers is certainly useful for other scientists, there are few incentives for people to write such comments. Why write a comment when you could be doing something more “useful”, like writing a paper or a grant? Furthermore, if you publicly criticize someone’s paper, there’s a chance that that person may be an anonymous referee in a position to scuttle your next paper or grant application.”

    Those comments are spot on. For instance, I am writing these comments on your blog at around 3:40 a.m. I am, in the meantime, not working on assignments for my master’s in library science and also thinking, “Then I will have to proofread what I have just written so as not to look like an idiot.” And, “Let’s see, am I saying anything here that could hurt me in some way in a future job hunt…” What incentive, indeed, is there to comment? I am commenting here because your essays are fascinating reading and because I would like to network with people like you in the Science 2.0 community. The commentator has no way of knowing what concrete results will come of such efforts. Indeed, you are one of the very few bloggers in any field who takes the time to respond to comments with personalized comments and links to earlier essays.

    I particularly liked,

    “It is for the people building the new online tools to also develop and boldly evangelize ways of measuring the contributions made with the tools. To understand what this means, imagine you’re a scientist sitting on a hiring committee that’s deciding whether or not to hire some scientist. Their curriculum vitae reports that they’ve helped build an open science wiki, and also write a blog. Unfortunately, the committee has no easy way of understanding the significance of these contributions, since as yet there are no broadly accepted metrics for assessing such contributions. The natural consequence is that such contributions are typically undervalued.

    To make the challenge concrete, ask yourself what it would take for a description of the contribution made through blogging to be reported by a scientist on their curriculum vitae. How could you measure the different sorts of contributions a scientist can make on a blog – outreach, education, and research? These are not easy questions to answer. Yet they must be answered before scientific blogging will be accepted as a valuable professional scientific contribution.”

    This is useful, too, “FriendFeed is a great service, but it suffers from many of the same problems that afflict the comment sites and Wikipedia. Lacking widely accepted metrics to measure contribution, scientists are unlikely to adopt FriendFeed en masse as a medium for scientific collaboration. And without widespread adoption, the utility of FriendFeed for scientific collaboration will remain relatively low.” So here I am at now a few minutes after four in the morning wondering if participating in FriendFeed is worth the time required to follow acitivity in it. And yet without it, I would not have found your valuable essays. But if I had not found your essays, I might be doing my library school homework.

    Thank you for the examples of the prestige that can be won in blogging. But it is significant that the examples are from mathematics, not from medicine. Indeed most of the successful examples of open access and Science 2.0 are from physics. I wish there were examples of breakthroughs in neurodegenerative diseases. That would mean more to the general public vis-à-vis improved quality of life and in engendering public support for such efforts.

    Finally, I would be interested in your comments on the influence that the advocate for research on amyotrophic lateral sclerosis, Augie Nieto, is having. He makes it a condition of the grants his organization makes that the results of the studies he helps fund be made available to other scientists expeditiously. It is ironic that is a hardheaded businessman like Nieto who is opening up science to the open access model.

  11. Gavin – I’m very leery of this suggestion. At present, my experience is that most scientists are pretty unaware of issues around copyright. If they have a problem with someone copying their work, they resolve it through cultural channels, not legal. This has its problems, but doesn’t seem to work too badly.

    By raising the profile of copyright in the way you suggest, it’s possible you may get an unintended consequence, namely scientists becoming fiercely legally protective of their work, creating an extra barrier to openness, when no extra barrier is necessary. Sure, some people may CC license their work, but others will protect it with all their might.

    A student interested in open data told me that they tried to convince their supervisor to try open data out, explaining CC licenses and all the rest. The supervisor wasn’t interested in the open data suggestion, but apparently became interested in the idea of using viral licenses to legally compel attribution, no matter how small the use of data.

  12. Hi Hilary – Logos are a really interesting idea! As Hope says, simple stamps like “Open Access” are already having a salutary effect, and maybe there’s more that could be done. It’d be nice to see a cross-publisher effort; over the long run, it’s hard to see anything that’s not cross-publisher succeed. I am, as I commented above in response to Gavin, a little leery about possible unintended consequences. But done carefully this might be really valuable: having an “open data” stamp, or “open code”, or whatever.

  13. Hope – Thanks for your long comment. I’m glad you enjoyed the essay. As regards Augie Nieto, I must admit I actually hadn’t heard of him before. Your comment about “hard-headed businessmen” driving some of this is spot on. In general, on the openness issues it’s often politicians, funding agencies and industry who are on the side of the angels, and scientists not always so much.

    On FriendFeed: it’s only worth it, in my opinion, if you really make an effort to become involved in the community there, commenting, liking items, and so on. I was delighted by how quickly people made me feel welcome there when I started participating, despite the fact that I knew very few people when I started using it. There are a lot of great communities there.

  14. PS to Hope: The issue about commenting on blogs versus commenting on sites like Nature’s peer review is an interesting one. I have a very long, complicated answer, which I won’t try to write out here. Basically, though, the two activities are _not_ at all the same, for a whole variety of reasons, and the cost-benefit tradeoff is quite different. I think that’s why blog comments work quite well, but the comment sites don’t. There’s a natural community around a blog, while that’s not true in the same way on the comment sites; in some sense, community is too thinly spread out on those sites. Furthermore, the relationship around comment sites tends to be more intrinsically adversarial (with a paper under discussion, the stakes are high) than the informal discussion common on many blogs.

    Anyways, as I say, this is really a very complex topic. I think comment sites will, one day, succeed, but the barriers are considerably higher than for blogs.

  15. Hi, Michael. Thank you for your interesting, thoughtful responses to all of us. I found your comment here to Gavin both amusing and depressingly indicative of the challenges to open science, “The supervisor wasn’t interested in the open data suggestion, but apparently became interested in the idea of using viral licenses to legally compel attribution, no matter how small the use of data.”

    I agree with you that some sort of universally recognized logo (as is becoming the case with the RSS button) would be a huge boost vis-à-vis the dissemination of open access material. As it stands, often you have to visit the site of a publisher (Springer or Elsevier, say) and either hope to see something that clearly indicates that something is open access (Springer) or just determine that it is (in the case of Elsevier) by clicking on the title of the article.

    Apropos of your comments here, “In general, on the openness issues it’s often politicians, funding agencies and industry who are on the side of the angels, and scientists not always so much…” I would include among the angels patients and the new breed of patient advocate/entrepreneurs such as the brothers Jamie and Ben Heywood of Patients Like Me (with some reservations) and patient/philanthropists such as Augie Nieto. I would be most interested to hear your comments, Michael, on how you regard such new players on the data dissemination/clinical trials fronts as these pioneers. For instance, the large-scale, patient-initiated and led study of lithium in amyotrophic lateral sclerosis and the fascinating, innovative model of the ALS Therapy Development Institute http://www.als.net/

    I am very impressed with FriendFeed, also and the quality of the posts there. It is there that I came across your blog. I have learned a lot by browsing through the Science 2.0 room.

    Your comments on characteristics of review sites versus blogs are helpful. Could you provide some examples of such sites? I have visited just Nature Networks. Would you consider LabRoots and Scientist Solutions review sites?

    Also, could you provide examples of blogs that typify a place where comments work well? In my experience, bloggers rarely take the time to respond as patiently and thoughtfully as you do. I have given up almost entirely on commenting on blogs. I lurk, for the most part, and move on to more productive activities.

    Indeed, the quick, telegraphic world of Twitter seems to work best in terms of getting info into our brains. I have used it for less than a month, but am amazed at how quickly I can learn of important articles, read them and move on to the next one thanks to the incredible efficiency of my fellow Twits in posting links.

    As to commenting on blogs, there is the matter of trying to determine how valuable one’s own comments are. I am writing this response to you because I think it is worthwhile in that I think it might elicit from you further erudite commentary on the important matter of open science. But I will now have to proofread what I have written and I have already invested quite a few minutes in this mini-essay (and I got a chuckle out your use of the word “long” to describe my initial efforts here). There are so many factors to consider—your essay is important and valuable. One hopes that one’s inputs will jog your thinking further and inspire to keep up with your valuable work—but one is, meanwhile, having to proofread one’s writings instead of going on to read something else. So many opportunity costs to weigh!

  16. Hope – I’m much more ignorant than I would like to be of the medical community and what’s going on there. It’s something I’ve just slowly been getting into the past few months. After 15 years as a physicist, I’m finding it quite refreshing to gradually get more in touch with colleagues in other disciplines – but there’s so many disciplines that it’s taking a lot of time! The links you’re sharing related to open medical practice are quite helpful; it’s all new to me.

    The essay I linked to earlier contains four or five examples of review sites, all of which failed. I haven’t looked at Scientist Solutions; I seem to recall that I did look at LabRoots, but to be perfectly honest don’t recall my impression. It seemed like just another social networking site for scientists. I get a bit glazed over – there seem to be a lot of scientific startups, and many don’t have a particularly unique value proposition. With that said, I do get some considerable value out of Nature Network. It’s not really a comment site, though, certainly not in the sense I meant.

    As regards blogs with good comments, the general rule of thumb seems to be that (a) anonymous comments should be strongly discouraged; (b) there shouldn’t be too many (> 100 per post seems to cause problems); and (c) the blogger should take an active role. When all three of those conditions are met, the result is often good. Terry Tao, Scott Aaronson, Chad Orzel, Cameron Neylon, Kevin Kelly, and many others have comments with these qualities. In fact, now that I think about it, a large fraction of the blogs I read do. This may be a community-dependent thing.

  17. Hi, Michael. Thank you so much for yet another series of comments. I found these interesting, “…just another social networking site for scientists. I get a bit glazed over – there seem to be a lot of scientific startups, and many don’t have a particularly unique value proposition. With that said, I do get some considerable value out of Nature Network. It’s not really a comment site, though, certainly not in the sense I meant.”

    My colleague at Next Generation Science Walter Jessen has just interviewed Dave Munger of
    ResearchBlogging.org–I wonder how you would classify ResearchBlogging.org and how it you would rate vis-a-vis particularly unique value. And thanks for the tips on blogs to look at–on Chad Orzel’s I was able to see how ResearchBlogging is employed on a blog by a serious scientist–thank for the pathway to a real-world instance.

    I have looked at Nature Network and it does seem to have a pretty impressive infrastructure and lively community–certainly when compared to Elsevier’s inert http://www.2collab.com.

    Anyway, thank you again for the work you do in providing a forum for discussing these important matters.

  18. Hi Hope – I’m very interested to see how ResearchBlogging evolves. At the moment, it’s a bit too early to say, but it may be that ways of aggregating “serious” posts about research will become more and more important as the blogosphere grows. One thing I wonder about, though, is the rather arbitrary distinction between peer-reviewed research and gray literature. For example, should Scott Aaronson’s recent post (http://scottaaronson.com/blog/?p=381) about a purported proof of an important conjecture in computer science be included in ResearchBlogging? The work it describes hasn’t yet been peer-reviewed, yet I think Scott’s post is far more interesting and scientifically weighty than most posts about peer reviewed research. Much the same is true of many of the most interesting blog posts in physics, mathematics and computer science.

  19. Michael, thanks for the reply. My response is a combination of (a) it’s inevitable, (b) it happens anyway, and (c) it doesn’t happen as much as you might fear.

    (a) It’s inevitable that researchers will become more aware of copyright and related regulatory issues as they move more into this space, simply by virtue of needing to understand a new space. The only hope is to establish norms so clearly and forcefully that nobody has to think very hard about it. There is a paradox here, because in trying to do so, you bring more people to think about it. I think this can be generalized to any kind of norm-setting, in any context. (Stop me if this doesn’t make sense.)

    (b) It happens anyway, exactly as you point out.

    I hasten to add that it happens even without copyright. In the U.S., at least, if you post a dataset, and someone else derives information from this, I don’t see any copyright issue (or any database right issue, since that regime doesn’t exist here). The only way to, say, try to compel attribution is via contract — by getting your users to agree to whatever terms you set as a condition of viewing your data. This is exactly what the OCLC is trying to do with its WorldCat data, since (given Feist v. Rural) there’s no meaningful copyright hook here.

    (c) Having watched this issue with CC licenses in applications outside science, it doesn’t seem like it happens that frequently. That’s not to say it never happens: just the other day, I commented on a group’s idea to encourage CC licensing in a context where fair use was perfectly sufficient. There’s a similar issue here: because boundaries are fuzzy, and nobody wants to end up in court, there can be a temptation to wear kid gloves and do less than we’re legally allowed, just to be extra sure. But if nobody stands up for all of what they’re allowed — even pushes the boundaries sometimes — then we allow the encroachment of property rights on what is and should be a commons, to the detriment of all. So the drift toward private ordering can be harmful when it moves the discussion from one of rights to be demanded to one of points of negotiation.

    The good news is, it really doesn’t seem to happen that frequently. And as (a) and (b) should suggest, I feel a bit fatalistic that it does and will happen almost no matter what. By focusing on openness, and offering licensing as a solution anyone can get behind, at least we have something to fight back with.

    None of this is disagree with your original point that the issue of data sharing is more one of the lack of incentives to share than one of disincentives (legal barriers). There has to be more impetus from funders, institutions, societies, journals and publishers, libraries, and researchers and the public generally. There are various levers to pull here, and it’d be interesting and value to take a birds-eye view of which are most effective.

  20. Hi, Michael and Gavin.

    I was a bit confused by Michael’s comment here, “One thing I wonder about, though, is the rather arbitrary distinction between peer-reviewed research and gray literature. For example, should Scott Aaronson’s recent post (http://scottaaronson.com/blog/?p=381) about a purported proof of an important conjecture in computer science be included in ResearchBlogging? The work it describes hasn’t yet been peer-reviewed, yet I think Scott’s post is far more interesting and scientifically weighty than most posts about peer reviewed research. Much the same is true of many of the most interesting blog posts in physics, mathematics and computer science.”

    To wit, I would not classify blog postings as, “gray literature.” I would go with the definition of great literature I just found in a quick Google search, “…research papers, statistical documents, and other difficult-to-access materials that are not controlled by commercial publishers.” I would not put blog postings under that rubric.

    That brings up the question of how, then, to categorize blog postings. As you say, there is much of genuine scientific value in blog postings that have not yet (and may never be) peered reviewed. Are you saying that the gatekeepers of ResearchBlogging are too rigid in that respect? Should there be some sort of ranking system within it? Or would that confuse matters and undermine the raison d’entre of the ResearchBlogging enterprise? Maybe there should be a entirely new, scientist-controlled service (as opposed to mainstream media wire services) for breaking news in the sciences that would indicate that items are potentially significant but not yet reviewed? I would think such a service would be as valuable as ResearchBlogging is proving to be. Thoughts?

    Gavin’s comments on OCLC are especially interesting in the light of the backlash by libraries against OCLC’s recent foiled (for now) power grab and its subsequent climb-down. We shall see how that all plays out.

  21. Hope – I was unclear. I was referring to the (unpublished) pdf Scott points to, not to the blog post itself. The pdf is just sitting on someone’s webpage, yet (apparently) contains the solution to an important research problem. Similarly, blog posts about arxiv papers don’t appear in researchblogging.com, so far as I’m aware, because those papers typically haven’t been peer-reviewed. That means researchblogging excludes blog posts about a large fraction of the most interesting happenings in physics.

  22. Hi, Michael. Could part of the problem be that we are only now beginning to see search engines that can render the contents of PDFs searchable? I would be very interested in your views in how much of the challenge to open science is related to the simple fact that much of the interesting research is lying untapped in PDF and in institutional repositories.

    For instance, I am very interested in the subject of amyotrophic lateral sclerosis and also saw a job open at Oregon State University for a digital librarian, so decided to explore its institutional repository here:

    http://ir.library.oregonstate.edu/dspace/index.jsp

    and found this very interesting dissertation on ALS:

    The detection of superoxide and implications for amyotrophic lateral sclerosis

    http://ir.library.oregonstate.edu/dspace/handle/1957/5003

    Now, that is the kind of thing that most search engines would miss and why I am so interested in this subject:

    http://www.osti.gov/ostiblog/home/entry/sophisticated_yet_simple_the_technology

    Sophisticated Yet Simple – The Technology Behind OSTI’s E-print Network

    And the fact that search engines like Mednar and its creator Deep Web Technologies are helping to render more of the Deep Web searchable.

    I’d be most interested to hear your thoughts on the role of search and the file format problem in open science.

  23. Hi, Michael. I just tried to leave a long note, but it seems to have vanished. The gist of it was asking you to comment on the challenge to open science of the fact that we are only now seeing search engines that can render the contents of PDFs searchable. Is part of the problem for open science simply that so much pioneering material is in PDF and thus not easily indexed by search engines?

    Also, what do you think of projects such as WorldWideScience.org and firms such as Deep Web Technologies?

  24. Hi Hope – making pdf and multiple repositories searchable seem like a relatively straightforward technical problems; what bothers me more is the fact many organizations don’t want to make their content available, or hide it in the deep web. That social problem seems very challenging, and like a significant problem.

  25. Hi Gavin,

    I’m not 100% sure I’m interpreting you correctly – I’m a bit uncertain about the antecedent to “it” in your second sentence. Just so I’m clear: are you referring to the issue of researchers becoming more aware of issues about licenses?

    I guess in a weak sense I agree that this will become so, as a result of OA mandates and so forth. But very few researchers have any sort of deep interest in licenses, and I think quite rightly: it’s not their job to be concerned with such matters, nor should it be.

    As I said earlier, making licenses more of an issue than the absolute minimum necessary strikes me as a path to trouble, with the same disastrous license proliferation we’ve seen with open source software, as well as many researchers starting to use copyright as an additional way of enclosing their work.

Trackbacks and Pingbacks

  1. Open Knowledge Foundation Blog » Blog Archive » Open Data Openness and Licensing
  2. Science in the open » Best practice for data availability – the debate starts…well over there really
  3. CameronNeylon.Net » Blog Archive » Best practice for data availability – the debate starts…well over there really

Comments are closed.