Cameron Neylon on practical steps toward open science

Cameron Neylon is a scientist at the UK Science and Technology Facilities Council, an open notebook scientist, and one of the most thoughtful advocates of open science (blog, Twitter). In an email interview I asked Cameron a few questions about practical steps that can be taken toward open science:

Q1: Suppose you’ve just been invited by the head of a major funding agency to advise them on open science. They’re asking for two or three practical suggestions for how they can help move us toward a more open scientific culture. What would you tell them?

For me the key first question is to ask what they see as their mission to maximise, and then seek to measure that effectively. I think there are two different main classes of reason why funders support science. One is to build up knowledge, and the other is to support the generation of economic and social outcomes from research and innovation. A third (but often brushed under the carpet) target is prestige – sometimes the implicit target of small research funders, or those from emerging and transitional economies, seeking to find a place on the global research stage. Nothing wrong with this but if that is the target you should optimise for that. If you want other outcomes you should optimise for that.

Our current metrics and policies largely optimise for prestige rather than knowledge building or social outcomes. On the assumption that most funders would choose one of these two outcomes as their mission I would say that the simple things to do are to actively measure and ask fundees to report on these things.

For knowledge building: Ask about, and measure the use and re-use of research outputs. Has data been re-used, is software being incorporated into other projects, are papers being cited and woven tightly into the networks of influence that we can now start to measure with more sophisticated analysis tools?

For social and economic outcomes: Similar to above but look more explicitly for real measurable outcomes. Evidence of influence over policy, measure of real economic activity generated by outputs (not just numbers of spin out companies), development of new treatment regimes.

Both of these largely seek to measure re-use as opposed to counting outputs. This is arguably not simple but as the aim is to re-align community
attitudes and encourage changes in behaviour its not going to be simple. However this kind of approach takes what we are already doing, and the direction it is taking us in terms of measuring “impact” and makes it more sophisticated.

Asking researchers to report on these, and actively measuring them, will in and of itself lead to greater consideration of these broader impacts and change behaviour with regard to sharing. For some period between 18 months and three years simply collect and observe. Then look at how those who are doing best on specific metrics and seek to capture best practice and implement policies to support it.

Throughout all of this accept that as research becomes less directed or applied that the measurement becomes harder, the error margins larger, and picking of winners (already difficult) near impossible. Consider mechanism to provide baseline funding at some low level, perhaps at the level of 25-50% of a PhD studentship or technician, direct to researchers with no restrictions on use, across disciplines with the aim of maintaining diversity, encouraging exploration, and maintaining capacity. This is both
politically and technically difficult but could have large dividends if the right balance is found. If it drops below an amount which can be useful when combined between a few researchers it is probably not worth it.

Q2: Suppose a chemist early in their career has just approached you. They’re inspired by the idea of open science, but want to know what exactly they can do. How can they get involved in a concrete way?

Any young researcher I speak to today I would say to do three things:

1) Write as much as possible, online and off, in as many different ways as possible. Writing is the transferable skill and people who do it well will always find good employment.

2) Become as good a programmer/software engineer/web developer as possible. A great way to contribute to any project is to be able to take existing tools and adapt them quickly to local needs.

3) Be as open as you can (or as your supervisor will allow you to) about communicating all of the things you are doing. The next stage of your career will depend on who has a favourable impression of what you’ve done. The papers will be important, but not as important as personal connections you can make through your work.

In concrete terms:

1) Start a blog (ask for explicit guidelines from supervisors and institutions about limitations on what you should write about). Contribute to wikipedia. Put your presentations on slideshare, and screencasts and videos of talks online.

2) To the extent that it is possible maintain your data, outputs, and research records in a way that when a decision is taken to publish (whether in a paper, informally on the web or anything in between) that it is easy to do so in a useful way. Go to online forums to find out what tools others find useful and see how they work for you. Include links to real data and records in your research talks

3) Get informed about data licensing and copyright. Find the state of the art in arguments around scooping, data management, and publication, and arm yourself with evidence. Be prepared to raise issues of Open Access publication, data publication, licensing and copyright in group discussions. Expect that you will rarely win these arguments but that you are planting ideas in people’s heads.

Above all, to the extent that you can, walk the walk. Tell stories of successes and failures in sharing. Acknowledge that its complicated but provide access to data, tools, software and records where you can. Don’t act unilaterally unless you have the rights to do so but keep asking whether you can act and explaining why you think its important. Question the status quo.

Q3: One of the things that has made the technology startup world so vibrant is that there’s an enormous innovation ecoystem around startups – they benefit from venture capital, from angel investors, from open source, from University training of students, and so on. That’s part of the reason a couple of students can start Google in a garage, and then take it all the way to being one of the largest companies in the world. At the moment, there is no comparably successful innovation ecosystem in science. Is there a way we can develop such an innovation ecosystem?

There are two problems with taking the silicon valley model into science. Firstly capital and consumable costs are much higher. Particularly today with on demand consumer services it is cheap and easy to scale a web based startup. Secondly the timeframes are much longer. A Foursquare or a Yelp can be expected to start demonstrating revenue streams in 18-24 months whereas research is likely to take much longer. A related timeframe issue is that the expertise required to contribute across these web based startups is relatively common and widespread in comparison with the highly focussed and often highly localised expertise required to solve specific research problems.

Some research could fit this model, particularly analytical tool development, and data intensive science, and certainly it should be applied where it can. More generally applying this kind of model will require cheap access to infrastructure and technical capacity (instruments and materials). Some providers in the biosciences are starting to appear and Creative Commons’ work on MTAs [materials transfer agreements] may help with materials access in the medium term.

The most critical issue however is rapid deployment of expertise to specific problems. To apply a distributed rapid innovation model we need the means to rapidly identify the very limited number of people with appropriate expertise to solve the problem at hand. We also need to rethink our research processes to make them more modular so that they can be divided up and distributed. Finally we need capacity in the system that makes it possible for expertise to actually be rapidly deployed. Its not clear to me how we achieve these goals although things like Innocentive, CoLab, Friendfeed, and others are pointing out potential directions. We are a long way from delivering on the promise and its not clear what a practical route there is.

Practical steps: more effective communication mechanisms will be driven by rewarding people for re-use of their work. Capacity can be added by baseline funding. Modularity is an attitude and a design approach which we will probably need to build into training and will be hard to do in a community where everything is bespoke and great pride is taken in eating our own dogfood but never trusting anyone else’s…