Micropublication and open source research
This is an extract from my (very early) draft book on the way the internet is changing how science is done.
I would like to legitimize a new kind of proof: `The Incomplete Proof’. The reason that the output of mathematicians is so meager is that we only publish that tiny part of our work that ended up in complete success. The rest goes to the recycling bin. [...] Why did [the great mathematician] Paul Cohen stop publishing at the age of 30? My guess is that he was trying, and probably still is, to prove [the Riemann Hypothesis]. I would love to be able to see his `failed attempts’. [...] So here is my revolutionary proposal. Publish all your (good) thoughts and ideas, regardless of whether they are finished or not.
- Doron Zeilberger
Imagine you are reading a research article. You notice a minor typo in the article, which you quickly fix, using a wiki-like editing system to create a new temporary “branch” of the article – i.e., a copy of the article, but with some modifications that you’ve made. The original authors of the article are notified of the branch, and one quickly contacts you to thank you for the fix. The default version of the article is now updated to point to your branch, and your name is automatically added to a list of people who have contributed to the article, as part of a complete version history of the article. This latter information is also collected by an aggregator which generates statistics about contributions, statistics which you can put into your curriculum vitae, grant applications, and so on.
Later on while reading, you notice a more serious ambiguity, an explanation that could be interpreted in several inconsistent ways. After some time, you figure out which explanation the authors intend, and prepare a corrected version of the article in a temporary branch. Once again, the original authors are notified. Soon, one contacts you with some queries about your fix, pointing out some subtleties that you’d failed to appreciate. After a bit of back and forth, you revise your branch further, until both you and the author agree that the result is an improvement on both the original article and on your first attempt at a branch. The author approved default version of the article is updated to point to the improved version, and you are recognized appropriately for your contribution.
Still later, you notice a serious error in the article – maybe a flaw in the logic, or a serious error of omission material to the argument – which you don’t immediately see how to fix. You prepare a temporary branch of the article, but this time, rather than correcting the error, you insert a warning explaining the existence and the nature of the error, and how you think it affects the conclusions of the article.
Once again, the original authors are notified of your branch. This time they aren’t so pleased with your modifications. Even after multiple back and forth exchanges, and some further revisions on your part, they disagree with your assessment that there is an error. Despite this, you remain convinced that they are missing your point.
Believing that the situation is not readily resolvable, you create a more permanent branch of the article. Now there are two branches of the article visible to the public, with slightly differing version histories. Of course, these version histories are publicly accessible, and so who contributed what is a matter of public record, and there is no danger that there will be any ambiguity about the origins of the new material, nor about the origin of the disagreement between the two branches.
Initially, most readers look only at the original branch of the article, but a few look at yours as well. Favourable commentary and a gradual relative increase in traffic to your branch (made suitably visible to potetial readers) encourages still more people to read your version preferentially. Your branch gradually becomes more highly visible, while the original fades. Someone else fixes the error you noticed, leading to your branch being replaced by a still further improved version, and still more traffic. After some months, reality sets in and the original authors come around to your point of view, removing their original branch entirely, leaving just the new improved version of the article. Alternately, perhaps the original authors, alarmed by their dimunition, decide to strike back with a revised version of their article, explaining in detail why you are wrong.
These stories illustrate a few uses of micropublication and open source research. These are simple ideas for research publication, but ones that have big consequences. The idea of micropublication is to enable publication in smaller increments and more diverse formats than in the standard scientific research paper. The idea of open source research is to open up the licensing model of scientific publication, providing more flexible ways in which prior work can be modified and re-used, while ensuring that all contributions are fully recognized and acknowledged.
Let’s examine a few more potential applications of micropublication and open source research.
Imagine you are reading an article about the principles of population control. As you read, you realize that you can develop a simulator which illustrates in a vivid visual form one of the main principles described in the article, and provides a sandbox for readers to play with and better understand that principle. After dropping a (favourably received) note to the authors, and a little work, you’ve put together a nice simulation. After a bit of back and forth with the authors, a link to your simulation is now integrated into the article. Anyone reading the article can now click on the relevant equation and will immediately see your simulation (and, if they like, the source code). A few months later, someone takes up your source code and develops the simulation further, improving the reader experience still further.
Imagine reading Einstein’s original articles on special relativity, and being able to link directly to simulations (or, even better, fully-fledged computer games) that vividly demonstrate the effects of length contraction, time dilation, and so on. In mathematical disciplines, this kind of content enhancement might even be done semi-automatically. The tools could gradually integrate the ability to make inferences and connections – “The automated reasoning software has discovered a simplification of Equation 3; would you like to view the simplification now?”
Similar types of content enhancement could, of course, be used in all disciplines. Graphs, videos, explanations, commentary, background material, data sets, source code, experimental procedures, links to wikipedia, links to other related papers, links to related pedagogical materials, talks, media releases – all these and more could be integrated more thoroughly into research publishing. Furthermore, rather than being second-class add-ons to “real” research publications, a well-designed citation and archival system would ensure that all these forms have the status of first-class research publications, raising their stature, and helping ensure that people put more effort into adding value in these ways.
Another use for open source research is more pedagogical in flavour. Imagine you are a student assigned to rewrite Einstein’s article on general relativity in the language of modern differential geometry. Think of the excitement of working with the master’s original text, fully inhabiting it, and then improving it still further! Of course, such an assignment is technologically possible even now. However, academia has strong cultural inhibitions against making such modifications to original research articles. I will argue that with properly authenticated archival systems these issues could be addressed, the inhibitions could be removed, and a world of new possibilities opened up.
Having discussed micropublication and open source research in concrete terms, let’s now describe them in more abstract terms, and briefly discuss some of the problems that must be overcome if they are to become viable modes of publication. More detailed resolutions to these problems will be discussed in a later post.
Micropublication does three things. First, it decreases the size of the smallest publishable unit of research. Second, it broadens the class of objects considered as first-class publishable objects so that it includes not just papers, but also items such as data, computer code, simulations, commentary, and so on. Third, it eliminates the barrier of peer review, a point we’ll come back to shortly. The consequence is to greatly reduce the friction slowing down the progress of the research community, by lowering the barriers to publication. Although promising, this lowering of the barriers to publication also creates three problems that must be addressed if the research community is to adopt the concept of micropublication.
The first problem is providing appropriate recognition for people’s contributions. This can be achieved through appropriate archival and citation systems, and is described in detail in a later post.
The second problem is quality assurance. The current convention in science is to filter content before publishing it through a system of peer review. In principle, this ensures that only the best research gets published in the top journals. While this system has substantial failures in practice, on the whole it has improved our access to high-quality research. To ensure similar quality, micropublication must use a publish-then-filter model which enables the highest quality research to be accurately identified. We will discuss the development of such filtering systems in a later post. Note, however, that publish-then-filter already works surprisingly well on the web, due to tools such as Google, which is capable of picking out high value webpages. Such filtering systems are far from perfect, of course, and there are serious obstacles to be overcome if this is to be a successful model.
The third problem is providing tools to organize and search through the mass of publication data. This is, in some sense, the flip side of the quality assurance problem, since it is also about organizing information in meaningful and useful ways, and there is considerable overlap in how these tools must work. Once again, we will discuss the development of these tools in a later post.
Open source research opens up the licensing model used in research publication so that people may make more creative reuse of existing work, and thus speed the process of research. It removes the cumbersome quote-and-cite licensing model in current use in sciece. This makes sense if one is publishing on paper, but is not necessary in electronic publication. Instead, it is replaced by a trustworthy authenticated archive of publication data which allows one to see an entire version history of a document, so that we can see who contributed what and when. This will allow people to rapidly improve, extend and enhance other people’s work, in all the ways described above.
Academics have something of a horror of the informal re-use that I may appear to be advocating. The reason is that the principal currency of research is attention and reputation, not (directly) money. In such a system, not properly citing sources is taken very seriously; even very illustrious researchers have fallen from grace over accusations of plagiarism. For these reasons, it is necessary to design the archival system carefully to ensure that one can gain the benefits of a more informal licensing model, while still adequately recognizing people’s contributions.
Overarching and unifying all these problems is one main problem, the problem of migration, i.e., convincing researchers that it is in their best interest to move to the new system. How can this possibly be achieved? The most obvious implementations of micropublication and open source research will require researchers to give up their participation in the standard recognition system of science — the existing journal system. Such a requirement will undoubtedly result in the migration failing. Fortunately, I believe it is possible to find a migratory path which integrates and extends the standard recognition system of science in such a way that researchers have only positive incentives to make the migration. This path does not start with a single jump to micropublication and open source research, but rather involves a staged migration, with each stage integrating support for legacy systems such as citation and peer review, but also building on new systems that can take the place of the legacy systems, and which are better suited for the eventual goals of micropublication and open source research. This process is quite flexible, but involves many separate ideas, which will be described in subsequent posts.