Reinventing scientific papers

By guest blogger Robin Blume-Kohout

In 2005, Slate published twelve essays on “How to reinvent higher education”. The opening paragraphs of one, by Alison Gopnik, still burn in my mind:

I’m a cognitive scientist who is also a university professor. There is a staggering contrast between what I know about learning from the lab and the way I teach in the classroom. … I know that children, and even adults, learn about the everyday world around them in much the way that scientists learn. …Almost none of this happens in the average university classroom, including mine. In lecture classes, the teacher talks and the students write down what the teacher says. In seminars, the students write down what other students say. This is, literally, a medieval form of learning…

In short, we are screwing up — and we should know better.

Scientific publishing — the primary means by which we communicate with other scientists — is in the same boat:

We’re doing it badly,
Our methods are medieval,
We should know better.

Technically, point #2 is unfair. Scientific publishing dates from the 1660s, when Proceedings of the Royal Society emerged from Henry Oldenburg‘s voluminous scientific correspondence. If you wanted to show off your research in 1665, you wrote a letter to Henry. When he got it (a month or two later), he forwarded it to someone who could tell him whether it was any good. If the referee liked it, then (after a few more month-long postal delays), Henry read your letter out loud to the Royal Society, and it got recorded in the Proceedings.

These days, it’s quite different. Specifically:

We write letters in LaTeX, and email them,
There are so many journals that nobody reads most of them,
Henry doesn’t read your letter out loud.

The rest of the system is unchanged. This raises a bunch of questions, like “Why does publication take 6 months?”, “Why is it so expensive?”, and “Does anybody read journals, what with the arXiv?” I’m not going to discuss these questions, but if you’re interested, you might try the Wikipedia article on scientific journals. Which is a perfect example of why we should know better.

I’m not talking about the content. I’m talking about the article itself, and how I referenced it — with a hyperlink. I’ve given you incredible power. Quickly and easily, you can:

Verify my sources,
Find answers to questions I’ve raised — if you’re interested,
Get more detailed explanations,
Discover and explore related topics.

Enabling you this way is part of the core mission: The purpose of scientific communication is to educate, extensibly and efficiently. Education: After months of research, I publish a paper so that you can learn what I know — without all the hard work. Extensibility: I include proofs, arguments, figures, explanations, and citations — so that you can verify my work and place it in the context of prior work. Efficiency: Writing this way takes more months — but thousands of my colleagues can save months by reading my paper.

We are failing at efficiency, for Wikipedia illustrates a more efficient way of educating — or, if you prefer, a source for more efficient learning. I don’t mean that Wikipedia is The Answer. We need to build a new medium, replacing medieval features with the best features of Wikipedia. For instance,

Hypertext revolutionizes scientific writing, by organizing content as a tree instead of a list. Articles and textbooks have a linear structure. To find a specific answer, I have to read (on average) half the text. In a hypertext environment like Wikipedia, I can search through a cluster of ideas for answers — even to questions I haven’t been able to formulate yet. Hyperlinking specifically enables…
“Choose your own adventure” approaches to a body of work. Scientific papers represent a cluster of related ideas. Different readers, with different background knowledge, will benefit from different paths. A well-structured (and judiciously hyperlinked) electronic text can become the reader’s personalized guide. Parts of several such texts can be combined by a customized path, to form an entirely new text. This requires…
Modular content, dividing a text into bite-sized chunks. Modularity also offers intrinsic benefits. One is reusability; a single explanation can be referenced in many contexts. Current scientific writing is necessarily terse. Hyperlinks and modularity allow the text to be larded with optional explanations, which clarify potential confusion without breaking the flow. Modularity also allows alternative approaches, providing the reader with multiple analyses of the same concept. Such alternatives are particularly useful when combined with…
Distributed editing by a large community of contributors. This is a vast can of worms that I shan’t open here, but two things are clear. First, a forum for scientific communication cannot adopt Wikipedia’s “anyone can edit” motto. Second, the potential benefits of post-publication editing, combined with an unlimited pool of “editors”, are too great to ignore. Balancing these imperatives is an outstanding challenge, but a relatively uncontroversial technique is…
Attached commentary, either critical or explanatory, by readers. Consider, for example, the Talmud, where post-publication analysis (the Gemara) attempts to clarify the original text (the Mishnah). More recently, commenting systems have proliferated on blogs and (with much, much less intellectual rigor) news-sites like Slashdot. In a scientific publishing context, commentary can
- correct mistakes, either technical or factual, in the original text,
- provide an alternative to a module that (the reader feels) could be improved,
- critique and question the original work,
- update older work in light of new research.

These points are not a prescription. They are a manifesto (“We can do better, see!”), and a plea (“Help make it better!”). Published scientific communications are the collective memory of scientists. If we cannot access it quickly and efficiently, we are effectively brain damaged. Improving our access makes us — quite simply — smarter. All we need to do is to use the computing tools before us intelligently.

We’ve taken first steps — the preprint arXiv, central repositories like PROLA, and online publishing by the likes of Nature. These are baby steps. We’re doing the same old thing a little better with new technology. Sooner or later, scientific communication is going to be restructured to really take advantage of what we can do now… and it’s going to make us (collectively) a lot smarter.

I can’t wait.

25 comments

Travis says:

September 11, 2007 at 10:44 pm

This is a great idea, and I’m glad you’re pushing it, but I’m worried it will be a long time in coming. The reason is that we’re stuck in a sort of local minima.

Young scientists are unlikely to be the first to adopt novel publication techniques, because they especially need the recognition (e.g., from faculty hiring committees) that comes with the “old-school” way of doing things. Right now, search committees probably don’t know how to weight an online “modern” publication that isn’t through the standard “medieval” system. In other words, they see the benefit, but can’t afford the risk.

Older scientists are unlikely to be the first adopters, because, well, they’re old, and, with a few exceptions, older people are more resistant to change.

That leaves people in the middle–those who have more or less established themselves, but aren’t too old to learn new tricks. In terms of highly visible people in the quantum information field, that’s pretty much you (Michael), Dave Bacon, and Scott Aaronson. So, stop waiting and start trend-setting.

One more thing–in order to help assuage some of the concerns about the negative aspects of Wikis and participatory publishing, why not allow designated reviewers to nominate and accept a paper for “permanent status”, once it’s reached a certain level of maturity? Readers would see the “permanent” version of the paper by default, with the option to see any new revisions. Authors would be able to tout the fact that their paper had been “accepted”, as a way of showing it wasn’t just some random junk that someone stuck online on a whim.
Nathan Kurz says:

September 12, 2007 at 3:40 am

Hi Robin —

A good posting and a good parallel. Being me I’m going to quibble a little bit.

First, like Travis said more delicately: while you may publish papers solely to help others learn what you have learned, some of those others publish them primarily to keep receiving a paycheck. Broken though the system may be for the purposes of efficient communication, it may be meeting the needs of those more concerned with the job than the science. While the replacement system doesn’t need to include this ‘feature’, it needs to take it into account.

Second, I happen to really like text, and I generally find that a well-written text beats an naive set of hyperlinks any day. Given a FAQ that is broken down question by question, the first thing I do is look for plain text version so I can use it more efficiently. Claiming that you need to read half the text on average is flat-out false: if you can’t find the answer in the abstract (or title) you skim until you find a relevant section header.

Ditto the modularity and the reuse — a well-written paper already has a lot of useful structure, and copy-and-paste is certainly common practice already. But the commentary, the crosslinks, the group editing: yes, yes, and yes. But like Travis says, don’t wait, do it!
Michael Nielsen says:

September 12, 2007 at 11:20 am

Just a small comment from the proprietor: Robin has told me that he’s checking in reasonably regularly to the comment thread, and will be participating.
Michael Nielsen says:

September 12, 2007 at 11:28 am

Travis: “stop waiting and start trend-setting”

I agree completely, and hopefully will soon have more than vapour to show for it 🙂 Incidentally, Terry Tao and Tim Gowers are doing some very interesting stuff along these lines in maths; they’re not exactly junior.

Travis: “Young scientists are unlikely to be the first to adopt novel publication techniques, because they especially need the recognition”

I think there’s a question of “how young”. I’m pretty optimistic that upper-level undergrads and beginning grad students may be more open to try new approaches. I think the tunnel probably narrow for postdocs and tenure-track folks.
Colin McCormick says:

September 12, 2007 at 2:15 pm

Robin, this is a good idea and a nice post. I’m glad to see it articulated so well in one place. But like Nathan, I’m still going to quibble a bit.

I think the model you’re using for scientific publication is too restrictive. Specifically, it really describes theoretical papers much more than experimental ones. As you say: I include proofs, arguments, figures, explanations, and citations … but no data. For most experimentalists such as myself, the first reason to read scientific papers is to see the data our colleagues have collected, and the second reason is to understand and evaluate their theoretical analysis — in that order.

Given that mode of reading papers, what experimentalists really care about are better ways of presenting data. For instance, the online optics journal Optics Express was launched several years ago to allow publication of data in audio and video formats. It’s a good journal, and it has helped set a new standard for how optics data get published. (Incidentally, it also makes its articles freely available online and speeds up the review and publication process significantly, two important near-term goals of improved scientific publication.)

There are a lot of other issues surrounding publication models for data, such as whether and how to make published data in genetics and other fields available to other researchers for further analysis.

I don’t argue with the idea that there are significant roles for attached commentary, distributed editing, and modular content, for both theoretical and experimental papers. (I’m not sold on the idea that hypertext is all that useful — like Nathan, I think a well-written paper already has a lot of good structure.) But I would like to see some more thinking about how this model of “Papers 2.0” would apply to data-based, experimental articles. There are an awful lot of us experimentalists, after all…
Jan Korsbakken says:

September 16, 2007 at 4:05 pm

Hi Robin, great post! I can’t wait either. With regards to what Travis wrote, I’m not sure that career concerns needs to be a big hindrance to getting this off the ground. Wiki-style publishing does not have to be incompatible with publishing in journals. If you have written a good journal article, it should not require all that much extra work to modify it and make the material suitable for publishing as modules in a wiki. And conversely, if you have contributed something to a wiki, and it is substantial enough to warrant a journal publication, then it should be fairly easy to convert it to a journal article. The latter even has an extra benefit: Make sure to reference the wiki material in the journal article, and that way anyone who is interested enough in your work to follow references will become aware of the existence of the wiki, and may want to start using it themselves.

This setup may not be optimal, since it might lead to the wiki at least initially becoming little more than a commentable and editable arXiv, and it would not immediately give us “micropublication” of the kind Michael discussed in an earlier post. But on the other hand, it would be a start, and a significant improvement over the status quo. We don’t need to hit an optimum solution right away (the first web search engines in the early 90s were enormously important to making the www useful, even though nobody uses them anymore). I think the most important thing right now, once a clear set of ideas have started to crystallize (Michael, how is that book coming along?). If we get a sizable core community of dedicated young idealists to start making wiki-versions of the work they publish (regardless of whether the wiki version is the main version or a derived one), over time it would hopefully attract a critical mass of other contributors too. And if after a long, long time many of the “non-wikified” professors have grown too old to sit on hiring committees, the need for journal publications may fade, but that does not need to happen anytime soon for the wiki model to be successful.

So, I think the big question is, does anyone right now have good ideas for actual implementations? If a wiki were around, I would certaintly be more than happy to post material from all my future research there, and help maintain and expand it for that matter. Any entrepreneurs around? 🙂
Robin Blume-Kohout says:

September 16, 2007 at 10:50 pm

Nate:

You raise an excellent question: is the “point of publishing” really to convey information, or is it to justify academics’ existence?

Clearly, it does both. Justifying my existence, however, cannot be the purpose of publication. Why? Because publications justify my existence only because they are of some intrinsic value. Society values them. Scientists are rewarded for publishing because — we presume — those publications are valuable, at least in aggregate.

I maintain that the intrinsic value of publications is that they convey information — they educate. They provide information that citizens, engineers, and other scientists use, to produce more value. Improve the efficiency with publications educate, and we increase their value. This is a Good Thing.

I think research scientists should still be evaluated precisely as they are today — by how much they contribute to the body of scientific knowledge. Measuring “Papers 2.0” contributions may be a bit more complicated… but I think the community is flexible enough to do so. It’s not like we aren’t already dealing with a complex, Byzantine set of implicit rules for evaluating a CV!

In the end, it’s a matter of trend-setting. If (a) people start publishing Papers 2.0, and (b) it really is as much better as I predict, and (c) a solid minority of the community recognize it, then I think it doesn’t matter whether you’re publishing to advance Science or to advance your career. Either way, you’ll be motivated to publish using the more effective system.
Robin Blume-Kohout says:

September 16, 2007 at 10:57 pm

Travis:

Your point — that there is a lot of inertia keeping us in the current system — is not only right, but also one of the most important issues we have to consider. I think Michael has some very far-reaching and sophisticated ideas about how to address this “incentive structure” issue, but here’s my idea.

Consider how you deal with water stuck in a local minimum. You dig a little trench, leading to the optimal solution. Then you let the water flow of its own accord, widening the trench as it goes.

To do this, we start a new open-access journal that implements the core of “Papers 2.0”. Articles are heavily linked, hierarchically structured (i.e., results are summarized, with links to details and explanation), and use review and pedagogical content to supplement research papers.

However, this journal is also authoritative, publishing discrete articles (consisting of a cluster of linked pages, rather than a sequential list) that can be cited. Content is reliable and fixed — no post-publication modification. We solicit and publish high-quality refereed material.

Articles published in Journal 2.0 are full publications on your CV. If the structure really has all the advantages I think it does, they will get read — and cited — more than traditional publications. This is the “cash reward” for publishing there… in addition to the appeal of changing the world.

At the same time, we can build in support for more controversial features — micropublication, distributed editing — and turn them on once Journal 2.0 achieves critical mass. Better yet, just open-source it, and let somebody else turn them on. I think this “gentle seduction” is the way out of our local minimum.
Robin Blume-Kohout says:

September 16, 2007 at 11:36 pm

Nate & Colin:

It’s absolutely true that “a well-written text beats a naive set of hyperlinks any day.” I also agree that “a well-written paper already has a lot of useful structure,” and a lot of work goes into fine-tuning that structure in my papers.

What I want is to make the structure even richer, and hypertext is a core primitive in doing so. By “hypertext”, I don’t imply any ideology, or stylistic imperative. I just mean the simple ability to mark up a text with interactive, dynamic links to other text.

My dream paper is this: you take a really great scientific paper, one that’s already as good as a paper can be in linear format, and then you
* link every piece of jargon to its definition,
* link every concept to is explanation, and
* link every theorem proof to an expanded version that lets me follow it without spending an hour wondering “How the heck does step 19 work?”
Personally, I can’t imagine a paper that couldn’t be improved — at least a little — this way.

Nate also points out that “copy-and-paste is certainly common practice already.” This really isn’t what I mean by modular/reusable text. For instance, in my latest paper I use the term “quantum process” about 40 times. That term deserves a full-page explanation, and I can’t possibly paste that explanation in every time I use it! In fact, space is at such a premium that I don’t include an explanation at all; I just cite a textbook.

Modular text lets me do two things. First, I can link every appearance of that jargon word (and its near-synonyms “quantum channel” and “CP map”) to the same explanation. Second — and in the long run this is more important — I don’t have to write that explanation, because I can link directly to the really really good explanation that somebody else spent a month polishing! In a journal publication, even if I _could_ copy and paste that explanation into my paper, it would be plagiarism and/or copyright violation.

Nate also points out that I don’t really have to read half the paper to find any given answer: “if you canâ€™t find the answer in the abstract (or title) you skim until you find a relevant section header.” Unfortunately, my experience with much of the scientific literature is that Sections 1…N-1 are required just to understand what the section header for Section N means.

To be fair: that statement, like the original one (that I have to read half the paper), is slight hyperbole. But only slight. My honest experience is that in at least 50% of the quantum information literature, the time required to find an answer is linear in the paper length. However, this may just mean that I’m not sufficiently smart.
Robin Blume-Kohout says:

September 16, 2007 at 11:52 pm

Colin:

You’re 103% correct. I wasn’t thinking about experimental papers, I don’t have the background to know how to deal with them, and we absolutely have figure that out.

Therefore, you’re hired. Your title is “Official Explainer of How Experimentalists Think,” and you start yesterday.

🙂

Having confessed my complete incompetence in that realm, let me hazard a couple of ideas. I think the tools I envision could be useful in a few ways that are specific to experimental papers:

1. I think you care about the methods as well as the data. There are probably techniques and tools that aren’t old hat to every experimentalist, and it might be nice to add some linked explanation to “…and then we flabulated the beam with a catadioptric widgie hooper…”. (Okay, as a theorist, _I_ would really like this).

2. Data analysis. A typical paper says “We wondered about X, so we built Y, took some data Z, and concluded that it meant Q.” From what I’ve seen, the analysis between Z and Q often gets very short shrift… and having an explanation of these details could be very useful for verifying that the results are significant.

3. Numerics. Experiments these days often probe stuff sufficiently nasty that theorists can’t predict it on a chalkboard, so y’all run some pretty cool numerics to get a benchmark. Boy, would I like to have some details on those numerics linked in… not just the code, but an explanation of what’s going on, and what assumptions were made. Since a lot of codes are built on the same foundation, modular text might be handy here.

Those are just ideas, and they may be entirely wrong-headed. I need to find out what experimentalists need…
Pingback: ç¼çœ¼ » é‡æ–°å‘æ˜Žç§‘å¦è®ºæ–‡
Digital says:

September 19, 2007 at 2:48 am

I really can’t agree, and I think your use of buzzwords — Papers 2.0 versus the “medieval” current standard — puts your argument on poor footing. The current “medieval” system has proven to work extremely well. You are going to have to come up with better arguments to change it. PDF papers on the arxiv already can have hyperlinking within the paper. (The electronic versions of good journals do, too, although I’m sure they maintain style guidelines.) You can have links pointing outside it, too, but fortunately that is less common.

Your “attached commentary” remark is the only one I find interesting. As an author, I want to maintain control over my published articles. I think the majority of attachments would be of poor quality, and hence a burden for me. On the other hand, as a reader I could see rare cases when they could be helpful — but primarily for older papers, and to find commentary I just need to read the new ones.

As to theorem proofs and extended proofs, I don’t want to write two versions of every proof. And I think much of the time, those “how does step 19 work?” moments wouldn’t be fixed by including an extended proof, because the problem is that the authors didn’t foresee your difficulty at all, as they had some different perspective.

Are page limits such a problem? Myself, I have never had a problem publishing a paper with complete proofs (but I’ve never gone past 50 pages).

Robin: “For instance, in my latest paper I use the term â€œquantum processâ€ about 40 times. That term deserves a full-page explanation, and I canâ€™t possibly paste that explanation in every time I use it!”

Have a definitions section. Standard “medieval” technique, and it works perfectly well. If you want, put (Def. 3) after the term the first time it is used in a section, and hyperlink it to the definition. Standard technique that lets someone read section N and skip section 3, 4, …, N-1. Or, I suppose, you could just hyperlink the word itself for an arxiv paper.
Robin Blume-Kohout says:

September 19, 2007 at 12:06 pm

Digital,

I’m really glad you brought some of these points up — although, unsurprisingly, I’m going to respectfully disagree.

Regarding buzzwords, I don’t think “medieval” counts as a buzzword, although I plead guilty to provocative language. I can’t take credit for “Papers 2.0”, which was introduced in the comments by Colin, but even so I’m not clear on why you feel it undermines my original post.

You remark that “[t]he current â€œmedievalâ€ system has proven to work extremely well,” I have to ask “What are you comparing it to?” It’s the only game in town, so we have no way of knowing whether it’s working well or not. History is full of confident statements along the lines of “The way we’re doing X right now is clearly the best way,” and it’s amazing how rarely they’re right.

More importantly, I think many of the things you suggest actually contradict that claim. Publication is evolving rapidly; we publish on the arxiv, in PDF, and those papers can use internal hyperlinks. Is there some reason to expect that we should stop innovating exactly where we are right now?

As you point out, hyperlinking within papers is already possible. It’s awkward, because PDF is a linear format with hyperlinking grafted on, but it’s there. Using that feature to enhance the usability of your Definitions section is an excellent idea! and very much line with what I’m proposing.

In fact, I think it’s such a good idea that I want to take it further. For instance, if internal hyperlinks can save the reader time and confusion, then links to outside material are potentially even more powerful. The problem — and I think this is why you write “fortunately that is less common” — is that the way they’re implemented right now is a kludge. Click on an external link in Acrobat, and you get shunted over to a web browser (if it works at all). This is one of the reasons I want to rethink the format of publication — there are a lot of powerful techniques that don’t graft on to the status quo very well.

I sympathize with “As an author, I want to maintain control over my published articles,” and to an extent, I agree. However, the author of an idea does not retain control over that idea — other authors are free to take that idea and run with it, in novel directions. I want to enable them to do so in very close proximity to the original idea.

For instance, you’re quite right that “those â€œhow does step 19 work?â€ moments wouldnâ€™t be fixed by including an extended proof, because the problem is that the authors didnâ€™t foresee your difficulty at all.” This is one of the best reasons for modular text and attached commentary: the first reader to realize the difficulty can add commentary explaining how step 19 works, and subsequent readers can read that comment if and only if they have the same problem.

On the other hand, there are definitely times where the author _does_ foresee a potential difficulty. My papers usually shrink by a factor of 2 between rough and final drafts, as I regretfully excise explanatory material because _most_ readers would just be bored and/or distracted by it. I’d like my papers to be completely self-contained for all readers. That can only happen if my “paper” is transparently linked to a vast amount of supporting knowledge… which is the vision that motivates my proposal.
Michael Nielsen says:

September 20, 2007 at 10:24 am

Digital: Can you think of any ways the current publication system might be improved?
Jan Korsbakken says:

September 23, 2007 at 9:35 pm

Hi Robin, with regards to your response to Travis, looks like we agree on strategy. But do you think it is even necessary to wait for Papers 2.0 (I happily use that buzzword for lack of any better term) to reach critical mass before turning on “controversial features”? It would definitely be a good idea to let papers published there be citeable in such a way that they can be put on on a CV. To make the citations seem even more journal-like, there could even be discrete releases with volume numbers and everything. But wouldn’t it be better to do this by just providing a permalink or other kind of permanent reference which would always link to the author’s original submission, and then let anyone who so wishes go ahead and edit or add to it right away, as long as the new versions have references that are distinct from the one that the original author uses on his/her CV? Or do you think that authors will actually be so possessive of their own work that they will balk at anyone making derived or edited articles (and disregard all the times that they have probably wanted to do the same themselves)? Regardless of this issue though, what nitty-gritty work do you think needs to be done in order make a “Journal 2.0” see the light of day?
Robin Blume-Kohout says:

September 26, 2007 at 10:57 pm

Jan,

I’m glad you’re enthusiastic about the idea! There’s definitely an argument for leaping in with both feet. As John Wheeler said, “Start ‘er up, and see why she don’t run!”

However, there’s also a strong argument — especially in anything related to academia — for planning wisely. We’re not precisely operating in the real world here; the markets and incentive structures are murky and sometimes counter-intuitive. I do not want to alienate large parts of the community by leaping before I look!

Case in point: I mentioned Wikipedia because it demonstrates several terrific features. However, I don’t think Wikipedia is a good model for scientific publication! Permanence (of articles), authoritativeness (of stated facts), and credit (to authors) are really important, and I think the Wiki model is unsuited to doing this right. For this (and a few other) reasons, I don’t want to just fire up a wiki and start soliciting contributions.

In short, I think we need to do some serious and intense thinking about the key features of the ideal system — and then, relatively quickly, start trying to build it. There’s a happy medium between going off half-cocked and overthinking. I don’t want to start building something before I know what the foundation is, because then I’ll realize that it has to be torn down and restarted.

A couple of specific responses:
1. “Wiki-style publishing does not have to be incompatible with publishing in journals.” As I mentioned above, I don’t think a wiki is the right foundation. Also, I don’t think that this “parallel publishing” model is sustainable, because authors do twice the work for less than twice the credit. I’d rather integrate with the current system — i.e., by allowing publications in a new format that exist on equal footing with existing journal publications.

2. “[I]f you have contributed something to a wiki, and it is substantial enough to warrant a journal publication, then it should be fairly easy to convert it to a journal article.” This is a really good point. I’d like to take it further — I’d like to minimize the divide between collaboration and publication. I’d like to use one tool for collaboration, note-taking, and scratchwork AND for publication… so that publication becomes a matter of changing the permissions on a subset of your notes. But that’s a topic for a different day!

3. “We donâ€™t need to hit an optimum solution right away (the first web search engines in the early 90s were enormously important to making the www useful, even though nobody uses them anymore.” It’s an appealing analogy, but there’s a difference. The first search engines were filling an empty niche. Bad search was better than no search. Here, there is an existing system. A publication system has to be _clearly_ better than the existing system to motivate folks to switch.

4. “[W]ouldnâ€™t it be better to do this by just providing a permalink or other kind of permanent reference…then let anyone who so wishes go ahead and edit or add to it right away…?” This is the germ of a solution, but there are important conflicts to be resolved. If you put the original version on your CV, but 2 years later the most popular “front page” version of a paper is substantially different, how do we deal with that? If Alice makes a few changes to Bob’s paper, changing the conclusions so that Bob doesn’t agree with them, whose name goes on the paper? There are a few hundred other similar scenarios to be foreseen and dealt with… which is why I’m loathe to leap before I look.

And — of course — _this_ is part of the discussion that will (hopefully) enable a new system. So, thanks! and keep critiquing!
Robin Blume-Kohout says:

September 26, 2007 at 11:07 pm

Jan,

One more thing, which I forgot to mention. You wrote, “Do you think that authors will actually be so possessive of their own work that they will balk at anyone making derived or edited articles (and disregard all the times that they have probably wanted to do the same themselves)?”

Honestly, I do think they will — and I’m definitely somewhat sympathetic. Your “improvement” to my work might (from my perspective) be a fatal mistake, or worse. Consider all the concluding paragraphs that say (basically), “This paper conclusively demonstrates that our theory rocks, blah blah”, and consider how the authors might react to a modified concluding paragraph that reads “This paper demonstrates that our theory, while successful in certain areas, has critical limitations…” Frankly, I can think of several papers where I’d like to do exactly that — but I don’t labor under the illusion that the authors would approve!

The Open Source movement in software is a tempting (and sometimes useful) role model, but there are some bedrock differences between code (which is really defined by what it DOES) and publication (which is partially literature/art).

Finally, regarding “and disregard all the times that they have probably wanted to do the same themselves”… just to play devil’s advocate, I’ll point out that I occasionally want to hit certain people very hard — but I’m quite glad to live in a society in which neither I nor other people are allowed to do so. We need a better argument for the right to edit other people’s papers than “everybody wants to,” because Western legal philosophy holds (in general) that your freedom of action is trumped by my freedom not to be acted upon.
Daniel Doro Ferrante says:

September 28, 2007 at 10:32 am

Hi Robin,

I’m a bit ‘constrained’ in my time lately… so, i figured i’d do a ‘core dump’ of what i believe i have to say and read the comments later. 😉 I know, not exactly a good practice, not my habit either… but, it’s the best i can do right now; so i apologize before hand if i repeat some of the comments already made.

Also, as opposed to praxis, i’ll put some references in the very beginning… for a reason: in this particular case i think we need more contextualizing then “referencing”; and that’s my point in putting them forth right away: if people can skim through them, it’s already good enough… maybe they’ll catch bits’n pieces of my line of reasoning this way (though i hope to make things clear as i continue this comment, below). 🙂

So, there you have it (alphabetical order): ccLearn, Connotea, Did You Say â€œIntellectual Propertyâ€? Itâ€™s a Seductive Mirage, Directory of Open Access Journals, Free Culture Movement, Free Software/Free Science, Open Science or Free Science, Saving Academia from Market Enclosure, Scholarpedia, Science Commons, SciRate, The Academic Reader, href=”http://3quarksdaily.blogs.com/3quarksdaily/2006/10/the_future_of_s_1.html”>The Future of Science is Open, Part 1: Open Access, The Future of Science is Open, Part 2: Open Science, The Future of Science is Open, Part 3: An Open Science World, The New Science of Sharing, Zotero. If you guys read portuguese (pt_BR), i wrote an article for a brazilian magazine that could be interesting, Democracia e Acesso Livre ao Conhecimento (Democracy and Free Access to Knowledge). Not-so-loosely-related to these links, i keep a list of “Science 2.0” links on my blog (check on the right-hand-side, under “Science 2.0”). And, to sum up this reference list, here are the last ones: Transforming XHTML to LaTeX and BibTeX, XML, XSL and TeX: Room for Cooperation and DocBook to LaTeX XSL stylesheets.

I hope that, with this set of links, my general point is somewhat clear… but, let me put it in writing, just in case. 😉

I believe, as some other folks also do, that we’re at a cross-roads right now, and here’s the reason: The Free Software movement has brought forward a “revolution” in thought, and it did so using a very simple principle, “freedom”. See, Computer Science was the first arena where the distance between “science” and “technology” singularly collapsed; here’s an example: If you were a graph theorist many moons ago, you were considered nothing but a ‘pure mathematician’ with very little practial use for society. Then, with the advent of the Internet and the massive connection of people worldwide… all of a sudden, this graph theorist became Ã¼bber-important: He can cook up a webbrowser overnight!

Hiperboles aside, i’m sure you understand what i mean: our society has always paid dues to ‘technology’, to artifacts that can actually perform some task â€” and “Science” hardly fits this profile, hence the centuries-long difficulty in funding [basic] science and research. Now, after the Internet has showed us, the society as a whole, a brand new world… we are coming to the realization that this brand new world intrinsically needs [basic] research… for the distance between Science and technology just shrank [to a point] in this new “computer age” of ours. That is, a Lattice QCD physicist, now, can be quite an important player on the development of a new supercomputer and new hardware at large… but â€” and here’s the trick! â€” nowadays society is ready to “understand” and assimilate what this means… and it’s also ready to value it accordingly. This is the revolution: the networking, connecting and “socializing” that the first MULTIX/UNIX systems provided for some has now been raised into another plainfield: That’s what the Inernet provided by connecting people together â€” now society knows that it can harness the ‘collective creative power’ via the ‘Net!

In fact, this is clear to me in this “Web 2.0” wave: the “power of the collective creativity” funneled via technology into “Social News” and sharing of all kinds! (This was the basic idea behind MULTIX/UNIX; so, in this sense, Web2.0 is nothing but “UNIX on the Internet”.) And here’s where i’ll close this first argument, in a circle: What the Free Software movement brought forward was the concept of freely sharing ideas! And this has been the actual power of the ‘Net since it’s inception. However, the important thing to note is that this idea of “sharing freely” is nothing new for Science and Scientists: Without freely sharing our ideas, we simply do not move forward! I don’t care how much people believe in the fairytail of the ‘lonely genius’, this is absolutely NOT the most frequent case, nor is it the most pleasurable. The creative power of our science is inherently collective… science is a “team work”, it’s a “contact sport”! 🙂

These are my reasons to think that we’re sitting on a critical point in time: all of our previous [mis]conceptions about “value” will have to change in order for us to move forward, as humans. The “distance” between Science and technology is not what it once was… in this new era of ours, it will be increasingly more difficult to attach ‘value’ to technology without realizing the true value of Science.

Having said that, we must realize one thing within our own community, the scientific one: Among our values, one of the most basic ones is freedom. Freedom of information, freedom of sharing, freedom of collaboration, freedom of access, etc, etc, etc. Even during the ‘Cold War’ years we had collaborations between Russian and American scientists! Sure it could have been better and more intensily harnessed… but, my point is that even with all of the difficulties that were presented at that time, Science still needed “freedom” in order to move on.

The same is true now: We need freedom to access papers, to read them, to access [raw] data, algorithms, to communicate with our peers and expose our ideas! And it is in this last one that the problem with “publishing” comes in.

I’m sure that, by now, we’ve all heard the stories about the magazines “Topology” and “K-Theory”, among the problems with publishing in the Physics world. Otherwise, John Baez has a good summary (besides, Google is always your friend): What We Can Do About Science Journals.

What i mean by all of this is the following: This whole shebang is a paradigm shift. And, as such, there will be many non-trivial hurdles ahead, so we need to have pretty concrete and robust ideas and values in our minds… otherwise, we’re doomed to fail from the get go. These things have to be clear in our heads so we can properly position ourselves and plan our tactics and strategies accordingly. “It’s always good to know what you’re doing.” It’s the analogous to knowing the answer before you start calculating… 😉 We’re not gonna go to battle to loose… at least we should put up an honorable fight. 🙂

As for the ideas you presented on your post, i agree: we really need to reform our current ways. The ways to do it are many, but i think it’s essential to recognize one simple fact: whatever these new ways may be, i’ll bet that they’ll all ‘harness the power of the collective creativity’. Be it in Wiki form (distributed editing, commentaries), be it in Modular Content form (different experts writing “core documents” about their area of expertise), and so on. Note that technologies like Connotea and Zotero (both linked above) already do this. (Also, there’s a PDF Reader for the Mac that allows you to comment on your PDF files! It’s like putting post-its all over your PDF â€” no different than Zotero…)

This has clear problems, the signal-to-noise ratio being the one that more quickly jumps to the eyes. Minimizing the noise will be a behemoth by itself… maximing the signal is a whole different ballgame altogether! 😉

But, in this sense, i think that a scheme like that of Scholarpedia (linked above) can come in quite handy. Their modus operandi can be quite easily generalized.

And, to finish this comment, let me bring XSL to attention (with its appropriate links above): Note that we can translate HTML, XHTML, XML, etc, all into LaTeX (and, thus, PS or PDF) using XSL’s! This is quite a feat to my mind. Imagine this: you can have your articles in any format you so desire, for it can be readily converted into any other format easily with the touch of a button. This is sexy! 😉

So, here are my quite preliminary views on this topic. I apologize for not having read the previous comments, and also if i went past the ‘polite’ length that a comment should have. But, after all talk and all, i felt i had to make a ‘core dump’. 😉

[]’s!
Travis says:

September 29, 2007 at 5:07 am

It might be interesting to think about this problem from a higher level. The scientific publishing world has three deliberate purposes:

1) Publish information.
By “publish”, I mean the minimal sense–just making it publicly available in some form. “Real” journals don’t do this any more than does the arXiv or even just uploading a PDF to a webpage.

2) Provide metadata.
Journals provide all sorts of metadata, from the explicit (author, date of submission, etc), to the implicit (the referees’ seal of approval).

3) Organize information.
Just by having different journals for different fields, journals help organize information. Citations serve as a medieval form of hyperlinking. Finally, most journals now offer online search facilities.

To this, we can add a fourth implicit purpose:

4) Reward authors via bolstered reputations

All of this is obvious and stuff you already know, of course, but I think it’s helpful to sort it out like this.

As far as I understand it, you project seeks to change relatively little about the nature of points 1, 2, and 4. Sure, authors may be able to provide a little more information because it can be better organized, but that’s not the big difference.

What you’re looking to do is re-invent the interface to scientific knowledge.

As we all know, Google often doesn’t do a very good job at finding scientific knowledge in an organized and complete manner, because it’s not smart enough to really understand scientific citations (keyword proximity doesn’t cut it). Wikipedia is often a good source for introductory material, but by design is not a primary reference. Plenty of systems exist for making discussion forums (such as this one), but they’re not adapted for discussing and augmenting scientific papers.

Your goal might be accomplished by starting a new kind of journal, or you might solve it by creating a specialized search engine that allows user participation (this is the direction SciRate is going). I don’t know, but I bet the winner will be whoever provides the most functional and efficient interface.
Travis says:

September 29, 2007 at 5:10 am

I just re-read my post, and it sounds kind of arrogant, which was not my intention. Robin and Michael–I’m sure you already have all this figured out–I’m just trying to re-frame the discussion here in this forum around thinking about this project as being about interface, first and foremost.
Daniel Doro Ferrante says:

September 29, 2007 at 9:25 am

Robin,

Here’s something that i forgot to put in my comment above… and have forgotten another couple of times that i’ve remembered it: Toward a Higher-Dimensional Wiki and What might an expository mathematical wiki be like?.

I think that both of these posts have intrinsic and interesting overlaps with this discussion we’re having here. 🙂

Besides, Travis mentioned metadata above… i can’t stress how important this is… but i’m sure we’re all aware of this. 😉

[]’s!
Jim Harrington says:

September 29, 2007 at 11:32 am

For those who are interested in more of what Alison Gopnik has to say on the connection between scientists and children’s learning, I highly recommend “Scientist in the Crib” by Gopnik, Meltzoff, and Kuhl.
Robin Blume-Kohout says:

September 29, 2007 at 1:04 pm

Travis,

I just want to violently disagree with two of your points: “All of this is obvious and stuff you already know, of course,” and “my post… sounds kind of arrogant… Iâ€™m sure you already have all this figured out”

I hope you don’t object to my disagreement! 🙂

First of all, I think your first post is precisely correct, and I can’t improve on “What youâ€™re looking to do is re-invent the interface to scientific knowledge.” I believe that the interface is hugely important.

Your post provides (at the very least) a new interface, by refactoring the issue as [publish] x [metadata] x [organize] x [reward]. This is already causing new synapses to fire in my head. I appreciate that!!

If I had this all figured out, or knew somebody who did, I’d be holed up in my basement making it happen. Instead, I’m stirring up discussion in the hopes of eliciting thought processes exactly like that one.

BTW, one final comment. How can/should we describe the boundary between “interface” and “content”? It’s not sharp. A new, simplified proof of an old theorem is acknowledged to be worthy… and review articles are generally highly valued. I wonder whether there _is_ a boundary! After all, programs are data too…
Daniel Doro Ferrante says:

September 30, 2007 at 10:42 am

Robin,

With regards to your comment/question about “interface” vs “content”â€¦ i think i understand this in much the same way i understand “manifold” and “[its] boundary”: intrinsically, it can be quite hard to tell these things apartâ€¦ until one comes up with the appropriate tool, e.g., Differential Geometry [that will make intrinsic measurements].

Here it is, more explicitly: “boundaries” determine ‘global’ aspects, while the ‘manifold’ itself determines ‘local’ aspects.

So, while you’re correct in saying that it’s hard to see the difference between ‘interface’ and ‘content’, manifold and boundary; just like the boundary of a manifold determines its global structure, the interface of this project will determine the “global” properties of the content that can be delivered.

See, in a clear way, the arXivs already had to go through some “interface” changes: not only the numbering itself, but the intrinsic metadata and the addition of trackbacks and such (e.g., RSS). 😉

So, this is a very nontrivial issue, for we (as humans) are not usually comfortable thinking in ‘global’ terms. 🙂

[]’s!
Travis says:

September 30, 2007 at 2:44 pm

Thanks Robin,

To me, the boundary between interface in content is actually quite sharp. Imagine that all the world’s scientific information is in a huge database, perfectly organized and tagged with every bit of metadata you could imagine. Furthermore, imagine that any new information (comments, simplified proofs, etc) is automatically added to that database. That’s the content (+metadata).

The interface is how you make this content available to the world. If the database is online but people have to write their own SQL queries to get information, well, that counts as an interface (but not a very good one). On the other end of the scale, if the interface is so smart that it allows an inexperienced reader to find whatever he or she wants, then automatically suggests all the relevant content, explanations, etc., that would be a great interface. There’s no difference in content between these two interface scenarios, but they’re worlds apart in usefulness.

That said, I can understand how content and interface get intertwined. Right now, the interface we have uses PDF (essentially digital paper) as a major component, and this is so restrictive that it actually discourages authors from submitting all the content they might otherwise make available. If I have an easily-understandable proof of something, but it’s too long for PRL’s page limit, then the only thing that gets published is some indecipherable condensed mess. It’s a real shame that our current publication model actually rewards authors for degrading the readability of their work. Imagine if PRL had no page limit, but required authors to submit a 2 – 4 page extended abstract if their work was long. Even better, authors could just submit one long paper, with large chunks tagged as “supplemental” –experts who just wanted to skim could set their browsers to skip the supplemental material, while those who wanted more depth could get it. The tags could even be more specific–allowing authors to tag explanations and definitions that would only be useful to novices in one way, and details that only experts would care about in another way. All this tagging (the author-side interface) could probably be done with a few minor tweaks to RevTeX.

Getting back to your example of a simplified proof of an old theorem, I see this as new content. Providing readers viewing the old theorem with a link to the simpler proof is interface. Metadata is what lets your database realize the relevance of the new proof to the old theorem.

It occurred to me that you can build such a database without involving yourself much in the publication process (this is what Google does). If it were me pursuing this project, I’d be thinking about building a science search engine rather than starting a new journal. For one, starting even just a “conventional” journal requires a heck of a lot of work, and most of that work does nothing to advance your goals. Another issue is that if you start a journal, you can only help one specific field, whereas a search engine can quickly be expanded to cover all areas of science.

Some key issues with building a science search engine:

1) In order to succeed, you have to do a better job than anything else that’s out there. The baseline is Google.

2) Everyone talks about “semantic search” (searching websites using metadata provided by the sites themselves), but none of the major search engines do it because it suffers from a fatal flaw: site owners have too many incentives to provide bad metadata to game the system and drive hits to their sites. Fortunately, this problem is nearly non-existent for scientific search, since authors have no incentive to provide bad metadata. Semantic search is how you can beat Google in providing useful search results.

3) To make semantic search work, you need there to be at least one (and hopefully not too many) well-defined formats for the metadata. The Open Archives Initiative (which arXiv.org participates in) is a good start, but you’ll need more than that. Develop an open standard for the metadata you need, and convince a few big journals to adopt it.

4) So far, this addresses primary publication, but not things like comments, links to relevant papers, etc. That can all be added on to the search engine structure–when users click a link in the search results, they get the paper, but also a listing of other relevant papers, comments other users have posted, etc.

5) Don’t be afraid to sell advertising to make money from the site. The targeted nature of the sight means it can deliver high-value ads (e.g., if I’m looking at a paper on building high-power UV lasers, I might just want to buy one instead), and advertisers pay a lot for that. If you don’t want to go the route of a for-profit company, donate the money to funding science education or open access journals or whatever. Perhaps you could even share the advertising revenue with journals in return for them providing free full access to their content.

This may all sound daunting, but it’s doable. Some friends of mine recently started a search engine for electronics parts (an idea born from the years of frustration that come from being grad students trying to find parts for experiments). Their site is octopart.com — you might want to get in touch with them to see what they’ve learned from their experiences.

Comments are closed.