Conference Report: Beyond PDF 2

It may be the case that startups are dramatically altering the landscape of trade and reference publishing, but a different and perhaps more vital revolution is emerging in academic publishing. I just attended Beyond PDF 2, in Amsterdam, a meeting of self-described revolutionaries who are helping direct a conversation that highlights how thoroughly and quickly old publishing regimes can fall. One of the statesmen of the open science community, Peter Murray-Rust, helped organize a group of people over lunch interested in actively promoting change, quickly proclaiming its own hashtag, #scholrev, which is the 2013 technology version of a quinceañera.

Led by the Force 11 initiative, which itself stemmed from the first Beyond PDF meeting in San Diego two years ago, the meeting’s attendees were an A-list of stakeholders among new academic startups, faculty cajoling their colleagues into the future, existing companies like the Social Science Research Network (SSRN) that are vigorously taking part in new modes of scholarly communication, and large publishers who have already committed to significant technology investments, such as Elsevier and Springer. A tidy summary of the excitement generated by the meeting can be found in a blog post from one of its organizers, Paul Groth.

Major Changes

The changes are the direct result of the growing maturity of the web. Academic results have been bundled into journals to facilitate their sharing since 1665, and in many ways—as was often noted at Beyond PDF—not much changed for 400 years. Some scholarly articles, particularly in science, technology, and medicine (STM), have become far more complex in their structure and references, with pointers to datasets, embedded graphs, illustrations, and increasingly high-res images. But for all of that, the foundational work of the publisher—coordinating peer review, laying out the finished article, and distributing the product—did not much change.

But it is changing with a vengeance now. What the web brings the scientist is not merely the capacity to automate peer review, or to make it more or less public both in the pre-publication as well as post-publication stage; these changes have been well commented on and they have already become an enshrined part of the shift taking place. Nor is it merely about being able to embed links to the source image files, datasets, or even interactive simulations that can be made to dance before the reader’s eyes in an online journal. Rather, what the web enables is the opportunity to blow academic publishing all to hell.

I’m being glib, but it is true: the web enables scientists and researchers to take the actual data that drive their inquiries and make them directly available to all comers. Datasets can be deposited in academic repositories made for the purpose, or in new services such as figshare, a hot academic startup now ensconced in the Macmillan stable. Concurrently, data citation standards are coming into their own, liberating each piece of research from the single identifier of the journal article in which it was first published. Chemical data and structures; high-energy physics data; sky surveys; genetic analysis; linguistic references; archaeological data—the inspiration for experiment and analysis is being made available to everyone.

But even beyond this, there are two other important things happening. The first is that these data, charts, experiments, illustrations, and other research products are increasingly characterized by published data description standards using formal schemas such as RDF. This enables researchers to link data and forge insights across different types of information, from different locations, with an ease previously unimaginable. It also permits computers to find new associations between research objects under the guidance of natural language processing and the discovery of correlations across vast stores of data. The rapid escalation of academic interest in this area, and the copyright and licensing issues it raises, has garnered the attention of a note in Nature.

Second, because web tools enable researchers to work with discrete products of the research process, each can now be independently published. Instead of bundling a package together of various protocols, methods, and findings from many months of research into a monolithic product – a journal article – that will require months of peer review and editing, often through multiple duplicative rounds until it finds acceptance, research can be published immediately in small chunks. These “nanopublications” can augment or replace the traditional capstone statement represented by a peer-reviewed article. In essence, a nanopublication is a minimal viable product for published academic research.

This is where academic publishing is a light-year ahead of trade. What’s happening as a result of this adoption of networked tools is something far more important than the apparent inevitably of open access—which attendees of Beyond PDF 2 perhaps self-servingly took as given—or even the completion of a digital transformation in academic publishing. Instead, what is happening now is that academic researchers are beginning to reconsider the underlying, fundamental workflow of research and publication.

What is still experimental in trade narrative fiction is increasingly commonplace in academia: reshaping not merely the product, but the thinking that yields one. And that’s really important. It means that scientists and humanities scholars alike can publish—on a blog, in figshare, or in a microjournal—insights, results, and data whenever they think it appropriate to share with their colleagues. In turn, that means they receive feedback on ideas far faster, and with far more focus, then when they were reliant on packaging a much greater range of components into articles.

Ramifications

The web not only subverts peer review, it subverts science itself. It permits investigators to think about how they sequence a research project; when they solicit feedback; and how they collaborate with each other. It also radically broadens their audience. David De Roure, an Oxford Univ. researcher, recently penned a look back from the future (2065) post, “Pages of History,” in which he noted, “Trying to bundle the infrastructure into the paper was bound to be problematic. The last two decades have seen a welcome return to narratives so that we can communicate all aspects of research between scientist, citizen and policymaker alike.”

Publishing results in variable-sized chunks, often in open access public forums with direct links to sources, means that science is becoming increasingly available to a wider range of the population than ever before. While not everyone can build an atom smasher in their garage, the replication of experiments and data analysis will turn “Don’t try this at home” on its head. Jokes around Ikea-like “Assemble it yourself” instructions aside, this is a transformation with great consequence. It won’t turn everyone into a citizen scientist, but it will turn science into an increasingly public engine of inquiry.

This has broad social and political ramifications; it demonstrates just how far the accountability of science can grow beyond publication of publicly funded results in open access journals. Daylight shining into science, and the potential to share and comment at Internet speed across the entire globe, are critical as we race into 21st Century with rising seas and violent storms. As we help to forge a new scientific literacy, we must now also forge equal access to knowledge across all of human society. That ramification was not lost to the revolutionaries in Amsterdam.