Taking Digital Scholarship 'Beyond the PDF'

By Tiffany Fox, (858) 246-0353, tfox@ucsd.edu

San Diego, Calif., Feb. 7, 2011 — Academic “white papers” are often the world’s introduction to developments in the domain sciences. But as digital scholarship and high-tech visualization become increasingly important in the academic world, these papers — which almost universally exist in the form of low-tech PDFs — can be as anachronistic in form as they are revolutionary in content. 
Workshop attendees pose in the Engineering courtyard
Attendees of the 'Beyond the PDF' workshop pose in UCSD's Engineering Courtyard. Instead of relying on static PDFs for digital scholarship, the participants want to 'unleash the full power of the Internet and crowd-source the digital printing press of the 21st century.'

Hoping for a sea change in the way that digital scholarship is conveyed and understood, a group of about 90 scientists, Web developers, publishers, scholars and visionaries gathered at the University of California, San Diego, from Jan. 19-21 (with an additional 100-200 participating virtually) to discuss the ways that emergent technologies and notions of open vs. closed access can move academia, in the words of conference organizers, “beyond the PDF.”

“The goal of the workshop was not to produce a white paper,” joked organizer Phil Bourne, a professor of pharmaceutical sciences at UC San Diego and an affiliate of the UCSD division of the California Institute for Telecommunications and Information Technology, where the workshop was held. “Instead, it was to identify a set of requirements for various classes of user — readers, authors, editors, etc. — to unleash the full power of the Internet and crowd-source the digital printing press of the 21st century.”

For Bourne and his colleagues at the workshop, the notion of evolving beyond the PDF was intended to capture a common philosophy, “not necessarily to be taken literally.”

“PDFs have their place, but they’re a bit like a pre-recorded television program – they don’t change, they’re static and preconceived. You know exactly what it will be like when you read it.

“But academic papers could really be more like live TV, where you never know what’s going to happen. Counterpoints are added to the initial thesis, a debate ensues and a new level of understanding is achieved and on it goes.”

PDF, or Portable Document Format, has become the open standard for document exchange because it allows the document to be shared independent of the application software, hardware and operating system. One advantage of the PDF is that it captures a complete ‘snapshot’ of a document’s layout, including text, fonts, images and graphics.

But, noted Bourne, the very nature of the PDF as a static document also limits how the information within it is displayed and consumed. For one, “it does not integrate the raw data associated with the research article, which means the data is computationally unusable.

“We want to make that data available to the reader so that the results then become reproducible,” he explained. “We also want to do things like use rich media — such as video and podcasts — to bring in other senses, enhance the comprehension of the work and make it interactive.”

The impetus for the January workshop came from a meeting of the Public Library of Science (PLOS) in March of last year, where attendees determined that the time had come for “a new kind of research article” (PLOS has been a standard bearer in the open access movement within the biosciences, and Bourne is Editor in Chief of one of the PLOS journals).

The resulting three-day ‘Beyond the PDF’ workshop featured 28 short presentations pertaining to six themes: Annotation, data, provenance, new models, writing and reviewing. A discussion session followed each theme, with conversations triggered by memorable quotes from the presentations, the audience response or the workshop’s Twitter feed, which was periodically displayed on the large screen in Calit2’s Atkinson hall auditorium.

Conference participants chat in the lobby of Calit2.
A significant portion of the workshop was devoted to work in small groups, which focused on four themes pertaining to the future of digital scholarship.
A significant portion of the workshop was subsequently devoted to work in small groups, which focused on four main goals whittled down from the workshop themes: 1) drafting a manifesto for the future direction of digital scholarship; 2) determining how the process of writing scientific papers might change as a result of an evolution away from the PDF; 3) determining the implications for attributing, evaluating and archiving scientific documents and 4) determining the implications for business and intellectual property rights.

The workshop was co-organized by Anita de Waard (Elsevier), Ed Hovy (Information Sciences Institute), Gully Burns (Information Sciences Institute), Cameron Neylon (Science and Technology Facilities Council, UK) and Paul Groth (VU University, Amsterdam). The organizers hope that new standards for digital scholarship will be adopted within two years.

But first, Bourne and his colleagues plan to create a prototype for this new model by developing a set of digital scholarship software tools and applying them to to a single research topic: Spinal Muscular Atrophy (SMA), a curable neuromuscular disease characterized by degeneration of motor neurons.  A number of scientific publishers, including Elsevier, have agreed to open up their complete corpus to the project.

“It’s an unprecedented opportunity to mine this information with a set of novel tools, which is very hard to do when you don’t have open access to the literature,” Bourne said. Within the team’s modernized research arsenal: An online content manager that includes tools for semantic tagging (which helps computers better understand the meaning of information and create better connections), tools for writers that create semantic tags as they write and tools that automatically link the user to related information the moment the user types a key set of terms.

“The scientists of tomorrow will have grown up with these tools and they will become part of the required scientific discourse,” predicted Bourne. “This evolution in digital scholarship will also open up knowledge to the developing world, which right now is at a distinct disadvantage because they don’t have access to this knowledge.

“With the continued trend toward open access and a platform that provides semantically linked information instantaneously,” he added, “the next Einstein might very well come from an unexpected place.”

For details on the workshop, including the on-going discussion forum see https://sites.google.com/site/beyondthepdf/.

Media Contacts

Tiffany Fox, (858) 246-0353, tfox@ucsd.edu

Related Links

Beyond the PDF Workshop