Subscribe to our newsletter

The PDF Puzzle: Will We Ever Really Move On?

23rd June 2015
 | Phill Jones
690555355_93948eae34_z
Leather bound copy of the PDF 1.5 specification. *

Over the past few years, some publishers, librarians and researchers have been trying to develop approaches to supersede the dominant format in online scholarly publishing, the PDF. Adobe began developing the PDF in 1991, during a time when the model for information exchange was one of a singular point of publication, with no interactivity. The PDF is based on a pre-digital paradigm of publishing and many would argue, is anachronistic.
There’s a lot wrong with the PDF as a format for publishing. It’s flat and horribly two-dimensional. While it’s possible to implement limited interactive features using embedded Java, the format wasn’t originally intended to cover such use cases, and even simple implementations can be challenging compared with much more complex functionality in HTML5. PDF is also painfully static. As a digital version of a published page, PDFs weren’t designed to be editable. While it’s possible to do a little bit of manual touching up and annotation, real changes require the document to be recompiled adding time and cost to publishing production workflows.

So if they are so horrible, why do the majority of researchers still click on the PDF links when they visit article pages? Is it purely inertia after all these years? Are readers still so accustomed to PDFs that they’ve simply never tried the highly functional, feature-rich HTML environments? I suspect not.

Perhaps, as this Quora question suggests, one of the attractions of the PDF is its linear reading environment and lack of distracting colourful adverts. There’s undoubtedly something to that, but as Marlo Harris explained at the ALPSP Disruption seminar that I moderated last year, Wiley responded to this sort of end-user feedback by creating the Anywhere Article. Despite the clean, linear environment and plenty of white space with no adverts, Wiley received a surprising amount of inquiries from people asking where the PDF download link was.

The part of the PDF puzzle that we haven’t been able to address yet, is one of its defining features and so obvious that we sometimes overlook it. The clue is in the name; it’s portable. PDFs are downloaded as self-contained, cross-platform, printable documents, which means that you can keep them on your hard-drive, email them, put them in a repository, or whatever else you want to do with them. Downloading a PDF gives a sense of ownership that cannot be replicated with HTML. At least, that’s what people think.

Some people argue that the printability is the unique selling point of the PDF. It’s true that there’s a familiarity to taking a printed article and a highlighter pen with you as you travel, commute or sit in a coffee shop, but the way that people consume information is changing rapidly as a result of the widespread adoption of tablet computing. Today, it’s very common to see people using iPads and Android tablets in previously work-hostile environments, like underground trains, city buses, or economy class on United Airlines. While I don’t think that the desire for printability is going away quickly, we’re in the middle of a steady shift away from paper and towards mobile computing.

The real problem is connectivity. It’s impossible to read the HTML version of an article when you’re not connected to the internet. Some publishers have looked to solve this problem by creating proprietary journal apps, but in many fields, particularly in science, researchers often read from dozens of journals during their literature searches and don’t want to install dozens of apps in order to do so.

The PDF suffers no such problem because it lives locally on the device and isn’t tied to a particular publisher.

WWGD: What would Google do?

I was recently speaking with Carissa Gilman of American Cancer Society about the issue of offline access and she told me that her new favourite app is Pocket. Pocket is an app with Chrome and Safari extensions that enables you to easily store web pages on your computer or mobile device so that you can load them back up and read them when you’re not online. Opera and Dolphin mobile browsers have similar functionality but the workflow is a little clunky and hidden, so they’ve never really got traction.

At the same time, Google are increasingly making their web-apps offline friendly. Gmail has an offline app and Google docs implemented offline editing a short time ago. The offline components of these web apps not only let you work without connectivity but also make it possible to work without frustration when traveling in and out of mobile data coverage while on the move or in a hotel room or conference centre with unreliable wifi. Web apps that sit inside the browser, store information locally and sync automatically when able, blur the line between online and offline computing, allowing the user to trivially move back and forth in a consistent environment without even thinking about it.

What will the successor to the PDF look like?

If we finally want to go beyond the PDF, we have to stop looking at all the things that we don’t like about it, and look more carefully at why it’s still around. The successor to the PDF will be interchangeably online and offline, it may well look like a PDF and be printable, but it will also have web-enabled features like clickable references, metrics, embedded supplements, data sets and multimedia content. By replicating the look and feel of the PDF and making it truly portable through cloud-based, offline friendly reading and reference management web apps, we can finally offer researchers something unequivocally better than the static, two-dimensional PDF.

*Image credit: Thanks to Ralph Giles. CC-BY Share alike