Subscribe to our newsletter

Publishers: Why is all your content not accessible?

16th December 2014
 | Guest Author

BazarganKaveh is a physicist by training. In 1988 he founded River Valley Technologies ( in London, in order to introduce computer generated illustrations to UK publishers. The main business is now typesetting for STM publishers, using the only “pure” XML-first system in the industry. In recent years River Valley  has been working on cloud-based platforms for publishers, including an end-to-end XML-based authoring to publication platform.

I am privileged to know a man called John Gardner, a distinguished solid state physicist at Oregon State University. In 1988, at the age of 48, he underwent a routine eye operation that he did not react well to. Tragically, and unexpectedly, he lost his eyesight completely. Having come to terms with his new life ahead, he was keen to get back to work and to continue his research and his teaching. But the only way he could read papers was to have his wife read them aloud to him! Nothing was “accessible”. He persevered and even went on to establish a successful company to help others needing assistive technologies.

220px-Handicapped_Accessible_sign.svgSo here we are, 25 years on. Can John click the DOI link of a paper and have it read out to him automatically? Or access a Braille version? Well, certainly not if there is heavy math in it, as there would be in the case of Physics papers. Many publishers are even converting their equations to static bitmap images, thus guaranteeing that they will never be accessible!

Now let’s look at another case where accessibility could be improved. In some forms of dyslexia, it is thought that the visual system interprets letters on a page differently to the average person. Most of us can easily distinguish a “p” from a “b” (or an “n” from a “u”), but it is thought that for dyslexics, the brain unconsciously rotates letters as it’s trying to interpret them (and ends up miss-interpreting them) – nice little video here explains it. All our brains do something similar to a certain extent; think of how we all know that a table placed upside down is still a table! So people with dyslexia have to work harder than average to read text in conventional “symmetric” typefaces. Well, there are at least two typefaces designed to address this problem: OpenDyslexic, and Dyslexie. And apparently these “bottom heavy” faces are much easier to read for dyslexics.

So, all publishers: raise your hands if your content is accessible in dyslexic fonts. Hmm, don’t see any hands… I know what you are thinking – you would love to have your content fully accessible, but you just don’t have the manpower in your IT team to have the content in every possible accessible form.

Enter XML

Well, I have good news for you. Most publishers (at least journal publishers) now archive the full XML version of their content. If we think back some 10–15 years, the whole logic behind creating XML was to allow new types of formats to be produced painlessly. If the XML is accurate as well as granular, then it is not technically hard to use it to create accessible content (for the visually impaired, dyslexics, etc) automatically, and there would be no extra burden on the publisher. A third party can do it. So why is that not being done now? Well, there are technical and legal challenges.

Technical challenges

The most basic technical challenge is that with very few exceptions (mainly Open Access publishers) the XML is hidden from public view. This begs the question of why the XML is produced in the first place! A second potential problem is that the XML is generally not as accurate as the PDF. For instance I just looked at the XML for a paper published four days ago by a well known (but nameless) OA publisher, where all non-standard characters have been replaced with a question mark, e.g. “Götz” is given in the XML as “G?tz” – clearly a completely useless XML file!

Legal challenge

Even when the XML is available, it is not clear if a third party is allowed to create a new format from that. This is the case even if one has paid a subscription fee for the article. And for OA articles, the inclusion of “ND” in a CC-BY license explicitly forbids any “derivative works”, including creating a new format.

My advice

In order to make a start at providing accessibility, here is my humble advice to publishers:

  • Publish your XML – the whole point of XML is to allow creation of new formats easily. So if I pay for an article, give me not only the HTML and the PDF, but the XML too.
  • Ensure your XML is correct. Mandate automated XML-first pagination from your suppliers to avoid embarrassing errors like that pointed out above.
  • Make sure that your licenses allow third parties to use the XML to create new formats, and even encourage them to do so.
  • If you are an OA publisher, please don’t use the ND versions of CC licenses.