Robotics Takes On Scientific Publishing

Artificial intelligence (AI) and our working lives

Artificial intelligence (AI) and robots continue to creep into our working lives. Not only are truck drivers, janitors, and bricklayers facing obsolescence in the near future but AI is also influencing knowledge work including paralegal, medical diagnostic and other professions that require less physical manipulation and more interpretation of data. Job automation has also extended to journalism and the creation of news and other content. It is possible that robots might infiltrate the scientific communication ecosystem in the same way.

Robotics has been employed in newsrooms for several years, starting small but becoming much more sophisticated lately. The automation of news had initial success in the business and sports sections with the creation of brief game synopses and earnings reports. These 2-3 sentence capsules that many of us are familiar with are increasingly written by software that takes machine readable data such as a box score for a game or earnings data from corporations and generates summaries for a general readership.

Sports briefs might include not only the winning team and the score but perhaps a mention of the player who scored the game-winner and any exceptional effort from one or more other players, the resulting league or division standings, the number of consecutive victories (or losses) or the total runs/goals/hits, etc. for a player who is vying for (or has moved into) the league lead. All of this information is gleaned from the official tabular score of the game and other sources which can be read by software and used to assemble a coherent distillation of the game.

Similarly, business stories are generated from machine-readable data and include a company’s revenues, expenses (including one-time costs), quarterly sales increase or decrease, earnings per share as well as the stock price close and gain or loss. This information can be collected by software from data available online to create a couple of sentences summarizing the quarterly or annual performance of a company. For more on automated journalism, see these articles on two of the more popular products: Narrative Science and Wordsmith.

For unstructured texts, the NVivo software offers some promise of generating prose that can be worked into an intelligible narrative. It won’t write a publishable piece (yet) but can inform and jump-start the creation of content that can be proofread, formatted and otherwise reviewed prior to release.

Since the early days, robot-generated articles have become a lot more sophisticated. A machine-generated obituary for (appropriately enough) AI pioneer, Marvin Minsky in 2015 was widely noted although an article in Wired marking this milestone stated that there was still a need for human review of these stories.

The need for oversight may not be going away, but the progress of this kind of software marches on, now touching scientific communication. A recent article in Research Information features an electronic lab notebook (sciNote) which includes a component to generate manuscripts based on laboratory or other research data, presumably the same way it is done with box scores for sporting news. The author selects one or more experiments for which they have used the ELN and supplies some background information including keywords describing their work and any DOIs for related papers. According to their website, the software will then produce “an introduction, materials & methods, results and references”. The scientist can then begin improving and editing the text.

Anyone with intellectual curiosity and who works in the science research enterprise can’t help but wonder what the effects of automated text-generation from research data will be on scientific publishing. One thought is that if manuscripts become easier to generate, it may signal a shift in assignment of academic merit from how many peer reviewed articles a scientist has written or how often they were cited to how many articles were generated by a dataset he/she collected. At the very least, I suspect we may see a greater recognition of data collection as the heavy lifting in science and less emphasis on the number of articles written by a given scientist or cited by others.

I believe Digital Science’s own Daniel Hook may have captured the essence of this change when he said,

“It is really only a matter of time before having a highly-cited dataset is as important in some fields as a paper in Nature, Science, or Cell.”

That may turn out to be an understatement.

About the author

Alvin Hutchinson is the Information Services Librarian at the Smithsonian Libraries’ Digital Programs and Initiatives division. He manages the Smithsonian Research Online program which describes, collects and archives research output of Smithsonian scholars. Although he does not have any formal science training, Alvin was a subject specialist in zoology at the Smithsonian’s National Museum of Natural History and its National Zoological Park for 15 years and has a personal interest in scientific research and publishing in the digital era.