Exploring a Digital Science Workflow for Reproducible Science

Expectations around reproducible research are clear, particularly in the area of computational research. A research paper is more than an account of the research that was undertaken; it is a narrative that surrounds an orchestration of research assets from the raw data and code, to the processed data and visualizations that result. A paper should invite a reader to trace the results back. How was this figure produced? What was the code that produced that particular result? The reader’s transition from narrative to exploring data or code should be as easy as turning the page.

Seen from the researcher’s perspective, the ideal computational paper arises organically from the research – the data that is created is the data that ends up in the paper. The code as it is written is the code that can be accessed in the paper. As analysis bubbles up from research into images for publication, those images keep their providence back to the data, and back to the code that produced them.

How close are we to this ideal today? Within the Digital Science family, methods for openly publishing data are ably supported by Figshare. Gigantum provides researchers with productive environments to not only develop their code, but also share their projects along with the providence of the steps that were run, and the environment necessary to execute it. Overleaf allows researchers to easily publish their research collaboratively using LaTeX.  As part of a poster presentation for the 2019 VIVO conference we took a broad research question that could be answered with Dimensions data, and undertook the research using workflows that knit these tools together. In doing so, our project, documented in our white paper, demonstrates an approach to undertaking reproducible computational science that operates on multiple levels. Specifically, it addresses:

  • What is it to develop reproducible code right from the beginning of a project using Gigantum?
  • How can data assets be structured and organised throughout the life of a project inside Figshare (and not just at the end of a project)?
  • How can Overleaf be integrated with Gigantum so that the act of creating an image or data table is as close as possible to the act of publishing the same object in a paper?
  • What is a good approach to tying code, data, and papers together using identifiers?

In this paper we demonstrate that not only is our poster reproducible, but that the methods we have adopted are useful to others as well. We feel we learnt a lot throughout this project, and hope to continue to refine these approaches in our analysis projects moving forward.  From analysis through to publication, we would love to hear about some of the ways that you have used research productivity tools in similar ways. Get in touch!

PS – This is the first Digital Science Report to be made entirely in Overleaf…

Technical Report: https://doi.org/10.6084/m9.figshare.9741890

Poster: https://doi.org/10.6084/m9.figshare.9742055

Online Version: https://wdaull.ds-innovation-experiments.com/