Why Scientists Should Share Their Data

16th March 2015

| Guest Author

“What use is finding a cure for the common cold if your idea of being a research scientist is hiding away in a lab and not talking to anyone. How are you going to communicate your brilliant ideas or theories and who would be able to corroborate or support you?”

That was something I used to argue with my PhD colleagues and other research students on the subject of sharing as a scientist.

The naive view of what scientists are like, sadly reinforced by movies and TV, is the ‘socially inept nerd’; in the lab all day unable to converse with non-scientists let alone their peers, working fervently on their latest precious discovery. This is an extreme and damaging view of a working scientist, but the truth is, research is a busy place and early career scientists are encouraged to play their part within departments as well as networking with their peers.

Nurturing the ECRs.

The early career researcher (ECR), these PhD students and postdocs, have a steep learning curve. They are encouraged to integrate themselves professionally within their environment and discuss their research and ideas with other professors and students – “You never know where that next big idea will come from” they’re told. This is nurturing for ECRs, what could be better than getting yourself known among your peers and making connections over coffee to further your career?

Sharing outside of the institution is easier now but we still don’t share as much as we could.

Even though it is easier to share data and research outside of institutions, it isn’t being done nearly as much as it should and this is, in part, due to attitude. We are in danger of teaching the next generation of researchers to become possessive of their data and coming up with endless excuses by plainly refusing to discuss the pros and cons because it’s too complicated.

Some of these self-imposed restrictions on sharing data are from researchers feeling bound by laws and restrictions of their funding bodies and learning institutions, which contracts appear convoluted and so are misunderstood obligations NOT to share data or novel knowledge they’ve gained from their research.

“In 2011, 74% of scientists, despite believing their data would benefit other researchers, only 54% were willing to allow others to view their raw data” (Academic executives blog)

The statements above, I think, stems from deep-seated beliefs about what it is to share data, and I agree with Physician, David Blumenthal’s quote that we should be paving the way for new scientists by encouraging a sharing research data attitude.

“I do think it’s worth teaching an ethic of sharing, because a young scientist’s early approach will likely become their life approach.” Blumenthal 2014

Years before the internet, sharing between researchers outside of institutions happened at conferences or other annual meetings. The internet itself was invented by scientists, physicists at CERN, who wanted to share papers more easily and quickly between their peers to have their research progress faster! http://home.web.cern.ch/topics/birth-web

Now we have huge amounts of usable data we can also share, to make the research happen quicker, but some researchers are still holding back. Why?

When I was doing my PhD, I sequenced lots of the same ribosomal gene from different single-celled microbes so I could create a taxonomic tree. I also relied heavily on public-domain sequences in my research (NCBI), adding them as a file to my own sequences, which I later published to the database. I was anxious about the quality of my sequences since not all of them were perfect (a few were query bases or not as long as the others). I felt embarrassed and worried that I may do something wrong in the uploading process. My supervisor comforted me reassuring that even though I might have incomplete sequences on some, the data was still novel and useful.

“You could spend years trying to collect the perfect data set when you should be publishing what you have, which is enough.”

My supervisor was right. After that I couldn’t publish my sequences fast enough. And then I was worried that my paper wouldn’t get accepted and my sequences would be used by someone else and I wouldn’t get credit for my novel findings. But I have. Everything was OK.

Learning the system to upload my sequences was a pain and took time the first time I did it, but each time after it was easier. On an already steep learning curve, having to make your data available, at first, can be long and tricky process, especially if someone isn’t as computer savvy as others. Luckily, these processes are becoming faster and easier.

There are free repositories for most data these days, which are easy to use and have help standing by. If you’re worried about using pre-publication repositories, it might alleviate your concerns to know of an analysis of journal data policies which showed that >90% of publishers do not consider sharing research output on online platforms as ‘prior’ publication. (Link to F1000 Table).

Journals and funding bodies are coming around to the way of open data sharing more and more, even mandating that raw data be shared when submitting to that journal (F1000, PLoS) and so too are funding bodies expecting these practices. Now, it is becoming the researchers responsibility to make their data available to other researchers, as well as making a certain number of publications open-access (hopefully 100% one day).

Aside from the legal obligations slowly coming into place, the reasons why researchers should share their data can be more career-centric and positive for the researcher as well as the wider research community:

Profile of the researcher is raised: Sharing data among your peers is a sure way of getting known.
Efficiency improved: Less time wasted answering requests for data and methods from peers.
Citation rates will go up: A study looked at citation rates of researchers who shared over those who didn’t and found sharing is associated with increased citation rate (Piwowar et al., 2007).
Unpublished data is published with a citable link: Your videos, posters, full methods can be published and used with full citable links via permanent DOI (Digital Object Identifier).
Promotion of your subject field: Promoting your field through available data and the training of new researchers.
Easier to work in the subject area: More data means more research. If it’s easier to do research more people will do the work.
Improving data integrity: Sharing data encourages quality and help to identify errors quicker.
Reduces fraud: The published papers should reflect the data and interpretation of results.
Duplication avoided: Money and time wasted on unnecessary replicates of data can be avoided via open data banks.
Less rich institutions and countries can also do research: Equality throughout research. Institutions are not left behind but instead are able to access data to enhance their projects and financial proposals for further research.

The hard work of so many researchers should not be wasted and left behind once the papers are published. We still have a way to go when it comes to standardisation of data and well curated databases for all different types data (physical data and computer data). These are not just the problems of the researcher, but of the journals and publishers, the Universities and research institutions, funding bodies and government institutions, who are already working to find ways of making sharing easier.

Companies like Figshare, Dryad, and NCBI, continually work hard to give users more control over their data to have the researchers hard work recognised; attribution for authorship is carried over in your data via DOIs and proper meta-data labelling.

Researchers who don’t share their data will be left behind in research. Sharing has never been so important as it is now, especially when governments and huge research companies are now encouraging it (American OSTP policy, ASA, GSK,).

The only question you need ask yourself now is, “Where do I share my data?”

After finishing a BSc at Keele University in Biology with Physical Geography, including a year abroad at the Pasteur Institute in Lille, Jojo stayed with Keele to do her Masters, and travelled to the South of France do the research at the University Montpellier 1, to study Leishmania genetics. After gaining a distinction for her Masters thesis, Jojo spent half year applying for PhD projects and got an interview at Oxford in April 2008 and was offered to start in October 2008. Jojo finished Completed her PhD in June 2013, and viva’d in October, gaining her Doctorate of Philosophy in Zoology for her thesis titled ‘The Diversity of Silica-Scaled Protists’. No sooner had Jojo passed her viva, and Tweeted a picture of herself in her academic examination dress, she was offered a position as a Post-Doctoral fellow at the University of Saskatchewan over Twitter. After negotiations and applications Jojo went to Canada and stayed until the end of October 2014 returning to England, UK, to embark on a writing career. Jojo registered in the UK as self-employed in January of 2015 and writes a blog of her own called the Online Academic, discussing all the aspects of being an academic online and using online tools to better one’s career, especially in research. She also writes scientific blogs and content for an up-and-coming Canadian startup.