Skip to content
Search Digital Science
See all jobs →

Junior Software Engineer – Data harvesting

Company
Dimensions
brand logo
Location

Europe | London

Closing Date

No closing date

At Digital Science we are looking for a Software Developer to contribute to our Dimensions product. As a Junior Software Developer, you will work on software that harvest data on scientific research on the internet. The software prepares the data for our processing pipeline and is scheduled to run fully automated. You will eventually also work on the processing pipeline and our self developed toolset. You will be part of an experienced and well skilled technical team, with a clear vision of the technological and engineering goal within the exciting setting of an international and agile company.

About us

With Dimensions, Digital Science launched an innovative research data and tool infrastructure, broadening the view of the research landscape after decades of focus on the publication/citation complex.

Dimensions is a research tool that interlinks multiple data sets (grant applications, publications, clinical trials, patents applications, policy documents). Based on these data sets and by using external services it provides metrics like attention and citation score and makes it possible to perform complex analysis on the data. In total, Dimensions contains today more than 400 million documents with more than 4 billion connections between these records. Our customers are researchers, research organizations, publishers and government and funder organizations from all around the world.

Dimensions has offices in Germany, Romania, US and UK, serving clients globally. For more information please visit dimensions.ai or try the free version of the Dimensions app at app.dimensions.ai

Your new role

  • Harvest websites using raw http requests and automating browsers using tools like Selenium, Scrapy or also Zyte (formerly Crawlera)
  • Use all kinds of web based APIs (REST, SOAP, …)
  • Implement batch jobs to retrieve data via common web protocols like HTTP, FTP, etc.
  • Extract data from various source formats by implementing heuristics
  • Cope with very different document formats, reaching from not very well formed HTML code or PDF documents to standard file formats like XML, JSON, CSV and using the default tools to process them (like XPATH, various libraries)
  • Store and map extracted data into sql databases (mainly PostgreSQL)
  • Write or maintain QA tools
  • Contribute to our data self written processing pipeline

What you’ll bring to the role

Experience

  • Relevant software development experience (preferably in Python language)
  • Basic Linux working experience and willingness to improve it
  • Experience with Amazon Web Services or eager to learn about it
  • Nice to have experience with application containers (preferably Docker)
  • Experience in distributed version control systems (git)

Skills

  • Understanding of Agile methodologies
  • Ability to work on intricate details without losing the big picture
  • Self-learner, possessing inherent inquisitiveness
  • Good problem solving and analytical skills
  • Strong interpersonal, communication, and organizational skills

Qualifications

Minimum Bachelor degree in Computer Science or a related field, or equivalent

Additional Information

We invest in, nurture and support innovative businesses and technologies that make all parts of the research process more open, efficient and effective. The people we secure are fundamental to us achieving our vision and our growth plans. The values we live by are:

  • We are brave in the pursuit of better
  • We are collaborative and inclusive
  • We are always open-minded
  • We are from and for the community 

Ready to apply?

We offer a competitive benefits package, the opportunity to work flexibly, and the possibility to really make a difference. If you would like to help shake up the world of research, Digital Science is the place to be. Together, we can make a positive, irreversible change. Please send your CV to careers@digital-science.com.

© 2021 Digital Science & Research Solutions Ltd. All Rights Reserved