Software Engineer – Data Engineering
Europe | London
No closing date
At Digital Science we are looking for a Software Developer to contribute to our Dimensions product.
As a Software and Data Engineer, you will help us take scientific publications and other documents and enrich them in a variety of ways: by adding new sources of data while identifying and merging duplicate documents across the sources, by linking different content types together, by normalising entities like people and places to a canonical form, and by discovering and implementing new ways of adding insight to the raw data.
You will be part of an experienced and well skilled technical team, with a clear vision of the technological and engineering goal within the exciting setting of an international and agile company.
With Dimensions, Digital Science launched an innovative research data and tool infrastructure, broadening the view of the research landscape after decades of focus on the publication/citation complex. Dimensions is a research tool that interlinks multiple data sets (grant applications, publications, clinical trials, patents applications, policy documents). Based on these data sets and by using external services it provides metrics like attention and citation score and makes it possible to perform complex analysis on the data. In total, Dimensions contains today more than 400 million documents with more than 4 billion connections between these records. Our customers are researchers, research organizations, publishers and government and funder organizations from all around the world. Dimensions has offices in Germany, Romania, US and UK, serving clients globally. For more information please visit dimensions.ai or try the free version of the Dimensions app at app.dimensions.ai.
Your new role
- Extend and implement rule-based and text mining based machine learning tools to disambiguate data sources and find links between data types
- Extract, transform and load data into a variety of data stores such as PostgreSQL and Google BigQuery
- Analyse data and enrichment outputs to find areas for improvement
- Build tools and create reports for performing quality assurance on our work
- Gather and analyse new sources of data which may be valuable to add to Dimensions
- Write well-crafted, well-tested, readable, maintainable code
- Deal with large datasets (100M+ documents with over one billion links)
- Work with Amazon Web Services
- Work with our Kubernetes based processing pipeline
- What you’ll bring to the role
- Several years of professional experience with Python development
- Working with medium to large scale data processing
- Using, creating and working with SQL databases
- Using distributed version control systems (git)
- Understanding of Agile methodologies
- Ability to work on intricate details without losing the big picture
- Self-learner, possessing inherent inquisitiveness
- Good problem solving and analytical skills
- Strong interpersonal, communication, and organizational skills
- Minimum Bachelor degree in Computer Science or a related field, or equivalent
We invest in, nurture and support innovative businesses and technologies that make all parts of the research process more open, efficient and effective. The people we secure are fundamental to us achieving our vision and our growth plans. The values we live by are:
- We are brave in the pursuit of better
- We are collaborative and inclusive
- We are always open-minded
- We are from and for the community
Ready to apply?
We offer a competitive benefits package, the opportunity to work flexibly, and the possibility to really make a difference. If you would like to help shake up the world of research, Digital Science is the place to be. Together, we can make a positive, irreversible change. Please send your CV to firstname.lastname@example.org.