Software Developer – harvesting data (Iasi)
Digital Science | Iasi
At Digital Science we are looking for a Software developer for harvesting science related data from the internet to contribute to our Dimensions platform. You will be part of an experienced and well skilled technical team, with a clear vision of the technological and engineering goal within the exciting setting of an international and agile company. The development language is Python. If you are not (yet) a professional Python developer but have an affinity for harvesting data and enthusiasm to really get into Python, you will be taught on the job by your fellow team mates.
With Dimensions, Digital Science launched an innovative research data and tool infrastructure, broadening the view of the research landscape after decades of focus on the publication/citation complex. The guiding principle, to deliver context, was to take different data sets out of their silos to create a heavily interlinked overarching dataset that described the whole research lifecycle: from funding input (grants), through research outputs (publications) and translation / application of research results (clinical trials, patents), attention (altmetric and citations) and finally to policy-level impact (mentions of research results in policy papers).
In total, Dimensions today contains more than 150 million documents with more than 4 billion connections between these records. For more information please visit https://dimensions.ai or try the free version of the Dimensions app at https://app.dimensions.ai. Dimensions has offices in Germany, Romania, US and UK, serving clients globally.
- Harvest websites using raw http requests and automating browser using tools like Selenium or Scrapy, using all kind of web based APIs (REST, SOAP, …)
- Implement batch jobs to retrieve data via common web protocols like HTTP, FTP, …
- Extract data from various source formats by implementing heuristics to extract data
- Being confronted with very different document formats, reaching from not very well formed HTML code or PDF documents to standard file formats like XML, JSON, CSV and using the default tools to process them (like XPATH)
- Store extracted data in sql databases (mainly PostgreSQL) in a generic format
- Integrate code into our data pipeline driving our whole data processing infrastructure
We are looking for:
- Relevant software development experience (preferably in Python language)
- Basic Linux working experience and willingness to improve it
- Ability to work on intricate details without losing the big picture.
- Experience with Amazon Web Services or eager to learn about it
- Nice to have experience with application containers (preferably Docker)
- Experience in distributed version control systems (git)
- Understanding of Agile methodologies
- Must be a self-learner, possessing inherent inquisitiveness
- Good problem solving and analytical skills
- Strong interpersonal, communication, and organizational skills
- Minimum Bachelor degree in Computer Science or a related field, or equivalent
What We Offer
- Be part of an international team distributed all over the globe
- Relaxed work environment that values innovation, initiative, and energy
- On a rainy day you can choose to work remotely, so most communication happens via video calls using Google Hangout
- Competitive salary based on experience
- Flexible working hours
- Hand pick your hardware