In association with Springer Nature, the Digital Science Consultancy and Dimensions teams have worked together to build a classification system of research associated with the United Nations Sustainable Development Goals (SDGs) from the Dimensions publications database. This ongoing work uses machine learning and semi-automatically generated training data.
In the first phase of the project, an automated approach was used to categorise scholarly articles into five of the SDGs employing supervised machine learning whereby curated training data fed machine learning algorithms to automatically build a classification model that was then used to categorise new articles without human involvement.
The second phase saw the project widened to include all 17 SDGs. Springer Nature and the Digital Science Consultancy and Dimensions teams collaborated in this phase and Springer Nature subject matter experts assisted by manually assessing keywords for the training sets and identifying any false positives. The same automated approach was used for the research classification in Dimensions, the results of which will lead to the next iteration in Dimensions automatically assigning an SDG category to publications and grants, etc, when they are added to the Dimensions database.