Today we launch our blog series on Natural Language Processing, or NLP. A facet of artificial intelligence, NLP is increasingly being used in many aspects of our every day life, and its capabilities are being implemented in research innovation to improve the efficiency of many processes.

Over the next few months, we will be releasing a series of articles looking at NLP from a range of viewpoints, showcasing what NLP is, how it is being used, what its current limitations are, and how we can use NLP in the future. If you have any burning questions about NLP in research that you would like us to find answers to, please email us or send us a tweet. As new articles are released, we will add a link to them on this page.

Isabel Thompson

Our first article is an overview from Isabel Thompson, Head of Data Platform at Digital Science. Her day job is also her personal passion: understanding the interplay of emerging technologies, strategy and psychology, to better support science. Isabel is on the Board of Directors for the Society of Scholarly Publishing (SSP), and won the SSP Emerging Leader Award in 2018. She is on Twitter as @IsabelT5000

 

NLP is Here, it’s Now – and it’s Useful

I find Natural Language Processing (NLP) to be one of the most fascinating fields in current artificial intelligence. Take a moment to think about everywhere we use language: reading, writing, speaking, thinking – it permeates our consciousness and defines us as humans unlike anything else. Why? Because language is all about capturing and conveying complex concepts using symbols and socially agreed contracts – that is to say: language is the key means of transferring knowledge. It is therefore foundational to science.

We are now in the dawn of a new era. After years of promise and development, the latest NLP algorithms now regularly score more highly than humans on structured language analysis and comprehension tests. There are of course limitations, but these should not blind us to the possibilities. NLP is here, it’s now – and it’s useful.

NLP’s new era is already impacting our daily lives: we are seeing much more natural interactions with our computers (e.g. Alexa), better quality predictive text in our emails, and more accurate search and translation. However, this is just the tip of the iceberg. There are many applications beyond this – many areas where NLP makes the previously impossible, possible.

Perhaps most exciting for science at present is the expansion of language processing into big data techniques. Until now, the processing of language has been almost entirely dependent on the human mind – but no longer. Machines may not currently understand language in the same way that we do (and, let’s be clear, they do not), but they can analyse it and extract deep insights from it that are broader in nature and greater in scale than humans can achieve.

For example, NLP offers us the ability to do a semantic analysis on every bit of text written in the last two decades, and to get insight on it in seconds. This means we can now find relationships in corpuses of text today that it would previously have taken a PhD to discover. To be able to take this approach to science is powerful, and this is but one example – given that so much of science and its infrastructure is rooted in language, NLP opens up the possibility for an enormous range of new tools to support the development of scientific knowledge and insight.

Google’s free NLP sentence parsing tool

Google’s free NLP sentence parsing tool

NLP is particularly interesting for the research sector because these techniques are – by all historical comparisons – highly accessible. The big players have been making their ever-increasingly good algorithms available to the public, ready for tweaking into specific use cases. Therefore, for researchers, funding agencies, publishers, and software providers, there’s a lot of opportunity to be had without (relatively-speaking) much technical requirement.

Stepping back, it is worth noting that we have made such extreme advances in NLP in recent years due to the collaborative and open nature of AI research. Unlike any cutting edge discipline in science before, we are seeing the most powerful tools open sourced and available for massive and immediate use. This democratises the ability to build upon the work of others and to utilise these tools to create novel insights. This is the power of open science.

Here at Digital Science, we have been investigating and investing in NLP techniques for many years. In this blog series, we will be sharing an overview of what NLP is, examine how its capabilities are developing, and look at specific use cases for research communication – to demonstrate that NLP is truly here. From offering researchers writing support and article summarisation, to assessing reproducibility and spotting new technology breakthroughs in patents, all the way through to the detection and reduction of bias in recruitment: this new era is just getting started – where it can go next is up to your imagination.

Look out for the next article in our series, “What is NLP?”, and follow the conversation using the hashtag #DSreports.