We’re continuing our blog series on Natural Language Processing with a brief guide to what it is, where it is being used, and why it is exciting news for research. Later this week, we will be hearing from Steve Scott, Director of Portfolio Development at Digital Science, to find out a bit more about why Digital Science are excited by NLP.

What is NLP?

Not to be confused with neuro linguistic programming, natural language processing, or NLP, is the way technology can interact with humans through words. These words could be written, spoken or heard, as input or output. NLP is a subset of artificial intelligence and machine learning, whereby systems are, in this case, able to ‘learn’ words in a language by analysing a range of input sources, or training data. The system can start to make sense of the patterns in text and dialogue through statistical analysis and the formation of algorithms. The system does not need programming; it simply picks up the ability to create words or a sequence of words that seem statistically likely given the contents of the training data and the context of the query.

Where have I encountered it in my everyday life?

Though you may not have heard of the term NLP, you are highly likely to have used it in your everyday lives. NLP has the capacity to add a level of efficiency to many tasks. Take good old autocarrot. I mean, autocorrect! The little bit of tech that thinks it knows best can often be a useful tool when you mistype a word or aren’t sure of the spelling. It is, however, widely regarded as a source of great hilarity when it does get things wrong, or just hasn’t yet learned a new word in context; one example being my own name, Suze, which frequently autocorrects to ‘Size’ – ironic as I am of rather diminutive stature. Even spelling and grammar checkers are based on NLP technology. These programs are constantly reading and referencing the words we write. They compare them to the likelihood of these words being correct based on patterns that have been determined by the same programs reading ‘training data’ from a range of sources. Similarly, having learned not only the spelling of words but the likely order of words based on rules of grammar and sentence structure, predictive text and autocomplete are also examples of NLP used in everyday life.

Beyond the ability to monitor and suggest, NLP also has the ability to translate, whether that is from speech into text, such as the dictation feature on many phones, or whether it is your favourite language-to-language translation service. The latter of these features is being used by IFI Claims to ensure that their patent database is as inclusive as possible, by extracting information in languages other than English and indexing that extracted information appropriately. The same application of NLP is used in apps that help you learn a new language. There are some limitations though, as a quick search for ‘funny Duolingo phrases’ will attest to; while the sentence structure of some of Duolingo’s best offerings make sense, the meaning can sometimes be lost in translation, so it certainly wouldn’t be able to pass a Turing test any time soon!

The one where the Duolingo app has been using the sitcom Friends as training data.

Mimicking human conversation is however a common application of NLP. If you have recently asked for online help with an issue, you may be directed to a live chat function that will triage your query as best it can. Often these first stages are led entirely by NLP, for example when you are asked what your query is regarding, which order it relates to, and what the problem is. Based on your responses, it will offer up a range of solutions, before asking whether your query has been resolved. If you are unsatisfied with the help offered, only then will you be transferred to an actual human assistant who is often already prepped with the key information about your query, increasing the efficiency of service offered to you.

This increase in efficiency of processes that can be analysed for expected patterns and routines is where NLP is most commonly applied, but where have we seen NLP being used in research?

Applications of NLP in research

NLP can be applied in many stages of research. Steve Scott, Digital Science’s Director of Portfolio Development will be diving into some of his favourite case studies from the Digital Science family of portfolios in our next article in the series. Steve will be covering everything from NLP’s ability to pick out keywords in published research and forming links, as seen within Dimensions, to the way that Ripeta can ‘read’ a research paper and look for key components that indicate the robustness and repeatability of the research carried out.

However NLP features in many more ways across the Digital Science family, such as the IFI Claims patent database that translates patent information from a range of source languages to create the most inclusive resource possible, to Writefull’s ability to create suggestions on how scientific writing can be improved based on similar text that it has ‘read’. Catalyst Grant winners Paper Digest’s tool can also ‘read’ a journal article and create a paragraph abstract of the key points in the paper in layman’s terms, in order to allow researchers and communicators of research alike to quickly determine whether a paper is of relevance to them or not.

Some of our portfolio’s tools support the research community using NLP-based add-on programs, such as chemRxiv, powered by figshare, which utilised iThenticate to detect plagiarism in submitted articles by ‘reading’ the articles and comparing them to other available resources for matching sentences and paragraphs.

What can NLP do for me in the future?

The brains behind these amazing innovations will be contributing longer pieces to this blog series where they dive into their successes and challenges of implementing NLP within their systems. We will also be hearing from Scismic who will discuss how they hope to implement NLP into their inclusive research recruitment tool to make it even better, while Joris van Rossum will discuss some of the challenges we still face when using NLP, and how we will be able to overcome these.

The ultimate goal of NLP is to make things more efficient, and therefore more productive, whether that is through more inclusive gathering and better linking of research information, or by making research information easier to understand quickly, by improving the quality of research outputs through checking for repeatable research or appropriate use of scientific language, or even by checking for plagiarism. However, this is just the start. NLP is already being used as a research tool, to identify patterns and narrow down statistically likely positive results in a range of scenarios. At Digital Science, we can’t wait to learn from, nurture and support the next wave of machine learning innovations, and to share the results of the more productive research that results from it.