NLP Series: Applying Natural Language Processing to a Global Patent Database

The latest article in our blog series on Natural Language Processing is from Catherine Suski, Director of Marketing at IFI CLAIMS Patent Services. Catherine has a passion for technology, and enjoys working in an area where she can see the direct impacts of implementing new tech into existing processes. Here Catherine will be talking about the benefits of using NLP to create an inclusive global patent database.

The role of NLP in inclusive data curation of patent information

CLAIMS Direct is a global patent database created by IFI CLAIMS Patent Services (IFI). NLP allows for the vast amount of information contained in patents to be applied to many situations. Through the curation of data, such as the standardisation of organisations, data can be amalgamated from a range of original sources. Using NLP, this patent information can also be translated into English from over 40 languages. By curating the data in this way, researchers can quickly access information from a broad range of original sources.

IFI receives inquiries from companies that require access to patent information for a range of use cases. From discovering important new invention types for use in investment decisions, to analysing the effects of government programmes on regional economic stimulus, the analysis of patent documents is becoming more widespread.

The growth of inexpensive and ever more powerful computing has led to easier methods for extracting meaningful data from patents, and NLP is a prime example of this. This technology is absolutely vital because, according to the 2019 report from the World Intellectual Property Organization (WIPO), 3.3 million patent applications were filed globally in 2018. This is almost twice the 1.85 million filed in 2008. There are more than 14 million active patents globally. With this many applications, it would be impossible to manually search for relevant information. Enter NLP.

Using NLP to overcome the language barrier of global patent information

With so many global patents that can contain important information, accurate translations are a must. Machine translation, or the use of computer software to perform translations, has been used for decades to translate patents. Recent advances employing NLP are speeding up this process. Early attempts looked at each word or phrase and translated it, however new techniques look at the overall context to provide higher quality results.

CLAIMS Direct, the global patent database and platform from IFI, uses Google Translate to convert documents in 48 languages to English. Based on neural network technology, one of the several driving forces behind NLP, Google Translate offers an exceptional level of accuracy. It overcomes problems found in most older phrase-based machine translation systems that do not sample a large enough segment of text to produce a proper translation. Using a large end-to-end network, this technology translates whole sentences or paragraphs at a time to provide context, and uses machine learning to continually make improvements over time.

Patent documents are often used by organisations and individuals who seek to patent something themselves. To be awarded a patent, the concept cannot infringe on another patent, and must also be a novel idea. Making a mistake by missing an existing publication or previously granted patent can lead to costly infringement lawsuits. The stakes are high and there will always be a big incentive to get it right. It is, therefore, common for many people to be involved in researching previous patent data, often employing multiple search methods.

While the exact format of a patent can vary by region, they have a number of structured data elements in common including invention title, inventor, submission data, active or inactive status, etc. This information, stored in named fields, is accessible in databases and is easy to search for. However, the body of a patent can contain far more useful free range text, or unstructured data, that is not parsed into fields and is difficult to search for with keywords and legacy search engines.

Search tools that use NLP can reveal crucial ideas contained in patent literature more easily than traditional methods which rely on keyword matches. Patent documents can be written using language which is meant to obscure the true nature of the invention, with the aim of keeping the subject matter hidden from competitors. Sometimes even technical subject matter experts cannot clearly see the idea being put forward. With the use of semantic and NLP algorithms, improved accuracy is achieved by ingesting large areas of text, examining the context, and making connections that are not otherwise obvious. The use of synonyms can also uncover new and relevant documents. Search intent of the user is better understood, and uniting all of these capabilities saves a huge amount of time.

Traditional use cases for NLP in patent documents

In business activities where intellectual property (IP) traditionally plays a large role, such as engineering and developing new drugs, some very successful new products incorporating NLP are improving the patent search process. Many clients of IFI have used CLAIMS Direct to build features such as:

Integration with other data sources: In addition to patents, searchable indexes can include scientific publications, internal research, websites, and other industry specific knowledge sources.
Text mining with specific vocabulary: This is especially important to the related industries of life sciences, biotechnology, and pharmaceuticals. For example, when developing new therapies, gene and disease target scanning can find research from another company that may be applicable to a new invention.
Clustering and categorisation: While patents from most countries use a common classification system, it is limited, and not industry specific. Some applications use pre-built tools tailored to different business requirements, while others allow users to set up their own requirements. The resulting visualisations provide quick insights about the latest inventions in any given field.
Relevancy scoring; With traditional search tools, results are ranked. Taking this a step further and providing a percentage score for relevancy shows the user a more finely-tuned answer.
Results delivered in an interactive framework; Search results can be refined by choosing “more like” in a field of related concepts. For example, when searching for “wind” a semantic application could give results that include wind turbines, wind-up clocks, and wind speed. The user can then select the most relevant category.

New use cases for NLP in patents

Advances in NLP have resulted in it becoming a lot easier to extract important information which used to be hidden in patent documents. This has led to a range of new use cases.

Patents are making their way onto the trading floor. Fund managers want to know which technologies are on the verge of quick growth, and who owns them, in order to inform investment decisions. Here, well indexed, easy to search patent data is crucial. By adding a data source such as CLAIMS Direct to their fast-moving algorithmic trading systems, they are utilising NLP to find hidden tips, enabling analysts to create better reports.

Management consulting companies are getting in on the action too. They need to keep clients informed about the most up-to-date technology and competitive intelligence across the globe. Knowing when relevant patents have been published or granted can be a game changer. NLP offers consultants the ability to quickly uncover trends important to their clients, while improving efficiency through automated workflows. Clustering visualisations makes the information easier to understand.

As the technology continues to evolve, more use cases for patent information will emerge. We look forward to implementing these advances into our processes at IFI CLAIMS Patent Services, to continue to be as inclusive of, and useful to, the wider research community as we can possibly be.

See more posts in this NLP series