AI Needs New Facts – The Value of Novel Scientific Research

4th June 2025

At SXSW London, I had the pleasure of seeing DeepMind co-founder Demis Hassabis speak on the future of artificial intelligence. Among many thought-provoking points, two remarks stuck with me. First, he emphasised the importance of understanding the fundamentals. Second, he championed the scientific method as a guiding principle for making meaningful progress in AI. 

DeepMind co-founder Demis Hassabis interviewed at SXSW London 2025. Photo by Mark Hahnel.

As someone who works at the intersection of open research and AI, I’ve found myself returning to a specific question: What kind of content truly matters to AI? I have previously spoken of an AI powered flywheel effect in research, where each cycle follows this pattern:

  1. Raw data is processed by AI to generate initial research outputs
  2. Knowledge extraction tools mine these outputs for higher-order insights
  3. These insights form a new, refined dataset
  4. AI processes this refined dataset, generating more precise analyses
  5. The cycle continues, with each rotation producing more valuable knowledge

I interpret “understanding the fundamentals” as being the base layer, the raw data. Models can mimic almost any writing style and generate endless reams of text. Not all content is created equal. I have witnessed this first hand as self declared “academics” from around the globe use generalist repositories to post non peer-reviewed content written by LLMs which proves their genius.

AI systems, particularly large language models, rely on data to learn. Not just more data, but better data; data that reveals new structure in the world. Without novel input, AI models will become better at rephrasing the known, but not at understanding the unknown. At its core, science is a method for producing high-quality, structured novelty – a repeatable process for generating new facts, testing them against reality, and sharing them with the world. Basic research, often funded for its long-term potential rather than short-term applications, is the primary engine of this kind of content. AlphaFold succeeded because it was trained on grounded, empirical data from the protein databank.

If we want AI to continue advancing in a meaningful way that uncovers new knowledge, we need to prioritise access to and support for novel scientific research. That means supporting open science. It means investing in infrastructure that ensures new data and discoveries are FAIR (Findable, Accessible, Interoperable, and Reusable). It means rethinking publication practices to encourage the dissemination of negative results, replication studies, and raw data. And it means funding basic research across the globe.

China has significantly increased its investment in basic research, aiming to reduce reliance on foreign technology and achieve self-sufficiency in foundational sciences. China’s spending on basic research passed 6% of its total R&D in 2023 and continues to rise. China seems to be an outlier. Here in the UK for example, basic research through UKRI continues but often faces pressures to demonstrate short-term economic impact. In the US, the NSF and NIH continue to support basic research, but federal R&D budgets have shifted toward mission-driven, applied research. The Inflation Reduction Act and CHIPS and Science Act brought some uplift, but basic science still receives a minority share of total R&D.

The geopolitical landscape is hard to predict. But the narrative is that all countries want to compete on the AI stage. Radical abundance only will happen in your country if you have some control of the input to the models. 

Former UK Prime Minister Tony Blair and the UK Secretary of State for Science, Innovation and Technology, The Rt Hon Peter Kyle MP, interviewed at SXSW London 2025. Photo by Mark Hahnel.

At a separate SXSW London session, we also heard from former UK Prime Minister Tony Blair and The Rt Hon Peter Kyle MP, UK Secretary of State for Science, Innovation and Technology. Peter Kyle’s plans for integrating AI into the UK government are commendable, and seem to be advancing at a pace that the UK government is not famous for. A comment from Tony Blair however highlighted what is at stake if we don’t fund basic research: “It is amazing to me that we are not feeding all of the NHS data into these AI models.”

Whether you trust this government with your most sensitive data or not, we will have future governments who may not follow your best interests – in the same way that LLM models are ignoring copyright on academic publications today, they may ignore human ethics when it comes to your medical data. Feeding the NHS into LLMs is not the answer. The easiest way to generate new data for the AI models with a view to advance science and technology is to fund more basic research and insist that the outputs be made open in a FAIR manner.

There is so much to gain. The value of novel scientific research has never been higher.

Share this article
Link copied to clipboard

Subscribe to our newsletter

Explore More From Digital Science
All TL;DR Videos