As they stand, data libraries in this country are fundamentally a pre-AI resource...That’s why a robust and interoperable approach to the National Data Library is needed.”
Professor Elena Simperl
03 February 2025
King's scientists' plans for AI-ready National Data Library could turbocharge economy
A new paper outlines the framework necessary to help “mainline AI into the veins” of the UK economy safely and responsibly.
![Whitehall and Dowing Street](/newimages/nmes/news/elena-national-data-library/whitehall-and-dowing-street.xb5775c40.jpg?f=webp)
Computer scientists have proposed a new road map for establishing a National Data Library, a key element of the recent AI Opportunities Action Plan that promises to use AI to boost the UK economy.
The researchers hope that by incorporating global data standards and cutting-edge semantic technology into their approach, their framework will allow for large datasets that can accurately train AI models at scale, helping to unlock the UK’s AI potential.
The AI Opportunities Action Plan outlines how government can leverage AI to make a positive impact on people’s lives, from cutting admin time for teachers, boosting productivity in the workplace and accelerating the rate of scientific discovery.
However, the accuracy of AI models depends on the quality of data they are trained on. Currently, there are fears amongst many computer scientists that we could soon run out of high-quality data, and issues of data accuracy, standardisation and security raise questions about dangerous AI hallucinations and access to sensitive data like health records.
To deal with this, the Plan posits a publicly available ‘National Data Library’ that brings together widely available public data, like planning permission information, with secured sensitive information like NHS documents, in a machine-readable format to train next-generation AI safely and accurately.
Commenting on current data landscape, lead author Professor Elena Simperl said “As they stand, data libraries in this country are fundamentally a pre-AI resource. It will take a lot of work to turn the SharePoints on the screens of civil servants today into the secure, high-quality data that will train the AI that will speed up someone’s planning application or help a researcher find the next historical revelation in the national archives. That’s why a robust and interoperable approach to the National Data Library is needed.”
In ‘How an AI-ready National Data Library would help UK science’, published by the Wellcome Collection, Department of Informatics researchers Professor Elena Simperl and Dr Albert Meroño Peñuela set out the architecture for a National Data Library that can be read by AI models from day one.
The paper is one of the first contributions by academic AI experts in the country, and one of the only plans settling out an implementable AI-ready National Data Library.
The team, alongside collaborators at the Open Data Institute, put forward a system which borrows from existing worldwide data interoperability formats and the latest research on knowledge graphs and security protocols.
Getting the architecture right in the National Data Library is going to be vital if we are to capitalise on the benefits of AI and cut out the well-publicised risks of AI hallucination. Academics like me must be part of this national conversation if the government is to truly deliver a ‘decade of renewal’ using AI."
Professor Elena Simperl
Knowledge graphs are machine-readable databases that bring together individual points of data and contextual reasoning to help AI models and other applications display facts alongside their provenance, helping prevent models from displaying dangerous hallucinated fiction as fact.
However, even when creating knowledge graphs, there is no standardised method of organising data and files in datasets used to train AI. This means that many datasets are siloed, used only for specific models by specific people, and lacking documentation about what they represent. This prevents the data being used responsibly by others, either limiting the available data to train AI or potentially introducing inaccuracy into the model.
By providing features like standardised metadata within a knowledge graph format, datasets can utilise data from several different formats responsibly, referred to as interoperability, in a way that is also explainable and cuts down hallucinations.
Reflecting on the need for the latest research expertise in national projects, Professor Simperl said, “Getting the architecture right in the National Data Library is going to be vital if we are to capitalise on the benefits of AI and cut out the well-publicised risks of AI hallucination. Academics like me must be part of this national conversation if the government is to truly deliver a ‘decade of renewal’ using AI.”