We believe there’s a smaller section of the network that does the same job as the entire network but at a lower energy cost. It’s like a holy grail or the winning 'lottery ticket'. Finding it would allow us to prune back lots of the dead weight, thereby reducing computational costs while maintaining performance.
Dr Frederik Mallmann-Trenn, Senior Lecturer in Computer Science (Data Science), Department of Informatics, King’s College London and co-author of the study
23 January 2025
New approach to training AI could significantly reduce time and energy involved
Pruning neural networks during training could make AI more efficient, sustainable and accessible
An approach that could significantly reduce the time and energy required to train AI has been investigated by a team of researchers from King’s College London and the Université Côte d’Azur. The hope is that this ushers in a new era for training AI.
The model involves refining a neural network instead of growing it during training, by identifying and isolating the essential parts that produce accurate results. This could have significant implications for both computational costs and environmental impact, as training large AI models requires vast amounts of energy and data.
Inspired by the brain, Artificial Neural Networks (ANNs) consist of layers of connected neurons, with each connection assigned a weight. Just as synapses in the brain can become stronger or weaker based on repeated use, neural network weights similarly strengthen connections between artificial neurons.
During training, which can take days or even weeks, the network gradually adjusts these weights by processing input data through multiple rounds, based on turning out accurate responses. This iterative process allows the ANN to refine its connections and improve its ability to recognise patterns and make increasingly accurate predictions. However, ANNs can become enormous, with millions or even billions of weights and connections, leading to increased computational cost and energy-intensive training and use.
The boom in AI technologies is driving dramatically increased energy demands. Major tech companies such as Microsoft and Google have reported rising emissions by 30 and 50 per cent respectively, largely due to data centres used to train and operate AI models. Estimates suggest training a GPT-3 model consumes 1300 megawatt hours, equivalent to the annual power consumption of 130 US homes, while a more advanced GPT-4 model can consume up to 50 times more electricity.
“This is where the concept of sparsity - using only a small fraction of the network’s weight - becomes important. Sparsity could make these networks faster and more resource-efficient, allowing us to achieve the same performance while ‘trimming the fat’”, said Dr Frederik Mallman-Trenn, Senior Lecturer in Computer Science (Data Science), Department of Informatics, King’s College London and co-author of the study.
To show the existence of spare ANNs, the researchers follow an approach called the Strong Lottery Ticket Hypothesis (SLTH), which suggests that hidden within large networks are subnetworks or ‘winning tickets’ – that contain all the information needed to perform well – just waiting to be found.
“We believe there’s a smaller section of the network that does the same job as the entire network but at a lower energy cost. It’s like a holy grail or the winning ‘lottery ticket’. Finding it would allow us to prune back lots of the dead weight, thereby reducing computational costs while maintaining performance”, said Dr Frederik Mallmann-Trenn.
Using a mathematical challenge called ‘Random Fixed-Size Subset Sum’ problem, they gained insights into which layers and structures within a network are essential for performance, making network trimming more effective.
Finding them could fundamentally transform how AI models are designed, trained and deployed, potentially leading to more compact, energy-efficient and computationally sustainable AI systems.
Dr Frederik Mallmann-Trenn
While promising, the Strong Lottery Ticket Hypothesis remains just that: a hypothesis. Whilst the researchers know they exist, they don’t know how to find them yet. The next step involves algorithm testing and development to identify these ‘winning tickets’ within complex neural networks. Dr Mallmann-Trenn said, “finding them could fundamentally transform how AI models are designed, trained and deployed, potentially leading to more compact, energy-efficient and computationally sustainable AI systems.”
More efficient neural networks could enable smarter, more responsive technologies influencing any sectors that are more reliant on energy efficient solutions, including low-powered devices like smartphones, wearables, and smart home systems. They may also bring sophisticated computational capabilities to areas previously limited by power and infrastructure constraints, such as in remote locations.
The research, ‘On the Sparsity of the Strong Lottery Ticket Hypothesis’ (Natale et al., 2024), was presented at the 38th Annual Conference on Neural Information Processing Systems. This work was a collaborative effort between Dr Emanuele Natale, Davide Ferré, and Dr Frédéric Giroire (Université Côte d’Azur), and Dr Frederik Mallmann-Trenn and Dr Giordano Giambartolomei (Department of Informatics, King’s College London).