Timing Trick Cuts Energy Used in LLM Training by Up to 14 Percent

2026-06-10 · Source: IEEE Spectrum · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Emerging Technologies & Innovation · Depth: Advanced, short

Summary

A research group at the University of Twente has demonstrated a method to reduce energy consumption in large language model (LLM) training by up to 14 percent without sacrificing speed. This technique, presented at the Computing Frontiers conference, addresses the substantial energy footprint of frontier LLMs, exemplified by GPT-4's estimated 50 Gigawatt-hours for training in 2023. Lead author Jeffrey Spaan and his collaborators achieved this by dynamically adjusting GPU clock frequencies using dynamic voltage-frequency scaling (DVFS) at a fine-grained, per-kernel level, rather than per-iteration. While DVFS is a known technique since the 1990s, previous applications to LLM training were either too slow or not precise enough. The team's experiment, training a single layer of GPT-3-xl (a 1.3 billion parameter model) on an Nvidia RTX 3080 Ti GPU, showed 14 percent energy savings with only a 0.6 percent increase in training time. This manual adjustment surpasses automatic GPU DVFS by leveraging foresight into kernel execution.

Key takeaway

For Machine Learning Engineers optimizing LLM training costs, you should investigate dynamic voltage-frequency scaling (DVFS) at the kernel level. This approach can yield up to 14 percent energy savings with minimal performance impact, especially on newer GPUs with faster frequency switching. Consider developing or adopting tools that implement optimal frequency scaling automatically for your specific workloads to maximize efficiency.

Key insights

Fine-grained, per-kernel DVFS can cut LLM training energy by 14% without speed loss.

Principles

Optimize hardware for software.
Manual DVFS outperforms automatic GPU control.
Energy savings depend on GPU switching speed.

Method

Adjust GPU core and memory clock frequencies dynamically at the per-kernel level during LLM training, leveraging foresight of kernel execution.

In practice

Apply per-kernel DVFS to reduce LLM training costs.
Prioritize newer GPUs with faster frequency switching.
Develop tools for automated optimal frequency scaling.

Topics

LLM Training
Energy Efficiency
Dynamic Voltage-Frequency Scaling
GPU Optimization
Deep Neural Networks
Kernel Scheduling

Best for: AI Engineer, NLP Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, AI Hardware Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by IEEE Spectrum.