Timing Trick Cuts Energy Used in LLM Training by Up to 14 Percent
Summary
A research group at the University of Twente has demonstrated a method to reduce energy consumption in large language model (LLM) training by up to 14 percent without sacrificing speed. This technique, presented at the Computing Frontiers conference, addresses the substantial energy footprint of frontier LLMs, exemplified by GPT-4's estimated 50 Gigawatt-hours for training in 2023. Lead author Jeffrey Spaan and his collaborators achieved this by dynamically adjusting GPU clock frequencies using dynamic voltage-frequency scaling (DVFS) at a fine-grained, per-kernel level, rather than per-iteration. While DVFS is a known technique since the 1990s, previous applications to LLM training were either too slow or not precise enough. The team's experiment, training a single layer of GPT-3-xl (a 1.3 billion parameter model) on an Nvidia RTX 3080 Ti GPU, showed 14 percent energy savings with only a 0.6 percent increase in training time. This manual adjustment surpasses automatic GPU DVFS by leveraging foresight into kernel execution.
Key takeaway
For Machine Learning Engineers optimizing LLM training costs, you should investigate dynamic voltage-frequency scaling (DVFS) at the kernel level. This approach can yield up to 14 percent energy savings with minimal performance impact, especially on newer GPUs with faster frequency switching. Consider developing or adopting tools that implement optimal frequency scaling automatically for your specific workloads to maximize efficiency.
Key insights
Fine-grained, per-kernel DVFS can cut LLM training energy by 14% without speed loss.
Principles
- Optimize hardware for software.
- Manual DVFS outperforms automatic GPU control.
- Energy savings depend on GPU switching speed.
Method
Adjust GPU core and memory clock frequencies dynamically at the per-kernel level during LLM training, leveraging foresight of kernel execution.
In practice
- Apply per-kernel DVFS to reduce LLM training costs.
- Prioritize newer GPUs with faster frequency switching.
- Develop tools for automated optimal frequency scaling.
Topics
- LLM Training
- Energy Efficiency
- Dynamic Voltage-Frequency Scaling
- GPU Optimization
- Deep Neural Networks
- Kernel Scheduling
Best for: AI Engineer, NLP Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, AI Hardware Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by IEEE Spectrum.