Implicit Compression Regularization: Concise Reasoning via Internal Shorter Distributions in RL Post-Training
Summary
A new method called Implicit Compression Regularization (ICR) addresses the "overthinking" problem in large language models (LLMs) trained with reinforcement learning, where models generate excessively long reasoning traces. Unlike traditional length penalties or early-exit strategies that can degrade accuracy or truncate valid reasoning, ICR derives its compression signal from a virtual shorter distribution. This distribution is induced by identifying the shortest correct responses within on-policy rollout groups. The method formalizes overthinking as a negative correlation between response length and accuracy, guiding the policy to favor concise yet correct trajectories. Experiments across three reasoning backbones and multiple mathematical and knowledge-intensive benchmarks demonstrate that ICR consistently shortens responses while maintaining or improving accuracy, achieving a superior accuracy-length Pareto frontier.
Key takeaway
For AI Engineers optimizing LLM inference costs and latency, ICR offers a method to significantly shorten reasoning traces without sacrificing accuracy. You should consider integrating ICR into your reinforcement learning post-training pipeline to achieve more concise and efficient model outputs, especially for applications where response length directly impacts user experience or computational resources.
Key insights
ICR compresses LLM reasoning by targeting naturally shorter, correct responses within on-policy rollouts.
Principles
- Overthinking correlates with negative length-accuracy.
- Shortest correct responses are natural compression targets.
Method
ICR uses a virtual shorter distribution, derived from the shortest correct responses in rollout groups, to regularize on-policy reinforcement learning and guide models toward concise trajectories.
In practice
- Apply ICR to reduce LLM reasoning trace length.
- Improve accuracy-length Pareto frontier in RL-trained LLMs.
Topics
- Implicit Compression Regularization
- Reinforcement Learning
- LLM Reasoning
- Overthinking
- Length-Accuracy Correlation
Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.