InfoDensity: Rewarding Information-Dense Traces for Efficient Reasoning
Summary
InfoDensity is a novel reward framework for training Large Language Models (LLMs) to generate more efficient and high-quality reasoning traces. It addresses the issue of verbose and redundant outputs from reasoning-focused LLMs, which incur high computational costs. An empirical study revealed that high-quality reasoning traces exhibit low uncertainty convergence and monotonic progress, indicating they are informationally dense. InfoDensity integrates an AUC-based reward, a monotonicity reward, and a length scaling term to measure reasoning quality and favor conciseness. Experiments on mathematical reasoning benchmarks, including GSM8K and MATH, used models like Qwen3-0.6B and DeepSeek-R1-Distill-Qwen-1.5B. InfoDensity matches or exceeds state-of-the-art baselines in accuracy while significantly reducing token usage, achieving a strong accuracy-efficiency trade-off. For instance, on DeepSeek-R1-Distill-Qwen-1.5B, it achieved 64.0% average accuracy and reduced tokens by 30%.
Key takeaway
For Machine Learning Engineers optimizing Large Language Models for reasoning tasks, you should consider integrating information-theoretic reward signals into your reinforcement learning pipelines. InfoDensity's approach rewards low uncertainty convergence and monotonic entropy reduction. This offers a robust method to achieve significant token usage reductions (e.g., 30%) without sacrificing accuracy. This can lead to more efficient and cost-effective deployment of reasoning models, especially for mathematical or verifiable tasks.
Key insights
High-quality LLM reasoning traces are informationally dense, showing low uncertainty convergence and monotonic progress in entropy reduction.
Principles
- Reasoning quality is tied to information density, not just length.
- High-quality traces converge to low uncertainty.
- High-quality traces exhibit monotonic entropy reduction.
Method
InfoDensity combines an AUC-based reward for low uncertainty convergence and a monotonicity reward for step-by-step entropy reduction, weighted by a length scaling term, for RL training.
In practice
- Use conditional entropy to quantify reasoning quality.
- Incorporate AUC and monotonicity rewards in RL objectives.
- Consider external judge models for stable reward signals.
Topics
- Large Language Models
- Reinforcement Learning
- Reasoning Traces
- Information Theory
- Entropy Reduction
- Computational Efficiency
Code references
Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.