InfoDensity: Rewarding Information-Dense Traces for Efficient Reasoning

2026-06-06 · Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, long

Summary

InfoDensity is a novel reward framework for training Large Language Models (LLMs) to generate more efficient and high-quality reasoning traces. It addresses the issue of verbose and redundant outputs from reasoning-focused LLMs, which incur high computational costs. An empirical study revealed that high-quality reasoning traces exhibit low uncertainty convergence and monotonic progress, indicating they are informationally dense. InfoDensity integrates an AUC-based reward, a monotonicity reward, and a length scaling term to measure reasoning quality and favor conciseness. Experiments on mathematical reasoning benchmarks, including GSM8K and MATH, used models like Qwen3-0.6B and DeepSeek-R1-Distill-Qwen-1.5B. InfoDensity matches or exceeds state-of-the-art baselines in accuracy while significantly reducing token usage, achieving a strong accuracy-efficiency trade-off. For instance, on DeepSeek-R1-Distill-Qwen-1.5B, it achieved 64.0% average accuracy and reduced tokens by 30%.

Key takeaway

For Machine Learning Engineers optimizing Large Language Models for reasoning tasks, you should consider integrating information-theoretic reward signals into your reinforcement learning pipelines. InfoDensity's approach rewards low uncertainty convergence and monotonic entropy reduction. This offers a robust method to achieve significant token usage reductions (e.g., 30%) without sacrificing accuracy. This can lead to more efficient and cost-effective deployment of reasoning models, especially for mathematical or verifiable tasks.

Key insights

High-quality LLM reasoning traces are informationally dense, showing low uncertainty convergence and monotonic progress in entropy reduction.

Principles

Reasoning quality is tied to information density, not just length.
High-quality traces converge to low uncertainty.
High-quality traces exhibit monotonic entropy reduction.

Method

InfoDensity combines an AUC-based reward for low uncertainty convergence and a monotonicity reward for step-by-step entropy reduction, weighted by a length scaling term, for RL training.

In practice

Use conditional entropy to quantify reasoning quality.
Incorporate AUC and monotonicity rewards in RL objectives.
Consider external judge models for stable reward signals.

Topics

Large Language Models
Reinforcement Learning
Reasoning Traces
Information Theory
Entropy Reduction
Computational Efficiency

Code references

anonymous/InfoDensity

Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.