InfoDensity: Rewarding Information-Dense Traces for Efficient Reasoning

· Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, long

Summary

InfoDensity is a novel reward framework for training Large Language Models (LLMs) to generate more efficient and high-quality reasoning traces. It addresses the issue of verbose and redundant outputs from reasoning-focused LLMs, which incur high computational costs. An empirical study revealed that high-quality reasoning traces exhibit low uncertainty convergence and monotonic progress, indicating they are informationally dense. InfoDensity integrates an AUC-based reward, a monotonicity reward, and a length scaling term to measure reasoning quality and favor conciseness. Experiments on mathematical reasoning benchmarks, including GSM8K and MATH, used models like Qwen3-0.6B and DeepSeek-R1-Distill-Qwen-1.5B. InfoDensity matches or exceeds state-of-the-art baselines in accuracy while significantly reducing token usage, achieving a strong accuracy-efficiency trade-off. For instance, on DeepSeek-R1-Distill-Qwen-1.5B, it achieved 64.0% average accuracy and reduced tokens by 30%.

Key takeaway

For Machine Learning Engineers optimizing Large Language Models for reasoning tasks, you should consider integrating information-theoretic reward signals into your reinforcement learning pipelines. InfoDensity's approach rewards low uncertainty convergence and monotonic entropy reduction. This offers a robust method to achieve significant token usage reductions (e.g., 30%) without sacrificing accuracy. This can lead to more efficient and cost-effective deployment of reasoning models, especially for mathematical or verifiable tasks.

Key insights

High-quality LLM reasoning traces are informationally dense, showing low uncertainty convergence and monotonic progress in entropy reduction.

Principles

Method

InfoDensity combines an AUC-based reward for low uncertainty convergence and a monotonicity reward for step-by-step entropy reduction, weighted by a length scaling term, for RL training.

In practice

Topics

Code references

Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.