Efficient Reasoning with Balanced Thinking

2025-11-22 · Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, extended

Summary

ReBalance is a training-free framework designed to enhance the efficiency and accuracy of Large Reasoning Models (LRMs) by addressing "overthinking" and "underthinking." Overthinking involves redundant computational steps on simple problems, while underthinking signifies insufficient exploration of reasoning paths. The framework utilizes confidence as a continuous indicator, identifying overthinking through high confidence variance and underthinking via consistent overconfidence. By aggregating hidden states from a small-scale dataset into reasoning mode prototypes, ReBalance computes a steering vector. A dynamic control function then modulates this vector's strength and direction based on real-time confidence, pruning redundancy during overthinking and promoting exploration during underthinking. Experiments on four models (0.5B to 32B) across nine benchmarks in math reasoning, general question answering, and coding tasks demonstrate that ReBalance reduces output redundancy by up to 52.3% while improving Pass@1 accuracy by up to 7.0 points, offering a general, plug-and-play solution.

Key takeaway

For AI Engineers deploying Large Reasoning Models in resource-constrained environments, ReBalance offers a robust, training-free solution to optimize inference efficiency and accuracy. By dynamically adjusting reasoning based on real-time confidence, your models can reduce token usage by up to 52.3% and improve accuracy by up to 7.0 points, without the overhead of auxiliary models or complex fine-tuning. Consider integrating ReBalance to achieve balanced thinking, ensuring models are neither prematurely terminating nor expending unnecessary computation.

Key insights

ReBalance uses confidence and its variance to dynamically steer LRMs, preventing both overthinking and underthinking without retraining.

Principles

Confidence variance indicates overthinking.
Consistent overconfidence signals underthinking.
Mid-to-late layers are optimal for steering interventions.

Method

ReBalance computes a steering vector from hidden states and applies a dynamic control function based on real-time confidence and variance to modulate reasoning trajectories, pruning redundancy or promoting exploration.

In practice

Use 500 MATH problems for steering vector extraction.
Target mid-to-late layers for optimal steering.
Employ a window size of 2 for confidence state detection.

Topics

Large Reasoning Models
Efficient Reasoning
Confidence-Based Steering
Overthinking Mitigation
Training-Free Methods

Code references

Best for: AI Engineer, NLP Engineer, AI Scientist, AI Researcher, Machine Learning Engineer, MLOps Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.