SuCo: Sufficiency-guided Continuous Adaptive Reasoning

· Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

Sufficiency-guided Continuous Adaptive Reasoning (SuCo) is a novel two-stage training framework designed to address the inefficiency of Large Reasoning Models (LRMs) that often produce overly long Chain-of-Thoughts (CoT), leading to inflated computational costs. SuCo introduces the concept of Minimal Sufficient CoT (MSC), defined as the shortest CoT prefix necessary for a correct answer, which empirically reduces reasoning tokens and enhances accuracy. The first stage, MSC-Aligned Fine-Tuning (MFT), generates MSC data using problem-adaptive sufficiency thresholds and fine-tunes the model to internalize concise reasoning patterns. The second stage, Sufficiency-Aware Policy Optimization (SAPO), employs reinforcement learning with dynamic complexity tracking and rewards that penalize both over- and under-thinking. Extensive experiments on mathematics, code, and science benchmarks demonstrate that SuCo consistently improves both accuracy and reasoning efficiency.

Key takeaway

For Machine Learning Engineers optimizing Large Reasoning Models, you should consider implementing sufficiency-guided training frameworks like SuCo. This approach, which defines and targets Minimal Sufficient CoT, offers a principled method to reduce computational costs associated with excessively long Chain-of-Thoughts while simultaneously enhancing model accuracy. By adopting adaptive reasoning control and sufficiency-aware rewards, your teams can achieve more efficient and precise LRM deployments across diverse tasks.

Key insights

Optimizing Chain-of-Thought length via sufficiency-guided adaptive reasoning improves LRM efficiency and accuracy.

Principles

Method

A two-stage framework: MSC-Aligned Fine-Tuning (MFT) constructs MSC data and fine-tunes for concise patterns, followed by Sufficiency-Aware Policy Optimization (SAPO) using RL with dynamic complexity tracking and sufficiency-aware rewards.

In practice

Topics

Best for: Research Scientist, AI Engineer, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.