SuCo: Sufficiency-guided Continuous Adaptive Reasoning

2026-06-16 · Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

Sufficiency-guided Continuous Adaptive Reasoning (SuCo) is a novel two-stage training framework designed to address the inefficiency of Large Reasoning Models (LRMs) that often produce overly long Chain-of-Thoughts (CoT), leading to inflated computational costs. SuCo introduces the concept of Minimal Sufficient CoT (MSC), defined as the shortest CoT prefix necessary for a correct answer, which empirically reduces reasoning tokens and enhances accuracy. The first stage, MSC-Aligned Fine-Tuning (MFT), generates MSC data using problem-adaptive sufficiency thresholds and fine-tunes the model to internalize concise reasoning patterns. The second stage, Sufficiency-Aware Policy Optimization (SAPO), employs reinforcement learning with dynamic complexity tracking and rewards that penalize both over- and under-thinking. Extensive experiments on mathematics, code, and science benchmarks demonstrate that SuCo consistently improves both accuracy and reasoning efficiency.

Key takeaway

For Machine Learning Engineers optimizing Large Reasoning Models, you should consider implementing sufficiency-guided training frameworks like SuCo. This approach, which defines and targets Minimal Sufficient CoT, offers a principled method to reduce computational costs associated with excessively long Chain-of-Thoughts while simultaneously enhancing model accuracy. By adopting adaptive reasoning control and sufficiency-aware rewards, your teams can achieve more efficient and precise LRM deployments across diverse tasks.

Key insights

Optimizing Chain-of-Thought length via sufficiency-guided adaptive reasoning improves LRM efficiency and accuracy.

Principles

Minimal Sufficient CoT (MSC) reduces tokens and improves accuracy.
Problem-adaptive sufficiency thresholds scale with question difficulty.
Penalizing both over- and under-thinking optimizes reasoning.

Method

A two-stage framework: MSC-Aligned Fine-Tuning (MFT) constructs MSC data and fine-tunes for concise patterns, followed by Sufficiency-Aware Policy Optimization (SAPO) using RL with dynamic complexity tracking and sufficiency-aware rewards.

In practice

Construct MSC data using adaptive sufficiency thresholds.
Fine-tune models for concise reasoning patterns.
Apply RL with dynamic complexity tracking and sufficiency-aware rewards.

Topics

Large Reasoning Models
Chain-of-Thought Optimization
Sufficiency-guided Reasoning
Reinforcement Learning
Model Efficiency
Adaptive Reasoning

Best for: Research Scientist, AI Engineer, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.