ADaPT: Token-Level Decoupling for Efficient Large Reasoning Models

2026-06-18 · Source: Machine Learning · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

ADaPT, or Adaptive Dual-Process Thinking, is a novel token-level dual-process framework designed to enhance the efficiency of large reasoning models without sacrificing performance. It addresses the high computational cost associated with long chain-of-thought reasoning, a common issue in existing methods that often degrade reasoning capability when attempting to shorten or mix strategies. ADaPT tackles this by explicitly decoupling efficiency and correctness signals during training, introducing a "mode-selection token" to control fast and slow reasoning. Efficiency-related rewards are applied solely to this token, preventing penalties for correct long reasoning while promoting efficiency when suitable. This framework also offers precise and continuous control over the efficiency-performance trade-off at inference time, allowing a single trained model to adjust along the Pareto frontier by modifying the mode-selection token's generation probability. Experiments confirm ADaPT significantly reduces inference cost while maintaining strong reasoning performance across multiple benchmarks.

Key takeaway

For Machine Learning Engineers optimizing large reasoning models, ADaPT offers a critical solution to the efficiency-performance dilemma. If you are struggling with high computational costs from long chain-of-thought reasoning, you should explore implementing token-level decoupling with a mode-selection token. This approach allows you to precisely control inference-time trade-offs, significantly reducing cost without degrading reasoning capability, thereby improving model deployment viability.

Key insights

ADaPT decouples efficiency and correctness in large reasoning models via a token-level dual-process framework.

Principles

Decouple efficiency and correctness signals.
Apply efficiency rewards to a dedicated token.
Enable continuous trade-off control at inference.

Method

ADaPT introduces a mode-selection token to control fast/slow reasoning, applying efficiency rewards exclusively to this token during training to avoid penalizing correct long reasoning.

In practice

Implement a mode-selection token for efficiency control.
Adjust mode-selection token probability for inference trade-offs.

Topics

Large Reasoning Models
Chain-of-Thought
Inference Efficiency
Token-Level Decoupling
Dual-Process Thinking
Computational Cost

Best for: AI Engineer, NLP Engineer, Research Scientist, AI Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.