When to Think Deeply: Inhibitory Deliberation for LLM Reasoning

2026-06-08 · Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Expert, extended

Summary

The IDPR (Inhibitory Deliberative Problem Reasoning) framework is proposed to optimize Large Language Model (LLM) reasoning by selectively invoking computationally expensive "slow reasoning." Unlike traditional input-only routers, IDPR first generates a concise "fast answer" and then uses a response-conditioned inhibition controller to decide whether to release this fast answer or suppress it in favor of deeper deliberation. The controller bases its decision on the fast answer itself and "fast-side evidence" like confidence, logit margin, parseability, and generation cost. Evaluated on a 5,000-example mathematical reasoning test set, IDPR invoked slow reasoning on only 8.20% of examples, improving accuracy from 47.90% to 48.92%. This significantly outperformed random routing (46.76% accuracy) and confidence-based baselines (48.22% accuracy) under the same slow-call budget, demonstrating its ability to identify fast answers that benefit most from slow reasoning.

Key takeaway

For AI Architects designing cost-aware LLM systems, you should consider implementing response-conditioned inhibitory deliberation. This approach allows your system to achieve higher accuracy on complex reasoning tasks, like mathematical problems, by selectively invoking expensive slow reasoning only when a fast answer is predicted to be unreliable. Calibrate your inhibition threshold to balance accuracy gains against increased token costs, especially for harder problem types.

Key insights

LLMs can selectively invoke costly slow reasoning by inhibiting fast answers based on response-conditioned evidence.

Principles

Routing decisions should be response-conditioned.
Control should be recruited selectively.
Estimate slow-over-fast quality gain.

Method

IDPR generates a fast answer, then an inhibition controller uses "fast-side evidence" (confidence, parseability, cost) to compute a switch score. If the score exceeds a threshold, the fast answer is suppressed for slow reasoning.

In practice

Use fast-side evidence for routing decisions.
Calibrate inhibition threshold for accuracy-cost trade-off.
Prioritize slow reasoning for harder problem subsets.

Topics

LLM Reasoning
Cost-Aware AI
Inhibitory Deliberation
Response-Conditioned Routing
Mathematical Reasoning
Cognitive Control

Code references

huggingface/open-r1

Best for: Research Scientist, AI Engineer, NLP Engineer, AI Scientist, Machine Learning Engineer, AI Architect

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.