Reasoning models struggle to control their chains of thought, and that’s good
Summary
OpenAI research published on March 5, 2026, reveals that current frontier reasoning models exhibit low "CoT controllability," meaning they struggle to deliberately shape or obscure their internal chains of thought (CoT) to evade monitoring. This finding is considered a positive for AI safety, as it enhances the reliability of CoT monitoring as a safeguard. The study introduces CoT-Control, an open-source evaluation suite comprising over 13,000 tasks from benchmarks like GPQA and MMLU-Pro, designed to measure a model's ability to follow instructions regarding its CoT properties. Evaluations across 13 reasoning models, including GPT-5.2 Thinking, showed CoT controllability scores ranging from 0.1% to 15.4%. While controllability slightly increases with model size and "monitor awareness," it decreases significantly with longer reasoning chains and additional post-training, such as RL training.
Key takeaway
For research scientists and AI safety engineers developing or deploying advanced reasoning models, the observed low CoT controllability is a reassuring signal for current monitoring effectiveness. You should continue to prioritize CoT monitoring as a robust safeguard, but also track CoT controllability as a "canary eval" in future system cards, starting with GPT-5.4 Thinking, to detect any emergent capabilities in obfuscating reasoning.
Key insights
Low CoT controllability in frontier models enhances the trustworthiness of AI safety monitoring systems.
Principles
- CoT controllability decreases with reasoning length.
- Post-training reduces CoT controllability.
- Monitor awareness slightly increases controllability.
Method
CoT-Control, an open-source evaluation suite, measures CoT controllability by assessing a model's adherence to user instructions affecting its internal reasoning steps, not just its visible output.
In practice
- Use CoT monitoring for AI safety assessments.
- Evaluate models with CoT-Control suite.
- Track CoT controllability in system cards.
Topics
- Chain-of-Thought
- AI Safety
- Reasoning Models
- CoT Monitoring
- CoT-Control Evaluation
Best for: Research Scientist, CTO, VP of Engineering/Data, AI Researcher, AI Scientist, MLOps Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by OpenAI News.