Measuring no CoT math time horizon (single forward pass)
Summary
An analysis of AI models' "no-CoT" (no chain-of-thought) math problem-solving ability reveals that Opus 4.5 achieves a 50% reliability time horizon of 3.5 minutes for immediate, single-forward-pass solutions. This opaque reasoning ability, a proxy for misalignment risk, has been doubling every 9 months. The study utilized a dataset of 907 easy competition math problems, with human completion times estimated by Opus 4.5. Repeating the problem in the prompt significantly boosts performance, with results showing Gemini 3 Pro reaching a 3.8-minute time horizon and Gemini 2.5 Pro achieving 2.7 minutes with 5 repeats. This no-CoT performance lags behind "with-CoT" reasoning, which has a faster doubling time of 4-6 months on tasks like SWE.
Key takeaway
For research scientists evaluating AI safety and capabilities, you should note that the 3.5-minute no-CoT time horizon for Opus 4.5, doubling every 9 months, indicates a concerning growth in opaque reasoning. This suggests that while explicit reasoning is advancing faster, the rapid improvement in immediate, non-transparent problem-solving warrants close monitoring for potential misalignment risks, especially on serial tasks.
Key insights
AI opaque reasoning, measured by no-CoT math performance, is rapidly improving but lags explicit reasoning.
Principles
- Opaque reasoning ability correlates with misalignment risk.
- Problem repetition enhances no-CoT math performance.
Method
The method estimates 50% reliability no-CoT time horizons using a dataset of easy math problems, with human completion times estimated by Opus 4.5, and incorporates problem repetition in prompts.
In practice
- Use problem repeats to boost no-CoT model performance.
- Evaluate AI on serial tasks to gauge opaque reasoning.
Topics
- AI Capability Measurement
- Opaque Reasoning
- Large Language Models
- Math Problem Solving
- Prompt Engineering
Code references
Best for: Research Scientist, AI Researcher, AI Scientist, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Redwood Research blog.