Measuring no CoT math time horizon (single forward pass)

2024-06-17 · Source: Redwood Research blog · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Advanced, short

Summary

An analysis of AI models' "no-CoT" (no chain-of-thought) math problem-solving ability reveals that Opus 4.5 achieves a 50% reliability time horizon of 3.5 minutes for immediate, single-forward-pass solutions. This opaque reasoning ability, a proxy for misalignment risk, has been doubling every 9 months. The study utilized a dataset of 907 easy competition math problems, with human completion times estimated by Opus 4.5. Repeating the problem in the prompt significantly boosts performance, with results showing Gemini 3 Pro reaching a 3.8-minute time horizon and Gemini 2.5 Pro achieving 2.7 minutes with 5 repeats. This no-CoT performance lags behind "with-CoT" reasoning, which has a faster doubling time of 4-6 months on tasks like SWE.

Key takeaway

For research scientists evaluating AI safety and capabilities, you should note that the 3.5-minute no-CoT time horizon for Opus 4.5, doubling every 9 months, indicates a concerning growth in opaque reasoning. This suggests that while explicit reasoning is advancing faster, the rapid improvement in immediate, non-transparent problem-solving warrants close monitoring for potential misalignment risks, especially on serial tasks.

Key insights

AI opaque reasoning, measured by no-CoT math performance, is rapidly improving but lags explicit reasoning.

Principles

Opaque reasoning ability correlates with misalignment risk.
Problem repetition enhances no-CoT math performance.

Method

The method estimates 50% reliability no-CoT time horizons using a dataset of easy math problems, with human completion times estimated by Opus 4.5, and incorporates problem repetition in prompts.

In practice

Use problem repeats to boost no-CoT model performance.
Evaluate AI on serial tasks to gauge opaque reasoning.

Topics

AI Capability Measurement
Opaque Reasoning
Large Language Models
Math Problem Solving
Prompt Engineering

Code references

rgreenblatt/no_cot_math_public

Best for: Research Scientist, AI Researcher, AI Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Redwood Research blog.