AIs can now often do massive easy-to-verify SWE tasks and I've updated towards shorter timelines

· Source: AI Alignment Forum · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering, Emerging Technologies & Innovation · Depth: Expert, extended

Summary

An editorial analyst has significantly updated their AI timelines, nearly doubling the probability of full AI R&D automation by EOY 2028 to just under 30%. This revision is driven by unexpected performance from models like Opus 4.5, Opus 4.6, and Codex 5.2, which exceeded benchmarks and demonstrated the ability to complete massive, easy-to-verify software engineering (SWE) tasks that would take humans months to years. The analyst now expects AIs to achieve a 50%-reliability time horizon of years to decades on these "Easy-and-cheap-to-verify SWE tasks that don't require much ideation" (ESNI tasks) by EOY 2026. Key factors include observed superexponential progress on ESNI tasks, anticipated substantial training compute scale-up in 2026, and a greater-than-expected "scaffolding overhang" allowing significant improvements with straightforward scaffolding. This accelerated progress is also expected to speed up AI R&D itself.

Key takeaway

For research scientists evaluating AI capabilities and future development timelines, you should account for the observed superexponential progress in AI's ability to handle easy-to-verify, iterative software engineering tasks. This shift implies that AI R&D automation may arrive significantly sooner than previously projected, with a 30% probability of AI R&D parity by EOY 2028. Your strategic planning should consider the accelerating pace of AI progress, especially in areas where tasks can be highly decomposed and verified.

Key insights

AI progress on easy-to-verify tasks is accelerating superexponentially, shortening timelines for AI R&D automation.

Principles

Method

AIs can develop test suites and iteratively optimize solutions against them, enabling progress on well-specified, easy-to-verify tasks even with initial errors or poor judgment.

In practice

Topics

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Director of AI/ML

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by AI Alignment Forum.