AIs can now often do massive easy-to-verify SWE tasks and I've updated towards shorter timelines
Summary
An editorial analyst has significantly updated their AI timelines, nearly doubling the probability of full AI R&D automation by EOY 2028 to just under 30%. This revision is driven by unexpected performance from models like Opus 4.5, Opus 4.6, and Codex 5.2, which exceeded benchmarks and demonstrated the ability to complete massive, easy-to-verify software engineering (SWE) tasks that would take humans months to years. The analyst now expects AIs to achieve a 50%-reliability time horizon of years to decades on these "Easy-and-cheap-to-verify SWE tasks that don't require much ideation" (ESNI tasks) by EOY 2026. Key factors include observed superexponential progress on ESNI tasks, anticipated substantial training compute scale-up in 2026, and a greater-than-expected "scaffolding overhang" allowing significant improvements with straightforward scaffolding. This accelerated progress is also expected to speed up AI R&D itself.
Key takeaway
For research scientists evaluating AI capabilities and future development timelines, you should account for the observed superexponential progress in AI's ability to handle easy-to-verify, iterative software engineering tasks. This shift implies that AI R&D automation may arrive significantly sooner than previously projected, with a 30% probability of AI R&D parity by EOY 2028. Your strategic planning should consider the accelerating pace of AI progress, especially in areas where tasks can be highly decomposed and verified.
Key insights
AI progress on easy-to-verify tasks is accelerating superexponentially, shortening timelines for AI R&D automation.
Principles
- Iterative optimization drives AI performance on verifiable tasks.
- Scaffolding significantly enhances AI utility for complex tasks.
- AI R&D accelerates as AI becomes more useful for R&D.
Method
AIs can develop test suites and iteratively optimize solutions against them, enabling progress on well-specified, easy-to-verify tasks even with initial errors or poor judgment.
In practice
- Decompose large tasks into easy-to-verify components for AI.
- Implement robust scaffolding to guide AI through complex projects.
- Provide frequent, brief human feedback to correct AI prioritization.
Topics
- AI Timelines
- AI R&D Automation
- Easy-to-Verify SWE Tasks
- AI Model Performance
- Scaffolding & Orchestration
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Director of AI/ML
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by AI Alignment Forum.