Evidence on AI R&D Progress from NanoGPT
Summary
An analysis of the NanoGPT speedrun, a public challenge to rapidly train a GPT-2-small 124M parameter language model on FineWeb using 8xH100 GPUs, reveals significant progress in AI R&D. From May 2024 to March 2026, 36 human contributors achieved a 31x speedup, reducing training time from 45 minutes to 1.43 minutes across 77 records. Contributions, classified by optimization depth and provenance, show that shallow and moderate improvements drove approximately 21x of the total speedup. While early records primarily imported or adapted existing techniques, later records increasingly featured newly invented ideas, accounting for 33% of contributions between January 2025 and March 2026. Four recent records, between late 2025 and early 2026, are credited to AI agents like Hiverge and Station, though their contributions appear relatively shallow. The study highlights challenges in interpreting such evidence, including data contamination and scale-dependence.
Key takeaway
For Machine Learning Engineers optimizing LLM pretraining, recognize that even shallow or moderate improvements can yield substantial speedups, as demonstrated by the 21x gain in NanoGPT. While AI agents are beginning to contribute, their current impact appears limited to shallower optimizations. You should consider public challenges like NanoGPT speedruns as valuable benchmarks for evaluating agent performance and identifying areas where human ingenuity still drives breakthrough innovations.
Key insights
Cumulative progress on public AI R&D challenges offers insights into human and agent contributions and their evolving nature.
Principles
- Shallow optimizations yield significant speedups.
- New ideas emerge throughout R&D cycles.
- AI agent contributions are currently shallow.
Method
The study classified 77 NanoGPT speedrun records by "Optimization depth" (Breakthrough, Deep, Moderate, Shallow) and "Provenance" (Invented, Adapted, Imported) using Claude Code to analyze PR diffs and descriptions.
In practice
- Use public challenges for agent benchmarking.
- Compare agent progress to human effort.
- Identify gaps in agent scaffolding or tooling.
Topics
- NanoGPT Speedrun
- AI R&D Progress
- LLM Pretraining
- AI Agent Contributions
- Optimization Benchmarking
- Machine Learning Efficiency
Code references
Best for: AI Engineer, NLP Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, Director of AI/ML
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by METR.