Andrej Karpathy Open-Sources ‘Autoresearch’: A 630-Line Python Tool Letting AI Agents Run Autonomous ML Experiments on Single GPUs

2026-03-09 · Source: Machine Learning ML & Generative AI News · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Intermediate, quick

Summary

Andrej Karpathy has open-sourced "autoresearch," a minimalist Python framework comprising approximately 630 lines of code, designed to enable AI agents to function as autonomous machine learning researchers. This tool adapts the nanochat core for single-GPU operation, allowing agents to conduct iterative training code sprints, each lasting about five minutes. It only commits improvements that result in lower validation bits-per-byte (BPB) scores. Shopify CEO Tobi Lutke reportedly used this loop to achieve a 19% boost in model performance, demonstrating that smaller, agent-optimized models can surpass larger ones through continuous refinement of hyperparameters and architecture. The framework effectively automates the "grad student descent" process, shifting the engineer's focus from manual tuning to crafting optimal research prompts.

Key takeaway

For ML engineers seeking to optimize model performance and efficiency, "autoresearch" offers a novel approach to automate iterative experimentation. You can leverage this 630-line Python tool to offload hyperparameter and architecture tuning to AI agents, freeing your time from manual adjustments. Consider designing precise research prompts to guide the agents, potentially achieving significant performance boosts on single GPUs, as demonstrated by a 19% improvement in one case.

Key insights

Autoresearch enables AI agents to autonomously optimize ML models on single GPUs through iterative, metric-driven refinement.

Principles

Iterative refinement drives performance gains.
Smaller models can outperform larger ones with optimization.

Method

AI agents run five-minute training sprints, committing code changes only if validation BPB scores improve, automating hyperparameter and architecture tuning.

In practice

Automate ML model optimization.
Refine hyperparameters with AI agents.
Boost model performance by 19%.

Topics

Autoresearch
Autonomous ML Experiments
AI Agents
Hyperparameter Optimization
Single-GPU Training

Code references

karpathy/autoresearch

Best for: NLP Engineer, AI Scientist, Research Scientist, AI Engineer, Machine Learning Engineer, AI Researcher

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning ML & Generative AI News.