Multi-Agent AutoResearch with Open Source Models
Summary
This content describes a multi-agent setup for AutoResearch, an optimization project by Andrej Karpathy that uses a code agent to improve the efficiency of a nanoGPT training script. The original AutoResearch used a single agent (Claude Code and Opus 4.6) to achieve a lowest bits-per-byte score after approximately 600 experiments. This new implementation, using open-source models and the OpenCode harness, distributes tasks among specialized agents: a researcher agent finds papers and proposes hypotheses using HF papers, a planner agent maintains an experiment queue, worker agents define and execute training scripts, and a reporter agent collects results. The repository structure includes the original training script, result files (research_live_master.json, research_results.tsv), and OpenCode-formatted skill definitions for each sub-agent. The system leverages Hugging Face Hub for shared caching and Tracheo for tracking metrics like active jobs, anomalies, and "best delta versus master" to monitor performance improvements.
Key takeaway
For AI Engineers optimizing LLM training, implementing a multi-agent system like this AutoResearch setup can significantly improve efficiency and manageability. By assigning specialized roles to agents (e.g., researcher, planner, worker, reporter), you can streamline the iterative optimization process, leverage open-source models, and gain clearer insights into experiment performance through tools like Tracheo. Consider adopting this modular approach to enhance your training script optimization workflows.
Key insights
Specializing AI agents into distinct roles enhances complex task execution and allows for open-source model utilization.
Principles
- Decompose complex tasks into specialized agent roles.
- Utilize shared caches to optimize multi-agent workflows.
Method
Implement AutoResearch with a multi-agent architecture: researcher, planner, worker, and reporter agents. Use OpenCode for agent orchestration and Hugging Face Hub for shared assets and job execution, tracking metrics with Tracheo.
In practice
- Define distinct agent roles for research, planning, execution, and reporting.
- Use OpenCode to manage multi-agent workflows.
- Integrate Tracheo for comprehensive experiment metric tracking.
Topics
- Multi-Agent Systems
- AutoResearch
- Open-Source Models
- OpenCode
- Machine Learning Optimization
Best for: AI Engineer, Machine Learning Engineer, MLOps Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by HuggingFace.