Multi-Agent AutoResearch with Open Source Models

· Source: HuggingFace · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems, Software Development & Engineering · Depth: Intermediate, medium

Summary

This content describes a multi-agent setup for AutoResearch, an optimization project by Andrej Karpathy that uses a code agent to improve the efficiency of a nanoGPT training script. The original AutoResearch used a single agent (Claude Code and Opus 4.6) to achieve a lowest bits-per-byte score after approximately 600 experiments. This new implementation, using open-source models and the OpenCode harness, distributes tasks among specialized agents: a researcher agent finds papers and proposes hypotheses using HF papers, a planner agent maintains an experiment queue, worker agents define and execute training scripts, and a reporter agent collects results. The repository structure includes the original training script, result files (research_live_master.json, research_results.tsv), and OpenCode-formatted skill definitions for each sub-agent. The system leverages Hugging Face Hub for shared caching and Tracheo for tracking metrics like active jobs, anomalies, and "best delta versus master" to monitor performance improvements.

Key takeaway

For AI Engineers optimizing LLM training, implementing a multi-agent system like this AutoResearch setup can significantly improve efficiency and manageability. By assigning specialized roles to agents (e.g., researcher, planner, worker, reporter), you can streamline the iterative optimization process, leverage open-source models, and gain clearer insights into experiment performance through tools like Tracheo. Consider adopting this modular approach to enhance your training script optimization workflows.

Key insights

Specializing AI agents into distinct roles enhances complex task execution and allows for open-source model utilization.

Principles

Method

Implement AutoResearch with a multi-agent architecture: researcher, planner, worker, and reporter agents. Use OpenCode for agent orchestration and Hugging Face Hub for shared assets and job execution, tracking metrics with Tracheo.

In practice

Topics

Best for: AI Engineer, Machine Learning Engineer, MLOps Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by HuggingFace.