Production Sub-agents for LLM Post Training
Summary
Pinterest's growth AI applications team has significantly accelerated machine learning model post-training from 4-6 weeks to approximately one week by integrating Claude Code and a sub-agent architecture. The traditional linear model training process, involving data definition, model selection, hyperparameter tuning, and extensive evaluation loops, was highly manual. The new workflow parallelizes data generation and training parameter tasks using Claude Code's SQL injection capabilities within a sub-agent structure. While agent swarm architectures were explored, they presented bottlenecks due to exponential context window expansion and rigid orchestration, leading to "hot celebrity" problems where single agents become overwhelmed. The team found sub-agents more effective for post-training, noting that MiniMax 2.5 offers dynamic scaling capabilities at a fraction of Claude Opus's cost. Common production failures like spec drift, data distribution bias, memory collapse, and tool misuse are addressed through specific fixes.
Key takeaway
For MLOps Engineers optimizing model post-training, adopting a sub-agent architecture with tools like Claude Code can reduce training cycles from weeks to days. Focus on reinforcing agent orchestration with an Agent SDK to gate outputs, implement structured `skills.md` for precise instructions, and customize agent memory with pruning logic to combat issues like spec drift and memory collapse. Consider MiniMax 2.5 for cost-effective dynamic scaling if Claude Opus is too expensive.
Key insights
Sub-agent architectures with Claude Code can drastically reduce ML post-training time by parallelizing tasks and mitigating context limitations.
Principles
- Parallelize ML training data generation.
- Sub-agents avoid swarm mode context limits.
- Structured instructions improve agent alignment.
Method
The method involves breaking down model training into parallelized tasks using Claude Code for data generation and parameter tuning, orchestrated via a sub-agent structure, and reinforced with an Agent SDK for gated decision-making.
In practice
- Use Anthropic's Agent SDK for orchestration.
- Implement structured `skills.md` for agents.
- Customize agent memory with pruning logic.
Topics
- LLM Post Training
- Sub-agents
- Claude Code
- Agent Orchestration
- Memory Management
Best for: Machine Learning Engineer, AI Engineer, MLOps Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by MLOps.community.