Training Agents: Live tutorial on how to fine-tune a coding agent for continual learning
Summary
A live tutorial session, "Training Agents, Session 1," details the initial step of an agentic post-training workflow: Supervised Fine-Tuning (SFT). The session demonstrates how to transform public coding-agent traces into prompt/completion training data, execute a small TRL + LoRA fine-tune, and analyze the resulting metrics. Key areas covered include the strategic importance of starting with SFT over GRPO or environment RL, the methodology for converting agent traces into training examples, applying completion-only loss for chat/tool traces, and conducting TRL SFT using Hugging Face Jobs. The tutorial also addresses maintaining experiment reproducibility without committing logs or checkpoints, and critically evaluating what initial metrics reveal. This session is part of a broader "Training Agents" series focused on using coding agents to design, run, monitor, and review post-training experiments for model improvement.
Key takeaway
For AI Engineers developing continually learning coding agents, this tutorial provides a crucial baseline for your post-training workflow. You should prioritize Supervised Fine-Tuning (SFT) using TRL + LoRA on agent traces before moving to more complex reinforcement learning. This approach establishes a robust foundation, ensuring your initial model improvements are measurable and reproducible, even without checking in extensive logs or checkpoints. Implement completion-only loss for chat/tool traces to optimize your training data.
Key insights
The tutorial outlines a supervised fine-tuning (SFT) baseline for coding agents using public traces and TRL + LoRA.
Principles
- SFT is a foundational step before advanced RL methods.
- Agent traces can be systematically converted to training data.
- Reproducibility is key without committing logs.
Method
Convert public coding-agent traces into prompt/completion training data. Perform TRL + LoRA fine-tuning with completion-only loss for chat/tool traces, running on Hugging Face Jobs.
In practice
- Convert agent traces to prompt/completion data.
- Apply TRL + LoRA for coding agent SFT.
- Ensure reproducibility without log/checkpoint commits.
Topics
- Agent Training
- Supervised Fine-Tuning
- LoRA
- TRL
- Hugging Face Jobs
- Continual Learning
- Coding Agents
Code references
Best for: Machine Learning Engineer, AI Engineer, AI Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by HuggingFace.