Training Agents: Live tutorial on how to fine-tune a coding agent for continual learning

2026-06-15 · Source: HuggingFace · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems, Software Development & Engineering · Depth: Advanced, quick

Summary

A live tutorial session, "Training Agents, Session 1," details the initial step of an agentic post-training workflow: Supervised Fine-Tuning (SFT). The session demonstrates how to transform public coding-agent traces into prompt/completion training data, execute a small TRL + LoRA fine-tune, and analyze the resulting metrics. Key areas covered include the strategic importance of starting with SFT over GRPO or environment RL, the methodology for converting agent traces into training examples, applying completion-only loss for chat/tool traces, and conducting TRL SFT using Hugging Face Jobs. The tutorial also addresses maintaining experiment reproducibility without committing logs or checkpoints, and critically evaluating what initial metrics reveal. This session is part of a broader "Training Agents" series focused on using coding agents to design, run, monitor, and review post-training experiments for model improvement.

Key takeaway

For AI Engineers developing continually learning coding agents, this tutorial provides a crucial baseline for your post-training workflow. You should prioritize Supervised Fine-Tuning (SFT) using TRL + LoRA on agent traces before moving to more complex reinforcement learning. This approach establishes a robust foundation, ensuring your initial model improvements are measurable and reproducible, even without checking in extensive logs or checkpoints. Implement completion-only loss for chat/tool traces to optimize your training data.

Key insights

The tutorial outlines a supervised fine-tuning (SFT) baseline for coding agents using public traces and TRL + LoRA.

Principles

SFT is a foundational step before advanced RL methods.
Agent traces can be systematically converted to training data.
Reproducibility is key without committing logs.

Method

Convert public coding-agent traces into prompt/completion training data. Perform TRL + LoRA fine-tuning with completion-only loss for chat/tool traces, running on Hugging Face Jobs.

In practice

Convert agent traces to prompt/completion data.
Apply TRL + LoRA for coding agent SFT.
Ensure reproducibility without log/checkpoint commits.

Topics

Agent Training
Supervised Fine-Tuning
LoRA
TRL
Hugging Face Jobs
Continual Learning
Coding Agents

Code references

burtenshaw/training-agents

Best for: Machine Learning Engineer, AI Engineer, AI Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by HuggingFace.