PACT: Privileged Trace Co-Training for Multi-Turn Tool-Use Agents
Summary
PACT, a Privileged trAce Co-Training framework, addresses challenges in developing multi-turn tool-use agents that must reason, call tools, and adapt across interactions. Existing methods like reinforcement learning suffer from sparse rewards, while supervised fine-tuning on expert traces can over-constrain models. PACT leverages expert traces solely as training-time optimization signals, maintaining prompt-only rollout generation. It employs two complementary signals: a trace-conditioned RL surrogate that evaluates prompt-only rollouts within an expert-trace context, and a component-aware SFT loss that supervises reasoning prefixes and tool-calls with annealed strength. To mitigate over-reliance on training-only trace context, PACT incorporates prompt-only anchoring. A latent-trace view further explains how expert traces guide optimization without being used during rollout. Experiments on FTRL, BFCL, and ToolHop demonstrate PACT's consistent improvements over strong SFT- and RL-based baselines.
Key takeaway
For Machine Learning Engineers developing multi-turn tool-use agents, PACT provides a compelling training framework to overcome limitations of traditional RL and SFT. You should consider integrating PACT's privileged trace co-training approach to improve agent reasoning and tool-calling capabilities. This method leverages expert traces for optimization signals without over-constraining rollout generation, leading to more adaptable and robust agents, as demonstrated by its consistent performance gains on FTRL, BFCL, and ToolHop benchmarks.
Key insights
PACT enhances multi-turn tool-use agents by optimizing prompt-only rollouts with expert traces during training.
Principles
- Expert traces can guide optimization without direct rollout use.
- Combine RL and SFT for robust agent training.
- Annealed supervision strength prevents over-constraining.
Method
PACT uses a trace-conditioned RL surrogate and a component-aware SFT loss with annealed strength, plus prompt-only anchoring, to optimize prompt-only rollouts using expert traces.
In practice
- Implement trace-conditioned RL surrogate for agent evaluation.
- Apply component-aware SFT loss to reasoning prefixes.
- Integrate prompt-only anchoring to reduce trace dependency.
Topics
- Multi-turn Tool-Use Agents
- Reinforcement Learning
- Supervised Fine-Tuning
- Co-Training Frameworks
- Expert Traces
- Agent Training
Best for: NLP Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, AI Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.