PACT: Privileged Trace Co-Training for Multi-Turn Tool-Use Agents

2026-06-15 · Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

PACT, a Privileged trAce Co-Training framework, addresses challenges in developing multi-turn tool-use agents that must reason, call tools, and adapt across interactions. Existing methods like reinforcement learning suffer from sparse rewards, while supervised fine-tuning on expert traces can over-constrain models. PACT leverages expert traces solely as training-time optimization signals, maintaining prompt-only rollout generation. It employs two complementary signals: a trace-conditioned RL surrogate that evaluates prompt-only rollouts within an expert-trace context, and a component-aware SFT loss that supervises reasoning prefixes and tool-calls with annealed strength. To mitigate over-reliance on training-only trace context, PACT incorporates prompt-only anchoring. A latent-trace view further explains how expert traces guide optimization without being used during rollout. Experiments on FTRL, BFCL, and ToolHop demonstrate PACT's consistent improvements over strong SFT- and RL-based baselines.

Key takeaway

For Machine Learning Engineers developing multi-turn tool-use agents, PACT provides a compelling training framework to overcome limitations of traditional RL and SFT. You should consider integrating PACT's privileged trace co-training approach to improve agent reasoning and tool-calling capabilities. This method leverages expert traces for optimization signals without over-constraining rollout generation, leading to more adaptable and robust agents, as demonstrated by its consistent performance gains on FTRL, BFCL, and ToolHop benchmarks.

Key insights

PACT enhances multi-turn tool-use agents by optimizing prompt-only rollouts with expert traces during training.

Principles

Expert traces can guide optimization without direct rollout use.
Combine RL and SFT for robust agent training.
Annealed supervision strength prevents over-constraining.

Method

PACT uses a trace-conditioned RL surrogate and a component-aware SFT loss with annealed strength, plus prompt-only anchoring, to optimize prompt-only rollouts using expert traces.

In practice

Implement trace-conditioned RL surrogate for agent evaluation.
Apply component-aware SFT loss to reasoning prefixes.
Integrate prompt-only anchoring to reduce trace dependency.

Topics

Multi-turn Tool-Use Agents
Reinforcement Learning
Supervised Fine-Tuning
Co-Training Frameworks
Expert Traces
Agent Training

Best for: NLP Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, AI Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.