PACT: Privileged Trace Co-Training for Multi-Turn Tool-Use Agents

· Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

PACT, a Privileged trAce Co-Training framework, addresses challenges in developing multi-turn tool-use agents that must reason, call tools, and adapt across interactions. Existing methods like reinforcement learning suffer from sparse rewards, while supervised fine-tuning on expert traces can over-constrain models. PACT leverages expert traces solely as training-time optimization signals, maintaining prompt-only rollout generation. It employs two complementary signals: a trace-conditioned RL surrogate that evaluates prompt-only rollouts within an expert-trace context, and a component-aware SFT loss that supervises reasoning prefixes and tool-calls with annealed strength. To mitigate over-reliance on training-only trace context, PACT incorporates prompt-only anchoring. A latent-trace view further explains how expert traces guide optimization without being used during rollout. Experiments on FTRL, BFCL, and ToolHop demonstrate PACT's consistent improvements over strong SFT- and RL-based baselines.

Key takeaway

For Machine Learning Engineers developing multi-turn tool-use agents, PACT provides a compelling training framework to overcome limitations of traditional RL and SFT. You should consider integrating PACT's privileged trace co-training approach to improve agent reasoning and tool-calling capabilities. This method leverages expert traces for optimization signals without over-constraining rollout generation, leading to more adaptable and robust agents, as demonstrated by its consistent performance gains on FTRL, BFCL, and ToolHop benchmarks.

Key insights

PACT enhances multi-turn tool-use agents by optimizing prompt-only rollouts with expert traces during training.

Principles

Method

PACT uses a trace-conditioned RL surrogate and a component-aware SFT loss with annealed strength, plus prompt-only anchoring, to optimize prompt-only rollouts using expert traces.

In practice

Topics

Best for: NLP Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, AI Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.