EnvFactory: Scaling Tool-Use Agents via Executable Environments Synthesis and Robust RL
Summary
EnvFactory is an automated framework designed to enhance the tool-use capabilities of Large Language Models (LLMs) through Agentic Reinforcement Learning (Agentic RL). It addresses the limitations of existing methods, which struggle with scalable execution environments and realistic training data. EnvFactory autonomously explores and verifies stateful, executable tool environments using authentic resources. It then synthesizes natural multi-turn trajectories by employing topology-aware sampling and calibrated refinement, generating grounded queries with implicit intents. Utilizing only 85 verified environments across 7 domains, EnvFactory generated 2,575 Supervised Fine-Tuning (SFT) and RL trajectories. This approach significantly improves Qwen3-series models by up to +15% on BFCLv3, +8.6% on MCP-Atlas, and +6% on conversational benchmarks like $τ^2$-Bench and VitaBench, demonstrating superior training efficiency and downstream performance compared to prior work that often uses five times more environments.
Key takeaway
For research scientists developing tool-use agents, EnvFactory offers a robust solution to the challenges of environment creation and data scarcity. You should consider integrating automated environment synthesis and topology-aware trajectory generation to improve training efficiency and model performance. This approach can yield substantial gains on benchmarks like BFCLv3 and MCP-Atlas, reducing reliance on costly real-world APIs or hallucination-prone simulators.
Key insights
EnvFactory automates environment and trajectory synthesis for Agentic RL, significantly boosting LLM tool-use performance.
Principles
- Automate environment verification from authentic resources.
- Synthesize multi-turn trajectories with implicit intents.
- Topology-aware sampling refines synthetic data quality.
Method
EnvFactory autonomously explores and verifies stateful tool environments, then synthesizes natural multi-turn trajectories via topology-aware sampling and calibrated refinement to produce grounded queries with implicit intents.
In practice
- Generate diverse training data for tool-use agents.
- Improve LLM performance on conversational benchmarks.
- Scale Agentic RL training efficiently.
Topics
- EnvFactory
- Tool-Use Agents
- Agentic Reinforcement Learning
- Executable Environments
- Trajectory Synthesis
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.