EnvFactory: Scaling Tool-Use Agents via Executable Environments Synthesis and Robust RL

2026-05-18 · Source: Computation and Language · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Expert, quick

Summary

EnvFactory is an automated framework designed to enhance the tool-use capabilities of Large Language Models (LLMs) through Agentic Reinforcement Learning (Agentic RL). It addresses the limitations of existing methods, which struggle with scalable execution environments and realistic training data. EnvFactory autonomously explores and verifies stateful, executable tool environments using authentic resources. It then synthesizes natural multi-turn trajectories by employing topology-aware sampling and calibrated refinement, generating grounded queries with implicit intents. Utilizing only 85 verified environments across 7 domains, EnvFactory generated 2,575 Supervised Fine-Tuning (SFT) and RL trajectories. This approach significantly improves Qwen3-series models by up to +15% on BFCLv3, +8.6% on MCP-Atlas, and +6% on conversational benchmarks like $τ^2$-Bench and VitaBench, demonstrating superior training efficiency and downstream performance compared to prior work that often uses five times more environments.

Key takeaway

For research scientists developing tool-use agents, EnvFactory offers a robust solution to the challenges of environment creation and data scarcity. You should consider integrating automated environment synthesis and topology-aware trajectory generation to improve training efficiency and model performance. This approach can yield substantial gains on benchmarks like BFCLv3 and MCP-Atlas, reducing reliance on costly real-world APIs or hallucination-prone simulators.

Key insights

EnvFactory automates environment and trajectory synthesis for Agentic RL, significantly boosting LLM tool-use performance.

Principles

Automate environment verification from authentic resources.
Synthesize multi-turn trajectories with implicit intents.
Topology-aware sampling refines synthetic data quality.

Method

EnvFactory autonomously explores and verifies stateful tool environments, then synthesizes natural multi-turn trajectories via topology-aware sampling and calibrated refinement to produce grounded queries with implicit intents.

In practice

Generate diverse training data for tool-use agents.
Improve LLM performance on conversational benchmarks.
Scale Agentic RL training efficiently.

Topics

EnvFactory
Tool-Use Agents
Agentic Reinforcement Learning
Executable Environments
Trajectory Synthesis

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.