ISE: An Execution-Grounded Recipe for Multi-Turn OS-Agent Trajectories
Summary
The ISE (Intent -> Simulate -> Execute) synthesis paradigm addresses critical data gaps for training capable OS agents by generating structured user intents, multi-turn task delegation, and grounded tool execution. Stage 1 constructs 43,956 unique structured intents using a 4D framework (Persona x Domain x Task x Complexity), achieving a Vendi Score of 61.57. Stage 2 simulates 23,132 multi-turn user-agent interactions, averaging 8.12 user turns and 68.24 total dialogue turns, grounded in actual execution outcomes. Stage 3 runs every tool call in a live, isolated OS workspace to produce authentic failure-recovery dynamics. Fine-tuning on ISETrace improves ClawEval pass@1 from 19.3 to 37.7 using Qwen3-8B, outperforming zero-shot GPT-4o and the larger Qwen3-32B model.
Key takeaway
For AI Engineers developing robust OS agents, the ISE paradigm offers a validated approach to overcome data scarcity and improve agent performance. By adopting its three-stage process—intent generation, execution-grounded multi-turn simulation, and live tool execution—you can create high-quality training data. This method significantly enhances agent tool-use capabilities, as demonstrated by the Qwen3-8B model's improved ClawEval pass@1 score.
Key insights
ISE is a three-stage synthesis paradigm for generating execution-grounded, multi-turn OS agent training data.
Principles
- Synthetic data generation can overcome dataset limitations.
- Execution grounding is crucial for realistic agent training.
- Multi-turn simulation significantly boosts agent performance.
Method
The ISE paradigm involves constructing structured intents (Stage 1), simulating multi-turn user-agent interactions grounded in execution (Stage 2), and executing tool calls in a live OS workspace for failure recovery (Stage 3).
In practice
- Use a 4D framework for diverse intent generation.
- Employ role-locked simulators for multi-turn interaction.
- Integrate live OS execution for failure dynamics.
Topics
- OS Agents
- Synthetic Data Generation
- Multi-Turn Dialogue
- Tool Use
- Agent Training
- Qwen3-8B
Code references
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.