OpenWebRL: Demystifying Online Multi-turn Reinforcement Learning for Visual Web Agents
Summary
OpenWebRL is an open framework designed for training visual web agents using online multi-turn reinforcement learning (RL) directly on live websites. It addresses the scalability bottleneck of relying on expensive, static supervised datasets by providing a full training pipeline, including scalable live-browser infrastructure, supervised initialization, multimodal context management, trajectory-level success judging, and efficient multi-turn policy optimization. Using this framework, OpenWebRL-4B, a 4B-parameter agent, was trained with only 0.4K initialization trajectories and 2.2K open-ended RL training tasks. It achieved 67.0% success on Online-Mind2Web and 64.0% on DeepShop, and 74.1% on WebVoyager, achieving leading open-source performance. The work also systematically studies key design choices for effective online RL in visual web agents and analyzes how RL improves agentic reasoning.
Key takeaway
For AI Scientists and ML Engineers developing visual web agents, consider adopting online multi-turn reinforcement learning to overcome data scalability issues. You should implement a supervised warm-start with minimal data, integrate robust browser infrastructure, and utilize multimodal context management to enhance agent performance on dynamic websites. This approach can significantly reduce reliance on expensive, static datasets, making agent development more cost-efficient and reproducible.
Key insights
Online multi-turn reinforcement learning on live websites offers a scalable path for training capable visual web agents.
Principles
- Supervised warm-start improves exploration.
- Multimodal context management is crucial.
- Trajectory-level judging guides policy optimization.
Method
OpenWebRL's method involves a supervised warm start, an agent harness with multi-tool action execution and textual feedback, and a multimodal multi-turn GRPO objective with trajectory-level judging.
In practice
- Use a fault-tolerant browser environment.
- Implement multi-tool-call interfaces for efficiency.
- Compress older visual history into text.
Topics
- Visual Web Agents
- Online Reinforcement Learning
- Multi-turn RL
- Browser Automation
- OpenWebRL Framework
- GRPO
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.