OpenWebRL: Demystifying Online Multi-turn Reinforcement Learning for Visual Web Agents
Summary
OpenWebRL is an open framework designed to train visual web agents using online multi-turn reinforcement learning directly on real websites. Addressing the scalability bottleneck of proprietary systems and supervised post-training, OpenWebRL integrates a full training pipeline, including live-browser infrastructure, supervised initialization, multimodal context management, trajectory-level success judging, and efficient multi-turn policy optimization. Utilizing this framework, OpenWebRL-4B was trained with only 0.4K initialization trajectories and 2.2K RL tasks, achieving 67.0% success on Online-Mind2Web and 64.0% on DeepShop. This establishes a new open-source state of the art, outperforming similar or larger open agents and competing with proprietary systems like OpenAI CUA and Gemini CUA. The work also systematically studies key design choices for effective online RL.
Key takeaway
For machine learning engineers developing visual web agents, this work presents a viable strategy to overcome data scarcity and improve agent performance. You should consider adopting online multi-turn reinforcement learning frameworks like OpenWebRL to train agents directly on live websites, reducing dependence on expensive, static datasets. This approach can lead to more robust and cost-efficient agents, as demonstrated by OpenWebRL-4B's competitive benchmark results against proprietary systems.
Key insights
Online multi-turn reinforcement learning on live websites offers a scalable path for training capable visual web agents.
Principles
- Online RL reduces reliance on costly static datasets.
- Multi-turn optimization enhances agentic reasoning.
- Live-browser infrastructure is crucial for scalability.
Method
OpenWebRL's pipeline includes scalable live-browser infrastructure, supervised initialization, multimodal context management, trajectory-level success judging, and efficient multi-turn policy optimization for visual web agents.
In practice
- Initialize agents with minimal supervised trajectories (0.4K).
- Train with open-ended RL tasks (2.2K) on live sites.
- Evaluate on Online-Mind2Web and DeepShop benchmarks.
Topics
- Visual Web Agents
- Online Reinforcement Learning
- Multi-turn RL
- Live-Browser Infrastructure
- Agentic Reasoning
- Open-Source Frameworks
Code references
Best for: Research Scientist, AI Engineer, Computer Vision Engineer, AI Scientist, Machine Learning Engineer, Robotics Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.