Weblica: Scalable and Reproducible Training Environments for Visual Web Agents

· Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Expert, extended

Summary

Weblica (Web Replica) is a novel framework designed to create reproducible and scalable training environments for visual web agents, addressing the challenges of the web's complexity and dynamic nature. It combines two mechanisms: HTTP-level caching to record and replay real website interactions, capturing stable visual states while preserving interactive behavior, and an LLM-based environment synthesis pipeline that generates interactive web environments grounded in real websites and core web navigation skills. This framework enables scaling Reinforcement Learning (RL) training to thousands of diverse environments and tasks. The resulting model, Weblica-8B, fine-tuned from the Qwen3-VL family, operates purely on screenshots and achieves 39.2% pass@1 on Online-Mind2Web with 30 steps, outperforming open-weight baselines of similar size and demonstrating competitiveness with API models like OpenAI computer-use-preview and Gemini computer-use-preview.

Key takeaway

For research scientists developing visual web agents, Weblica offers a robust approach to overcome data scarcity and environmental instability. You should consider integrating HTTP-level caching and LLM-based environment synthesis into your training pipelines to create diverse, reproducible datasets. This strategy allows for large-scale RL training, potentially leading to agents that outperform existing open-weight models and approach the capabilities of proprietary API models, even with fewer inference steps.

Key insights

Weblica enables scalable, reproducible web agent training via HTTP caching and LLM-driven synthetic environment generation.

Principles

Method

Weblica uses HTTP-level caching with automated rule generation to replay web interactions, and an LLM-based pipeline (Claude Code) to synthesize diverse, interactive web environments with specific capabilities, domains, and visual styles.

In practice

Topics

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.