Weblica: Scalable and Reproducible Training Environments for Visual Web Agents

2026-05-11 · Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Expert, extended

Summary

Weblica (Web Replica) is a novel framework designed to create reproducible and scalable training environments for visual web agents, addressing the challenges of the web's complexity and dynamic nature. It combines two mechanisms: HTTP-level caching to record and replay real website interactions, capturing stable visual states while preserving interactive behavior, and an LLM-based environment synthesis pipeline that generates interactive web environments grounded in real websites and core web navigation skills. This framework enables scaling Reinforcement Learning (RL) training to thousands of diverse environments and tasks. The resulting model, Weblica-8B, fine-tuned from the Qwen3-VL family, operates purely on screenshots and achieves 39.2% pass@1 on Online-Mind2Web with 30 steps, outperforming open-weight baselines of similar size and demonstrating competitiveness with API models like OpenAI computer-use-preview and Gemini computer-use-preview.

Key takeaway

For research scientists developing visual web agents, Weblica offers a robust approach to overcome data scarcity and environmental instability. You should consider integrating HTTP-level caching and LLM-based environment synthesis into your training pipelines to create diverse, reproducible datasets. This strategy allows for large-scale RL training, potentially leading to agents that outperform existing open-weight models and approach the capabilities of proprietary API models, even with fewer inference steps.

Key insights

Weblica enables scalable, reproducible web agent training via HTTP caching and LLM-driven synthetic environment generation.

Principles

Combine real-world data with synthetic generation for diversity.
HTTP-level caching ensures reproducible web interactions.
LLM-based synthesis scales environment creation.

Method

Weblica uses HTTP-level caching with automated rule generation to replay web interactions, and an LLM-based pipeline (Claude Code) to synthesize diverse, interactive web environments with specific capabilities, domains, and visual styles.

In practice

Use HTTP caching to stabilize dynamic web content for training.
Employ LLMs to generate diverse, interactive web environments.
Train visual agents on raw screenshots for better generalization.

Topics

Weblica Framework
Visual Web Agents
HTTP-level Caching
LLM-based Environment Synthesis
Reinforcement Learning Training

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.