AI2's fully open web agent MolmoWeb navigates the web using only screenshots

· Source: The Decoder · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Intermediate, medium

Summary

The Allen Institute for AI (AI2) has released MolmoWeb, a fully open web agent capable of navigating websites using only screenshots, without needing access to underlying source code. This agent, available in 4 billion and 8 billion parameter versions, outperforms existing open models on all tested benchmarks and approaches the performance of proprietary systems like OpenAI's o3. MolmoWeb was trained on MolmoWebMix, one of the largest public datasets of its kind, which combines 36,000 human browsing records across 1,100+ websites, automatically generated runs, and over 2.2 million screenshot-question-answer pairs. The training utilized supervised fine-tuning on 64 H100 GPUs, without reinforcement learning or distillation from proprietary systems. AI2 provides all training data, model weights, and evaluation tools under an Apache 2.0 license.

Key takeaway

For AI Architects and Research Scientists developing web automation solutions, MolmoWeb offers a robust, open-source foundation. Its screenshot-only approach and strong benchmark performance, even with smaller parameter counts, suggest a viable alternative to proprietary systems. You should investigate MolmoWeb's Apache 2.0 licensed resources on Hugging Face and GitHub to build or enhance your web agents, particularly for tasks where UI stability is critical.

Key insights

MolmoWeb is an open web agent that navigates websites using only visual screenshots, outperforming other open models.

Principles

Method

MolmoWeb uses a Molmo2 architecture with Qwen3 as the language model and SigLIP2 as the vision encoder, trained via supervised fine-tuning on a mixed dataset of human and auto-generated browsing runs.

In practice

Topics

Code references

Best for: AI Architect, AI Scientist, Research Scientist, AI Engineer, Machine Learning Engineer, AI Researcher

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by The Decoder.