Signal-Driven Observation for Long-Horizon Web Agents
Summary
Signal-Driven Observation (SDO) is a novel architectural proposal for long-horizon web agents, addressing the critical issue of "observation over-ingestion." Current web agents routinely ingest raw DOM and accessibility trees, often tens of thousands of tokens, at every action step. This architectural coupling of observation and action frequency causes progressive context degradation, leading to failures like context rot, loop-trapping, and goal drift, even in models with large context windows (e.g., 200K tokens). SDO decouples these frequencies by employing a dedicated sub_RLM that reads the full DOM but returns only compact, task-relevant elements and their selectors. This sub_RLM is triggered only when a lightweight signal detector identifies meaningful page changes, such as URL transitions, newly visible interactive elements, action failures, or exogenous browser events. The paper defines observation over-ingestion, sketches SDO as a concrete solution, and outlines open problems for the community.
Key takeaway
For AI Engineers designing long-horizon web agents, you should prioritize architectural solutions that decouple observation frequency from action frequency. Implementing a Signal-Driven Observation (SDO) approach, where a sub-LLM provides compact, task-relevant observations only when a lightweight signal detector fires, will significantly mitigate context rot, loop-trapping, and goal drift. This shift from constant full DOM ingestion to event-driven, filtered observation is crucial for improving agent reliability and performance on complex web tasks.
Key insights
Decoupling web agent observation frequency from action frequency prevents context degradation and improves long-horizon task success.
Principles
- Querying documents outperforms wholesale ingestion.
- Observation compression is a core architectural decision.
- Context quality matters more than raw length.
Method
SDO uses a sub_RLM to return compact, task-relevant DOM elements, triggered by a zero-LLM-cost signal detector monitoring URL changes, new ARIA elements, action failures, or exogenous events.
In practice
- Implement a signal detector for browser events.
- Use a sub-LLM to filter DOM for task relevance.
- Prioritize observation-level failure diagnostics.
Topics
- Web Agents
- Observation Compression
- Long-Horizon Tasks
- Context Management
- Signal Detection
- Recursive Language Models
Best for: Research Scientist, AI Architect, AI Scientist, Machine Learning Engineer, AI Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.