ECHO: Prune to act, trace to learn with selective turn memory in agentic RL

· Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Expert, quick

Summary

ECHO is a novel selective turn-memory framework designed for long-horizon language agents operating under bounded context windows. It addresses two key limitations in existing context-management methods: the progressive removal or collapse of historical observations, which hinders fine-grained evidence reuse, and the loss of an explicit path for aligning policy updates with successful outcomes once original turns are no longer source-addressable. ECHO tackles these issues by compressing each completed environment turn into a compact memory record, reconstructing bounded policy contexts by selectively choosing from these records, and reusing the selected source indices to route positive outcome credit. On the BrowseComp-Plus benchmark, ECHO achieves 43.4% held-out accuracy, significantly outperforming GRPO (28.9%) and the rolling-summary baseline SUPO (36.1%), while also requiring fewer turns and lower trajectory volume than SUPO. Furthermore, the trained policy demonstrates improved zero-shot generalization across multi-objective QA, code generation, and deep information-seeking benchmarks, utilizing both dense and MoE backbones.

Key takeaway

For Machine Learning Engineers developing long-horizon language agents, ECHO offers a robust solution to context window limitations and policy alignment challenges. You should consider implementing selective turn memory and source-indexed reconstruction to maintain fine-grained evidence and enable traceable learning. This approach can significantly improve agent performance, as demonstrated by ECHO's 43.4% accuracy on BrowseComp-Plus, and enhance zero-shot generalization across diverse tasks like QA and code generation.

Key insights

ECHO uses selective turn memory and source-indexed reconstruction to enable fine-grained evidence reuse and traceable learning in long-horizon language agents.

Principles

Method

ECHO compresses environment turns into memory records, reconstructs policy contexts by selecting records, and routes positive outcome credit via source indices to align policy updates with supporting evidence and selection actions.

In practice

Topics

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.