ECHO: Prune to act, trace to learn with selective turn memory in agentic RL
Summary
ECHO is a novel selective turn-memory framework designed for long-horizon language agents operating under bounded context windows. It addresses two key limitations in existing context-management methods: the progressive removal or collapse of historical observations, which hinders fine-grained evidence reuse, and the loss of an explicit path for aligning policy updates with successful outcomes once original turns are no longer source-addressable. ECHO tackles these issues by compressing each completed environment turn into a compact memory record, reconstructing bounded policy contexts by selectively choosing from these records, and reusing the selected source indices to route positive outcome credit. On the BrowseComp-Plus benchmark, ECHO achieves 43.4% held-out accuracy, significantly outperforming GRPO (28.9%) and the rolling-summary baseline SUPO (36.1%), while also requiring fewer turns and lower trajectory volume than SUPO. Furthermore, the trained policy demonstrates improved zero-shot generalization across multi-objective QA, code generation, and deep information-seeking benchmarks, utilizing both dense and MoE backbones.
Key takeaway
For Machine Learning Engineers developing long-horizon language agents, ECHO offers a robust solution to context window limitations and policy alignment challenges. You should consider implementing selective turn memory and source-indexed reconstruction to maintain fine-grained evidence and enable traceable learning. This approach can significantly improve agent performance, as demonstrated by ECHO's 43.4% accuracy on BrowseComp-Plus, and enhance zero-shot generalization across diverse tasks like QA and code generation.
Key insights
ECHO uses selective turn memory and source-indexed reconstruction to enable fine-grained evidence reuse and traceable learning in long-horizon language agents.
Principles
- Selective turn memory prevents history collapse.
- Source-indexed reconstruction enables traceable learning.
- Compact memory records improve context efficiency.
Method
ECHO compresses environment turns into memory records, reconstructs policy contexts by selecting records, and routes positive outcome credit via source indices to align policy updates with supporting evidence and selection actions.
In practice
- Apply selective memory for long-horizon tasks.
- Use source indexing for outcome-based RL.
- Improve generalization with trained policies.
Topics
- Agentic RL
- Context Management
- Language Agents
- Memory Networks
- Policy Alignment
- Zero-shot Generalization
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.