Beyond Semantic Organization: Memory as Execution State Management for Long-Horizon Agents

2026-05-01 · Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Expert, extended

Summary

Mage, a new memory framework developed by researchers from the University of Science and Technology of China and Microsoft, redefines memory for LLM-based agents tackling long-horizon tasks. Unlike existing RAG and agent memory systems that rely on semantic similarity, Mage functions as an active execution-state manager, organizing interaction history into a two-layer hierarchical state tree. This design addresses state fragmentation and error propagation by deriving the agent's state from an active root-to-current path, combining subgoal summaries, recent traces, and hints from prior branches. Four operations—Grow, Compress, Maintain, and Revise—manage this tree, enabling context growth bounding, state validation, and error isolation. Experiments on MemoryArena show Mage improves average task success rates by 7.8–20.4 percentage points over baselines and reduces token consumption by 55.1%.

Key takeaway

For AI Engineers developing LLM agents for complex, multi-step tasks, adopting an execution-state management approach to memory is crucial. Your current RAG or semantic memory systems may fragment state and propagate errors, leading to suboptimal performance and high token costs. Consider implementing hierarchical memory structures with explicit state validation and error isolation mechanisms, like Mage's Grow, Compress, Maintain, and Revise operations, to significantly improve task success rates and reduce operational expenses.

Key insights

Agent memory should manage execution state hierarchically, not just retrieve semantically similar facts.

Principles

Preserve execution path integrity.
Validate memory writes at subgoal boundaries.
Isolate erroneous segments via branching.

Method

Mage uses Grow for new traces, Compress for subgoal summaries, Maintain for summary validation, and Revise for state rollback and new branch creation.

In practice

Implement hierarchical state trees for long-horizon agents.
Incorporate explicit error detection and recovery mechanisms.
Design memory operations around execution boundaries.

Topics

LLM Agents
Memory Management
Execution State
Hierarchical Memory
Error Isolation
Long-Horizon Tasks
RAG Systems

Best for: Research Scientist, AI Architect, AI Scientist, Machine Learning Engineer, AI Engineer

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.