Agent Memory: Characterization and System Implications of Stateful Long-Horizon Workloads
Summary
This paper presents the first systems characterization of agent memory, crucial for large language model (LLM) agents performing long-horizon tasks. It introduces a four-axis taxonomy for classifying agent memory systems and a phase-aware profiling harness to attribute costs. The study analyzes ten representative systems across MemoryAgentBench and MemoryArena, revealing that memory construction, not query-time serving, is the dominant cost, often exceeding total query-phase energy across 300 queries. Per-query serving latency varies by two orders of magnitude, from under 0.1 seconds for Mem0 to approximately 38 seconds for long-context baselines. The research identifies construction as a prefill- and embedding-heavy workload, with energy per correct answer spanning over 47 times across systems. It also highlights that construction-LLM choice is algorithm-constrained, and no single system optimizes for construction cost, query latency, and accuracy simultaneously. The analysis concludes with ten system recommendations for deployment.
Key takeaway
For MLOps Engineers deploying long-horizon LLM agents, you must evaluate agent memory systems beyond just accuracy. Prioritize systems by their full lifecycle energy, especially construction costs, which often dominate. Manage construction as a background throughput workload with admission control to avoid interfering with latency-sensitive queries. Match your system's cost-split to your workload's query arrival patterns. Be aware that agentic systems can incur super-linear cost growth, requiring active compaction policies to prevent unbounded expenses.
Key insights
Agent memory system costs are dominated by construction, varying widely and unreflected by accuracy metrics.
Principles
- Construction energy dominates LLM agent lifecycle.
- Construction LLM choice has an algorithm-imposed floor.
- No single system optimizes all cost-accuracy axes.
Method
A system-oriented taxonomy classifies agent memory. A phase-aware harness profiles construction, retrieval, and generation costs across ten systems.
In practice
- Select agent memory based on system costs, not just accuracy.
- Account for full agent lifecycle energy, especially construction.
- Match cost-split to query patterns; consider growth slope.
Topics
- LLM Agents
- Agent Memory Systems
- System Characterization
- Cost Optimization
- Workload Management
- MemoryAgentBench
Best for: Research Scientist, AI Architect, AI Engineer, AI Scientist, Machine Learning Engineer, MLOps Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.