Agent Memory: Characterization and System Implications of Stateful Long-Horizon Workloads

2026-06-06 · Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cloud Computing & IT Infrastructure, Software Development & Engineering · Depth: Expert, extended

Summary

This paper presents the first systems characterization of agent memory, crucial for large language model (LLM) agents performing long-horizon tasks. It introduces a four-axis taxonomy for classifying agent memory systems and a phase-aware profiling harness to attribute costs. The study analyzes ten representative systems across MemoryAgentBench and MemoryArena, revealing that memory construction, not query-time serving, is the dominant cost, often exceeding total query-phase energy across 300 queries. Per-query serving latency varies by two orders of magnitude, from under 0.1 seconds for Mem0 to approximately 38 seconds for long-context baselines. The research identifies construction as a prefill- and embedding-heavy workload, with energy per correct answer spanning over 47 times across systems. It also highlights that construction-LLM choice is algorithm-constrained, and no single system optimizes for construction cost, query latency, and accuracy simultaneously. The analysis concludes with ten system recommendations for deployment.

Key takeaway

For MLOps Engineers deploying long-horizon LLM agents, you must evaluate agent memory systems beyond just accuracy. Prioritize systems by their full lifecycle energy, especially construction costs, which often dominate. Manage construction as a background throughput workload with admission control to avoid interfering with latency-sensitive queries. Match your system's cost-split to your workload's query arrival patterns. Be aware that agentic systems can incur super-linear cost growth, requiring active compaction policies to prevent unbounded expenses.

Key insights

Agent memory system costs are dominated by construction, varying widely and unreflected by accuracy metrics.

Principles

Construction energy dominates LLM agent lifecycle.
Construction LLM choice has an algorithm-imposed floor.
No single system optimizes all cost-accuracy axes.

Method

A system-oriented taxonomy classifies agent memory. A phase-aware harness profiles construction, retrieval, and generation costs across ten systems.

In practice

Select agent memory based on system costs, not just accuracy.
Account for full agent lifecycle energy, especially construction.
Match cost-split to query patterns; consider growth slope.

Topics

LLM Agents
Agent Memory Systems
System Characterization
Cost Optimization
Workload Management
MemoryAgentBench

Best for: Research Scientist, AI Architect, AI Engineer, AI Scientist, Machine Learning Engineer, MLOps Engineer

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.