Agent Memory: Characterization and System Implications of Stateful Long-Horizon Workloads

· Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cloud Computing & IT Infrastructure, Software Development & Engineering · Depth: Expert, extended

Summary

This paper presents the first systems characterization of agent memory, crucial for large language model (LLM) agents performing long-horizon tasks. It introduces a four-axis taxonomy for classifying agent memory systems and a phase-aware profiling harness to attribute costs. The study analyzes ten representative systems across MemoryAgentBench and MemoryArena, revealing that memory construction, not query-time serving, is the dominant cost, often exceeding total query-phase energy across 300 queries. Per-query serving latency varies by two orders of magnitude, from under 0.1 seconds for Mem0 to approximately 38 seconds for long-context baselines. The research identifies construction as a prefill- and embedding-heavy workload, with energy per correct answer spanning over 47 times across systems. It also highlights that construction-LLM choice is algorithm-constrained, and no single system optimizes for construction cost, query latency, and accuracy simultaneously. The analysis concludes with ten system recommendations for deployment.

Key takeaway

For MLOps Engineers deploying long-horizon LLM agents, you must evaluate agent memory systems beyond just accuracy. Prioritize systems by their full lifecycle energy, especially construction costs, which often dominate. Manage construction as a background throughput workload with admission control to avoid interfering with latency-sensitive queries. Match your system's cost-split to your workload's query arrival patterns. Be aware that agentic systems can incur super-linear cost growth, requiring active compaction policies to prevent unbounded expenses.

Key insights

Agent memory system costs are dominated by construction, varying widely and unreflected by accuracy metrics.

Principles

Method

A system-oriented taxonomy classifies agent memory. A phase-aware harness profiles construction, retrieval, and generation costs across ten systems.

In practice

Topics

Best for: Research Scientist, AI Architect, AI Engineer, AI Scientist, Machine Learning Engineer, MLOps Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.