MemRefine: LLM-Guided Compression for Long-Term Agent Memory

2026-06-12 · Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Expert, quick

Summary

MemRefine is an LLM-guided framework designed to address the unbounded growth of memory stores in large language model (LLM) agents operating over long-term interactions. As agent memory accumulates, it becomes filled with redundant entries, increasing storage costs and degrading information retrieval, particularly on resource-constrained platforms. MemRefine tackles this "storage-budgeted memory management" problem by using similarity metrics solely to propose candidate memory pairs. Crucially, it then employs an LLM judge to make delete, merge, or preserve decisions based on factual content, iterating until a fixed memory budget is achieved. The framework consistently meets target budgets across various memory frameworks and long-term conversation benchmarks, preserving downstream performance and outperforming rule-based baselines under tight budget constraints.

Key takeaway

For AI Scientists and Machine Learning Engineers developing LLM agents with long-term memory requirements, MemRefine offers a robust solution to manage memory growth and resource constraints. You should consider integrating an LLM-guided compression framework to maintain performance under tight memory budgets. This approach ensures that critical factual information is preserved by deferring delete/merge decisions to an LLM judge, rather than relying on less effective surface similarity metrics.

Key insights

LLM-guided compression, MemRefine, manages agent memory by using an LLM judge for factual content-based decisions, outperforming surface similarity.

Principles

Surface similarity poorly reflects factual value.
LLM judges can make content-aware memory decisions.
Iterative compression can meet fixed memory budgets.

Method

MemRefine proposes candidate memory pairs via similarity, then an LLM judge decides to delete, merge, or preserve based on factual content, iterating until the predefined memory budget is met.

In practice

Implement LLM-guided memory compression for agents.
Prioritize factual content over surface similarity in memory management.
Apply iterative budget-constrained memory refinement.

Topics

LLM Agents
Memory Management
Memory Compression
Large Language Models
Resource Constraints
Factual Content Preservation

Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.