Memp: Exploring Agent Procedural Memory

· Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems, Emerging Technologies & Innovation · Depth: Expert, extended

Summary

$Mem^{p}$ is a task-agnostic framework designed to equip Large Language Model (LLM)-based agents with learnable, updatable, and lifelong procedural memory. It addresses the brittleness of existing procedural memory in agents by distilling past trajectories into both fine-grained, step-by-step instructions and higher-level, script-like abstractions. The framework systematically explores strategies for building, retrieving, and updating this memory, including dynamic regimens for continuous correction and deprecation. Empirical evaluations on the TravelPlanner and ALFWorld benchmarks demonstrate that as the memory repository is refined, agents achieve consistently higher success rates and greater efficiency on analogous tasks. Notably, procedural memory built from stronger models like GPT-4o can be transferred to weaker models such as Qwen2.5-14B-Instruct, yielding substantial performance gains, including a 5% increase in task completion rate and a 1.6-step reduction on TravelPlanner.

Key takeaway

For Research Scientists developing LLM-based agents, integrating a dynamic procedural memory system like $Mem^{p}$ is crucial for improving agent robustness and efficiency. You should focus on implementing memory construction that combines abstract scripts with concrete trajectories, employ semantic-aware retrieval, and prioritize reflection-based update mechanisms to ensure continuous learning and adaptation, especially for long-horizon and complex tasks.

Key insights

Procedural memory, dynamically built and updated, significantly enhances LLM agent performance and efficiency across diverse tasks.

Principles

Method

$Mem^{p}$ distills agent trajectories into fine-grained instructions and high-level scripts. It employs strategies for memory construction (trajectories, scripts, combined), retrieval (query, AveFact), and dynamic updating (vanilla, validation, adjustment/reflection).

In practice

Topics

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Robotics Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.