SWE-MeM: Learning Adaptive Memory Management for Long-Horizon Coding Agents
Summary
SWE-MeM is a novel training framework designed to equip long-horizon software engineering agents with adaptive memory management capabilities. It addresses the challenge of limited context budgets and noisy interaction histories by introducing a flexible memory tool. This tool allows agents to proactively and on-demand decide when, what, and how to compress historical context based on trajectory state, task progress, and remaining context. The framework integrates memory-management trajectory synthesis, a two-stage curriculum fine-tuning process, and Memory-aware GRPO, which jointly optimizes memory decisions and task-solving performance. Evaluated on SWE-Bench Verified, SWE-MeM achieved resolve rates of 43.4% with Qwen3-4B-Instruct and 60.2% with Qwen3-Coder-30B-A3B under a 32K context budget. It consistently outperformed existing memory management baselines in both performance and token efficiency, demonstrating its ability to provide a cleaner working context and improve task-relevant information utilization.
Key takeaway
For machine learning engineers developing long-horizon software engineering agents, consider integrating adaptive memory management like SWE-MeM. Your agents can significantly improve issue resolution rates and token efficiency by learning to proactively compress irrelevant context. Implement a flexible memory tool and train with memory-aware reinforcement learning to enable agents to make intelligent, on-demand compression decisions, rather than relying on static rules. This approach not only prevents context overflow but also provides a cleaner, more focused working context for complex tasks.
Key insights
Long-horizon coding agents can achieve superior performance and efficiency through adaptive, proactive memory management.
Principles
- Adaptive memory management improves agent efficiency.
- Jointly optimize memory decisions and task performance.
Method
SWE-MeM trains agents using synthesized memory-management trajectories, curriculum fine-tuning, and Memory-aware GRPO, which employs trajectory splitting and step-level credit assignment for joint optimization.
In practice
- Implement flexible context compression tools.
- Synthesize memory-aware training trajectories.
Topics
- Software Engineering Agents
- Memory Management
- LLM Context Management
- Reinforcement Learning
- Curriculum Learning
- SWE-Bench Benchmark
Best for: Research Scientist, AI Engineer, AI Scientist, Machine Learning Engineer, Robotics Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.SE updates on arXiv.org.