ReasoningBank: Enabling agents to learn from experience
Summary
ReasoningBank, a novel agent memory framework developed by Jun Yan and Chen-Yu Lee at Google Cloud, enables AI agents to continuously learn from both successful and failed experiences after deployment. Unlike existing memory methods that store exhaustive action records or only successful workflows, ReasoningBank distills high-level, transferable reasoning patterns and strategic guardrails from mistakes. It structures memory items with a title, description, and distilled content, operating in a continuous loop of retrieval, extraction, and consolidation. When integrated with memory-aware test-time scaling (MaTTS), ReasoningBank further enhances learning by leveraging parallel and sequential exploration. Evaluated using Gemini-2.5-Flash on WebArena and SWE-Bench-Verified benchmarks, ReasoningBank improved success rates by 8.3% and 4.6% respectively, and reduced execution steps compared to memory-free baselines.
Key takeaway
For AI Architects designing persistent, long-running agents, ReasoningBank offers a critical advancement by enabling continuous learning from both successes and failures. You should consider integrating this framework to move beyond simple trajectory or workflow memories, as it distills higher-level strategic insights and significantly boosts agent effectiveness and efficiency. This approach can lead to more robust and adaptable agents capable of evolving their reasoning over time.
Key insights
ReasoningBank enables AI agents to learn generalizable strategies from both successes and failures for continuous self-evolution.
Principles
- Distill high-level reasoning patterns, not just actions.
- Learn from both successful and failed experiences.
- Memory-aware scaling enhances learning signals.
Method
ReasoningBank uses a closed-loop workflow of memory retrieval, environmental interaction, LLM-as-a-judge self-assessment, and distillation of insights from trajectories into structured memory items.
In practice
- Use LLM-as-a-judge for self-assessment.
- Incorporate counterfactual signals from failures.
- Apply parallel or sequential scaling for richer learning.
Topics
- ReasoningBank
- Agent Memory Framework
- Learning from Failure
- Memory-aware Test-Time Scaling
- WebArena Benchmark
Code references
Best for: AI Architect, Research Scientist, AI Scientist, Machine Learning Engineer, AI Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by The latest research from Google.