FORGE: Self-Evolving Agent Memory With No Weight Updates via Population Broadcast
Summary
FORGE (Failure-Optimized Reflective Graduation and Evolution) is a novel, staged, population-based protocol designed to enhance LLM agent decision-making through self-generated, prompt-injected natural-language memory, without requiring gradient updates. It integrates a Reflexion-style inner loop where a reflection agent transforms failed trajectories into reusable knowledge artifacts, such as textual heuristics (Rules), few-shot demonstrations (Examples), or a combination (Mixed). An outer loop propagates the best-performing agent's memory to the population across stages and graduates converged instances. Evaluated on CybORG CAGE-2, a 30-step stochastic network-defense POMDP, FORGE improved average evaluation return by 1.7-7.7x over zero-shot baselines and by 29-72% over Reflexion baselines across 12 model-representation conditions, reducing major-failure rates to approximately 1%. This performance was observed across Gemini-2.5-Flash-Lite, Grok-4-Fast, Llama-4-Maverick, and Qwen3-235B.
Key takeaway
For research scientists developing LLM agents for complex, stochastic environments, FORGE offers a robust method to improve agent performance and reduce failure rates without costly model retraining. You should consider implementing a population-based memory evolution system with broadcast mechanisms, particularly if working with models exhibiting high zero-shot failure rates, as this approach can significantly mitigate capability gaps and enhance decision-making in challenging POMDPs like network defense.
Key insights
FORGE enables LLM agents to self-evolve memory via population broadcast, significantly improving decision-making without weight updates.
Principles
- Population broadcast is critical for performance gains.
- Graduation primarily saves compute, not performance.
- Weaker models benefit disproportionately from FORGE.
Method
FORGE uses a staged, population-based protocol with an inner Reflexion-style loop for memory generation (Rules, Examples, Mixed) and an outer loop for propagating best-performing memory and graduating converged instances.
In practice
- Use "Examples" for strongest returns with most models.
- Consider "Rules" for better cost-reliability (40% fewer tokens).
Topics
- FORGE Protocol
- LLM Agents
- Agent Memory Evolution
- Population Broadcast
- Reflexion
Best for: Research Scientist, AI Scientist, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.