FORGE: Self-Evolving Agent Memory With No Weight Updates via Population Broadcast

2026-05-15 · Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Expert, quick

Summary

FORGE (Failure-Optimized Reflective Graduation and Evolution) is a novel, staged, population-based protocol designed to enhance LLM agent decision-making through self-generated, prompt-injected natural-language memory, without requiring gradient updates. It integrates a Reflexion-style inner loop where a reflection agent transforms failed trajectories into reusable knowledge artifacts, such as textual heuristics (Rules), few-shot demonstrations (Examples), or a combination (Mixed). An outer loop propagates the best-performing agent's memory to the population across stages and graduates converged instances. Evaluated on CybORG CAGE-2, a 30-step stochastic network-defense POMDP, FORGE improved average evaluation return by 1.7-7.7x over zero-shot baselines and by 29-72% over Reflexion baselines across 12 model-representation conditions, reducing major-failure rates to approximately 1%. This performance was observed across Gemini-2.5-Flash-Lite, Grok-4-Fast, Llama-4-Maverick, and Qwen3-235B.

Key takeaway

For research scientists developing LLM agents for complex, stochastic environments, FORGE offers a robust method to improve agent performance and reduce failure rates without costly model retraining. You should consider implementing a population-based memory evolution system with broadcast mechanisms, particularly if working with models exhibiting high zero-shot failure rates, as this approach can significantly mitigate capability gaps and enhance decision-making in challenging POMDPs like network defense.

Key insights

FORGE enables LLM agents to self-evolve memory via population broadcast, significantly improving decision-making without weight updates.

Principles

Population broadcast is critical for performance gains.
Graduation primarily saves compute, not performance.
Weaker models benefit disproportionately from FORGE.

Method

FORGE uses a staged, population-based protocol with an inner Reflexion-style loop for memory generation (Rules, Examples, Mixed) and an outer loop for propagating best-performing memory and graduating converged instances.

In practice

Use "Examples" for strongest returns with most models.
Consider "Rules" for better cost-reliability (40% fewer tokens).

Topics

FORGE Protocol
LLM Agents
Agent Memory Evolution
Population Broadcast
Reflexion

Best for: Research Scientist, AI Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.