REFLEX: Reflective Evolution from LLM Experience
Summary
REFLEX is a novel, train-free evolutionary framework designed to enhance the transparency and efficiency of LLM-guided policy search. Existing methods often entangle visual behavioral interpretation with code synthesis, leading to opaque feedback loops and hindering algorithmic insight retention. REFLEX addresses this by structurally decoupling visual diagnosis from code generation. It employs a vision-enabled Critic to distill task-specific behavioral evidence into auditable diagnoses, which a text-optimized Actor then uses to synthesize child policies. This Actor also utilizes a persistent, self-evolving Skill Memory for reusable code snippets, enabling transparent mutation traces and cross-run programmatic knowledge transfer. Extensive evaluations across control benchmarks like Lunar Lander, Acrobot, and Pendulum, alongside a 36-dimensional antenna array synthesis task, demonstrate exceptional sample efficiency. REFLEX solves Acrobot and Pendulum in under 10 LLM calls and achieved a Normalized Weighted Score of 1.092 on Lunar Lander, significantly accelerating early-stage discovery of transparent policies.
Key takeaway
For Machine Learning Engineers developing LLM-guided evolutionary systems, REFLEX demonstrates a critical architectural shift. By decoupling visual diagnosis from code generation, you can achieve significantly more transparent mutation traces and enable cross-run knowledge transfer. This approach, proven to solve complex control tasks in under 10 LLM calls, suggests implementing a distinct Critic-Actor framework with a self-evolving Skill Memory to accelerate the discovery of auditable policies.
Key insights
REFLEX decouples visual diagnosis from code generation in LLM-guided evolutionary search for transparent, efficient policy discovery.
Principles
- Decouple diagnosis from generation.
- Retain algorithmic insights cross-run.
- Use structured diagnoses for policy synthesis.
Method
REFLEX uses a vision-enabled Critic for structured behavioral diagnoses. A text-optimized Actor then synthesizes child policies, leveraging these diagnoses and a self-evolving Skill Memory of reusable code snippets.
In practice
- Apply to control benchmarks.
- Synthesize antenna arrays.
- Accelerate transparent policy discovery.
Topics
- Large Language Models
- Evolutionary Algorithms
- Policy Search
- Multimodal AI
- Code Generation
- Reinforcement Learning
Best for: Research Scientist, AI Scientist, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.