ARC Prize 2025 Paper Award 2nd Place SOAR

· Source: ARC Prize · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering, Robotics & Autonomous Systems · Depth: Expert, extended

Summary

Julian Pel, Cedric Ka, and P.V. from Inria Bordeaux won first runner-up in the ARC Prize 2025 for their paper, "Self-Improving Language Models for Evolutionary Program Synthesis: A Case Study on ARC AGI." Their work introduces SOAR, a novel approach that enables Large Language Models (LLMs) to self-improve as evolutionary operators for program synthesis, specifically targeting the challenging ARC AGI benchmark. SOAR alternates between a search phase, where LLMs generate solutions, and a learning phase, where the model is trained on both successful and failed attempts using a technique called hindsight experience replay. This method allows the LLM to learn from diverse data, including "negative" or failed solutions, which enhances its ability to refine programs and generate more diverse outputs, ultimately leading to higher performance without relying on human-engineered data or task-specific DSLs. The team released their datasets to foster further research into the abstractions discovered by the algorithm.

Key takeaway

For AI Scientists and Research Scientists working on challenging program synthesis tasks like ARC AGI, adopting self-improving LLM architectures like SOAR is critical. Your team should explore integrating iterative search-and-learn loops and hindsight experience replay to train models on both successful and failed program generation attempts. This approach can significantly enhance model performance and generalization without requiring extensive human-engineered data, potentially outperforming methods reliant on human-curated DSLs.

Key insights

SOAR enables LLMs to self-improve in program synthesis by learning from both successes and failures through an iterative search and learning loop.

Principles

Method

SOAR employs an alternating search and learning phase. The search phase generates solutions, while the learning phase fine-tunes the LLM using synthetic data derived from both successful and failed attempts, leveraging hindsight experience replay to improve its role as an evolutionary operator.

In practice

Topics

Best for: AI Scientist, Research Scientist, AI Researcher, Machine Learning Engineer, AI Student

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by ARC Prize.