84.0% on ARC-AGI2 (840/1000) using LLM program synthesis + deterministic verification — no fine-tuning, no neural search
Summary
A student in Kyoto, Japan, achieved an 84.0% score (840/1000 tasks) on the ARC-AGI2 training set by combining 127,000 lines of hand-crafted symbolic solvers with a Claude-powered program synthesis pipeline. The system operates in two stages: an initial set of 30+ specialized Python solvers, which plateaued at 24.4% (244/1000 tasks), and a subsequent LLM program synthesis stage. For unsolved tasks, Claude Sonnet 4.5 generates Python `transform` functions, which are then deterministically verified by an external Python script against all training examples. Claude Opus 4 orchestrates the process, batching tasks and managing parallel Sonnet sub-agents. This hybrid approach, which avoids fine-tuning or neural search, demonstrates a 78.8% success rate on previously unsolved tasks, with the full pipeline processing 1000 tasks in approximately 3 hours on a MacBook.
Key takeaway
For AI Scientists and Research Scientists developing solutions for complex reasoning benchmarks like ARC-AGI2, consider adopting a neurosymbolic architecture. Your team should prioritize using LLMs for program synthesis and pair this with robust, deterministic verification to ensure accuracy and mitigate hallucination, rather than relying on direct model predictions. This approach can significantly improve performance on tasks requiring precise, verifiable outputs, even if it introduces a generalization gap on evaluation sets.
Key insights
Combining LLM program synthesis with deterministic verification significantly boosts performance on complex reasoning tasks.
Principles
- Deterministic verification catches LLM hallucinations.
- LLMs excel at code generation, not direct grid prediction.
- Hybrid neurosymbolic systems overcome plateaus.
Method
An LLM generates Python code for unsolved tasks, which is then executed and deterministically verified against all training examples. Accepted code must be pixel-perfect across all examples.
In practice
- Use LLMs as code generators, not direct solvers.
- Implement strict, deterministic verification for LLM outputs.
- Combine symbolic solvers with LLM synthesis for complex problems.
Topics
- ARC-AGI2
- LLM Program Synthesis
- Deterministic Verification
- Neurosymbolic AI
- Claude Sonnet
Code references
Best for: AI Scientist, Research Scientist, AI Researcher, AI Engineer, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning ML & Generative AI News.