Building an ARC-2 Solver — From Socratic Panels to a Single Oracle
Summary
The article details the development of an ARC-2 solver, transitioning from a multi-agent Socratic architecture to a single, tool-augmented "Oracle" model. Initially, a supervisor agent orchestrated specialist panels debating hypotheses, but this approach proved inefficient due to flawed visual grounding and high token usage. The new Oracle model, built on LangGraph, achieved 50-70%+ accuracy on the ARC-2 public evaluation dataset, comparable to top leaderboard entries. This architecture emphasizes structured reflection, experimentation, and verification using a constrained tool environment, including Python for analysis and a utility library for ARC archetypes. Key to its success was improved visual perception through various grid presentation modes and a focus on stateless execution and mandatory verification before submission.
Key takeaway
For AI Engineers developing reasoning systems, this work suggests that focusing on robust perception and tool-augmented, disciplined single-agent architectures can yield better results than complex multi-agent debates. Prioritize clear visual grounding, integrate verified utility functions, and enforce mandatory verification steps to improve accuracy and reduce token costs. Consider "thinking time" (MAX_ORACLE_TOOL_CALLS) as a critical, tunable parameter for performance.
Key insights
A single, tool-augmented Oracle architecture outperforms multi-agent debate for ARC-2 by prioritizing perception, structured reflection, and verification.
Principles
- Deep reflection beats debate.
- Perceptual scaffolding matters.
- Thinking time scales performance.
Method
The Oracle model uses a LangGraph-based single-agent loop for reflection, experimentation via Python tools, and verification, submitting only after confirming correctness across all training examples.
In practice
- Use Python as an analytical microscope, not a solution generator.
- Implement stateless execution for rigor.
- Encode ARC archetypes into utility libraries.
Topics
- ARC-2 Solver
- Tool-Augmented AI
- Visual Perception
- AI Benchmarks
- Single-Agent Architecture
Best for: AI Engineer, Machine Learning Engineer, AI Researcher
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Towards AI - Medium.