Structural Grid Descriptors Predict Within-Task Solver Success on ARC-AGI
Summary
A study by Ayan Pendharkar demonstrates that structural properties of intermediate grid states can predict the success of symbolic ARC-AGI solvers. Across 44,800 runs involving beam search and Stochastic DFS solvers on 400 ARC tasks, hand-crafted grid descriptors measured at 50% trajectory completion effectively distinguished successful from failed runs within the same task, achieving a mean within-task best-feature AUC of 0.885 (p < 0.001). The most predictive content aligns with a single grid-complexity axis, and features selected on one solver architecture predict success on the other with AUCs ranging from 0.747 to 0.762. The "n_components_final" feature showed robust prediction on a held-out set with AUC = 0.765. This predictive signal is independent of solver capacity and weakly coupled to score trajectories. Practical implications include reducing beam-search compute by 33.6% with 98.9% solve retention via early stopping, and cutting SDFS compute by 65.3% without solve loss through degenerate-trajectory detection. Additionally, 229 of 400 evaluation tasks failed due to DSL primitive library limitations.
Key takeaway
For AI scientists developing or deploying ARC-AGI solvers, integrating structural grid descriptors into your evaluation pipeline is crucial. You can significantly reduce computational costs by implementing early stopping at 50% trajectory completion for beam search, saving 33.6% compute, or using degenerate-trajectory detection for SDFS, cutting 65.3% compute without solve loss. Additionally, assess your DSL primitive library's coverage to identify fundamental task limitations.
Key insights
Structural grid descriptors at 50% trajectory completion reliably predict ARC-AGI solver success, enabling significant compute reductions.
Principles
- Grid complexity is a primary success predictor.
- Predictive features transfer across solver architectures.
- DSL coverage limits task solvability.
Method
The study used conditional mutual information I(X;Ytask) > 0 to test if hand-crafted grid descriptors, measured at 50% trajectory completion, predict solver success across 44,800 runs.
In practice
- Implement early stopping at 50% trajectory completion.
- Use degenerate-trajectory detection for SDFS.
- Evaluate DSL primitive library coverage.
Topics
- ARC-AGI
- Symbolic AI Solvers
- Grid Descriptors
- Predictive Analytics
- Early Stopping
- Computational Efficiency
- DSL Limitations
Best for: Research Scientist, AI Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.