Structural Grid Descriptors Predict Within-Task Solver Success on ARC-AGI

· Source: Takara TLDR - Daily AI Papers · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Mathematics & Computational Sciences · Depth: Expert, quick

Summary

A study by Ayan Pendharkar demonstrates that structural properties of intermediate grid states can predict the success of symbolic ARC-AGI solvers. Across 44,800 runs involving beam search and Stochastic DFS solvers on 400 ARC tasks, hand-crafted grid descriptors measured at 50% trajectory completion effectively distinguished successful from failed runs within the same task, achieving a mean within-task best-feature AUC of 0.885 (p < 0.001). The most predictive content aligns with a single grid-complexity axis, and features selected on one solver architecture predict success on the other with AUCs ranging from 0.747 to 0.762. The "n_components_final" feature showed robust prediction on a held-out set with AUC = 0.765. This predictive signal is independent of solver capacity and weakly coupled to score trajectories. Practical implications include reducing beam-search compute by 33.6% with 98.9% solve retention via early stopping, and cutting SDFS compute by 65.3% without solve loss through degenerate-trajectory detection. Additionally, 229 of 400 evaluation tasks failed due to DSL primitive library limitations.

Key takeaway

For AI scientists developing or deploying ARC-AGI solvers, integrating structural grid descriptors into your evaluation pipeline is crucial. You can significantly reduce computational costs by implementing early stopping at 50% trajectory completion for beam search, saving 33.6% compute, or using degenerate-trajectory detection for SDFS, cutting 65.3% compute without solve loss. Additionally, assess your DSL primitive library's coverage to identify fundamental task limitations.

Key insights

Structural grid descriptors at 50% trajectory completion reliably predict ARC-AGI solver success, enabling significant compute reductions.

Principles

Method

The study used conditional mutual information I(X;Ytask) > 0 to test if hand-crafted grid descriptors, measured at 50% trajectory completion, predict solver success across 44,800 runs.

In practice

Topics

Best for: Research Scientist, AI Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.