ARC-AGI-2 Technical Report

· Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Expert, extended

Summary

The ARC-AGI-2 Technical Report introduces a transformer-based system that significantly advances performance on the Abstraction and Reasoning Corpus (ARC) benchmark, achieving a 27.08% score on the semiprivate evaluation set with a 200M-parameter LongT5 model. This system reformulates ARC reasoning as a sequence modeling problem using a compact 125-token encoding and a modified LongT5 architecture. Key innovations include a principled data augmentation framework based on group symmetries, grid traversals, and cellular automata perturbations, which enforces invariance to representation changes. It also incorporates test-time training (TTT) with lightweight LoRA adaptation for task specialization and a symmetry-aware decoding and scoring pipeline that aggregates likelihoods across augmented task views, performing "multi-perspective reasoning" over candidate solutions. The approach demonstrates synergistic effects, with augmentations expanding hypothesis space, TTT sharpening local reasoning, and symmetry-based scoring improving solution consistency.

Key takeaway

For AI Scientists and Research Scientists developing models for abstract reasoning, this work highlights the critical role of structured data augmentation and test-time adaptation. You should consider integrating symmetry-aware priors and multi-perspective reasoning into your model architectures and inference pipelines. This approach can significantly improve generalization beyond simple pattern matching, especially in data-sparse environments like ARC, by forcing models to learn underlying rules rather than superficial patterns.

Key insights

Combining neural inference with structured priors and online adaptation significantly improves abstract reasoning on ARC-AGI.

Principles

Method

The method involves offline training with curriculum learning and UL2 denoising, followed by online inference with Test-Time Training (LoRA), beam search decoding, symbolic filtering, and symmetry-aware scoring across D4 transformations.

In practice

Topics

Code references

Best for: AI Scientist, Research Scientist, AI Researcher, Machine Learning Engineer, Deep Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.