Agentic Discovery of Neural Architectures: AIRA-Compose and AIRA-Design

· Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Emerging Technologies & Innovation · Depth: Expert, extended

Summary

Meta's FAIR team introduces AIRA-Compose and AIRA-Design, dual frameworks leveraging LLM agents for autonomous neural architecture discovery and optimization, aiming for recursive self-improvement. AIRA-Compose employs 11 agents to search a combinatorial space of computational primitives (Attention, MLP, Mamba) within a 24-hour compute budget, yielding 14 novel architectures. These include AIRAformers (Transformer-based) and AIRAhybrids (Transformer-Mamba-based), which at 1B parameters, outperform Llama 3.2 and Composer-found alternatives by up to 3.8% accuracy on downstream tasks. AIRA-Compose also identifies architectures like AIRAformer-C and AIRAhybrid-C that scale 54-71% and 23-37% faster, respectively. AIRA-Design tasks up to 20 agents with writing novel attention mechanisms and optimizing training scripts. On the Long Range Arena (LRA) benchmark, agent-designed architectures achieve accuracy within 2.3-2.6% of human SOTA, and on the Autoresearch benchmark, Greedy Opus 4.5 surpasses the published minimum reference with a 0.968 validation bits-per-byte.

Key takeaway

For research scientists focused on next-generation foundation model design, these agentic frameworks offer a powerful paradigm. You should consider integrating LLM-powered agents into your architecture search and optimization workflows to discover novel, high-performing hybrid models and achieve more efficient scaling, potentially accelerating recursive self-improvement efforts. The findings suggest that agent-driven methods can yield competitive designs that rival or surpass human-designed baselines.

Key insights

LLM agents can autonomously discover and optimize neural architectures and training methods, surpassing human-designed baselines.

Principles

Method

A dual-framework approach: AIRA-Compose for high-level architecture search using predefined primitives, and AIRA-Design for low-level mechanistic implementation and training script optimization.

In practice

Topics

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.