Agentic Discovery of Neural Architectures: AIRA-Compose and AIRA-Design
Summary
Meta's FAIR team introduces AIRA-Compose and AIRA-Design, dual frameworks leveraging LLM agents for autonomous neural architecture discovery and optimization, aiming for recursive self-improvement. AIRA-Compose employs 11 agents to search a combinatorial space of computational primitives (Attention, MLP, Mamba) within a 24-hour compute budget, yielding 14 novel architectures. These include AIRAformers (Transformer-based) and AIRAhybrids (Transformer-Mamba-based), which at 1B parameters, outperform Llama 3.2 and Composer-found alternatives by up to 3.8% accuracy on downstream tasks. AIRA-Compose also identifies architectures like AIRAformer-C and AIRAhybrid-C that scale 54-71% and 23-37% faster, respectively. AIRA-Design tasks up to 20 agents with writing novel attention mechanisms and optimizing training scripts. On the Long Range Arena (LRA) benchmark, agent-designed architectures achieve accuracy within 2.3-2.6% of human SOTA, and on the Autoresearch benchmark, Greedy Opus 4.5 surpasses the published minimum reference with a 0.968 validation bits-per-byte.
Key takeaway
For research scientists focused on next-generation foundation model design, these agentic frameworks offer a powerful paradigm. You should consider integrating LLM-powered agents into your architecture search and optimization workflows to discover novel, high-performing hybrid models and achieve more efficient scaling, potentially accelerating recursive self-improvement efforts. The findings suggest that agent-driven methods can yield competitive designs that rival or surpass human-designed baselines.
Key insights
LLM agents can autonomously discover and optimize neural architectures and training methods, surpassing human-designed baselines.
Principles
- Agent-driven search navigates vast combinatorial design spaces efficiently.
- Hybrid architectures combining Attention, MLP, and SSMs offer superior performance.
- Iterative refinement is crucial for low-level code generation tasks.
Method
A dual-framework approach: AIRA-Compose for high-level architecture search using predefined primitives, and AIRA-Design for low-level mechanistic implementation and training script optimization.
In practice
- Explore hybrid Transformer-Mamba architectures for improved scaling.
- Utilize agentic frameworks for automated hyperparameter tuning.
- Implement iterative debugging for complex code generation tasks.
Topics
- Agentic AI Research
- Neural Architecture Search
- Foundation Models
- Hybrid LLM Architectures
- Recursive Self-Improvement
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.