Agentic Discovery of Neural Architectures: AIRA-Compose and AIRA-Design

· Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems, Emerging Technologies & Innovation · Depth: Expert, quick

Summary

A dual-framework approach, AIRA-Compose and AIRA-Design, enables LLM agents to autonomously design neural architectures for foundation models, moving beyond standard Transformers. AIRA-Compose employs 11 agents to search for high-level architectures within a 24-hour budget, evaluating million-parameter candidates and extrapolating top designs to 350M, 1B, and 3B scales. This process generated 14 architectures, including AIRAformers (Transformer-based) and AIRAhybrids (Transformer-Mamba), which consistently outperformed Llama 3.2 and Composer-found baselines at 1B scale. AIRAformer-D and AIRAhybrid-D improved accuracy by 2.4% and 3.8% over Llama 3.2 on downstream tasks. AIRA-Compose also found models with efficient scaling, such as AIRAformer-C scaling 54% and 71% faster than Llama 3.2 and Composer's best Transformer, respectively. AIRA-Design utilizes 20 agents to create novel attention mechanisms and training scripts, achieving results within 2.3% and 2.6% of human state-of-the-art on Long Range Arena benchmarks for document matching and text classification.

Key takeaway

For research scientists focused on next-generation foundation models, you should investigate integrating agentic discovery frameworks like AIRA-Compose and AIRA-Design into your workflow. This approach offers a path to autonomously generate and optimize neural architectures that can outperform current hand-designed baselines, potentially accelerating recursive self-improvement in AI systems and leading to more efficient scaling frontiers for large language models.

Key insights

LLM agents can autonomously discover and optimize neural architectures, surpassing human-designed baselines.

Principles

Method

AIRA-Compose uses 11 agents for high-level architecture search, evaluating candidates and extrapolating. AIRA-Design employs 20 agents for low-level mechanistic implementation, focusing on attention mechanisms and training scripts.

In practice

Topics

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Architect

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.