How RecursiveMAS speeds up multi-agent inference by 2.4x and reduces token usage by 75%

2026-05-15 · Source: VentureBeat · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Advanced, medium

Summary

Researchers from the University of Illinois Urbana-Champaign and Stanford University have developed RecursiveMAS, a novel framework for multi-agent AI systems that enables agents to communicate via embedding space rather than text sequences. This approach addresses key challenges in traditional multi-agent systems, such as high latency, increased token costs, and difficulties in cohesive system training. RecursiveMAS achieves significant accuracy improvements across complex domains including code generation, medical reasoning, and search, while also boosting inference speed by 1.2x to 2.4x and reducing token usage by up to 75%. The framework is also considerably cheaper to train than standard full fine-tuning or LoRA methods, requiring the lowest peak GPU memory and cutting training costs by over half, making it a scalable and cost-effective solution for custom multi-agent deployments.

Key takeaway

For AI Engineers building multi-agent systems, RecursiveMAS offers a compelling alternative to text-based communication, significantly reducing operational costs and improving performance. You should explore integrating this framework to enhance inference speed and cut token consumption in production environments, especially for reasoning-heavy tasks. Consider leveraging its Apache 2.0 licensed code and trained model weights to develop more scalable and cost-effective agentic deployments.

Key insights

RecursiveMAS enables multi-agent systems to communicate via embedding space, boosting efficiency and performance.

Principles

Latent space communication reduces overhead.
Recursive architectures deepen reasoning without adding parameters.
Optimizing connective tissue is cheaper than full model fine-tuning.

Method

RecursiveMAS uses specialized RecursiveLink modules to transmit and refine latent states between agents, allowing iterative, continuous reasoning in embedding space. Only these lightweight modules are trained, keeping core model parameters frozen.

In practice

Implement RecursiveMAS for multi-agent code generation.
Apply RecursiveMAS to enhance medical reasoning tasks.
Utilize RecursiveMAS for efficient search-based Q&A.

Topics

RecursiveMAS
Multi-Agent Systems
Embedding Space Communication
RecursiveLink Modules
Token Efficiency

Code references

RecursiveMAS/RecursiveMAS

Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, AI Architect

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by VentureBeat.