The Illusion of Multi-Agent Advantage
Summary
A systematic evaluation challenges the prevailing assumption that Multi-Agent Systems (MAS) consistently outperform Single-Agent Systems (SAS) like Chain-of-Thought with Self-Consistency (CoT-SC). Researchers from Salesforce Research and HKUST (Guangzhou) found that automatically generated MAS frameworks, including DyLAN, MAS-Zero, ADAS, AFlow, MaAS, and MAS-Orchestra, frequently underperform CoT-SC across various reasoning and interactive tasks, despite incurring up to 10x higher computational costs. The study introduces the Synthetic Multi-Hop Financial Reasoning (SMFR) dataset, specifically designed to offer explicit opportunities for multi-agent advantages like task decomposition and parallelization. On SMFR, an expert-architected MAS significantly outperforms automated MAS, achieving up to 96.5% accuracy with GPT-5 compared to 57.0% for CoT-SC, at comparable costs. Architectural deconstruction revealed that automated MAS suffer from "architectural bloat," where superficial complexity, such as redundant agent roles and verifier biases, fails to translate into functional utility, often collapsing into basic CoT-SC-like execution.
Key takeaway
For AI Architects and Machine Learning Engineers considering Multi-Agent Systems for complex reasoning tasks, you should critically re-evaluate their perceived advantages. This research indicates that current automated MAS frameworks often introduce significant computational overhead (up to 10x) without delivering superior performance compared to strong single-agent baselines like CoT-SC. Instead, focus on designing MAS with explicit, human-engineered task decomposition and role specialization, especially for problems with clear parallelization opportunities, as this approach demonstrated substantial gains and cost-efficiency. Avoid black-box automated MAS generation, which frequently leads to architectural bloat and functional redundancy.
Key insights
Automated Multi-Agent Systems often incur high costs for superficial complexity, failing to outperform simpler Single-Agent Systems.
Principles
- Multi-agent systems require explicit architectural design for functional utility.
- Automated MAS often degenerate into redundant ensembling, not true collaboration.
- MAS benefits are contingent on underlying LLM competency and task suitability.
Method
The study systematically evaluates automated MAS against CoT-SC on diverse benchmarks, introduces the SMFR diagnostic dataset, and deconstructs MAS architectures to identify functional failures.
In practice
- Prioritize strong single-agent baselines like CoT-SC for cost-efficiency.
- Design MAS with explicit task decomposition and context separation.
- Avoid automated MAS frameworks that generate "architectural bloat."
Topics
- Multi-Agent Systems
- Single-Agent Systems
- Chain-of-Thought
- LLM Evaluation
- Architectural Bloat
- Cost-Efficiency
- Synthetic Benchmarks
Code references
Best for: Research Scientist, AI Engineer, AI Scientist, Machine Learning Engineer, AI Architect
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.