The Illusion of Multi-Agent Advantage

· Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems, Data Science & Analytics · Depth: Expert, extended

Summary

A systematic evaluation challenges the prevailing assumption that Multi-Agent Systems (MAS) consistently outperform Single-Agent Systems (SAS) like Chain-of-Thought with Self-Consistency (CoT-SC). Researchers from Salesforce Research and HKUST (Guangzhou) found that automatically generated MAS frameworks, including DyLAN, MAS-Zero, ADAS, AFlow, MaAS, and MAS-Orchestra, frequently underperform CoT-SC across various reasoning and interactive tasks, despite incurring up to 10x higher computational costs. The study introduces the Synthetic Multi-Hop Financial Reasoning (SMFR) dataset, specifically designed to offer explicit opportunities for multi-agent advantages like task decomposition and parallelization. On SMFR, an expert-architected MAS significantly outperforms automated MAS, achieving up to 96.5% accuracy with GPT-5 compared to 57.0% for CoT-SC, at comparable costs. Architectural deconstruction revealed that automated MAS suffer from "architectural bloat," where superficial complexity, such as redundant agent roles and verifier biases, fails to translate into functional utility, often collapsing into basic CoT-SC-like execution.

Key takeaway

For AI Architects and Machine Learning Engineers considering Multi-Agent Systems for complex reasoning tasks, you should critically re-evaluate their perceived advantages. This research indicates that current automated MAS frameworks often introduce significant computational overhead (up to 10x) without delivering superior performance compared to strong single-agent baselines like CoT-SC. Instead, focus on designing MAS with explicit, human-engineered task decomposition and role specialization, especially for problems with clear parallelization opportunities, as this approach demonstrated substantial gains and cost-efficiency. Avoid black-box automated MAS generation, which frequently leads to architectural bloat and functional redundancy.

Key insights

Automated Multi-Agent Systems often incur high costs for superficial complexity, failing to outperform simpler Single-Agent Systems.

Principles

Method

The study systematically evaluates automated MAS against CoT-SC on diverse benchmarks, introduces the SMFR diagnostic dataset, and deconstructs MAS architectures to identify functional failures.

In practice

Topics

Code references

Best for: Research Scientist, AI Engineer, AI Scientist, Machine Learning Engineer, AI Architect

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.