Evaluating Multi-Agent LLM Architectures for Rare Disease Diagnosis

2026-03-10 · Source: cs.MA updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Multiagent Systems · Depth: Advanced, quick

Summary

A study submitted on March 6, 2026, evaluated four multi-agent large language model (LLM) architectures for rare disease diagnosis across 302 cases spanning 33 rare disease categories. The architectures tested were Control (single agent), Hierarchical, Adversarial, and Collaborative. The Hierarchical topology achieved the highest diagnostic accuracy at 50.0%, slightly surpassing Collaborative (49.8%) and Control (48.5%). The Adversarial model performed significantly worse, reaching only 27.3% accuracy and exhibiting a large "Reasoning Gap" where correct diagnoses were rejected. Performance was strongest for Allergic diseases and Toxic Effects but weakest for Cardiac Malformation and Respiratory cases. Notably, all multi-agent systems, including the Adversarial model, showed superior accuracy in Bone and Thoracic disease categories compared to the single-agent baseline.

Key takeaway

For AI researchers developing diagnostic LLMs, these findings indicate that simply adding agents does not ensure better performance. You should prioritize dynamic topology selection, favoring Hierarchical or Collaborative designs, especially for Bone and Thoracic disease categories, while carefully avoiding Adversarial configurations that can introduce artificial doubt and reduce accuracy.

Key insights

Multi-agent LLM architectures do not guarantee improved diagnostic accuracy; topology selection is critical.

Principles

Increased system complexity does not guarantee better reasoning.
Adversarial topologies can degrade diagnostic performance.

Method

The study introduced a "Reasoning Gap" metric to quantify the difference between internal knowledge retrieval and final diagnostic accuracy in LLM-based diagnostic systems.

In practice

Consider Hierarchical or Collaborative topologies for diagnosis.
Avoid Adversarial topologies for critical diagnostic tasks.

Topics

Multi-agent LLMs
Rare Disease Diagnosis
LLM Architectures
Diagnostic Accuracy
Reasoning Gap Metric

Best for: AI Researcher, AI Scientist, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.MA updates on arXiv.org.