DPBench: Structural Determinants of Multi-Agent LLM Coordination Under Simultaneous Resource Contention

· Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems, Software Development & Engineering · Depth: Expert, extended

Summary

DPBench, a new open-source benchmark, evaluates large language model coordination in multi-agent systems under simultaneous resource contention, using the classic Dining Philosophers problem. Experiments with GPT-5.2, Claude Opus 4.5, and Grok 4.1 reveal a significant challenge: LLMs coordinate effectively in sequential decision-making, with GPT-5.2 achieving 0% deadlock, but fail dramatically when decisions are simultaneous, leading to deadlock rates exceeding 95% in some conditions. This failure stems from "convergent reasoning," where agents independently adopt identical strategies that, when executed concurrently, guarantee deadlock. Counterintuitively, enabling communication between agents does not improve coordination and can even increase deadlock rates, due to delayed messages and inconsistent message-action alignment. The findings suggest that multi-agent LLM systems requiring concurrent resource access should rely on external coordination mechanisms rather than emergent coordination.

Key takeaway

For AI Architects designing multi-agent LLM systems that require concurrent resource access, you must implement explicit external coordination mechanisms. Relying on emergent coordination or inter-agent communication for simultaneous decisions will likely lead to high failure rates, as current LLMs exhibit "convergent reasoning" causing deadlocks. Prioritize sequential decision protocols where possible, or integrate robust arbitration to manage shared resources effectively and prevent system-wide failures.

Key insights

LLMs struggle with simultaneous multi-agent coordination due to "convergent reasoning," even with communication.

Principles

Method

DPBench evaluates LLM coordination using the Dining Philosophers problem across 8 conditions varying decision timing, group size (3 or 5 agents), and communication. It measures deadlock rate, throughput, and fairness.

In practice

Topics

Code references

Best for: Research Scientist, CTO, VP of Engineering/Data, AI Scientist, Machine Learning Engineer, AI Architect

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.