Enhancing Decision-Making with Large Language Models through Multi-Agent Fictitious Play
Summary
Multi-Agent Fictitious Play (MAFP) is a novel paradigm addressing "stance entanglement" in LLM-based multi-agent decision-making, a challenge where stakeholder decisions are mutually dependent. Unlike existing systems that handle "execution complexity," MAFP formulates decision-making as an equilibrium-seeking process. It iteratively updates each agent's decision by best-responding to the empirical mixture of other agents' past decisions, inspired by game theory's fictitious play. Evaluated across 13 competitive game and negotiation scenarios, MAFP outperformed single-round and multi-round baselines, including CoT, Self-Reflection, Debate, and Theory-of-Mind. It achieved the highest average tournament strength (0.533) and robustness (0.421), demonstrating superior effectiveness in complex scenarios with imperfect information or mixed-strategy equilibria, using Qwen3.5-35B-A3B as the action model.
Key takeaway
For AI Architects designing LLM-based multi-agent systems for strategic decision-making, you should consider implementing Multi-Agent Fictitious Play (MAFP). This framework effectively addresses "stance entanglement" by iteratively refining agent policies against empirical opponent mixtures, leading to more robust and less exploitable outcomes. Your systems will achieve superior performance in complex, real-world scenarios involving imperfect information or mixed-strategy equilibria, moving beyond the limitations of single-chain reasoning and traditional divide-and-conquer methods.
Key insights
Iterative fictitious play with LLM agents resolves "stance entanglement" in complex decision-making by decomposing recursive anticipation into sequential best responses.
Principles
- Decision complexity arises from "stance entanglement" among interdependent stakeholders.
- Robust decisions are equilibrium-seeking, maximizing payoff without exploitable weaknesses.
- Fictitious play decomposes recursive anticipation into iterative best responses.
Method
MAFP initializes agents with policies, then iteratively aggregates opponents' past policies into an empirical mixture, and generates best-response policies against this mixture.
In practice
- Apply MAFP to competitive games and negotiation scenarios.
- Evaluate policies using "tournament strength" and "robustness" metrics.
- Utilize Qwen3.5-35B-A3B as the action model for policy execution.
Topics
- Large Language Models
- Multi-Agent Systems
- Game Theory
- Fictitious Play
- Decision-Making
- Stance Entanglement
- Strategic Reasoning
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Architect
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.