Enhancing Decision-Making with Large Language Models through Multi-Agent Fictitious Play

· Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Expert, extended

Summary

Multi-Agent Fictitious Play (MAFP) is a novel paradigm addressing "stance entanglement" in LLM-based multi-agent decision-making, a challenge where stakeholder decisions are mutually dependent. Unlike existing systems that handle "execution complexity," MAFP formulates decision-making as an equilibrium-seeking process. It iteratively updates each agent's decision by best-responding to the empirical mixture of other agents' past decisions, inspired by game theory's fictitious play. Evaluated across 13 competitive game and negotiation scenarios, MAFP outperformed single-round and multi-round baselines, including CoT, Self-Reflection, Debate, and Theory-of-Mind. It achieved the highest average tournament strength (0.533) and robustness (0.421), demonstrating superior effectiveness in complex scenarios with imperfect information or mixed-strategy equilibria, using Qwen3.5-35B-A3B as the action model.

Key takeaway

For AI Architects designing LLM-based multi-agent systems for strategic decision-making, you should consider implementing Multi-Agent Fictitious Play (MAFP). This framework effectively addresses "stance entanglement" by iteratively refining agent policies against empirical opponent mixtures, leading to more robust and less exploitable outcomes. Your systems will achieve superior performance in complex, real-world scenarios involving imperfect information or mixed-strategy equilibria, moving beyond the limitations of single-chain reasoning and traditional divide-and-conquer methods.

Key insights

Iterative fictitious play with LLM agents resolves "stance entanglement" in complex decision-making by decomposing recursive anticipation into sequential best responses.

Principles

Method

MAFP initializes agents with policies, then iteratively aggregates opponents' past policies into an empirical mixture, and generates best-response policies against this mixture.

In practice

Topics

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Architect

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.