When Reasoning Models Hurt Behavioral Simulation: A Solver-Sampler Mismatch in Multi-Agent LLM Negotiation

· Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems, Social Sciences & Behavioral Studies · Depth: Expert, extended

Summary

This study investigates the "solver-sampler mismatch" in multi-agent large language model (LLM) simulations, challenging the assumption that stronger reasoning always improves simulation fidelity. The authors argue that when the objective is to simulate plausible boundedly rational behavior, reasoning-enhanced models can become better strategic solvers but worse behavioral samplers. These models may over-optimize, leading to rigid, authority-driven outcomes and reduced diversity, a phenomenon termed "diversity-without-fidelity." The research compares three reflection conditions (no reflection, bounded reflection, native reasoning) across Gemini 3.1 Flash Lite Preview and DeepSeek V3.2, and extends to OpenAI's GPT-4.1 and GPT-5.2. Experiments in trading-limits and grid-curtailment scenarios, totaling 495 runs, consistently show that bounded reflection yields more diverse and compromise-oriented trajectories, while native reasoning often results in rigid outcomes and operational noise. For instance, GPT-5.2 native produced authority decisions in 45 of 45 runs, whereas GPT-5.2 bounded recovered compromise outcomes in every environment.

Key takeaway

For Machine Learning Engineers developing multi-agent simulations, recognize that optimizing LLMs for raw reasoning power can hinder their ability to simulate realistic, boundedly rational human behavior. You should integrate bounded reflection mechanisms, such as structured private ledgers, to encourage diverse, compromise-oriented trajectories and improve simulation fidelity, rather than relying solely on native reasoning capabilities which often lead to rigid, over-optimized outcomes.

Key insights

Stronger LLM reasoning can degrade behavioral simulation fidelity by over-optimizing for strategic dominance, leading to rigid outcomes.

Principles

Method

The study uses multi-agent negotiation experiments with three reflection conditions (no reflection, bounded reflection, native reasoning) and evaluates behavioral sampler fidelity using action entropy, concession arc rate, and max-turn exhaustion rate.

In practice

Topics

Best for: AI Scientist, Research Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.