SOTOPIA-TOM: Evaluating Information Management in Multi-Agent Interaction with Theory of Mind
Summary
Sotopia-ToM is a new multi-dimensional benchmarking framework designed to evaluate Large Language Model (LLM) agents' ability to manage information asymmetry and privacy in multi-party interactions. It features an environment supporting both public and private communication channels and includes 160 human-reviewed scenarios across eight industry sectors, each involving 3 to 5 agents with partitioned private knowledge and channel-dependent sharing policies. The framework employs a multi-dimensional evaluation suite to assess information sharing, detail seeking, coordination efficiency, and privacy protection, aggregated into a composite InfoMgmt metric. Empirical results across 6 LLM backbones and various prompting strategies (vanilla, CoT-privacy, and ToM-based interventions) show that even the largest high-reasoning model, GPT-5, achieves only a 62% InfoMgmt score, highlighting persistent deficiencies in information-seeking and privacy-aware decision-making. ToM-based interventions, like ToM-Coach, consistently improve the coordination-privacy balance, for example, reducing critical privacy violations on GPT-4o from 9.9% to 2.2% and increasing the InfoMgmt score from 15% to 40%.
Key takeaway
For research scientists and CTOs developing multi-agent LLM systems, this work indicates that current models, even GPT-5, significantly underperform in complex information management and privacy-aware coordination. You should prioritize research into strategic disclosure planning and advanced inquiry mechanisms, as these remain critical bottlenecks. Consider integrating ToM-based reasoning, specifically ToM-Coach or ToM-Belief, to enhance privacy protection and overall coordination, but be aware that fundamental limitations persist, particularly in proactive information seeking.
Key insights
LLM agents struggle with information management and privacy in multi-party interactions, even with advanced ToM prompting.
Principles
- Information asymmetry requires careful disclosure management.
- Theory of Mind interventions improve LLM coordination-privacy balance.
- Inquiry alignment is a significant bottleneck for LLM agents.
Method
Sotopia-ToM uses a multi-stage pipeline to generate 160 human-reviewed scenarios, an N-agent simulator with public/private channels, and a four-metric evaluation suite (DA, IA, CPV, EFF) aggregated into an InfoMgmt score.
In practice
- Use ToM-based prompting to reduce privacy violations.
- Focus on improving LLM inquiry strategies.
- Test agents in multi-party, channel-sensitive environments.
Topics
- Sotopia-ToM Benchmark
- Multi-Agent LLM Systems
- Theory of Mind (ToM) Interventions
- Information Asymmetry
- Privacy Management
Code references
Best for: Research Scientist, CTO, VP of Engineering/Data, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.MA updates on arXiv.org.