CollabBench: Benchmarking and Unleashing Collaborative Ability of LLMs with Diverse Players via Proactive Engagement

2026-06-06 · Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Expert, quick

Summary

CollabBench is a new benchmark designed to evaluate and train large language model (LLM) agents for effective collaboration with diverse human partners in cooperative game environments. Addressing the limitations of existing conversation-level studies that lack grounded interaction, CollabBench introduces a Diverse Player Profile Simulation pipeline to model varied player behaviors. It also features a Collaborative Agentic Training paradigm, which unifies reasoning, communication, and action through agentic rollouts, optimized by a hybrid reward system balancing task efficiency and affective adaptation. The benchmark extends classic environments like CWAH-MultiPlayer and Cook-MultiPlayer for systematic evaluation. Experimental results demonstrate that models trained with CollabBench significantly outperform base models, achieving 19.5% higher efficiency and 24.4% improved affective performance, offering insights into current collaborative limitations. This work was accepted by ICML 2026.

Key takeaway

For AI Engineers developing collaborative LLM agents, CollabBench offers a critical tool to move beyond individual task excellence. You should consider integrating its Diverse Player Profile Simulation and Collaborative Agentic Training paradigm to enhance agent performance. This approach can significantly improve your agents' efficiency and affective adaptation, as demonstrated by 19.5% and 24.4% gains, respectively, when interacting with diverse human partners in complex cooperative settings.

Key insights

CollabBench provides a benchmark and training paradigm to enhance LLM agents' collaborative ability with diverse human partners in cooperative games.

Principles

LLM agents need grounded interaction for effective collaboration.
Diverse player profiles are crucial for realistic simulation.
Hybrid rewards can balance task efficiency and affective adaptation.

Method

CollabBench uses a Diverse Player Profile Simulation and a Collaborative Agentic Training paradigm, unifying reasoning, communication, and action via agentic rollouts with a hybrid reward.

In practice

Evaluate LLM agents in cooperative game environments.
Train agents for improved efficiency and affective performance.
Simulate varied human player behaviors for robust testing.

Topics

CollabBench
LLM Agents
Cooperative Games
Human-AI Collaboration
Agentic Training
Performance Benchmarking

Best for: Research Scientist, NLP Engineer, AI Scientist, Machine Learning Engineer, AI Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.