LVLMs and Humans Ground Differently in Referential Communication

· Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Social Sciences & Behavioral Studies, Research Methodology & Innovation · Depth: Expert, extended

Summary

A referential communication experiment investigated how Large Vision Language Models (LVLMs) like OpenAI's GPT-5.2 compare to humans in establishing common ground during multi-turn interactions. The study employed a factorial design, creating human-human, human-AI, AI-human, and AI-AI director-matcher pairs, and analyzed a corpus of 356 dialogues over four rounds. Human-human pairs consistently improved accuracy (from 80% to over 90%) and efficiency, reducing words and turns. In contrast, AI-AI pairs started with high accuracy (90%) but declined, showing no efficiency gains or common ground formation. Mixed pairs also struggled, with human-AI showing the lowest initial accuracy and AI-human experiencing precipitous declines. The findings highlight LVLMs' inability to adapt communication or track common ground.

Key takeaway

For AI Scientists developing collaborative agents, recognize that current LVLMs like GPT-5.2 struggle significantly with establishing common ground and adapting communication over multiple turns. This deficit leads to decreased accuracy and efficiency in human-AI interactions, particularly when the AI takes initiative. You should prioritize research into models that can genuinely learn from dialogue history and form conceptual pacts to prevent user frustration and task failures in real-world applications.

Key insights

LVLMs fail to build common ground and adapt communication in multi-turn referential tasks, unlike humans.

Principles

Method

A referential communication task with director-matcher pairs (human/AI) over four rounds, identifying non-lexicalized objects, measuring accuracy, effort, and lexical entrainment.

In practice

Topics

Best for: AI Scientist, Research Scientist, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.