Implicit vs. Explicit Prompting Strategies for LVLMs in Referential Communication

2026-06-18 · Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Social Sciences & Behavioral Studies · Depth: Expert, long

Summary

A recent study investigated conflicting findings regarding large vision-language models' (LVLMs) ability to coordinate efficient referring expressions, specifically comparing implicit and explicit prompting styles. Replicating prior work, the research found that LVLMs, including GPT-5.2 and GPT-5.5, can achieve human-like communicative efficiency when explicitly instructed. For instance, GPT-5.5's referring expression length dropped from 58.8 words to 32.7 words over five rounds with explicit prompts, maintaining 97.5% accuracy and 1.00 lexical overlap. Conversely, models given implicit, pragmatically informed prompts remained verbose (GPT-5.2 averaged 1250.7 words/round, GPT-5.5 710.4 words/round), failing to spontaneously shorten expressions. This indicates that prompting style, rather than model version or task differences, drives the observed divergence in LVLM communicative behavior.

Key takeaway

For prompt engineers designing LVLM interactions for efficient communication, you must prioritize explicit, direct instructions over general pragmatic principles. Your models, such as GPT-5.5, will only reliably shorten referring expressions and achieve human-like entrainment when forcefully commanded to do so. Be mindful that extreme brevity, as seen with GPT-5.2, can sometimes lead to a slight drop in accuracy, so balance conciseness with task performance.

Key insights

LVLMs require explicit, forceful prompting to achieve human-like communicative efficiency and lexical entrainment, not implicit pragmatic cues.

Principles

Explicit prompts are critical for LVLM brevity.
Implicit pragmatic instructions do not induce entrainment.
LVLM "conceptual pacts" differ from human common ground.

Method

This study used a multi-round, multi-turn collaborative object-matching task, comparing explicit and implicit prompt designs for GPT-5.2 and GPT-5.5 in AI-AI pairs, based on Zeng et al. (2026)'s pipeline.

In practice

Use direct commands for LVLM brevity.
Monitor accuracy-brevity tradeoffs carefully.
Do not rely on implicit pragmatic cues.

Topics

Large Vision-Language Models
Prompt Engineering
Referential Communication
Communicative Efficiency
Lexical Entrainment
Human-AI Interaction

Best for: AI Engineer, Machine Learning Engineer, NLP Engineer, AI Scientist, Research Scientist, Prompt Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.