Implicit vs. Explicit Prompting Strategies for LVLMs in Referential Communication
Summary
A recent study investigated conflicting findings regarding large vision-language models' (LVLMs) ability to coordinate efficient referring expressions, specifically comparing implicit and explicit prompting styles. Replicating prior work, the research found that LVLMs, including GPT-5.2 and GPT-5.5, can achieve human-like communicative efficiency when explicitly instructed. For instance, GPT-5.5's referring expression length dropped from 58.8 words to 32.7 words over five rounds with explicit prompts, maintaining 97.5% accuracy and 1.00 lexical overlap. Conversely, models given implicit, pragmatically informed prompts remained verbose (GPT-5.2 averaged 1250.7 words/round, GPT-5.5 710.4 words/round), failing to spontaneously shorten expressions. This indicates that prompting style, rather than model version or task differences, drives the observed divergence in LVLM communicative behavior.
Key takeaway
For prompt engineers designing LVLM interactions for efficient communication, you must prioritize explicit, direct instructions over general pragmatic principles. Your models, such as GPT-5.5, will only reliably shorten referring expressions and achieve human-like entrainment when forcefully commanded to do so. Be mindful that extreme brevity, as seen with GPT-5.2, can sometimes lead to a slight drop in accuracy, so balance conciseness with task performance.
Key insights
LVLMs require explicit, forceful prompting to achieve human-like communicative efficiency and lexical entrainment, not implicit pragmatic cues.
Principles
- Explicit prompts are critical for LVLM brevity.
- Implicit pragmatic instructions do not induce entrainment.
- LVLM "conceptual pacts" differ from human common ground.
Method
This study used a multi-round, multi-turn collaborative object-matching task, comparing explicit and implicit prompt designs for GPT-5.2 and GPT-5.5 in AI-AI pairs, based on Zeng et al. (2026)'s pipeline.
In practice
- Use direct commands for LVLM brevity.
- Monitor accuracy-brevity tradeoffs carefully.
- Do not rely on implicit pragmatic cues.
Topics
- Large Vision-Language Models
- Prompt Engineering
- Referential Communication
- Communicative Efficiency
- Lexical Entrainment
- Human-AI Interaction
Best for: AI Engineer, Machine Learning Engineer, NLP Engineer, AI Scientist, Research Scientist, Prompt Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.