The Rise of Verbal Tics in Large Language Models: A Systematic Analysis Across Frontier Models
Summary
A systematic analysis across eight frontier Large Language Models (LLMs) reveals a pervasive phenomenon of "verbal tics"—repetitive, formulaic linguistic patterns in model outputs. Researchers evaluated GPT-5.4, Claude Opus 4.7, Gemini 3.1 Pro, Grok 4.2, Doubao-Seed-2.0-pro, Kimi K2.5, DeepSeek V3.2, and MiMo-V2-Pro using a custom API-based framework, analyzing 160,000 responses from 10,000 prompts in English and Chinese across 10 task categories. The study introduces the Verbal Tic Index (VTI), a composite metric, finding Gemini 3.1 Pro has the highest VTI (0.590) and DeepSeek V3.2 the lowest (0.295). Verbal tics accumulate over multi-turn conversations, are amplified in subjective tasks, and show distinct cross-lingual patterns. Human evaluation ($N=120$) confirmed a strong inverse relationship between sycophancy and perceived naturalness ($r=-0.87$, $p<0.001$), highlighting an "alignment tax" in current LLM training.
Key takeaway
For AI Engineers and Product Managers developing conversational LLM applications, you should prioritize models with lower Verbal Tic Index (VTI) scores, such as DeepSeek V3.2 or Claude Opus 4.7, to enhance user trust and perceived naturalness. Be aware that subjective tasks and multi-turn conversations significantly amplify tic rates, potentially degrading user experience. Implement strategies to mitigate tic accumulation, especially in long-form interactions, to avoid the "alignment tax" on linguistic authenticity.
Key insights
LLMs exhibit pervasive verbal tics, an "alignment tax" impacting naturalness and trust due to current training paradigms.
Principles
- Verbal tics accumulate in multi-turn conversations.
- Sycophancy inversely correlates with perceived naturalness.
- Cultural norms influence cross-lingual tic patterns.
Method
The Verbal Tic Index (VTI) quantifies tic prevalence using TicRate, normalized Type-Token Ratio, Sycophancy Score, and Repetition Rate, weighted to maximize correlation with human judgments.
In practice
- Use objective tasks to minimize LLM verbal tics.
- Monitor tic accumulation in multi-turn interactions.
- Consider model-specific "tic signatures" for deployment.
Topics
- Large Language Models
- Verbal Tics
- Sycophancy
- RLHF
- Alignment Tax
Code references
Best for: AI Engineer, Research Scientist, AI Product Manager, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.