Lost in the Tower of Babel: The Adverse Effects of Incidental Multilingualism in LLMs

· Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Expert, quick

Summary

A paper submitted on May 2, 2026, titled "Lost in the Tower of Babel: The Adverse Effects of Incidental Multilingualism in LLMs," argues that current large language models (LLMs) exhibit fragile and misleading multilingual capabilities. The authors, Anjishnu Mukherjee, Chutong Meng, and Antonios Anastasopoulos, contend that LLMs appear multilingual due to training on vast, uneven web corpora rather than intentional design for multilingual competence. This incidental approach leads to unequal, brittle, and opaque behavior across languages, posing severe consequences for real-world deployments requiring cross-linguistic reasoning. The study empirically investigates which languages models claim to support versus those they actually respond in, demonstrating how simple language-change attacks expose these failures and hidden assumptions within LLM systems.

Key takeaway

For research scientists developing or deploying multilingual LLMs, you should critically evaluate models for "multilingualism by design" rather than relying on incidental capabilities. Prioritize equitable multilingual performance and cultural grounding as first-class goals in your model pipeline to avoid brittle cross-lingual behavior and hidden language assumptions in real-world applications.

Key insights

Incidental multilingualism in LLMs leads to brittle, unequal, and opaque cross-lingual performance.

Principles

Method

The study empirically investigates LLM self-reported language support versus actual response languages, using language-change attacks to expose failures.

In practice

Topics

Best for: Research Scientist, AI Scientist, NLP Engineer, AI Ethicist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.