G-IdiomAlign: A Gloss-Pivoted Benchmark for Cross-Lingual Idiom Alignment

2026-06-18 · Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Advanced, extended

Summary

G-IdiomAlign is a new gloss-pivoted benchmark designed to evaluate cross-lingual idiom alignment across large language models. It comprises 18,785 idiom pairs spanning 36 language pairs, including nine core languages and four additional languages, with each idiom anchored by an English gloss from Wiktionary. The benchmark supports two evaluation protocols: a Multiple-Choice Idiom Equivalence task with typed distractors for error attribution, and a Gloss-Contrastive Generation task comparing No-gloss and With-gloss inputs. Experiments with models like DeepSeek-V3.2, Gemini-2.5-Pro, and Qwen3-8B consistently show a bias towards literal translation, particularly in low-resource languages. While explicit glosses improve generation performance, overall accuracy remains modest. Attention-based diagnostics on Qwen3-8B indicate that successful gloss-aided generations correlate with stronger gloss anchoring in attention heads.

Key takeaway

For NLP Engineers developing cross-lingual LLM applications, you should recognize that current models exhibit a strong literal translation bias for idioms. To improve performance, explicitly integrate semantic pivots like English glosses during translation tasks. Your evaluation protocols should include diagnostic benchmarks, such as multiple-choice tasks with typed distractors, to pinpoint specific failure modes and measure the impact of semantic grounding on figurative meaning transfer.

Key insights

Cross-lingual idiom alignment challenges LLMs due to non-compositionality, with glosses offering semantic grounding but literal bias remaining dominant.

Principles

Idioms are non-compositional and culturally grounded.
LLMs exhibit a strong literal translation bias.
English glosses provide a robust semantic pivot.

Method

G-IdiomAlign constructs idiom pairs by extracting Wiktionary glosses, retrieving top-k candidates in an embedding space, enforcing mutual nearest neighbor (MNN) agreement, and applying distribution-aware filtering for high-confidence alignment.

In practice

Integrate explicit glosses for idiom translation.
Use typed distractors for fine-grained error analysis.
Apply attention diagnostics to trace gloss anchoring.

Topics

Cross-Lingual Idiom Alignment
Large Language Models
NLP Benchmarking
Semantic Grounding
Wiktionary
Attention Mechanisms

Code references

Best for: Research Scientist, AI Scientist, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.