G-IdiomAlign: A Gloss-Pivoted Benchmark for Cross-Lingual Idiom Alignment

· Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Advanced, extended

Summary

G-IdiomAlign is a new gloss-pivoted benchmark designed to evaluate cross-lingual idiom alignment across large language models. It comprises 18,785 idiom pairs spanning 36 language pairs, including nine core languages and four additional languages, with each idiom anchored by an English gloss from Wiktionary. The benchmark supports two evaluation protocols: a Multiple-Choice Idiom Equivalence task with typed distractors for error attribution, and a Gloss-Contrastive Generation task comparing No-gloss and With-gloss inputs. Experiments with models like DeepSeek-V3.2, Gemini-2.5-Pro, and Qwen3-8B consistently show a bias towards literal translation, particularly in low-resource languages. While explicit glosses improve generation performance, overall accuracy remains modest. Attention-based diagnostics on Qwen3-8B indicate that successful gloss-aided generations correlate with stronger gloss anchoring in attention heads.

Key takeaway

For NLP Engineers developing cross-lingual LLM applications, you should recognize that current models exhibit a strong literal translation bias for idioms. To improve performance, explicitly integrate semantic pivots like English glosses during translation tasks. Your evaluation protocols should include diagnostic benchmarks, such as multiple-choice tasks with typed distractors, to pinpoint specific failure modes and measure the impact of semantic grounding on figurative meaning transfer.

Key insights

Cross-lingual idiom alignment challenges LLMs due to non-compositionality, with glosses offering semantic grounding but literal bias remaining dominant.

Principles

Method

G-IdiomAlign constructs idiom pairs by extracting Wiktionary glosses, retrieving top-k candidates in an embedding space, enforcing mutual nearest neighbor (MNN) agreement, and applying distribution-aware filtering for high-confidence alignment.

In practice

Topics

Code references

Best for: Research Scientist, AI Scientist, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.