G-IdiomAlign: A Gloss-Pivoted Benchmark for Cross-Lingual Idiom Alignment
Summary
G-IdiomAlign is a new gloss-pivoted benchmark designed to evaluate Large Language Models' (LLMs) ability to perform cross-lingual idiom alignment, addressing the challenge of idioms' non-compositionality. This benchmark anchors each idiom with an English gloss from Wiktionary and includes a high-confidence reference alignment set for reproducible evaluation. It supports two protocols: a Multiple-Choice Idiom Equivalence (MCIE) task with typed distractors for error attribution, and a Gloss-Contrastive Generation (GCG) task that compares No-gloss and With-gloss inputs to isolate the impact of an explicit semantic pivot. Across diverse LLMs, a dominant failure mode is a bias towards literal translation, particularly for low-resource languages. While glosses consistently improve GCG performance under an embedding-based semantic proxy, overall performance remains modest, suggesting significant room for improvement. Further analysis on Qwen3-8B indicates that cross-condition differences are concentrated in attention heads more than layers, with stronger gloss anchoring correlating with better With-gloss generations.
Key takeaway
For NLP Engineers developing multilingual LLMs, you should prioritize addressing the persistent literal translation bias in idiom handling, especially for low-resource languages. Integrate explicit semantic glosses into your training or inference pipelines to improve cross-lingual idiom alignment. Consider using benchmarks like G-IdiomAlign's MCIE and GCG protocols to rigorously evaluate and attribute errors in your models' idiom translation capabilities, focusing on attention head mechanisms for targeted improvements.
Key insights
Cross-lingual idiom alignment in LLMs benefits from explicit semantic glosses, yet literal translation bias remains a significant challenge.
Principles
- Idioms' non-compositionality hinders cross-lingual transfer.
- LLMs exhibit a strong bias for literal idiom translation.
- Explicit semantic glosses can improve idiom alignment.
Method
G-IdiomAlign uses a gloss-pivoted benchmark with two protocols: Multiple-Choice Idiom Equivalence (MCIE) for error attribution and Gloss-Contrastive Generation (GCG) to isolate semantic pivot effects.
In practice
- Anchor idiom translations with semantic glosses.
- Analyze attention heads for cross-lingual idiom improvements.
- Employ MCIE and GCG for idiom alignment evaluation.
Topics
- Cross-Lingual Alignment
- Idiom Translation
- Large Language Models
- NLP Benchmarks
- Semantic Glosses
- Low-Resource NLP
Best for: Research Scientist, AI Scientist, NLP Engineer, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.