G-IdiomAlign: A Gloss-Pivoted Benchmark for Cross-Lingual Idiom Alignment
Summary
G-IdiomAlign introduces a novel gloss-pivoted benchmark designed to address the challenges of cross-lingual idiom transfer, where idioms are anchored by English glosses from Wiktionary. The benchmark includes a high-confidence reference alignment set and supports two protocols: Multiple-Choice Idiom Equivalence with typed distractors for error attribution, and Gloss-Contrastive Generation. Across various Large Language Models (LLMs), a dominant failure mode is a bias towards literal translation, particularly for low-resource languages. Glosses consistently enhance Gloss-Contrastive Generation performance under an embedding-based semantic proxy, though overall performance remains modest. Further analysis on Qwen3-8B indicates that cross-condition differences are concentrated in attention heads more than in layers, with better gloss-inclusive generations correlating with stronger gloss anchoring.
Key takeaway
For NLP engineers developing cross-lingual models, you should integrate semantic glosses to mitigate literal translation bias, especially for low-resource languages. Consider using benchmarks like G-IdiomAlign to evaluate model performance and identify specific failure modes. Your focus should be on improving gloss anchoring within attention mechanisms to enhance idiom translation accuracy and non-compositionality.
Key insights
G-IdiomAlign is a gloss-pivoted benchmark revealing LLM bias towards literal idiom translation, improved by semantic glosses.
Principles
- Idioms are difficult to transfer cross-lingually.
- LLMs exhibit a bias to literal idiom translation.
- Glosses consistently improve idiom generation.
Method
G-IdiomAlign uses English glosses from Wiktionary to anchor idioms, supporting Multiple-Choice Idiom Equivalence and Gloss-Contrastive Generation protocols for evaluation.
In practice
- Use glosses to improve cross-lingual idiom generation.
- Evaluate LLMs for literal translation bias.
- Analyze attention heads for gloss anchoring.
Topics
- Cross-Lingual Idiom Alignment
- G-IdiomAlign Benchmark
- Large Language Models
- Semantic Glosses
- Low-Resource Languages
- Qwen3-8B Analysis
Best for: Research Scientist, AI Scientist, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.