G-IdiomAlign: A Gloss-Pivoted Benchmark for Cross-Lingual Idiom Alignment

2026-06-17 · Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Natural Language Processing · Depth: Expert, quick

Summary

G-IdiomAlign introduces a novel gloss-pivoted benchmark designed to address the challenges of cross-lingual idiom transfer, where idioms are anchored by English glosses from Wiktionary. The benchmark includes a high-confidence reference alignment set and supports two protocols: Multiple-Choice Idiom Equivalence with typed distractors for error attribution, and Gloss-Contrastive Generation. Across various Large Language Models (LLMs), a dominant failure mode is a bias towards literal translation, particularly for low-resource languages. Glosses consistently enhance Gloss-Contrastive Generation performance under an embedding-based semantic proxy, though overall performance remains modest. Further analysis on Qwen3-8B indicates that cross-condition differences are concentrated in attention heads more than in layers, with better gloss-inclusive generations correlating with stronger gloss anchoring.

Key takeaway

For NLP engineers developing cross-lingual models, you should integrate semantic glosses to mitigate literal translation bias, especially for low-resource languages. Consider using benchmarks like G-IdiomAlign to evaluate model performance and identify specific failure modes. Your focus should be on improving gloss anchoring within attention mechanisms to enhance idiom translation accuracy and non-compositionality.

Key insights

G-IdiomAlign is a gloss-pivoted benchmark revealing LLM bias towards literal idiom translation, improved by semantic glosses.

Principles

Idioms are difficult to transfer cross-lingually.
LLMs exhibit a bias to literal idiom translation.
Glosses consistently improve idiom generation.

Method

G-IdiomAlign uses English glosses from Wiktionary to anchor idioms, supporting Multiple-Choice Idiom Equivalence and Gloss-Contrastive Generation protocols for evaluation.

In practice

Use glosses to improve cross-lingual idiom generation.
Evaluate LLMs for literal translation bias.
Analyze attention heads for gloss anchoring.

Topics

Cross-Lingual Idiom Alignment
G-IdiomAlign Benchmark
Large Language Models
Semantic Glosses
Low-Resource Languages
Qwen3-8B Analysis

Best for: Research Scientist, AI Scientist, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.