G-IdiomAlign: A Gloss-Pivoted Benchmark for Cross-Lingual Idiom Alignment

· Source: Takara TLDR - Daily AI Papers · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Natural Language Processing · Depth: Expert, medium

Summary

G-IdiomAlign is a new gloss-pivoted benchmark designed to evaluate Large Language Models' (LLMs) ability to perform cross-lingual idiom alignment, addressing the challenge of idioms' non-compositionality. This benchmark anchors each idiom with an English gloss from Wiktionary and includes a high-confidence reference alignment set for reproducible evaluation. It supports two protocols: a Multiple-Choice Idiom Equivalence (MCIE) task with typed distractors for error attribution, and a Gloss-Contrastive Generation (GCG) task that compares No-gloss and With-gloss inputs to isolate the impact of an explicit semantic pivot. Across diverse LLMs, a dominant failure mode is a bias towards literal translation, particularly for low-resource languages. While glosses consistently improve GCG performance under an embedding-based semantic proxy, overall performance remains modest, suggesting significant room for improvement. Further analysis on Qwen3-8B indicates that cross-condition differences are concentrated in attention heads more than layers, with stronger gloss anchoring correlating with better With-gloss generations.

Key takeaway

For NLP Engineers developing multilingual LLMs, you should prioritize addressing the persistent literal translation bias in idiom handling, especially for low-resource languages. Integrate explicit semantic glosses into your training or inference pipelines to improve cross-lingual idiom alignment. Consider using benchmarks like G-IdiomAlign's MCIE and GCG protocols to rigorously evaluate and attribute errors in your models' idiom translation capabilities, focusing on attention head mechanisms for targeted improvements.

Key insights

Cross-lingual idiom alignment in LLMs benefits from explicit semantic glosses, yet literal translation bias remains a significant challenge.

Principles

Method

G-IdiomAlign uses a gloss-pivoted benchmark with two protocols: Multiple-Choice Idiom Equivalence (MCIE) for error attribution and Gloss-Contrastive Generation (GCG) to isolate semantic pivot effects.

In practice

Topics

Best for: Research Scientist, AI Scientist, NLP Engineer, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.