G-IdiomAlign: A Gloss-Pivoted Benchmark for Cross-Lingual Idiom Alignment

2026-06-17 · Source: Takara TLDR - Daily AI Papers · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Natural Language Processing · Depth: Expert, medium

Summary

G-IdiomAlign is a new gloss-pivoted benchmark designed to evaluate Large Language Models' (LLMs) ability to perform cross-lingual idiom alignment, addressing the challenge of idioms' non-compositionality. This benchmark anchors each idiom with an English gloss from Wiktionary and includes a high-confidence reference alignment set for reproducible evaluation. It supports two protocols: a Multiple-Choice Idiom Equivalence (MCIE) task with typed distractors for error attribution, and a Gloss-Contrastive Generation (GCG) task that compares No-gloss and With-gloss inputs to isolate the impact of an explicit semantic pivot. Across diverse LLMs, a dominant failure mode is a bias towards literal translation, particularly for low-resource languages. While glosses consistently improve GCG performance under an embedding-based semantic proxy, overall performance remains modest, suggesting significant room for improvement. Further analysis on Qwen3-8B indicates that cross-condition differences are concentrated in attention heads more than layers, with stronger gloss anchoring correlating with better With-gloss generations.

Key takeaway

For NLP Engineers developing multilingual LLMs, you should prioritize addressing the persistent literal translation bias in idiom handling, especially for low-resource languages. Integrate explicit semantic glosses into your training or inference pipelines to improve cross-lingual idiom alignment. Consider using benchmarks like G-IdiomAlign's MCIE and GCG protocols to rigorously evaluate and attribute errors in your models' idiom translation capabilities, focusing on attention head mechanisms for targeted improvements.

Key insights

Cross-lingual idiom alignment in LLMs benefits from explicit semantic glosses, yet literal translation bias remains a significant challenge.

Principles

Idioms' non-compositionality hinders cross-lingual transfer.
LLMs exhibit a strong bias for literal idiom translation.
Explicit semantic glosses can improve idiom alignment.

Method

G-IdiomAlign uses a gloss-pivoted benchmark with two protocols: Multiple-Choice Idiom Equivalence (MCIE) for error attribution and Gloss-Contrastive Generation (GCG) to isolate semantic pivot effects.

In practice

Anchor idiom translations with semantic glosses.
Analyze attention heads for cross-lingual idiom improvements.
Employ MCIE and GCG for idiom alignment evaluation.

Topics

Cross-Lingual Alignment
Idiom Translation
Large Language Models
NLP Benchmarks
Semantic Glosses
Low-Resource NLP

Best for: Research Scientist, AI Scientist, NLP Engineer, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.