When Similar Means Different: Evaluating LLMs on Arabic--Hebrew Cognates

· Source: Computation and Language · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Natural Language Processing · Depth: Expert, quick

Summary

Researchers introduced SemCog Bench, a new benchmark comprising 1,858 Arabic--Hebrew word pairs with sentence-level annotations, to evaluate large language models' cross-lingual semantic understanding. This benchmark specifically targets cognate identification and semantic disambiguation, addressing challenges posed by true cognates, false friends, and modern loanwords between these closely related Semitic languages. Evaluations of open-source and commercial LLMs, using various input representations including raw, diacritized, Romanized, and phonetic forms, revealed a significant gap in cross-lingual reasoning. Models performed well on true cognates but showed a sharp decline in accuracy on false friends and loanwords, indicating a strong dependence on surface-form similarity. Furthermore, incorporating sentence-level context offered only modest improvements, suggesting it is insufficient to overcome misleading form-based signals. These findings highlight a fundamental limitation in current LLMs' ability to resolve cross-lingual form-meaning conflicts.

Key takeaway

For NLP engineers developing or deploying multilingual LLMs for Semitic languages, you must account for significant limitations in cross-lingual semantic disambiguation. Your models will likely struggle with false friends and loanwords, even with sentence-level context, due to over-reliance on surface forms. Prioritize rigorous evaluation using benchmarks like SemCog Bench, specifically testing for form-meaning conflicts, and focus development on techniques that move beyond surface similarity for robust cross-lingual understanding.

Key insights

Current LLMs struggle with cross-lingual semantic disambiguation, especially false friends and loanwords, due to over-reliance on surface forms.

Principles

Method

SemCog Bench evaluates LLMs using 1,858 Arabic--Hebrew word pairs with sentence-level annotations for cognate identification and semantic disambiguation across four input representations.

In practice

Topics

Best for: Research Scientist, AI Scientist, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.