When Similar Means Different: Evaluating LLMs on Arabic--Hebrew Cognates

2026-06-11 · Source: Computation and Language · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Natural Language Processing · Depth: Expert, quick

Summary

Researchers introduced SemCog Bench, a new benchmark comprising 1,858 Arabic--Hebrew word pairs with sentence-level annotations, to evaluate large language models' cross-lingual semantic understanding. This benchmark specifically targets cognate identification and semantic disambiguation, addressing challenges posed by true cognates, false friends, and modern loanwords between these closely related Semitic languages. Evaluations of open-source and commercial LLMs, using various input representations including raw, diacritized, Romanized, and phonetic forms, revealed a significant gap in cross-lingual reasoning. Models performed well on true cognates but showed a sharp decline in accuracy on false friends and loanwords, indicating a strong dependence on surface-form similarity. Furthermore, incorporating sentence-level context offered only modest improvements, suggesting it is insufficient to overcome misleading form-based signals. These findings highlight a fundamental limitation in current LLMs' ability to resolve cross-lingual form-meaning conflicts.

Key takeaway

For NLP engineers developing or deploying multilingual LLMs for Semitic languages, you must account for significant limitations in cross-lingual semantic disambiguation. Your models will likely struggle with false friends and loanwords, even with sentence-level context, due to over-reliance on surface forms. Prioritize rigorous evaluation using benchmarks like SemCog Bench, specifically testing for form-meaning conflicts, and focus development on techniques that move beyond surface similarity for robust cross-lingual understanding.

Key insights

Current LLMs struggle with cross-lingual semantic disambiguation, especially false friends and loanwords, due to over-reliance on surface forms.

Principles

LLMs rely heavily on surface-form similarity.
Contextual cues alone are often insufficient.
Cross-lingual form-meaning conflicts persist.

Method

SemCog Bench evaluates LLMs using 1,858 Arabic--Hebrew word pairs with sentence-level annotations for cognate identification and semantic disambiguation across four input representations.

In practice

Use SemCog Bench for multilingual LLM evaluation.
Focus LLM training on form-meaning conflicts.
Develop robust cross-lingual disambiguation.

Topics

Large Language Models
Cross-lingual Understanding
Arabic-Hebrew Cognates
Semantic Disambiguation
Benchmarking
Surface-Form Similarity

Best for: Research Scientist, AI Scientist, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.