When Similar Means Different: Evaluating LLMs on Arabic--Hebrew Cognates
Summary
Researchers introduced SemCog Bench, a new benchmark comprising 1,858 Arabic--Hebrew word pairs with sentence-level annotations, to evaluate large language models' cross-lingual semantic understanding. This benchmark specifically targets cognate identification and semantic disambiguation, addressing challenges posed by true cognates, false friends, and modern loanwords between these closely related Semitic languages. Evaluations of open-source and commercial LLMs, using various input representations including raw, diacritized, Romanized, and phonetic forms, revealed a significant gap in cross-lingual reasoning. Models performed well on true cognates but showed a sharp decline in accuracy on false friends and loanwords, indicating a strong dependence on surface-form similarity. Furthermore, incorporating sentence-level context offered only modest improvements, suggesting it is insufficient to overcome misleading form-based signals. These findings highlight a fundamental limitation in current LLMs' ability to resolve cross-lingual form-meaning conflicts.
Key takeaway
For NLP engineers developing or deploying multilingual LLMs for Semitic languages, you must account for significant limitations in cross-lingual semantic disambiguation. Your models will likely struggle with false friends and loanwords, even with sentence-level context, due to over-reliance on surface forms. Prioritize rigorous evaluation using benchmarks like SemCog Bench, specifically testing for form-meaning conflicts, and focus development on techniques that move beyond surface similarity for robust cross-lingual understanding.
Key insights
Current LLMs struggle with cross-lingual semantic disambiguation, especially false friends and loanwords, due to over-reliance on surface forms.
Principles
- LLMs rely heavily on surface-form similarity.
- Contextual cues alone are often insufficient.
- Cross-lingual form-meaning conflicts persist.
Method
SemCog Bench evaluates LLMs using 1,858 Arabic--Hebrew word pairs with sentence-level annotations for cognate identification and semantic disambiguation across four input representations.
In practice
- Use SemCog Bench for multilingual LLM evaluation.
- Focus LLM training on form-meaning conflicts.
- Develop robust cross-lingual disambiguation.
Topics
- Large Language Models
- Cross-lingual Understanding
- Arabic-Hebrew Cognates
- Semantic Disambiguation
- Benchmarking
- Surface-Form Similarity
Best for: Research Scientist, AI Scientist, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.