Ukrainian Visual Word Sense Disambiguation Benchmark
Summary
A new benchmark has been developed for the Visual Word Sense Disambiguation (Visual-WSD) task in Ukrainian, addressing the challenge of identifying the correct meaning of ambiguous words from a set of ten images with minimal context. This benchmark, constructed using a methodology similar to Raganato et al. (2023) for English, Italian, and Farsi, allows for cross-language model performance comparisons. Data was collected semi-automatically and refined by Ukrainian philology experts, focusing on high-frequency noun homonyms. Eight multilingual and multimodal large language models were evaluated, all performing worse than the zero-shot CLIP-based baseline model used for English Visual-WSD. The analysis revealed a significant performance gap in Visual-WSD between Ukrainian and English, highlighting issues with MLLMs in low-resource languages and their susceptibility to hallucination.
Key takeaway
For AI Scientists developing or deploying multimodal LLMs for low-resource languages, you should prioritize creating and utilizing language-specific benchmarks like the U-VWSD. The observed performance disparity between Ukrainian and English models indicates that direct application of models trained on high-resource languages is insufficient. Focus on domain adaptation and data augmentation strategies tailored to the unique semantic nuances of target languages to mitigate hallucination and improve accuracy.
Key insights
Multimodal LLMs significantly underperform in Ukrainian Visual-WSD compared to English, revealing a critical language resource gap.
Principles
- Low-resource languages challenge MLLM performance.
- Homonym frequency impacts hallucination generation.
Method
The benchmark construction involved semi-automatic data collection from a digitized Ukrainian homonym dictionary, expert refinement, and generation of positive and negative image samples from Wikipedia, along with challenging trigger words.
In practice
- Use MRR and HIT@1 for Visual-WSD evaluation.
- Consider language-specific polysemy in model assessment.
Topics
- Visual Word Sense Disambiguation
- Multimodal Large Language Models
- Ukrainian NLP
- Low-Resource Languages
- AI Benchmarking
Best for: AI Scientist, AI Researcher, NLP Engineer, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CV updates on arXiv.org.