NRITYAM: Language Models Meet Art and Heritage of Dance
Summary
NRITYAM is a comprehensive benchmark designed to evaluate language models' cultural comprehension of global dance traditions. Comprising 9,260 curated question-answer pairs across 12 languages, it is the largest dataset dedicated to assessing cultural knowledge in dance. Developed in collaboration with native dance artists and speakers, NRITYAM covers traditional dances from 12 countries across 5 continents. The benchmark rigorously evaluates various models, including LLMs, SLMs, MLMs, and SMLMs, revealing significant gaps in their cultural and contextual reasoning abilities, particularly in low-resource languages. For instance, GPT-5 achieved 61.73% and DeepSeek-OCR 68.64% overall, highlighting persistent cross-lingual disparities.
Key takeaway
For AI Scientists and Machine Learning Engineers developing global language models, this research highlights that even advanced models like GPT-5 and DeepSeek-OCR demonstrate significant cultural and cross-lingual reasoning gaps. You should prioritize integrating culturally diverse and low-resource language data into your training and evaluation pipelines. This is crucial for mitigating biases and ensuring your AI systems can genuinely understand and interact with varied socio-cultural contexts, moving beyond mainstream cultural representations.
Key insights
Language models require specialized benchmarks like NRITYAM to assess and improve their understanding of diverse cultural contexts in traditional arts.
Principles
- Global AI effectiveness depends on nuanced local socio-cultural understanding.
- Current language models exhibit biases due to training on mainstream cultural data.
- Human cultural interpretation significantly surpasses current model capabilities.
Method
NRITYAM was built through a multi-phase manual process involving 36 native dance artists and speakers from 12 countries who authored, translated, and cross-validated 9,260 culturally relevant question-answer pairs from diverse sources.
In practice
- Evaluate AI systems using culturally specific benchmarks like NRITYAM.
- Focus model development on improving low-resource language performance.
- Engage domain experts for culturally sensitive data creation and validation.
Topics
- NRITYAM Benchmark
- Cultural AI
- Multilingual LLMs
- Multimodal LMs
- Traditional Dance
- AI Bias Mitigation
Code references
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.