NRITYAM: Language Models Meet Art and Heritage of Dance

2026-06-19 · Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Advanced, extended

Summary

NRITYAM is a comprehensive benchmark designed to evaluate language models' cultural comprehension of global dance traditions. Comprising 9,260 curated question-answer pairs across 12 languages, it is the largest dataset dedicated to assessing cultural knowledge in dance. Developed in collaboration with native dance artists and speakers, NRITYAM covers traditional dances from 12 countries across 5 continents. The benchmark rigorously evaluates various models, including LLMs, SLMs, MLMs, and SMLMs, revealing significant gaps in their cultural and contextual reasoning abilities, particularly in low-resource languages. For instance, GPT-5 achieved 61.73% and DeepSeek-OCR 68.64% overall, highlighting persistent cross-lingual disparities.

Key takeaway

For AI Scientists and Machine Learning Engineers developing global language models, this research highlights that even advanced models like GPT-5 and DeepSeek-OCR demonstrate significant cultural and cross-lingual reasoning gaps. You should prioritize integrating culturally diverse and low-resource language data into your training and evaluation pipelines. This is crucial for mitigating biases and ensuring your AI systems can genuinely understand and interact with varied socio-cultural contexts, moving beyond mainstream cultural representations.

Key insights

Language models require specialized benchmarks like NRITYAM to assess and improve their understanding of diverse cultural contexts in traditional arts.

Principles

Global AI effectiveness depends on nuanced local socio-cultural understanding.
Current language models exhibit biases due to training on mainstream cultural data.
Human cultural interpretation significantly surpasses current model capabilities.

Method

NRITYAM was built through a multi-phase manual process involving 36 native dance artists and speakers from 12 countries who authored, translated, and cross-validated 9,260 culturally relevant question-answer pairs from diverse sources.

In practice

Evaluate AI systems using culturally specific benchmarks like NRITYAM.
Focus model development on improving low-resource language performance.
Engage domain experts for culturally sensitive data creation and validation.

Topics

NRITYAM Benchmark
Cultural AI
Multilingual LLMs
Multimodal LMs
Traditional Dance
AI Bias Mitigation

Code references

niladrighosh03/NRITYAM

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.