IndicDB -- Benchmarking Multilingual Text-to-SQL Capabilities in Indian Languages

· Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, extended

Summary

IndicDB is a new multilingual Text-to-SQL benchmark designed to evaluate Large Language Models (LLMs) on cross-lingual semantic parsing in Indian languages, addressing a gap in existing English-centric benchmarks. It comprises 20 PostgreSQL databases with 237 tables, sourced from India's National Data and Analytics Platform (NDAP) and India Data Portal (IDP), reflecting complex administrative data structures with join-depths up to six. The benchmark includes 15,617 tasks across English, Hindi, and five other Indic languages, synthesized using a value-aware, difficulty-calibrated, and join-enforced pipeline. Evaluations of models like Deepseek v3.2, MiniMax 2.7, Llama 3.3, and Qwen3 reveal a 9.00% global performance drop from English to Indic variants, termed the "Indic Gap," primarily due to schema-linking difficulties and structural ambiguity. The benchmark and code are publicly available.

Key takeaway

Research Scientists developing multilingual LLMs should prioritize improving schema-linking and compositional reasoning for non-Western languages. The observed "Indic Gap" highlights that current models struggle with linguistic distance and morphological variation, necessitating targeted development. Consider integrating external evidence augmentation and structured prompting techniques like DIN-SQL to enhance cross-lingual grounding and reduce performance disparities in high-cardinality, linguistically diverse environments.

Key insights

IndicDB reveals a significant "Indic Gap" in LLM Text-to-SQL performance for Indian languages compared to English.

Principles

Method

IndicDB uses a three-agent judge pattern (Architect, Auditor, Refiner) to transform denormalized government data into complex relational schemas, then synthesizes value-grounded, difficulty-calibrated Text-to-SQL tasks across seven languages.

In practice

Topics

Best for: Research Scientist, AI Scientist, NLP Engineer, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.