IndicDB -- Benchmarking Multilingual Text-to-SQL Capabilities in Indian Languages
Summary
IndicDB is a new multilingual Text-to-SQL benchmark designed to evaluate Large Language Models (LLMs) on cross-lingual semantic parsing in Indian languages, addressing a gap in existing English-centric benchmarks. It comprises 20 PostgreSQL databases with 237 tables, sourced from India's National Data and Analytics Platform (NDAP) and India Data Portal (IDP), reflecting complex administrative data structures with join-depths up to six. The benchmark includes 15,617 tasks across English, Hindi, and five other Indic languages, synthesized using a value-aware, difficulty-calibrated, and join-enforced pipeline. Evaluations of models like Deepseek v3.2, MiniMax 2.7, Llama 3.3, and Qwen3 reveal a 9.00% global performance drop from English to Indic variants, termed the "Indic Gap," primarily due to schema-linking difficulties and structural ambiguity. The benchmark and code are publicly available.
Key takeaway
Research Scientists developing multilingual LLMs should prioritize improving schema-linking and compositional reasoning for non-Western languages. The observed "Indic Gap" highlights that current models struggle with linguistic distance and morphological variation, necessitating targeted development. Consider integrating external evidence augmentation and structured prompting techniques like DIN-SQL to enhance cross-lingual grounding and reduce performance disparities in high-cardinality, linguistically diverse environments.
Key insights
IndicDB reveals a significant "Indic Gap" in LLM Text-to-SQL performance for Indian languages compared to English.
Principles
- Real-world data schemas are complex and denormalized.
- Multilingual Text-to-SQL requires robust cross-lingual semantic parsing.
- External evidence augmentation improves LLM grounding in diverse linguistic contexts.
Method
IndicDB uses a three-agent judge pattern (Architect, Auditor, Refiner) to transform denormalized government data into complex relational schemas, then synthesizes value-grounded, difficulty-calibrated Text-to-SQL tasks across seven languages.
In practice
- Use DIN-SQL prompting for improved schema grounding.
- Augment LLM prompts with external evidence files for performance gains.
- Focus error analysis on schema linking and aggregation issues in multilingual settings.
Topics
- IndicDB Benchmark
- Multilingual Text-to-SQL
- Indian Languages
- Large Language Models
- Semantic Parsing
Best for: Research Scientist, AI Scientist, NLP Engineer, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.