Voice of India: A Large-Scale Benchmark for Real-World Speech Recognition in India
Summary
Voice of India is a new closed-source benchmark designed for real-world Automatic Speech Recognition (ASR) in India, addressing limitations of existing benchmarks that often rely on scripted, clean speech and strict single-reference Word Error Rate (WER) evaluation. This new dataset comprises 306,230 utterances, totaling 536 hours of unscripted telephonic conversations from 36,691 speakers across 15 major Indian languages and 139 regional clusters. Its transcripts account for natural spelling variations, including non-standardized spellings of code-mixed English origin words. The benchmark also provides geographical performance analysis at the district level, revealing disparities, and detailed analysis across factors like audio quality, speaking rate, gender, and device type to pinpoint current ASR system weaknesses.
Key takeaway
For AI Engineers developing ASR systems for Indian languages, this benchmark highlights the need to move beyond clean, scripted data. You should prioritize training and evaluating models on diverse, unscripted telephonic conversations that account for natural spelling variations and code-mixing. Focusing on performance disparities at the district level and across factors like audio quality will lead to more robust and equitable real-world ASR solutions.
Key insights
Real-world Indic ASR requires benchmarks with unscripted speech, diverse languages, and flexible spelling.
Principles
- Unscripted speech improves real-world ASR robustness.
- Spelling variation must be accommodated in Indic ASR.
Method
The Voice of India benchmark was built from unscripted telephonic conversations, covering 15 Indian languages and 139 regional clusters, with transcripts accounting for spelling variations and geographical analysis.
In practice
- Evaluate ASR systems on unscripted telephonic data.
- Design ASR to handle code-mixed English words.
- Analyze ASR performance by geography and audio quality.
Topics
- Voice of India Benchmark
- Indic ASR Systems
- Real-World Speech Data
- Spelling Variation Handling
- Geographical ASR Disparities
Best for: AI Engineer, Research Scientist, AI Scientist, NLP Engineer, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.