Paza: Introducing automatic speech recognition benchmarks and models for low resource languages
Summary
Microsoft Research has released PazaBench, the first automatic speech recognition (ASR) leaderboard for low-resource languages, and new Paza ASR models. PazaBench initially covers 39 African languages and benchmarks 52 state-of-the-art ASR and language models, tracking Character Error Rate (CER), Word Error Rate (WER), and RTFx (Inverse Real-Time Factor). The Paza ASR models are three fine-tuned models built on architectures like Phi-4 Multimodal-Instruct, Meta's MMS-1B-All, and OpenAI's Whisper-Large-v3-Turbo. These models target six Kenyan languages: Swahili, Dholuo, Kalenjin, Kikuyu, Maasai, and Somali, and were developed using a human-centered pipeline, including real-world testing with farmers on mobile devices. This initiative aims to bridge the digital and AI divides for under-represented languages.
Key takeaway
For NLP engineers and AI scientists developing ASR systems for diverse global users, prioritize human-centered design and real-world validation. Your models must perform effectively in low-resource settings, supporting local languages and accents on common mobile devices. Consider contributing to or utilizing platforms like PazaBench to assess model performance and identify gaps in underserved languages, ensuring your solutions are truly usable.
Key insights
Human-centered design and evaluation are crucial for effective speech models in low-resource, real-world environments.
Principles
- Co-create speech technology with user communities.
- Evaluation must prioritize real-world usability, not just benchmarks.
Method
The Paza method involves benchmarking low-resource languages with PazaBench, fine-tuning ASR models with minimal data, and evaluating with community testers on real devices in real contexts.
In practice
- Test ASR models with end-users on everyday mobile devices.
- Focus on CER for languages with rich word forms.
- Address Whisper hallucination with post-processing.
Topics
- Low-Resource ASR
- PazaBench Leaderboard
- Human-Centered AI
- African Languages
- Speech Technology Evaluation
Best for: NLP Engineer, AI Scientist, Research Scientist, AI Researcher, Machine Learning Engineer, AI Product Manager
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Microsoft Research.