Paza: Introducing automatic speech recognition benchmarks and models for low resource languages

2026-02-05 · Source: Microsoft Research · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, Emerging Technologies & Innovation · Depth: Intermediate, medium

Summary

Microsoft Research has released PazaBench, the first automatic speech recognition (ASR) leaderboard for low-resource languages, and new Paza ASR models. PazaBench initially covers 39 African languages and benchmarks 52 state-of-the-art ASR and language models, tracking Character Error Rate (CER), Word Error Rate (WER), and RTFx (Inverse Real-Time Factor). The Paza ASR models are three fine-tuned models built on architectures like Phi-4 Multimodal-Instruct, Meta's MMS-1B-All, and OpenAI's Whisper-Large-v3-Turbo. These models target six Kenyan languages: Swahili, Dholuo, Kalenjin, Kikuyu, Maasai, and Somali, and were developed using a human-centered pipeline, including real-world testing with farmers on mobile devices. This initiative aims to bridge the digital and AI divides for under-represented languages.

Key takeaway

For NLP engineers and AI scientists developing ASR systems for diverse global users, prioritize human-centered design and real-world validation. Your models must perform effectively in low-resource settings, supporting local languages and accents on common mobile devices. Consider contributing to or utilizing platforms like PazaBench to assess model performance and identify gaps in underserved languages, ensuring your solutions are truly usable.

Key insights

Human-centered design and evaluation are crucial for effective speech models in low-resource, real-world environments.

Principles

Co-create speech technology with user communities.
Evaluation must prioritize real-world usability, not just benchmarks.

Method

The Paza method involves benchmarking low-resource languages with PazaBench, fine-tuning ASR models with minimal data, and evaluating with community testers on real devices in real contexts.

In practice

Test ASR models with end-users on everyday mobile devices.
Focus on CER for languages with rich word forms.
Address Whisper hallucination with post-processing.

Topics

Low-Resource ASR
PazaBench Leaderboard
Human-Centered AI
African Languages
Speech Technology Evaluation

Best for: NLP Engineer, AI Scientist, Research Scientist, AI Researcher, Machine Learning Engineer, AI Product Manager

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Microsoft Research.