BaltiVoice: A Speech Corpus and Fine-tuned Whisper ASR System for the Balti Language
Summary
BaltiVoice introduces a 16.8-hour read-speech corpus for Balti (ISO 639-3: bft), a Tibetic language previously lacking public Automatic Speech Recognition (ASR) resources. This corpus comprises 10,060 validated utterances in native Nastaliq script, sourced from Mozilla Common Voice recordings. Researchers fine-tuned OpenAI Whisper-small on this new dataset, achieving a Word Error Rate (WER) of 30.07% on a 538-utterance validation set. This represents a significant improvement from Whisper-small's zero-shot baseline WER of 182.18% for Balti. The complete dataset, the fine-tuned model, and a live transcription demonstration are all publicly accessible on HuggingFace.
Key takeaway
For NLP Engineers or Machine Learning Engineers tasked with developing ASR systems for low-resource languages, this work demonstrates a viable path. You should consider creating a focused read-speech corpus, even if relatively small (e.g., 16.8 hours), and fine-tuning a pre-trained model like Whisper-small. This approach can yield substantial performance gains, transforming a non-functional zero-shot baseline into a practical system for your target language.
Key insights
Fine-tuning pre-trained ASR models with modest, newly created corpora can establish functional speech recognition for low-resource languages.
Principles
- Publicly available resources enable ASR development for underserved languages.
- Fine-tuning significantly reduces ASR error rates on new languages.
Method
A 16.8-hour read-speech corpus was created from Mozilla Common Voice, then used to fine-tune OpenAI Whisper-small, achieving a 30.07% WER.
In practice
- Utilize Mozilla Common Voice for low-resource language corpus generation.
- Fine-tune Whisper-small for initial ASR system development.
Topics
- Balti Language
- Automatic Speech Recognition
- Speech Corpus
- Whisper Model
- Low-Resource Languages
- Fine-tuning
- Nastaliq Script
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.