Improving low-resource ASR using bilingual fine-tuning with language identification: a cross-linguistic evaluation
Summary
This study evaluates bilingual fine-tuning for improving Automatic Speech Recognition (ASR) in low-resource languages. The method involves pre-pending input text with a language identification token during training. At inference, the model jointly predicts both the language and the transcription from speech input. The research covers nine linguistically and geographically diverse language pairs. A key finding is that ASR performance is low when the language is incorrectly identified. To address this, a follow-up experiment provided the language identification token during both training and inference. Results indicate that bilingual fine-tuning is beneficial when language identification accuracy is high. Furthermore, including the language identification token at inference significantly improves ASR performance in scenarios where initial language identification is poor.
Key takeaway
For NLP Engineers developing ASR systems for low-resource languages, consider implementing bilingual fine-tuning with language identification tokens. If your model's language identification accuracy is high, this approach can significantly improve ASR performance. When language identification is less reliable, explicitly providing the language identification token during inference is a critical step to enhance transcription accuracy and ensure robust system operation.
Key insights
Bilingual fine-tuning enhances low-resource ASR, particularly when language identification is accurate or explicitly guided during inference.
Principles
- Bilingual fine-tuning aids low-resource ASR.
- High language ID accuracy is crucial.
- Explicit language ID improves ASR.
Method
Pre-pend a language identification token to input text during training. The model then jointly predicts language and transcription. For low language ID accuracy, provide the token during inference to boost ASR.
In practice
- Implement language ID tokens.
- Prioritize language ID accuracy.
- Supply language ID at inference.
Topics
- Low-Resource ASR
- Bilingual Fine-tuning
- Language Identification
- Speech Recognition
- Cross-Linguistic Evaluation
- Inference Optimization
Best for: AI Engineer, Research Scientist, AI Scientist, NLP Engineer, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.