Combining Real and Synthetic Speech for ASR Adaptation in Brazilian Portuguese
Summary
A new domain-specific Automatic Speech Recognition (ASR) dataset, GARAGEM: General Automotive Real and Artificial speech corpus for Garage Environments and Maintenance, has been introduced for Brazilian Portuguese. This dataset focuses on automotive repair terminology and combines real speech from online sources with synthetic speech generated from curated technical terms. A reproducible methodology is outlined, covering real data acquisition, domain-guided synthetic data generation, dataset consolidation, and ASR model fine-tuning. Experiments using Whisper, Wav2vec 2.0, and Conformer models demonstrated that synthetic data significantly improves ASR performance when complementing real recordings. Both quantitative and qualitative analyses showed reductions in Word Error Rate (WER) and Character Error Rate (CER), alongside enhanced recognition of specialized terms not present in the original real training set.
Key takeaway
For AI Engineers and Research Scientists developing ASR systems in specialized, low-resource languages like Brazilian Portuguese, consider integrating domain-guided synthetic speech. This approach can significantly reduce Word Error Rate (WER) and Character Error Rate (CER), improving the recognition of critical domain-specific terms. Implement a reproducible methodology for data acquisition, synthetic data generation, and model fine-tuning to enhance system performance and adapt to niche environments effectively.
Key insights
Domain-guided synthetic speech effectively augments real data for ASR adaptation in specialized, low-resource scenarios.
Principles
- Synthetic data improves ASR when complementing real recordings.
- Domain-specific terminology enhances synthetic data generation.
Method
The proposed methodology involves acquiring real data, generating domain-guided synthetic speech, consolidating the dataset, and fine-tuning ASR models like Whisper, Wav2vec 2.0, and Conformer.
In practice
- Generate synthetic speech from curated technical terms.
- Combine real and synthetic data for ASR fine-tuning.
Topics
- Automatic Speech Recognition
- Brazilian Portuguese
- Synthetic Speech
- GARAGEM Corpus
- Domain Adaptation
Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Paper Index on ACL Anthology.