Combining Real and Synthetic Speech for ASR Adaptation in Brazilian Portuguese

2026-04-12 · Source: Paper Index on ACL Anthology · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, medium

Summary

A new domain-specific Automatic Speech Recognition (ASR) dataset, GARAGEM: General Automotive Real and Artificial speech corpus for Garage Environments and Maintenance, has been introduced for Brazilian Portuguese. This dataset focuses on automotive repair terminology and combines real speech from online sources with synthetic speech generated from curated technical terms. A reproducible methodology is outlined, covering real data acquisition, domain-guided synthetic data generation, dataset consolidation, and ASR model fine-tuning. Experiments using Whisper, Wav2vec 2.0, and Conformer models demonstrated that synthetic data significantly improves ASR performance when complementing real recordings. Both quantitative and qualitative analyses showed reductions in Word Error Rate (WER) and Character Error Rate (CER), alongside enhanced recognition of specialized terms not present in the original real training set.

Key takeaway

For AI Engineers and Research Scientists developing ASR systems in specialized, low-resource languages like Brazilian Portuguese, consider integrating domain-guided synthetic speech. This approach can significantly reduce Word Error Rate (WER) and Character Error Rate (CER), improving the recognition of critical domain-specific terms. Implement a reproducible methodology for data acquisition, synthetic data generation, and model fine-tuning to enhance system performance and adapt to niche environments effectively.

Key insights

Domain-guided synthetic speech effectively augments real data for ASR adaptation in specialized, low-resource scenarios.

Principles

Synthetic data improves ASR when complementing real recordings.
Domain-specific terminology enhances synthetic data generation.

Method

The proposed methodology involves acquiring real data, generating domain-guided synthetic speech, consolidating the dataset, and fine-tuning ASR models like Whisper, Wav2vec 2.0, and Conformer.

In practice

Generate synthetic speech from curated technical terms.
Combine real and synthetic data for ASR fine-tuning.

Topics

Automatic Speech Recognition
Brazilian Portuguese
Synthetic Speech
GARAGEM Corpus
Domain Adaptation

Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Paper Index on ACL Anthology.