ROMPAR: Morphological Completion and Demographic Unlearning for Romanian-Accented Speech Recognition

· Source: Computation and Language · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Natural Language Processing · Depth: Expert, quick

Summary

The ROMPAR (ROManian PARliamentary Speech Corpus) dataset, a 17.80-hour collection of Romanian and Moldavian parliamentary speech, is introduced to address challenges in automated transcription, including demographic bias, dialectal variation, and utterance truncation. This corpus features double-annotated ground truth and explicit labels for reconstructed word fragments. To build a robust Automatic Speech Recognition (ASR) system, a multi-task adversarial training framework is proposed, designed to enforce demographic invariance across age, gender, and dialect. The framework tackles adversarial objective instability in generative architectures by incorporating an exponential decay mechanism for adversarial coefficients. Additionally, an LLM-guided decoding strategy with position-dependent weighting is implemented to facilitate morphological completion of truncated terminal words. The system significantly reduces Word Error Rate (WER) and achieves an F1-score of 96.6% in morphological reconstruction.

Key takeaway

For Machine Learning Engineers developing ASR systems for diverse or accented speech, this research offers a robust approach to mitigate demographic bias and improve transcription accuracy. You should consider integrating multi-task adversarial training with an exponential decay mechanism to stabilize generative architectures. Furthermore, implementing an LLM-guided decoding strategy can significantly enhance morphological completion for truncated words, directly improving your system's overall Word Error Rate.

Key insights

A multi-task adversarial framework with LLM-guided decoding improves ASR for accented speech by reducing demographic bias and completing morphology.

Principles

Method

The proposed method uses multi-task adversarial training with exponential decay for coefficients and an LLM-guided decoding strategy with position-dependent weighting to complete truncated words.

In practice

Topics

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.