Contrastive Training with LLM-generated Near-Misses for Robust Code-Switching Speech Recognition
Summary
A new Point-of-Interest (POI)-aware contrastive training framework has been developed to enhance Automatic Speech Recognition (ASR) for code-switching (CS) speech. This method addresses the challenge of alternating languages within single utterances by focusing on CS-critical regions. The framework identifies CS spans using POI detection, then generates acoustically plausible "near-miss" hypotheses by perturbing POIs in ASR N-best outputs and expanding candidates with a large language model (LLM). These hard but plausible negatives are filtered using acoustic, phonemic, and textual constraints. Finally, Whisper-small is fine-tuned with LoRA, incorporating a POI-weighted cross-entropy anchor objective and a multi-negative contrastive ranking loss. Experiments on CS-FLEURS (cmn-eng) and ViMedCSS (vie-eng) datasets demonstrated consistent reductions of over 2% in both general and CS-aware error rates compared to standard LoRA fine-tuning.
Key takeaway
For Machine Learning Engineers developing ASR systems for multilingual environments, consider integrating POI-aware contrastive training. This approach, which uses LLM-generated near-misses and multi-negative ranking loss, significantly reduces code-switching error rates by over 2%. You should explore this method to enhance your models' robustness in handling mixed-language utterances, potentially improving user experience in diverse linguistic contexts.
Key insights
Contrastive training with LLM-generated near-misses improves code-switching ASR by targeting critical regions.
Principles
- POI detection can pinpoint code-switching critical regions.
- LLMs can generate plausible "near-miss" hypotheses for training.
- Filtering hard negatives improves contrastive learning effectiveness.
Method
Identify CS spans via POI detection. Generate near-misses by perturbing N-best ASR outputs and expanding with an LLM. Filter negatives. Fine-tune Whisper-small with LoRA using POI-weighted cross-entropy and multi-negative contrastive ranking loss.
In practice
- Apply POI detection to identify language switch points.
- Use LLMs to create diverse, challenging negative samples.
- Integrate multi-negative contrastive loss for ASR fine-tuning.
Topics
- Code-Switching ASR
- Contrastive Training
- Large Language Models
- Whisper Model
- LoRA Fine-tuning
Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.