Contrastive Training with LLM-generated Near-Misses for Robust Code-Switching Speech Recognition

2026-06-05 · Source: Computation and Language · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Audio and Speech Processing · Depth: Expert, quick

Summary

A new Point-of-Interest (POI)-aware contrastive training framework has been developed to enhance Automatic Speech Recognition (ASR) for code-switching (CS) speech. This method addresses the challenge of alternating languages within single utterances by focusing on CS-critical regions. The framework identifies CS spans using POI detection, then generates acoustically plausible "near-miss" hypotheses by perturbing POIs in ASR N-best outputs and expanding candidates with a large language model (LLM). These hard but plausible negatives are filtered using acoustic, phonemic, and textual constraints. Finally, Whisper-small is fine-tuned with LoRA, incorporating a POI-weighted cross-entropy anchor objective and a multi-negative contrastive ranking loss. Experiments on CS-FLEURS (cmn-eng) and ViMedCSS (vie-eng) datasets demonstrated consistent reductions of over 2% in both general and CS-aware error rates compared to standard LoRA fine-tuning.

Key takeaway

For Machine Learning Engineers developing ASR systems for multilingual environments, consider integrating POI-aware contrastive training. This approach, which uses LLM-generated near-misses and multi-negative ranking loss, significantly reduces code-switching error rates by over 2%. You should explore this method to enhance your models' robustness in handling mixed-language utterances, potentially improving user experience in diverse linguistic contexts.

Key insights

Contrastive training with LLM-generated near-misses improves code-switching ASR by targeting critical regions.

Principles

POI detection can pinpoint code-switching critical regions.
LLMs can generate plausible "near-miss" hypotheses for training.
Filtering hard negatives improves contrastive learning effectiveness.

Method

Identify CS spans via POI detection. Generate near-misses by perturbing N-best ASR outputs and expanding with an LLM. Filter negatives. Fine-tune Whisper-small with LoRA using POI-weighted cross-entropy and multi-negative contrastive ranking loss.

In practice

Apply POI detection to identify language switch points.
Use LLMs to create diverse, challenging negative samples.
Integrate multi-negative contrastive loss for ASR fine-tuning.

Topics

Code-Switching ASR
Contrastive Training
Large Language Models
Whisper Model
LoRA Fine-tuning

Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.