Towards Truly Multilingual ASR: Generalizing Code-Switching ASR to Unseen Language Pairs

2026-06-19 · Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Natural Language Processing · Depth: Expert, long

Summary

This research investigates generalizing code-switching Automatic Speech Recognition (CS-ASR) capabilities from seen to unseen language pairs, addressing the scarcity of multilingual CS speech resources. Using Whisper-medium as the backbone, the study fine-tuned models on English-centric pairs (Korean-English, Japanese-English, German-English) and evaluated their transferability to unseen non-English pairs like Korean-Japanese and Korean-German. For these unseen pairs, novel evaluation datasets were constructed, comprising 450 Korean-Japanese and 387 Korean-German utterances. Experiments explored model merging (Task Arithmetic, TIES, DARE) and domain generalization (Fish, Fishr, GGA) techniques. Results indicate that while fine-tuning and these generalization methods offer modest improvements, the gains are limited, with an average Mixed Error Rate (MER) of 0.32 on unseen pairs, still far from the sub-0.2 MER on seen pairs. Layer-wise analysis revealed CS adaptation primarily in higher encoder and decoder layers.

Key takeaway

For AI Scientists developing multilingual ASR systems, recognize that current model merging and domain generalization techniques offer only limited transferability for code-switching to unseen language pairs. You should prioritize developing CS-ASR architectures and adaptation strategies specifically designed for robust cross-pair generalization, rather than relying on existing general-purpose methods. Consider contributing to or utilizing more diverse, higher-quality multilingual code-switching speech datasets to advance practical deployment.

Key insights

Code-switching ASR generalization to unseen language pairs remains limited despite model merging and domain generalization efforts.

Principles

Code-switching ASR capabilities transfer modestly across language pairs.
CS adaptation primarily modifies higher-level semantic and linguistic representations.
Naive application of general-purpose domain generalization methods is insufficient for CS-ASR.

Method

The study fine-tuned Whisper-medium on seen bilingual CS datasets, then applied model merging (Task Arithmetic, TIES, DARE) and domain generalization (Fish, Fishr, GGA) to evaluate performance on newly constructed unseen CS language pair datasets.

In practice

Construct small-scale evaluation datasets for under-resourced language pairs.
Consider TIES-Merging for combining language-pair-specific CS-ASR models.
Focus CS-ASR adaptation on deeper encoder and decoder layers.

Topics

Code-Switching ASR
Multilingual ASR
Model Merging
Domain Generalization
Whisper-medium
Speech Datasets

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.