Towards Truly Multilingual ASR: Generalizing Code-Switching ASR to Unseen Language Pairs

· Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Natural Language Processing · Depth: Expert, long

Summary

This research investigates generalizing code-switching Automatic Speech Recognition (CS-ASR) capabilities from seen to unseen language pairs, addressing the scarcity of multilingual CS speech resources. Using Whisper-medium as the backbone, the study fine-tuned models on English-centric pairs (Korean-English, Japanese-English, German-English) and evaluated their transferability to unseen non-English pairs like Korean-Japanese and Korean-German. For these unseen pairs, novel evaluation datasets were constructed, comprising 450 Korean-Japanese and 387 Korean-German utterances. Experiments explored model merging (Task Arithmetic, TIES, DARE) and domain generalization (Fish, Fishr, GGA) techniques. Results indicate that while fine-tuning and these generalization methods offer modest improvements, the gains are limited, with an average Mixed Error Rate (MER) of 0.32 on unseen pairs, still far from the sub-0.2 MER on seen pairs. Layer-wise analysis revealed CS adaptation primarily in higher encoder and decoder layers.

Key takeaway

For AI Scientists developing multilingual ASR systems, recognize that current model merging and domain generalization techniques offer only limited transferability for code-switching to unseen language pairs. You should prioritize developing CS-ASR architectures and adaptation strategies specifically designed for robust cross-pair generalization, rather than relying on existing general-purpose methods. Consider contributing to or utilizing more diverse, higher-quality multilingual code-switching speech datasets to advance practical deployment.

Key insights

Code-switching ASR generalization to unseen language pairs remains limited despite model merging and domain generalization efforts.

Principles

Method

The study fine-tuned Whisper-medium on seen bilingual CS datasets, then applied model merging (Task Arithmetic, TIES, DARE) and domain generalization (Fish, Fishr, GGA) to evaluate performance on newly constructed unseen CS language pair datasets.

In practice

Topics

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.