Towards Personalized Federated Learning for Dysarthric Speech Recognition

2026-06-11 · Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Natural Language Processing · Depth: Expert, quick

Summary

Personalized Federated Learning (FL) for dysarthric speech recognition addresses challenges posed by speaker variability and privacy concerns. While FL-based Automatic Speech Recognition (ASR) protects privacy, it struggles with heterogeneity, making shared model components suboptimal. This research explores two personalization aggregation strategies: parameter-based averaging and embedding-based averaging. Experiments conducted on the UASpeech and TORGO datasets demonstrate that these proposed methods significantly outperform baseline regularized FedAvg. Specifically, they achieved statistically significant Word Error Rate (WER) reductions of up to 0.99% absolute (3.15% relative) on UASpeech and 0.56% absolute (4.73% relative) on TORGO. This work highlights personalization as a promising direction for improving ASR for dysarthric speakers.

Key takeaway

For Machine Learning Engineers developing Automatic Speech Recognition (ASR) systems for diverse or impaired speech, you should consider personalized federated learning approaches. Standard federated learning struggles with speaker heterogeneity, but implementing strategies like parameter-based or embedding-based averaging can yield significant accuracy improvements. Your efforts to enhance ASR for dysarthric speakers could see Word Error Rate reductions of up to 4.73% relative, making these personalization techniques crucial for robust, privacy-preserving systems.

Key insights

Personalization significantly improves federated learning-based Automatic Speech Recognition for dysarthric speakers by mitigating heterogeneity.

Principles

Shared model components are suboptimal under speaker heterogeneity.

Method

The method explores two aggregation strategies for personalized federated learning: parameter-based averaging and embedding-based averaging, applied to ASR models.

In practice

Apply personalized FL to ASR for diverse user groups.
Investigate parameter-based averaging for model customization.

Topics

Federated Learning
Dysarthric Speech Recognition
Automatic Speech Recognition
Personalization
Model Aggregation
Speaker Heterogeneity

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.