Towards Data-free and Training-free Compression for Speech Foundation Models Using Parameter Clustering

· Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, long

Summary

A novel data-free and training-free compression approach for speech foundation models, utilizing channel-wise clustering via k-means, is introduced. This method also explores mixed sparsity pruning by layer-level varying numbers of parameter clusters. Experiments on the LibriSpeech dataset demonstrate that when applied to HuBERT-large at 50% sparsity, the method achieved absolute Word Error Rate (WER) reductions of 27.73% on test-clean and 18.61% on test-other compared to magnitude-based pruning before fine-tuning. After 3 epochs of fine-tuning, WER reductions were 0.19% and 0.79% respectively. For Whisper-large-v3 at 10% sparsity, absolute WER reductions of 2.86% and 5.02% were observed against magnitude-based pruning, with no significant WER increase relative to the uncompressed baseline. The approach produces hardware-friendly, coarse-grained compressed models.

Key takeaway

For Machine Learning Engineers optimizing speech foundation models for resource-constrained environments, consider implementing parameter clustering. This data-free, training-free approach significantly reduces model size and computational demands while maintaining or improving Word Error Rate compared to traditional magnitude-based pruning. You can achieve substantial compression on models like HuBERT-large and Whisper-large-v3, enabling deployment on standard hardware without specialized libraries.

Key insights

Parameter clustering offers data-free, training-free, and hardware-friendly compression for speech foundation models, outperforming magnitude-based pruning.

Principles

Method

Apply k-means clustering to structured units (attention heads, FFN units) to merge similar components into K centroids, replacing originals. Use variance-based mixed sparsity to adaptively assign K per layer.

In practice

Topics

Best for: AI Engineer, NLP Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, AI Hardware Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.