Towards Data-free and Training-free Compression for Speech Foundation Models Using Parameter Clustering
Summary
A novel data-free and training-free compression approach for speech foundation models, utilizing channel-wise clustering via k-means, is introduced. This method also explores mixed sparsity pruning by layer-level varying numbers of parameter clusters. Experiments on the LibriSpeech dataset demonstrate that when applied to HuBERT-large at 50% sparsity, the method achieved absolute Word Error Rate (WER) reductions of 27.73% on test-clean and 18.61% on test-other compared to magnitude-based pruning before fine-tuning. After 3 epochs of fine-tuning, WER reductions were 0.19% and 0.79% respectively. For Whisper-large-v3 at 10% sparsity, absolute WER reductions of 2.86% and 5.02% were observed against magnitude-based pruning, with no significant WER increase relative to the uncompressed baseline. The approach produces hardware-friendly, coarse-grained compressed models.
Key takeaway
For Machine Learning Engineers optimizing speech foundation models for resource-constrained environments, consider implementing parameter clustering. This data-free, training-free approach significantly reduces model size and computational demands while maintaining or improving Word Error Rate compared to traditional magnitude-based pruning. You can achieve substantial compression on models like HuBERT-large and Whisper-large-v3, enabling deployment on standard hardware without specialized libraries.
Key insights
Parameter clustering offers data-free, training-free, and hardware-friendly compression for speech foundation models, outperforming magnitude-based pruning.
Principles
- Merging similar parameters preserves collective information.
- Higher parameter variance indicates more complex information.
- Structured compression is compatible with general-purpose hardware.
Method
Apply k-means clustering to structured units (attention heads, FFN units) to merge similar components into K centroids, replacing originals. Use variance-based mixed sparsity to adaptively assign K per layer.
In practice
- Compress HuBERT-large or Whisper-large-v3 without data or training.
- Achieve significant WER reductions over magnitude pruning.
- Deploy compressed models on standard hardware platforms.
Topics
- Speech Foundation Models
- Model Compression
- Parameter Clustering
- Automatic Speech Recognition
- HuBERT
- Whisper
Best for: AI Engineer, NLP Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, AI Hardware Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.