QC-GAN: A Parameter-Efficient Quaternion Conformer GAN for High-Fidelity Speech Enhancement
Summary
The Quaternion Conformer GAN (QC-GAN) is a new parameter-efficient speech enhancement framework that integrates a Quaternion Conformer generator with MetricGAN-based training. It leverages the Hamilton product for structured weight sharing, which encodes magnitude and phase information jointly, significantly reducing layer parameters while maintaining their interdependencies. A metric-learning discriminator is employed to maximize perceptual quality by optimizing approximate perceptual evaluation scores. On the VoiceBank+DEMAND dataset, QC-GAN (Base) achieved a Perceptual Evaluation of Speech Quality (PESQ) score of 3.48 with only 0.89M parameters, delivering performance comparable to leading models at less than half their size. An ultra-compact 35K-parameter variant, QC-GAN (Tiny), achieved a PESQ score of 3.23, surpassing conventional lightweight methods. Evaluation on the DNS-Challenge 3 dataset further confirmed its generalization to real-world conditions, with QC-GAN (Base) achieving the highest DNSMOS OVRL (2.73), BAK (3.79), and P.808 MOS (3.37) scores. Ablation studies demonstrated that quaternion representations improved phase reconstruction, reducing group delay error by approximately 9%.
Key takeaway
For Machine Learning Engineers developing lightweight speech enhancement solutions, QC-GAN presents a compelling architecture. You should consider integrating Quaternion Neural Networks and MetricGAN-based training to achieve high perceptual quality with significantly fewer parameters. This approach can reduce your model size by over 50% while maintaining or improving PESQ scores. Explore fused quaternion kernels or linear attention to further optimize inference latency for real-time CPU deployment.
Key insights
Quaternion Conformer GANs achieve high-fidelity speech enhancement with significantly fewer parameters by jointly modeling magnitude and phase.
Principles
- Hamilton product reduces parameters by 75% via structured weight sharing.
- Quaternion algebra provides inductive bias for coupled magnitude-phase encoding.
- MetricGAN training optimizes for human perceptual quality metrics.
Method
QC-GAN employs a Quaternion Conformer generator and a metric discriminator, optimizing a multi-task loss including differentiable PESQ.
In practice
- Apply quaternion layers for parameter-efficient audio processing.
- Integrate MetricGAN for perceptually-driven model training.
- Map STFT magnitude and phase directly to quaternion components.
Topics
- Speech Enhancement
- Quaternion Neural Networks
- Conformer
- MetricGAN
- Parameter Efficiency
- Phase Reconstruction
Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by stat.ML updates on arXiv.org.