QC-GAN: A Parameter-Efficient Quaternion Conformer GAN for High-Fidelity Speech Enhancement

· Source: stat.ML updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Speech Processing · Depth: Expert, extended

Summary

The Quaternion Conformer GAN (QC-GAN) is a new parameter-efficient speech enhancement framework that integrates a Quaternion Conformer generator with MetricGAN-based training. It leverages the Hamilton product for structured weight sharing, which encodes magnitude and phase information jointly, significantly reducing layer parameters while maintaining their interdependencies. A metric-learning discriminator is employed to maximize perceptual quality by optimizing approximate perceptual evaluation scores. On the VoiceBank+DEMAND dataset, QC-GAN (Base) achieved a Perceptual Evaluation of Speech Quality (PESQ) score of 3.48 with only 0.89M parameters, delivering performance comparable to leading models at less than half their size. An ultra-compact 35K-parameter variant, QC-GAN (Tiny), achieved a PESQ score of 3.23, surpassing conventional lightweight methods. Evaluation on the DNS-Challenge 3 dataset further confirmed its generalization to real-world conditions, with QC-GAN (Base) achieving the highest DNSMOS OVRL (2.73), BAK (3.79), and P.808 MOS (3.37) scores. Ablation studies demonstrated that quaternion representations improved phase reconstruction, reducing group delay error by approximately 9%.

Key takeaway

For Machine Learning Engineers developing lightweight speech enhancement solutions, QC-GAN presents a compelling architecture. You should consider integrating Quaternion Neural Networks and MetricGAN-based training to achieve high perceptual quality with significantly fewer parameters. This approach can reduce your model size by over 50% while maintaining or improving PESQ scores. Explore fused quaternion kernels or linear attention to further optimize inference latency for real-time CPU deployment.

Key insights

Quaternion Conformer GANs achieve high-fidelity speech enhancement with significantly fewer parameters by jointly modeling magnitude and phase.

Principles

Method

QC-GAN employs a Quaternion Conformer generator and a metric discriminator, optimizing a multi-task loss including differentiable PESQ.

In practice

Topics

Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by stat.ML updates on arXiv.org.