Neuromorphic Speech Enhancement with Dual-Branch Spiking Neural Networks

· Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Audio and Speech Processing · Depth: Expert, quick

Summary

A novel dual-branch spiking neural network (SNN) architecture, GSU-DBNet, has been developed to advance neuromorphic speech enhancement. Published on 2026-06-22, this model addresses the performance gap between energy-efficient SNNs and traditional artificial neural networks (ANNs) by overcoming limitations like binary activations and architectural design. GSU-DBNet incorporates a gated spiking unit (GSU) and simultaneously processes both speech magnitude and complex spectra to predict corresponding masks. It further utilizes a dual-path GSU module to effectively exploit temporal and frequency information, enhancing spatiotemporal feature representation. Experimental results on a benchmark dataset demonstrate that GSU-DBNet achieves a PESQ score of 3.04 with only 394K parameters, surpassing existing SNN-based methods. Notably, it uses only 4.5% to 10.6% of the parameters required by representative ANN-based models, highlighting its efficiency.

Key takeaway

For Machine Learning Engineers developing speech enhancement solutions, GSU-DBNet offers a compelling alternative to traditional ANNs. If you prioritize energy efficiency and parameter reduction without sacrificing performance, consider integrating this dual-branch SNN. It achieves a PESQ score of 3.04 with only 394K parameters. This model uses 4.5%-10.6% of ANN parameters. This suggests a viable path for deploying high-quality speech enhancement on edge devices or in power-sensitive applications.

Key insights

GSU-DBNet, a dual-branch SNN with gated spiking units, significantly improves neuromorphic speech enhancement performance and efficiency over ANNs.

Principles

Method

GSU-DBNet simultaneously models speech magnitude and complex spectra using a dual-branch architecture. It predicts corresponding masks and employs a dual-path GSU module to exploit temporal and frequency information for enhanced spatiotemporal features.

In practice

Topics

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.