Lung-SRAD: Spectral-Aware Regularized Audio DASS with Dual-Axis Patch-Mix Contrastive Learning for Respiratory Sound Classification

2026-06-10 · Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, AI in Healthcare · Depth: Expert, quick

Summary

Lung-SRAD introduces a novel approach for Respiratory Sound Classification (RSC) that leverages State Space Models (SSMs) as an alternative to traditional CLS-token driven self-attention architectures like the Audio Spectrogram Transformer (AST). Existing AST models often exhibit low-pass filtering, diminishing sensitivity to localized abnormal respiratory patterns. Lung-SRAD addresses this by utilizing SSMs, which demonstrate stronger preservation of mid-to-high spatial-frequency components in intermediate representations. The method incorporates spectral-aware layer regularization via Gaussian convolution and proposes Dual-Axis Patch-Mix contrastive learning specifically for SSM-based audio models. This combined strategy achieved a 64.48% score on the ICBHI benchmark, surpassing the AST baseline by 5%. Code is publicly available.

Key takeaway

For Machine Learning Engineers developing respiratory sound classification systems, Lung-SRAD offers a compelling alternative to AST models. If your current models struggle with localized abnormal patterns due to low-pass filtering, you should investigate integrating State Space Models and spectral-aware regularization. This approach, which achieved 64.48% on ICBHI, suggests a path to significantly improve diagnostic accuracy by better preserving critical mid-to-high frequency audio details. Consider exploring the provided code to adapt these techniques.

Key insights

Lung-SRAD uses State Space Models and spectral-aware regularization to improve respiratory sound classification by preserving high-frequency details.

Principles

SSMs preserve mid-to-high spatial-frequencies.
Spectral-aware regularization enhances sensitivity.
Contrastive learning improves representation robustness.

Method

Lung-SRAD employs Distilled Audio State Space models, applies Gaussian convolution for spectral-aware layer regularization, and integrates Dual-Axis Patch-Mix contrastive learning for robust representation.

In practice

Use SSMs for high-frequency audio tasks.
Apply Gaussian convolution for spectral regularization.
Implement Dual-Axis Patch-Mix contrastive learning.

Topics

Respiratory Sound Classification
State Space Models
Contrastive Learning
Spectral-Aware Regularization
Audio Spectrogram Transformer
ICBHI Benchmark

Code references

RSC-Toolkit/Lung-SRAD

Best for: AI Scientist, Machine Learning Engineer, Research Scientist

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.