Lung-SRAD: Spectral-Aware Regularized Audio DASS with Dual-Axis Patch-Mix Contrastive Learning for Respiratory Sound Classification
Summary
Lung-SRAD introduces a novel approach for Respiratory Sound Classification (RSC) that leverages State Space Models (SSMs) as an alternative to traditional CLS-token driven self-attention architectures like the Audio Spectrogram Transformer (AST). Existing AST models often exhibit low-pass filtering, diminishing sensitivity to localized abnormal respiratory patterns. Lung-SRAD addresses this by utilizing SSMs, which demonstrate stronger preservation of mid-to-high spatial-frequency components in intermediate representations. The method incorporates spectral-aware layer regularization via Gaussian convolution and proposes Dual-Axis Patch-Mix contrastive learning specifically for SSM-based audio models. This combined strategy achieved a 64.48% score on the ICBHI benchmark, surpassing the AST baseline by 5%. Code is publicly available.
Key takeaway
For Machine Learning Engineers developing respiratory sound classification systems, Lung-SRAD offers a compelling alternative to AST models. If your current models struggle with localized abnormal patterns due to low-pass filtering, you should investigate integrating State Space Models and spectral-aware regularization. This approach, which achieved 64.48% on ICBHI, suggests a path to significantly improve diagnostic accuracy by better preserving critical mid-to-high frequency audio details. Consider exploring the provided code to adapt these techniques.
Key insights
Lung-SRAD uses State Space Models and spectral-aware regularization to improve respiratory sound classification by preserving high-frequency details.
Principles
- SSMs preserve mid-to-high spatial-frequencies.
- Spectral-aware regularization enhances sensitivity.
- Contrastive learning improves representation robustness.
Method
Lung-SRAD employs Distilled Audio State Space models, applies Gaussian convolution for spectral-aware layer regularization, and integrates Dual-Axis Patch-Mix contrastive learning for robust representation.
In practice
- Use SSMs for high-frequency audio tasks.
- Apply Gaussian convolution for spectral regularization.
- Implement Dual-Axis Patch-Mix contrastive learning.
Topics
- Respiratory Sound Classification
- State Space Models
- Contrastive Learning
- Spectral-Aware Regularization
- Audio Spectrogram Transformer
- ICBHI Benchmark
Code references
Best for: AI Scientist, Machine Learning Engineer, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.