Pretrained Neural Audio Models for Asthma Detection from Voice and Speech
Summary
Researchers investigated the use of pretrained neural audio models for detecting asthma from short mobile-recorded Brazilian Portuguese voice and speech audio. The study utilized transfer learning with convolutional architectures (PANNs) trained on large-scale audio datasets. Two recording types were evaluated: sustained vowel phonation and read speech, with models trained for a binary classification task and assessed at both segment and patient levels. Read speech demonstrated superior performance compared to sustained vowels. The optimal configuration, CNN14 on speech, achieved a 0.85 patient-level balanced accuracy, a ROC-AUC of 0.93, and a PR-AUC of 0.98, performing similarly to CNN10. Fine-tuning pretrained models outperformed training from scratch, highlighting the benefit of pretraining with limited data. The study also noted performance variations across different age groups, indicating demographic sensitivity.
Key takeaway
For NLP engineers developing biomedical diagnostic tools, consider integrating pretrained neural audio models for conditions like asthma. Your team should prioritize using read speech over sustained vowel phonation for data collection, as it yields significantly better classification performance. Leveraging transfer learning with models like CNN14 can achieve high accuracy (0.85 balanced accuracy) even with limited disease-specific audio data, accelerating development and deployment of mobile-based diagnostic applications.
Key insights
Pretrained neural audio models can effectively detect asthma from mobile-recorded speech, outperforming scratch training.
Principles
- Transfer learning improves performance with limited data.
- Read speech provides more diagnostic cues than sustained vowels.
Method
The method involves fine-tuning pretrained convolutional neural networks (PANNs) on mobile-recorded Brazilian Portuguese audio, specifically read speech, for binary asthma classification, evaluated at patient and segment levels.
In practice
- Prioritize read speech over sustained vowels for audio analysis.
- Utilize pretrained models for biomedical audio tasks.
Topics
- Asthma Detection
- Neural Audio Models
- Transfer Learning
- Speech Biomarkers
- PANNs
Best for: NLP Engineer, AI Scientist, Machine Learning Engineer, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Paper Index on ACL Anthology.