Pretrained Neural Audio Models for Asthma Detection from Voice and Speech

2026-04-12 · Source: Paper Index on ACL Anthology · Field: Health & Wellbeing — Artificial Intelligence & Machine Learning, Health & Medical Research, Medical Devices & Health Technology · Depth: Expert, medium

Summary

Researchers investigated the use of pretrained neural audio models for detecting asthma from short mobile-recorded Brazilian Portuguese voice and speech audio. The study utilized transfer learning with convolutional architectures (PANNs) trained on large-scale audio datasets. Two recording types were evaluated: sustained vowel phonation and read speech, with models trained for a binary classification task and assessed at both segment and patient levels. Read speech demonstrated superior performance compared to sustained vowels. The optimal configuration, CNN14 on speech, achieved a 0.85 patient-level balanced accuracy, a ROC-AUC of 0.93, and a PR-AUC of 0.98, performing similarly to CNN10. Fine-tuning pretrained models outperformed training from scratch, highlighting the benefit of pretraining with limited data. The study also noted performance variations across different age groups, indicating demographic sensitivity.

Key takeaway

For NLP engineers developing biomedical diagnostic tools, consider integrating pretrained neural audio models for conditions like asthma. Your team should prioritize using read speech over sustained vowel phonation for data collection, as it yields significantly better classification performance. Leveraging transfer learning with models like CNN14 can achieve high accuracy (0.85 balanced accuracy) even with limited disease-specific audio data, accelerating development and deployment of mobile-based diagnostic applications.

Key insights

Pretrained neural audio models can effectively detect asthma from mobile-recorded speech, outperforming scratch training.

Principles

Transfer learning improves performance with limited data.
Read speech provides more diagnostic cues than sustained vowels.

Method

The method involves fine-tuning pretrained convolutional neural networks (PANNs) on mobile-recorded Brazilian Portuguese audio, specifically read speech, for binary asthma classification, evaluated at patient and segment levels.

In practice

Prioritize read speech over sustained vowels for audio analysis.
Utilize pretrained models for biomedical audio tasks.

Topics

Asthma Detection
Neural Audio Models
Transfer Learning
Speech Biomarkers
PANNs

Best for: NLP Engineer, AI Scientist, Machine Learning Engineer, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Paper Index on ACL Anthology.