Decoding Insect Song: A Multitask Semisupervised Orthoptera Bioacoustic Classifier
Summary
PULSE, a novel semi-supervised, multi-task framework, significantly advances Orthoptera bioacoustic classification for ecological inference. Designed to overcome the narrow training and non-transferability of existing passive acoustic monitoring tools, PULSE integrates weakly-supervised species classification, self-supervised learning on unlabelled field audio, and knowledge distillation from the BirdNET model. The framework was trained using publicly available labelled Orthoptera data and nearly 150 GB of unlabelled UK field recordings from 10 Oxfordshire sites. PULSE demonstrated superior performance over a leading general bioacoustic model, Perch 2.0, achieving a macro F1 of 0.21 versus 0.07, AUC of 0.74 versus 0.45, and AP of 0.32 versus 0.19. Active learning further boosted its F1 to 0.34 and AUC to 0.84. Beyond classification, PULSE's learned embeddings reveal ecologically meaningful structures and enable the unmixing of overlapping calls, providing vital information for conservation and habitat management.
Key takeaway
For Machine Learning Engineers developing bioacoustic classifiers in data-scarce environments, you should adopt multi-task, semi-supervised frameworks. This approach, exemplified by PULSE, significantly improves model performance and transferability compared to purely supervised methods. Integrate knowledge distillation from general models and self-supervised learning on unlabelled data to enhance adaptability. Additionally, consider active learning strategies to efficiently expand your labelled datasets, reducing manual annotation burden while boosting classification accuracy for ecological monitoring.
Key insights
Combining semi-supervised, multi-task learning, and knowledge distillation improves bioacoustic classification with limited labels.
Principles
- Unsupervised tasks are crucial for scarce labelled bioacoustic data.
- Multi-task training mitigates data scarcity and domain shift.
- Distill general audio representations for downstream tasks.
Method
PULSE uses a VGGish backbone on mel spectrograms, optimizing three objectives: supervised classification with binary cross-entropy, BirdNET embedding matching via L2 loss, and BYOL for self-supervision. Losses are jointly optimized, and active learning labels field data.
In practice
- Utilize VGGish backbones for audio spectrogram processing.
- Implement time and frequency masking for spectrogram augmentation.
- Apply NNLS on prototype vectors to unmix multi-species audio.
Topics
- Multitask Learning
- Semisupervised Learning
- Orthoptera Classification
- Bioacoustic Monitoring
- Knowledge Distillation
- Active Learning
- Ecological Inference
Code references
Best for: AI Scientist, Machine Learning Engineer, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.