Decoding Insect Song: A Multitask Semisupervised Orthoptera Bioacoustic Classifier

2026-06-12 · Source: cs.AI updates on arXiv.org · Field: Science & Research — Artificial Intelligence & Machine Learning, Life Sciences & Biology, Environmental Science & Earth Systems · Depth: Expert, extended

Summary

PULSE, a novel semi-supervised, multi-task framework, significantly advances Orthoptera bioacoustic classification for ecological inference. Designed to overcome the narrow training and non-transferability of existing passive acoustic monitoring tools, PULSE integrates weakly-supervised species classification, self-supervised learning on unlabelled field audio, and knowledge distillation from the BirdNET model. The framework was trained using publicly available labelled Orthoptera data and nearly 150 GB of unlabelled UK field recordings from 10 Oxfordshire sites. PULSE demonstrated superior performance over a leading general bioacoustic model, Perch 2.0, achieving a macro F1 of 0.21 versus 0.07, AUC of 0.74 versus 0.45, and AP of 0.32 versus 0.19. Active learning further boosted its F1 to 0.34 and AUC to 0.84. Beyond classification, PULSE's learned embeddings reveal ecologically meaningful structures and enable the unmixing of overlapping calls, providing vital information for conservation and habitat management.

Key takeaway

For Machine Learning Engineers developing bioacoustic classifiers in data-scarce environments, you should adopt multi-task, semi-supervised frameworks. This approach, exemplified by PULSE, significantly improves model performance and transferability compared to purely supervised methods. Integrate knowledge distillation from general models and self-supervised learning on unlabelled data to enhance adaptability. Additionally, consider active learning strategies to efficiently expand your labelled datasets, reducing manual annotation burden while boosting classification accuracy for ecological monitoring.

Key insights

Combining semi-supervised, multi-task learning, and knowledge distillation improves bioacoustic classification with limited labels.

Principles

Unsupervised tasks are crucial for scarce labelled bioacoustic data.
Multi-task training mitigates data scarcity and domain shift.
Distill general audio representations for downstream tasks.

Method

PULSE uses a VGGish backbone on mel spectrograms, optimizing three objectives: supervised classification with binary cross-entropy, BirdNET embedding matching via L2 loss, and BYOL for self-supervision. Losses are jointly optimized, and active learning labels field data.

In practice

Utilize VGGish backbones for audio spectrogram processing.
Implement time and frequency masking for spectrogram augmentation.
Apply NNLS on prototype vectors to unmix multi-species audio.

Topics

Multitask Learning
Semisupervised Learning
Orthoptera Classification
Bioacoustic Monitoring
Knowledge Distillation
Active Learning
Ecological Inference

Code references

mbsantiago/whombat

Best for: AI Scientist, Machine Learning Engineer, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.