A frequency analysis of filterbank initialisation and noise augmentation for LEAF
Summary
A study investigated the adaptability and interpretability of the LEArnable Frontend (LEAF) in computer audition tasks, specifically focusing on its filterbank initialisation and noise augmentation effects. Researchers performed a detailed analysis across speech recognition (SR), speech emotion recognition (SER), acoustic scene classification (ASC), and bird activity detection (BAD) tasks. They found that LEAF's performance remains consistently high regardless of filterbank initialisation (Mel-scale, Bark-scale, linear), provided it covers the entire frequency spectrum, with minimal parameter adaptation. However, a filterbank initialised with all frequency bands equally (constant initialisation) showed lower performance despite adapting its center frequencies and bandwidths. Experiments with controlled frequency distributions, including bandpass filtering and low/high-passed noise, confirmed that LEAF filters exhibit a strong bias towards an S-shaped curve, even when information is absent or corrupted in those frequencies. The study used an EfficientNet-B0 backend and trained models for 50 epochs on an NVIDIA GeForce RTX 3090.
Key takeaway
For research scientists developing or deploying LEAF-based computer audition systems, you should recognize that LEAF's filterbank parameters are highly resistant to change, even when task-specific frequency information is limited or corrupted. This suggests that LEAF may not offer the promised interpretability through adaptive filterbanks. You might need to explore advanced training techniques like layer-adjusted learning rates or sharpness-aware optimization to encourage more meaningful filterbank adaptation, or consider if traditional, non-learnable filterbanks provide sufficient performance with less complexity.
Key insights
LEAF filterbanks show high inertia, resisting adaptation and converging to a fixed S-curve regardless of input frequency content.
Principles
- Full frequency spectrum coverage is critical for LEAF performance.
- LEAF's filterbank parameters exhibit strong inductive bias.
- Initialisation choice has minimal impact if full spectrum is covered.
Method
The study involved training LEAF with an EfficientNet-B0 backend on four CA tasks using Mel-scale, Bark-scale, linear, and constant filterbank initialisations. It also applied bandpass filtering and frequency-specific noise augmentations to the Speech Recognition dataset.
In practice
- Prioritize full frequency coverage in LEAF filterbank initialisation.
- Consider alternative optimization methods for LEAF parameter adaptation.
- Be aware of LEAF's inherent bias towards an S-shaped frequency curve.
Topics
- LEArnable Frontend
- Filterbank Initialization
- Computer Audition Tasks
- Deep Learning Training Dynamics
- Frequency Analysis
Code references
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Machine learning : nature.com subject feeds.