Scaling few-shot spoken word classification with generative meta-continual learning
Summary
The Generative Meta-Continual Learning (GeMCL) algorithm demonstrates significant potential for scaling few-shot spoken word classification, enabling a model to sequentially learn to distinguish between 1000 classes with only five shots per class. Trained on approximately 477 hours of labelled data from the Multilingual Spoken Words Corpus (MSWC), GeMCL achieved accuracy within 3% of the HuBERT base model with a repeatedly trained classifier head (CH), a more practically viable baseline. Crucially, GeMCL adapts 2000 times faster than HuBERT baselines, requiring no retraining for new words, only closed-form updates to class statistics. It also exhibits exceptionally stable per-word performance, unlike the unstable HuBERT baselines, making it more predictable for real-world deployment in scenarios requiring large-scale, continually evolving keyword spotting.
Key takeaway
For Machine Learning Engineers developing scalable, continually learning spoken word classification systems, GeMCL offers a compelling alternative to finetuning large pre-trained models. You should consider GeMCL for applications requiring rapid adaptation to new classes (2000x faster) and highly stable per-word accuracy, especially when dealing with up to 1000 classes and limited training data. This approach minimizes retraining overhead and enhances deployment predictability.
Key insights
GeMCL scales few-shot spoken word classification to 1000 classes with high stability and rapid adaptation.
Principles
- Meta-continual learning prevents catastrophic forgetting.
- Generative classifiers enable closed-form updates.
- Stability is key for real-world continual learning.
Method
GeMCL uses an encoder and a generative classifier modeling class distributions as Gaussians. It updates Normal-Gamma parameters in closed form via Bayes' rule for class-specific parameters, optimized through meta-training on N-way-K-shot episodes.
In practice
- Deploy GeMCL for dynamic keyword spotting systems.
- Use GeMCL for data labeling in low-resource languages.
- Prioritize GeMCL for edge devices needing rapid adaptation.
Topics
- Few-shot Learning
- Spoken Word Classification
- Continual Learning
- Meta-Continual Learning
- GeMCL Algorithm
- HuBERT Model
- Keyword Spotting
Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.