ZEBRA: Zero-Shot Entropy-Regularized Prompt Learning for Base-to-Novel Generalization in Audio-Language Models
Summary
ZEBRA, a novel plug-and-play framework, addresses a critical base-to-novel generalization gap observed in Audio-Language Models (ALMs) using prompt learning. While prompt learning enhances accuracy on base classes through few-shot supervised adaptation, it often degrades performance on novel classes, sometimes falling below zero-shot accuracy. ZEBRA tackles this by fusing zero-shot logits with prompt-learning logits and applying self-entropy regularization. This regularization technique specifically aims to reduce overfitting to base classes. Experimental results across multiple audio classification datasets demonstrate that ZEBRA consistently improves novel-class performance while maintaining strong base accuracy, effectively narrowing the generalization gap compared to standard prompt learning approaches. The framework's code is publicly available.
Key takeaway
For AI Scientists developing Audio-Language Models, if you are encountering a performance drop on novel classes when using prompt learning, ZEBRA offers a solution. You should consider integrating this plug-and-play framework to fuse zero-shot and prompt-learning logits, leveraging self-entropy regularization. This approach can significantly improve your model's generalization to unseen categories while preserving base class accuracy.
Key insights
Prompt learning in ALMs creates a base-to-novel generalization gap; ZEBRA mitigates this by fusing logits and entropy regularization.
Principles
- Prompt learning can degrade novel class performance.
- Self-entropy regularization reduces base class overfitting.
Method
ZEBRA fuses zero-shot and prompt-learning logits, then applies self-entropy regularization to prevent overfitting to base classes, improving novel-class generalization.
In practice
- Apply ZEBRA to improve ALM novel-class generalization.
- Integrate zero-shot and prompt-learning logits.
Topics
- Audio-Language Models
- Prompt Learning
- Zero-Shot Learning
- Generalization Gap
- Entropy Regularization
- Audio Classification
Code references
Best for: Research Scientist, AI Scientist, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.