GP-Adapter: Gaussian Process CLIP-Adapter for Few-Shot Out-of-Distribution Detection
Summary
GP-Adapter is a training-free framework augmenting CLIP with Gaussian Process (GP) uncertainty modeling for few-shot classification and out-of-distribution (OOD) detection. It constructs modality-specific, class-wise one-class GPs on frozen CLIP embeddings, using an RBF kernel for image features and a linear kernel for text prompts. This method requires no CLIP backbone fine-tuning, relying on a small K-shot cache and lightweight hyperparameter selection, with memory cost scaling as O(CK^2). Experiments on ImageNet-1k and multiple OOD benchmarks show GP-Adapter provides competitive few-shot performance and consistently improves OOD detection, especially when combined with prompt-learning baselines like CoOp and LoCoOp, using CLIP ViT-B/16 or ResNet-50 backbones.
Key takeaway
For Machine Learning Engineers developing robust vision systems with limited labeled data, GP-Adapter offers a training-free approach to enhance out-of-distribution detection. You can integrate this method with existing CLIP-based prompt-learning solutions to improve reliability and reduce overconfident predictions without costly fine-tuning. Consider applying GP-Adapter to medical imaging or industrial inspection tasks where OOD samples are critical.
Key insights
Integrating GP uncertainty with frozen CLIP enhances few-shot OOD detection without fine-tuning.
Principles
- GP uncertainty improves OOD detection.
- Modality-specific kernels are crucial.
- Training-free adaptation is scalable.
Method
GP-Adapter builds class-wise one-class GPs on frozen CLIP image (RBF kernel) and text (linear kernel) embeddings. It fuses predictive means and variances, then applies a variance-aware Maximum Softmax Probability (MSP) for OOD scoring.
In practice
- Use RBF kernel for image embeddings.
- Apply linear kernel for text prompts.
- Combine with prompt-learning for gains.
Topics
- Out-of-Distribution Detection
- Gaussian Processes
- CLIP (Contrastive Language-Image Pre-training)
- Few-Shot Learning
- Uncertainty Quantification
- Vision-Language Models
Code references
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CV updates on arXiv.org.