GP-Adapter: Gaussian Process CLIP-Adapter for Few-Shot Out-of-Distribution Detection

2026-06-08 · Source: cs.CV updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Emerging Technologies & Innovation · Depth: Expert, extended

Summary

GP-Adapter is a training-free framework augmenting CLIP with Gaussian Process (GP) uncertainty modeling for few-shot classification and out-of-distribution (OOD) detection. It constructs modality-specific, class-wise one-class GPs on frozen CLIP embeddings, using an RBF kernel for image features and a linear kernel for text prompts. This method requires no CLIP backbone fine-tuning, relying on a small K-shot cache and lightweight hyperparameter selection, with memory cost scaling as O(CK^2). Experiments on ImageNet-1k and multiple OOD benchmarks show GP-Adapter provides competitive few-shot performance and consistently improves OOD detection, especially when combined with prompt-learning baselines like CoOp and LoCoOp, using CLIP ViT-B/16 or ResNet-50 backbones.

Key takeaway

For Machine Learning Engineers developing robust vision systems with limited labeled data, GP-Adapter offers a training-free approach to enhance out-of-distribution detection. You can integrate this method with existing CLIP-based prompt-learning solutions to improve reliability and reduce overconfident predictions without costly fine-tuning. Consider applying GP-Adapter to medical imaging or industrial inspection tasks where OOD samples are critical.

Key insights

Integrating GP uncertainty with frozen CLIP enhances few-shot OOD detection without fine-tuning.

Principles

GP uncertainty improves OOD detection.
Modality-specific kernels are crucial.
Training-free adaptation is scalable.

Method

GP-Adapter builds class-wise one-class GPs on frozen CLIP image (RBF kernel) and text (linear kernel) embeddings. It fuses predictive means and variances, then applies a variance-aware Maximum Softmax Probability (MSP) for OOD scoring.

In practice

Use RBF kernel for image embeddings.
Apply linear kernel for text prompts.
Combine with prompt-learning for gains.

Topics

Out-of-Distribution Detection
Gaussian Processes
CLIP (Contrastive Language-Image Pre-training)
Few-Shot Learning
Uncertainty Quantification
Vision-Language Models

Code references

tms-byte/GP-Adapter

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CV updates on arXiv.org.