GP-Adapter: Gaussian Process CLIP-Adapter for Few-Shot Out-of-Distribution Detection

· Source: Computer Vision and Pattern Recognition · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Computer Vision & Pattern Recognition · Depth: Expert, quick

Summary

GP-Adapter is a novel, training-free framework that enhances CLIP (Contrastive Language-Image Pre-training) with Gaussian Process (GP) uncertainty modeling. Designed for few-shot classification and out-of-distribution (OOD) detection, it addresses CLIP's limitation of providing only deterministic similarity scores, which offers insufficient uncertainty information in low-data or shifted distribution scenarios. GP-Adapter builds modality-specific, class-wise one-class GPs on frozen CLIP embeddings, utilizing an RBF kernel for image features and a linear kernel for text prompts. It then fuses these predictive statistics to generate a variance-aware confidence score for OOD detection. The method requires no CLIP backbone fine-tuning, operating with a small K-shot cache and lightweight hyperparameter selection, incurring a memory cost of O(CK^2). Experiments on ImageNet and various OOD benchmarks demonstrate competitive few-shot performance and improved OOD detection, particularly when integrated with prompt-learning baselines. The framework was published on 2026-06-05.

Key takeaway

For Machine Learning Engineers developing robust vision systems in low-data or distribution-shifted environments, GP-Adapter offers a compelling solution. You should consider integrating this training-free framework with your CLIP-based models to gain critical uncertainty information. This approach improves few-shot classification and out-of-distribution detection without fine-tuning the CLIP backbone, enhancing model reliability and decision-making in challenging real-world applications.

Key insights

GP-Adapter augments CLIP with Gaussian Processes for training-free, uncertainty-aware few-shot OOD detection, improving reliability in low-data settings.

Principles

Method

Construct modality-specific, class-wise one-class Gaussian Processes on frozen CLIP embeddings using RBF for images and linear for text. Fuse predictive statistics to generate variance-aware confidence scores for OOD detection with a K-shot cache.

In practice

Topics

Code references

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.