Multi-Label Test-Time Adaptation with Bayesian Conditional Priors

· Source: Computer Vision and Pattern Recognition · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Computer Vision · Depth: Expert, quick

Summary

Bayesian Conditional Priors (BCP) Estimation is a novel, gradient-free test-time adaptation method designed for multi-label recognition using frozen Vision-Language Models (VLMs). It addresses the issue of standard zero-shot inference's brittleness under distribution shift, where independent label scoring often produces incoherent label sets and suppresses weaker but compatible labels. BCP injects label dependency without modifying the VLM backbone by treating zero-shot logits as marginal posteriors and correcting for mismatched label priors. For each test image, it selects a high-confidence anchor label and applies a closed-form Bayesian refinement in logit space, which explicitly promotes compatible labels and suppresses incompatible ones, interpretable as pointwise mutual information. BCP estimates anchor-conditioned priors online from unlabeled test streams using lightweight second-order co-occurrence statistics, adding negligible overhead. It significantly outperforms strong TTA baselines, improving RN50 average mAP from 57.31 to 69.22 and ViT-B/16 from 62.61 to 71.79.

Key takeaway

For Machine Learning Engineers deploying Vision-Language Models for multi-label recognition, especially under distribution shift, Bayesian Conditional Priors (BCP) Estimation offers a compelling solution. You should consider integrating BCP to inject crucial label dependency without costly backbone tuning or target annotations. This method significantly improves performance, as demonstrated by mAP gains on RN50 and ViT-B/16, enhancing label coherence and robustness with negligible computational overhead.

Key insights

BCP enhances multi-label VLM recognition under distribution shift by injecting label dependency via anchor-conditioned Bayesian refinement.

Principles

Method

BCP estimates anchor-conditioned priors online from unlabeled test streams using second-order co-occurrence statistics. It then applies a closed-form Bayesian refinement in logit space based on a high-confidence anchor label.

In practice

Topics

Best for: Research Scientist, Computer Vision Engineer, AI Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.