Calibrating Uncertainty for Zero-Shot Adversarial CLIP

2026-06-08 · Source: cs.CV updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, extended

Summary

A new method, Uncertainty-Calibrated Adversarial fine-Tuning (UCAT), addresses the critical issue of CLIP's vulnerability to adversarial attacks, which cause both accuracy degradation and unreliable over-confidence by suppressing predictive uncertainty. UCAT reformulates CLIP's logits as concentration parameters of a Dirichlet distribution, creating a unified representation that captures both relative semantic structure and predictive confidence magnitude. This allows for a novel adversarial fine-tuning objective that holistically aligns these Dirichlet distributions between clean and perturbed samples. Extensive experiments across 16 single-label benchmarks and the multi-label MS-COCO dataset demonstrate UCAT's effectiveness. It consistently restores calibrated uncertainty, achieves competitive adversarial robustness, and maintains high clean accuracy, often ranking best or second-best, even against strong attacks like AutoAttack with ε=2/255. The approach also shows stable performance across varying regularization strengths and generalizes to different CLIP backbones.

Key takeaway

For AI Engineers developing robust vision-language models, you should consider integrating uncertainty calibration into your adversarial fine-tuning pipelines. Traditional methods often overlook the miscalibration caused by adversarial attacks, leading to spuriously confident predictions. By reparameterizing CLIP logits as Dirichlet distributions and aligning these distributions between clean and adversarial samples, you can significantly improve both adversarial robustness and the reliability of uncertainty estimates, especially in zero-shot settings. This approach ensures more trustworthy model behavior under attack.

Key insights

Adversarial attacks on CLIP suppress uncertainty, leading to over-confident misclassifications; calibrating this is crucial for reliability.

Principles

Predictive uncertainty should increase with input difficulty or distributional shift.
Dirichlet distributions can model both inter-class relations and evidence strength.
Aligning clean and adversarial Dirichlet distributions improves robustness.

Method

UCAT reparameterizes CLIP logits as Dirichlet concentration parameters. It then uses a joint objective combining text-guided cross-entropy loss with KL divergence to align clean and adversarial Dirichlet distributions.

In practice

Use Dirichlet parameterization for CLIP logits to estimate uncertainty.
Apply KL divergence to align clean and adversarial distributions.
Tune calibration coefficient τ' for confidence sharpness.

Topics

CLIP
Adversarial Robustness
Uncertainty Calibration
Dirichlet Distribution
Zero-Shot Learning
Vision-Language Models

Best for: Computer Vision Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, AI Engineer

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CV updates on arXiv.org.