Calibrating Uncertainty for Zero-Shot Adversarial CLIP
Summary
A new method, Uncertainty-Calibrated Adversarial fine-Tuning (UCAT), addresses the critical issue of CLIP's vulnerability to adversarial attacks, which cause both accuracy degradation and unreliable over-confidence by suppressing predictive uncertainty. UCAT reformulates CLIP's logits as concentration parameters of a Dirichlet distribution, creating a unified representation that captures both relative semantic structure and predictive confidence magnitude. This allows for a novel adversarial fine-tuning objective that holistically aligns these Dirichlet distributions between clean and perturbed samples. Extensive experiments across 16 single-label benchmarks and the multi-label MS-COCO dataset demonstrate UCAT's effectiveness. It consistently restores calibrated uncertainty, achieves competitive adversarial robustness, and maintains high clean accuracy, often ranking best or second-best, even against strong attacks like AutoAttack with ε=2/255. The approach also shows stable performance across varying regularization strengths and generalizes to different CLIP backbones.
Key takeaway
For AI Engineers developing robust vision-language models, you should consider integrating uncertainty calibration into your adversarial fine-tuning pipelines. Traditional methods often overlook the miscalibration caused by adversarial attacks, leading to spuriously confident predictions. By reparameterizing CLIP logits as Dirichlet distributions and aligning these distributions between clean and adversarial samples, you can significantly improve both adversarial robustness and the reliability of uncertainty estimates, especially in zero-shot settings. This approach ensures more trustworthy model behavior under attack.
Key insights
Adversarial attacks on CLIP suppress uncertainty, leading to over-confident misclassifications; calibrating this is crucial for reliability.
Principles
- Predictive uncertainty should increase with input difficulty or distributional shift.
- Dirichlet distributions can model both inter-class relations and evidence strength.
- Aligning clean and adversarial Dirichlet distributions improves robustness.
Method
UCAT reparameterizes CLIP logits as Dirichlet concentration parameters. It then uses a joint objective combining text-guided cross-entropy loss with KL divergence to align clean and adversarial Dirichlet distributions.
In practice
- Use Dirichlet parameterization for CLIP logits to estimate uncertainty.
- Apply KL divergence to align clean and adversarial distributions.
- Tune calibration coefficient τ' for confidence sharpness.
Topics
- CLIP
- Adversarial Robustness
- Uncertainty Calibration
- Dirichlet Distribution
- Zero-Shot Learning
- Vision-Language Models
Best for: Computer Vision Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, AI Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CV updates on arXiv.org.