SS-TPT: Stability and Suitability-Guided Test-Time Prompt Tuning for Adversarially Robust Vision-Language Models

2026-06-08 · Source: cs.CV updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cybersecurity & Data Privacy · Depth: Expert, quick

Summary

Stability and Suitability-guided Test-time Prompt Tuning (SS-TPT) addresses the fragility of vision-language models (VLMs) like CLIP against adversarial perturbations, a common issue with existing test-time adaptation defenses that incur significant slowdowns. SS-TPT improves robustness and throughput by evaluating the quality of each augmented view using two complementary scores: stability, which measures prediction invariance to weak augmentations, and suitability, assessing feature-space density among views. These SS scores guide both adaptation and inference through an SS-guided consistency loss and an SS-weighted prediction, effectively amplifying trustworthy views while suppressing corrupted ones. Experiments show SS-TPT significantly outperforms prior state-of-the-art methods, achieving superior robustness-throughput trade-offs across diverse datasets and varying numbers of views, demonstrating strong practicality and generality.

Key takeaway

For machine learning engineers deploying vision-language models in security-sensitive applications, SS-TPT offers a practical solution to enhance adversarial robustness without sacrificing inference speed. You should consider integrating SS-TPT to improve model resilience against perturbations, leveraging its stability and suitability scores to dynamically filter augmented views. This approach allows you to achieve superior robustness-throughput trade-offs, making your VLM deployments more reliable and efficient in real-world scenarios.

Key insights

SS-TPT enhances VLM adversarial robustness by dynamically weighting augmented views based on prediction stability and feature suitability.

Principles

Evaluate augmented view quality.
Prioritize stable and dense views.
Balance robustness and throughput.

Method

SS-TPT uses stability and suitability scores to guide adaptation via an SS-guided consistency loss and inference through an SS-weighted prediction, amplifying trustworthy views.

In practice

Improve VLM robustness under attack.
Optimize robustness-throughput trade-offs.
Apply to diverse datasets.

Topics

Vision-Language Models
Adversarial Robustness
Test-Time Adaptation
Prompt Tuning
CLIP
Model Efficiency

Code references

sunoh-kim/SS-TPT

Best for: Computer Vision Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, AI Security Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CV updates on arXiv.org.