When CLIP Sees More, It Fights Back Harder: Multi-View Guided Adaptive Counterattacks for Test-Time Adversarial Robustness

· Source: Computer Vision and Pattern Recognition · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Computer Vision · Depth: Expert, quick

Summary

Multi-view guided Adaptive Counterattack (MAC) is a novel method designed to enhance the adversarial robustness of vision-language models like CLIP. Addressing the fragility of existing Test-time counterattack (TTC) under strong attacks, MAC introduces a corruption-aware soft weighting scheme for multi-view counterattacks. The process involves constructing augmented views of an input image to generate diverse embeddings, then refining these corrupted embeddings. MAC adaptively scales the counterattack intensity for each view based on its estimated corruption degree. Finally, these adaptively counterattacked views are aggregated to produce a robust final prediction. Experiments across 20 datasets and various attack scenarios demonstrate that MAC significantly improves robustness while maintaining high inference speed and memory efficiency, thanks to its tuning-free design.

Key takeaway

For Machine Learning Engineers deploying vision-language models like CLIP in security-sensitive applications, MAC offers a robust solution against adversarial perturbations. You should consider integrating multi-view guided adaptive counterattacks to significantly enhance your model's resilience. This tuning-free approach preserves high inference speed and memory efficiency, making it practical for production environments where strong attack scenarios are a concern.

Key insights

MAC improves CLIP's adversarial robustness by adaptively counterattacking multi-view image embeddings with corruption-aware weighting.

Principles

Method

MAC constructs augmented image views, refines their corrupted embeddings, adaptively scales counterattack intensity per view based on corruption, then aggregates views for robust prediction.

In practice

Topics

Code references

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.