Pushing the Frontier of Black-Box LVLM Attacks via Fine-Grained Detail Targeting
Summary
M-Attack-V2 is a new black-box adversarial attack method designed to improve transferability against Large Vision-Language Models (LVLMs). It addresses limitations in prior state-of-the-art transfer-based approaches like M-Attack, which suffered from high-variance gradients due to ViT translation sensitivity and structural asymmetry in image crops. M-Attack-V2 introduces Multi-Crop Alignment (MCA) to average gradients from multiple local views, reducing variance on the source side. It also incorporates Auxiliary Target Alignment (ATA), which uses a small auxiliary set for a smoother target manifold instead of aggressive augmentation. Additionally, it reinterprets momentum as Patch Momentum, replaying historical crop gradients, and uses a refined patch-size ensemble (PE+). These enhancements significantly boost attack success rates on frontier LVLMs, increasing success on Claude-4.0 from 8% to 30%, Gemini-2.5-Pro from 83% to 97%, and GPT-5 from 98% to 100%.
Key takeaway
For research scientists developing or evaluating LVLM security, M-Attack-V2 demonstrates that refining gradient stability and local alignment significantly enhances black-box attack transferability. You should consider integrating these gradient denoising and alignment techniques into your adversarial testing frameworks to more robustly assess model vulnerabilities, especially against advanced models like GPT-5 and Gemini-2.5-Pro.
Key insights
M-Attack-V2 improves black-box LVLM attacks by denoising gradients and refining local alignment for better transferability.
Principles
- High-variance gradients destabilize optimization.
- Averaging gradients from multiple views reduces variance.
- Smoother target manifolds improve attack transferability.
Method
M-Attack-V2 uses Multi-Crop Alignment (MCA) for source-side gradient averaging, Auxiliary Target Alignment (ATA) for target-side manifold smoothing, and Patch Momentum with a patch-size ensemble (PE+) to strengthen transferable directions.
In practice
- Apply MCA to reduce gradient variance.
- Use ATA for smoother target manifold generation.
- Implement Patch Momentum for historical gradient replay.
Topics
- Black-box Attacks
- Large Vision-Language Models
- Adversarial Machine Learning
- Gradient-based Attacks
- Transferability
Code references
Best for: Research Scientist, CTO, VP of Engineering/Data, AI Researcher, AI Scientist, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.