VPA-Guard: Defending and Benchmarking Image-to-Video Generation Against Visual Prompt Attacks

· Source: Takara TLDR - Daily AI Papers · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cybersecurity & Data Privacy · Depth: Expert, quick

Summary

VPA-Guard is a novel retrieval-augmented and self-evolving defense framework designed to protect Image-to-Video (I2V) generation models from visual prompt attacks. These attacks exploit visual cues like arrows or emojis in input images, which models interpret as temporal instructions, leading to harmful video outputs. To address the lack of relevant safety evaluations, the authors introduce VVA-Bench, the first systematic benchmark specifically for vision-centric prompt attacks. Experiments using VVA-Bench revealed that leading I2V models are highly vulnerable, with Wan 2.7 achieving a 100.0% Attack Success Rate (ASR) and Veo 3.1 reaching 74.8% ASR. VPA-Guard effectively mitigates these risks, reducing the attack ASR by 44.2% and the harmfulness score by 73.4% on average, while maintaining the models' utility for legitimate user edits. This work provides both a rigorous evaluation tool and an effective defense strategy for safer multimodal generation.

Key takeaway

For AI Security Engineers developing or deploying Image-to-Video (I2V) generation systems, you must prioritize defense against visual prompt attacks. Your current models, including Wan 2.7 and Veo 3.1, are likely highly vulnerable to implicit visual cues leading to harmful outputs. Implement robust defense frameworks like VPA-Guard, which significantly reduces attack success rates and harmfulness, to ensure the responsible and safe operation of your multimodal generation capabilities.

Key insights

Visual prompt attacks pose a severe threat to Image-to-Video models, necessitating specialized benchmarks like VVA-Bench and defenses like VPA-Guard.

Principles

Method

VPA-Guard is a retrieval-augmented, self-evolving defense framework that employs few-shot reasoning to identify latent malicious intents within visual prompts, thereby mitigating harmful video generation.

In practice

Topics

Best for: Computer Vision Engineer, Research Scientist, CTO, AI Scientist, AI Security Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.