Does AI Reviewer See the Full Picture? Attacking and Defending Multimodal Peer Review
Summary
PaperGuard introduces the first comprehensive benchmark and defense framework to address significant adversarial manipulation risks in AI-assisted scientific peer review, particularly for Multimodal LLMs (MLLMs). Current robustness studies are limited to text, overlooking figures which convey core evidence in scientific papers. PaperGuard's framework comprises a new multimodal peer-review dataset spanning various scientific domains, a unified suite of attacks including black-box prompt injections and white-box perturbations targeting both text (GCG) and figures (PGD), and a practical defense utilizing chunk-based embedding search to efficiently localize and mitigate harmful instructions in long academic papers. Extensive experiments across state-of-the-art models confirm that AI reviewers are pervasively vulnerable to these domain-specific, cross-modal attacks. PaperGuard establishes foundational protocols and an actionable defense for trustworthy, attack-resilient AI scholarly reviewing.
Key takeaway
For AI Security Engineers developing or deploying AI-assisted peer review systems, you must prioritize defenses against sophisticated cross-modal adversarial attacks. Your current text-only robustness evaluations are insufficient, as figures are critical attack vectors. Implement chunk-based embedding search or similar long-context mitigation strategies to protect against targeted manipulations that could compromise review integrity. Proactively integrate multimodal datasets and attack suites like PaperGuard's into your development lifecycle to ensure system resilience.
Key insights
AI peer review is highly vulnerable to targeted, cross-modal adversarial attacks, necessitating specialized defenses beyond standard jailbreaking.
Principles
- AI peer review faces distinct cross-modal attack vectors.
- Robustness studies must extend beyond text-only analysis.
- Targeted attacks differ from general safety policy violations.
Method
PaperGuard's defense uses chunk-based embedding search to localize and mitigate harmful instructions within long academic papers, addressing the long-context challenge.
In practice
- Evaluate AI reviewers against cross-modal attacks.
- Implement chunk-based embedding search for defense.
- Develop multimodal peer-review datasets.
Topics
- Multimodal LLMs
- Peer Review Automation
- Adversarial Attacks
- Cross-Modal Robustness
- PaperGuard Benchmark
- Prompt Injection
Best for: CTO, VP of Engineering/Data, Director of AI/ML, AI Scientist, AI Security Engineer, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.