Does AI Reviewer See the Full Picture? Attacking and Defending Multimodal Peer Review

2026-06-10 · Source: Computation and Language · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cybersecurity & Data Privacy · Depth: Expert, quick

Summary

PaperGuard introduces the first comprehensive benchmark and defense framework to address significant adversarial manipulation risks in AI-assisted scientific peer review, particularly for Multimodal LLMs (MLLMs). Current robustness studies are limited to text, overlooking figures which convey core evidence in scientific papers. PaperGuard's framework comprises a new multimodal peer-review dataset spanning various scientific domains, a unified suite of attacks including black-box prompt injections and white-box perturbations targeting both text (GCG) and figures (PGD), and a practical defense utilizing chunk-based embedding search to efficiently localize and mitigate harmful instructions in long academic papers. Extensive experiments across state-of-the-art models confirm that AI reviewers are pervasively vulnerable to these domain-specific, cross-modal attacks. PaperGuard establishes foundational protocols and an actionable defense for trustworthy, attack-resilient AI scholarly reviewing.

Key takeaway

For AI Security Engineers developing or deploying AI-assisted peer review systems, you must prioritize defenses against sophisticated cross-modal adversarial attacks. Your current text-only robustness evaluations are insufficient, as figures are critical attack vectors. Implement chunk-based embedding search or similar long-context mitigation strategies to protect against targeted manipulations that could compromise review integrity. Proactively integrate multimodal datasets and attack suites like PaperGuard's into your development lifecycle to ensure system resilience.

Key insights

AI peer review is highly vulnerable to targeted, cross-modal adversarial attacks, necessitating specialized defenses beyond standard jailbreaking.

Principles

AI peer review faces distinct cross-modal attack vectors.
Robustness studies must extend beyond text-only analysis.
Targeted attacks differ from general safety policy violations.

Method

PaperGuard's defense uses chunk-based embedding search to localize and mitigate harmful instructions within long academic papers, addressing the long-context challenge.

In practice

Evaluate AI reviewers against cross-modal attacks.
Implement chunk-based embedding search for defense.
Develop multimodal peer-review datasets.

Topics

Multimodal LLMs
Peer Review Automation
Adversarial Attacks
Cross-Modal Robustness
PaperGuard Benchmark
Prompt Injection

Best for: CTO, VP of Engineering/Data, Director of AI/ML, AI Scientist, AI Security Engineer, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.