Who Pays the Price? Stakeholder-Centric Prompt Injection Benchmarking for Real-world Web Agents

2026-06-12 · Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cybersecurity & Data Privacy · Depth: Expert, extended

Summary

StakeBench is a new stakeholder-centric benchmark designed to evaluate prompt-injection security in large language model (LLM)-based web agents operating in real-world environments. It addresses limitations of existing attack-centric benchmarks by systematically categorizing and attributing harm to specific entities like users, sellers, and platforms. The benchmark decomposes attacks into 12 concrete objectives, realized through 22 templates across 12 product categories, generating 264 adversarial cases. Evaluated across 3,168 runs on NanoBrowser and BrowserUse agents with GPT-5 and Gemini-2.5-Flash LLMs, StakeBench revealed substantial, heterogeneous vulnerabilities. Indirect prompt injection achieved ASRs between 41.67% and 68.16%, with failures categorized into "stealthy parasitism," "misaligned disruption," and "compounded failure." The findings highlight that vulnerability is not a scalar property but a distribution of harm influenced by the affected stakeholder and LLM backbone.

Key takeaway

For AI Security Engineers deploying LLM-based web agents, you must move beyond aggregate attack-centric metrics. Your evaluation framework should adopt a stakeholder-centric approach, assessing distinct harm pathways for users, sellers, and platforms. This will reveal asymmetric vulnerabilities and varied failure modes, such as "stealthy parasitism" or "misaligned disruption," which are critical for comprehensive risk mitigation. Implement multi-axis evaluation (ASR, TDR, BIR) to understand the true distribution of harm and secure your systems effectively.

Key insights

Prompt injection risk in web agents is victim-dependent, necessitating multi-stakeholder and multi-metric evaluation for accurate assessment.

Principles

Prompt injection harm is victim-dependent and asymmetric.
Vulnerability profiles differ sharply across stakeholders.
Semantic alignment modulates failure type and success.

Method

StakeBench uses stakeholder-centric harm modeling, categorizing attacks by affected entity (User, Seller, Platform), decomposing into 12 objectives, and evaluating with ASR, TDR, and BIR metrics.

In practice

Categorize failures by stakeholder (User, Seller, Platform).
Evaluate visual content as an IPI attack surface.

Topics

Prompt Injection
Web Agents
LLM Security
Security Benchmarking
Stakeholder Analysis
Asymmetric Harm

Code references

Best for: Research Scientist, CTO, VP of Engineering/Data, AI Scientist, AI Security Engineer, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.