Who Pays the Price? Stakeholder-Centric Prompt Injection Benchmarking for Real-world Web Agents
Summary
StakeBench is a new stakeholder-centric benchmark designed to evaluate prompt-injection security in large language model (LLM)-based web agents operating in real-world environments. It addresses limitations of existing attack-centric benchmarks by systematically categorizing and attributing harm to specific entities like users, sellers, and platforms. The benchmark decomposes attacks into 12 concrete objectives, realized through 22 templates across 12 product categories, generating 264 adversarial cases. Evaluated across 3,168 runs on NanoBrowser and BrowserUse agents with GPT-5 and Gemini-2.5-Flash LLMs, StakeBench revealed substantial, heterogeneous vulnerabilities. Indirect prompt injection achieved ASRs between 41.67% and 68.16%, with failures categorized into "stealthy parasitism," "misaligned disruption," and "compounded failure." The findings highlight that vulnerability is not a scalar property but a distribution of harm influenced by the affected stakeholder and LLM backbone.
Key takeaway
For AI Security Engineers deploying LLM-based web agents, you must move beyond aggregate attack-centric metrics. Your evaluation framework should adopt a stakeholder-centric approach, assessing distinct harm pathways for users, sellers, and platforms. This will reveal asymmetric vulnerabilities and varied failure modes, such as "stealthy parasitism" or "misaligned disruption," which are critical for comprehensive risk mitigation. Implement multi-axis evaluation (ASR, TDR, BIR) to understand the true distribution of harm and secure your systems effectively.
Key insights
Prompt injection risk in web agents is victim-dependent, necessitating multi-stakeholder and multi-metric evaluation for accurate assessment.
Principles
- Prompt injection harm is victim-dependent and asymmetric.
- Vulnerability profiles differ sharply across stakeholders.
- Semantic alignment modulates failure type and success.
Method
StakeBench uses stakeholder-centric harm modeling, categorizing attacks by affected entity (User, Seller, Platform), decomposing into 12 objectives, and evaluating with ASR, TDR, and BIR metrics.
In practice
- Categorize failures by stakeholder (User, Seller, Platform).
- Evaluate visual content as an IPI attack surface.
Topics
- Prompt Injection
- Web Agents
- LLM Security
- Security Benchmarking
- Stakeholder Analysis
- Asymmetric Harm
Code references
Best for: Research Scientist, CTO, VP of Engineering/Data, AI Scientist, AI Security Engineer, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.