LivePI: More Realistic Benchmarking of Agents Against Indirect Prompt Injectio

· Source: Takara TLDR - Daily AI Papers · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cybersecurity & Data Privacy · Depth: Expert, quick

Summary

LivePI (Live Prompt Injection) is a new structured benchmark designed to evaluate indirect prompt injection (IPI) risks in AI agents operating in production-like, test-controlled environments. IPI occurs when agents like OpenClaw execute harmful instructions embedded in untrusted inputs such as emails, downloaded files, or webpages. LivePI assesses IPI risk across seven input surfaces, twelve attack/rendering families, and five malicious goals, including protected-information exfiltration and unauthorized security-control changes. The benchmark operates on a real virtual machine with live, test-controlled interfaces for email, chat, web, local files, repositories, and cryptocurrency wallets. Evaluations of GPT-5.3-Codex, Claude Opus 4.6, Gemini 3.1 Pro, Kimi K2.5, and GLM-5 revealed total attack success rates between 10.7% and 29.6%, with group-chat injection being uniformly successful. A two-layer defense, combining prompt-level filtering and pre-execution tool-call authorization, successfully intercepted all malicious-goal completions for GPT-5.3-Codex in LivePI, while maintaining benign utility.

Key takeaway

For AI/ML Directors deploying agents with external tool access, you must rigorously evaluate indirect prompt injection (IPI) risks using realistic benchmarks like LivePI. Your teams should prioritize implementing robust, multi-layered defenses, including prompt filtering and pre-execution tool-call authorization, especially for high-risk channels like group chat and repository links, to prevent data exfiltration and unauthorized actions.

Key insights

LivePI benchmarks indirect prompt injection risks in AI agents across diverse attack vectors and malicious goals.

Principles

Method

LivePI uses a real VM with test-controlled interfaces to simulate seven input surfaces, twelve attack families, and five malicious goals, evaluating agent responses to untrusted inputs.

In practice

Topics

Best for: CTO, VP of Engineering/Data, Director of AI/ML, AI Scientist, AI Security Engineer, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.