TRAP: Benchmark for Task-completion and Resistance to Active Privacy-extraction
Summary
The TRAP (Task-completion and Resistance to Active Privacy-extraction) benchmark evaluates the fundamental tension between an AI agent's ability to use sensitive private information for task completion and its resistance to revealing that data. Designed for document-intensive workflows where agents handle routine private inputs like passport numbers, TRAP scenarios include a document with private data, a task query requiring tool invocation with private fields, and an attack query to elicit the same information. Evaluating 22 frontier proprietary and open-source models, the study found all model families exhibit non-trivial leakage, with instruction-following ability correlating with higher leakage rates. Existing prompt-based defenses reduce leakage but significantly compromise task accuracy. The research demonstrates an impossibility result: no soft-constraint defense can jointly achieve high task success with zero leakage probability for softmax-based models. To address this, structural private field isolation is proposed, replacing private fields with hash keys before model input, largely preventing leakage while maintaining task accuracy.
Key takeaway
For AI Security Engineers or Machine Learning Engineers deploying agents in document-intensive workflows handling sensitive data, you must recognize the inherent privacy leakage risk. Current prompt-based defenses are insufficient, often sacrificing task accuracy without eliminating leakage. You should investigate implementing structural private field isolation, which replaces private fields with hash keys before model processing, to effectively prevent data exposure while maintaining agent utility.
Key insights
AI agents face an inherent trade-off between using private data for tasks and preventing its leakage, which prompt-based defenses cannot fully resolve.
Principles
- Instruction-following ability correlates with privacy leakage in AI models.
- Soft-constraint defenses cannot achieve zero leakage with high task success for softmax models.
Method
Structural private field isolation replaces private fields with hash keys before model input, preventing leakage while preserving task accuracy.
In practice
- Evaluate agent privacy-accuracy trade-offs using the TRAP benchmark.
- Implement hash key replacement for sensitive data in agent inputs.
Topics
- TRAP benchmark
- AI agents
- Privacy leakage
- Data privacy
- Prompt engineering
- Information security
- Hash keys
Best for: AI Architect, Research Scientist, CTO, AI Scientist, AI Security Engineer, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.