PARSE: Provenance-Aware Retrieval Sanitization for Professional Domain LLM Agents
Summary
PARSE (Provenance-Aware Retrieval Sanitization) addresses the critical gap where prompt injection defenses, effective on synthetic benchmarks, fail against real enterprise documents. A new benchmark of 122 tasks across financial, legal, medical, scientific, and DevOps domains, using actual SEC filings and arXiv papers, revealed that paraphrasing, a strong synthetic defense, showed no statistically significant attack reduction (p=0.500) and degraded utility from 91.8% to 82.8%. PARSE introduces a domain-aware, fact-preserving sanitization pipeline that classifies sentences by injection likelihood, extracts structured facts, and verifies preservation via a consistency-checking loop. A directiveness gate routes 59% of documents to a lightweight path. PARSE achieved a 15.6% attack success rate, a 38% reduction from the 25.4% baseline, with 86.9% utility (p=0.014).
Key takeaway
For AI Security Engineers or ML teams deploying LLM agents in professional domains, you must move beyond synthetic benchmarks. Your evaluation of prompt injection defenses should utilize domain-matched real documents to accurately assess efficacy and utility. Consider implementing provenance-aware sanitization pipelines like PARSE to achieve significant attack reduction while preserving critical factual content and maintaining high utility.
Key insights
Synthetic prompt injection benchmarks fail to predict defense efficacy on complex, real-world enterprise documents.
Principles
- Real-world enterprise documents challenge LLM defenses more than synthetic data.
- Domain-aware sanitization is crucial for fact preservation and attack reduction.
- Utility degradation is a significant risk when implementing LLM defenses.
Method
PARSE's pipeline involves sentence-level injection likelihood classification, structured fact extraction before rewriting, and a consistency-checking loop for fact preservation, with a directiveness gate for computational efficiency.
In practice
- Evaluate LLM agent defenses using domain-matched real documents.
- Implement fact-preserving sanitization for professional domain LLM agents.
Topics
- Prompt Injection
- LLM Agents
- Retrieval-Augmented Generation
- Document Sanitization
- Enterprise AI
- Benchmark Evaluation
Best for: Research Scientist, CTO, VP of Engineering/Data, AI Scientist, Machine Learning Engineer, AI Security Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.