PARSE: Provenance-Aware Retrieval Sanitization for Professional Domain LLM Agents

2026-06-16 · Source: Computation and Language · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cybersecurity & Data Privacy, Robotics & Autonomous Systems · Depth: Advanced, quick

Summary

PARSE (Provenance-Aware Retrieval Sanitization) addresses the critical gap where prompt injection defenses, effective on synthetic benchmarks, fail against real enterprise documents. A new benchmark of 122 tasks across financial, legal, medical, scientific, and DevOps domains, using actual SEC filings and arXiv papers, revealed that paraphrasing, a strong synthetic defense, showed no statistically significant attack reduction (p=0.500) and degraded utility from 91.8% to 82.8%. PARSE introduces a domain-aware, fact-preserving sanitization pipeline that classifies sentences by injection likelihood, extracts structured facts, and verifies preservation via a consistency-checking loop. A directiveness gate routes 59% of documents to a lightweight path. PARSE achieved a 15.6% attack success rate, a 38% reduction from the 25.4% baseline, with 86.9% utility (p=0.014).

Key takeaway

For AI Security Engineers or ML teams deploying LLM agents in professional domains, you must move beyond synthetic benchmarks. Your evaluation of prompt injection defenses should utilize domain-matched real documents to accurately assess efficacy and utility. Consider implementing provenance-aware sanitization pipelines like PARSE to achieve significant attack reduction while preserving critical factual content and maintaining high utility.

Key insights

Synthetic prompt injection benchmarks fail to predict defense efficacy on complex, real-world enterprise documents.

Principles

Real-world enterprise documents challenge LLM defenses more than synthetic data.
Domain-aware sanitization is crucial for fact preservation and attack reduction.
Utility degradation is a significant risk when implementing LLM defenses.

Method

PARSE's pipeline involves sentence-level injection likelihood classification, structured fact extraction before rewriting, and a consistency-checking loop for fact preservation, with a directiveness gate for computational efficiency.

In practice

Evaluate LLM agent defenses using domain-matched real documents.
Implement fact-preserving sanitization for professional domain LLM agents.

Topics

Prompt Injection
LLM Agents
Retrieval-Augmented Generation
Document Sanitization
Enterprise AI
Benchmark Evaluation

Best for: Research Scientist, CTO, VP of Engineering/Data, AI Scientist, Machine Learning Engineer, AI Security Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.