Context-Fractured Decomposition Attacks on Tool-Using LLM Agents: Exploiting Artifact Provenance Gaps
Summary
Context-Fractured Decomposition (CFD) attacks represent a novel family of multi-step jailbreaks targeting tool-using LLM agents, exploiting "provenance gaps" in how artifact state is tracked. Unlike traditional jailbreaks assuming a single contiguous conversation, CFD attacks leverage the fragmented enforcement across tools, modules, and time in real agent pipelines. These attacks involve an early interaction that creates benign-looking intermediate artifacts, followed by a later interaction, potentially with a different agent instance or workflow stage, where individually innocuous tool actions combine with the earlier artifacts to elicit harmful behavior. The research operationalizes this deployment failure mode, instruments it with trace-level diagnostics, and outlines "provenance lineage tagging" as a verifiable mitigation. CFD attacks demonstrate improved success rates by up to 28.3 percentage points over state-of-the-art baselines on agent-system jailbreak benchmarks, even against strong single-turn judges.
Key takeaway
For AI Security Engineers designing defenses for tool-using LLM agents, you must move beyond single-turn conversation assumptions. Your defense strategies should explicitly track artifact provenance across all tools and workflow stages to prevent Context-Fractured Decomposition attacks. Implement provenance lineage tagging and robust cross-step composition reasoning to mitigate delayed jailbreaks that exploit fragmented enforcement.
Key insights
Context-Fractured Decomposition (CFD) attacks exploit untracked artifact provenance in LLM agents, enabling delayed, multi-step jailbreaks.
Principles
- Defenses must reason about cross-step composition.
- Artifact provenance gaps enable delayed attacks.
- Fragmented enforcement creates vulnerabilities.
Method
CFD attacks involve creating benign intermediate artifacts in an early interaction, then later using innocuous tool actions to compose with these artifacts, eliciting harmful behavior across agent instances or workflow stages.
In practice
- Instrument agent pipelines with trace-level diagnostics.
- Implement provenance lineage tagging for artifacts.
- Evaluate defenses against multi-step, cross-context attacks.
Topics
- LLM Agents
- Jailbreak Attacks
- Provenance Gaps
- Context-Fractured Decomposition
- Artifact Security
- AI Security
Best for: CTO, VP of Engineering/Data, Director of AI/ML, AI Scientist, AI Security Engineer, MLOps Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.