FragFuse: Bypassing Access Control of Large Language Model Agents via Memory-Based Query Fragmentation and Fusion

2026-06-14 · Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cybersecurity & Data Privacy · Depth: Expert, quick

Summary

FragFuse is a novel attack designed to bypass access control mechanisms in large language model (LLM) agents by exploiting their long-term memory operations. This attack leverages a temporal channel where prohibited content, normally blocked by access control, is fragmented across multiple interactions. These benign-appearing fragments are stored in the agent's long-term memory and later reconstructed through memory retrieval, without the harmful content explicitly appearing in the final user query. FragFuse operates in three stages: identifying rejection-responsive fragments via black-box adaptive querying, injecting these fragments into memory using marker carrier queries, and then retrieving and fusing them with a follow-up attack query. An automated surrogate-based optimization scheme further tunes fusion instructions and marker designs for automated attack generation. Evaluated across four agent settings and three representative access-control mechanisms, FragFuse achieved an 86.3% average bypass success rate and a 41.1% average end-to-end harmful task success rate, with only 4.4% task-success degradation. Existing prompt-injection and perplexity detectors proved ineffective against this attack.

Key takeaway

For AI Security Engineers deploying LLM agents with access control, you must recognize that current mechanisms are highly vulnerable to memory-based query fragmentation attacks like FragFuse. Your existing prompt-injection and perplexity detectors are ineffective. You should urgently re-evaluate your agent's security posture, focusing on memory interaction channels, and develop new defenses that prevent the injection and reconstruction of fragmented harmful content within long-term memory. This vulnerability achieves an 86.3% bypass rate, demanding immediate attention.

Key insights

LLM agent long-term memory introduces a novel attack surface, FragFuse, enabling access control bypass via fragmented content injection and reconstruction.

Principles

Agent memory operations introduce a temporal attack channel.
Fragmenting content evades direct access control checks.
Reconstructing content from memory bypasses explicit query scrutiny.

Method

FragFuse identifies rejection-responsive fragments, injects them into memory using marker carrier queries, then retrieves and fuses them via a follow-up attack query. An automated optimization scheme tunes fusion instructions and marker designs.

In practice

Evaluate agent access controls against memory-based attacks.
Implement memory sanitization for LLM agent interactions.
Develop detectors for fragmented harmful content patterns.

Topics

LLM Agents
Access Control Bypass
Memory Attacks
Query Fragmentation
AI Security
Vulnerability

Best for: CTO, AI Architect, VP of Engineering/Data, AI Scientist, AI Security Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.