SMSR: Certified Defence Against Runtime Memory Poisoning in Persistent LLM Agent Systems
Summary
SMSR: Certified Defence Against Runtime Memory Poisoning in Persistent LLM Agent Systems" introduces a novel, formally certified defense against Multi-Session Memory Poisoning (MSMP) in Retrieval-Augmented Generation (RAG) agent systems. This threat involves adversaries injecting malicious memories into persistent agent stores to alter future behavior without modifying model weights. Existing defenses lack formal guarantees for this dynamic runtime injection. SMSR comprises two components: Component 1 uses HMAC-SHA256 provenance tagging at write time, achieving 0% Attack Success Rate (ASR) against unsigned injection. Component 2 employs randomized memory ablation and verdict-based majority aggregation at query time, providing a certified robustness bound for authenticated adversaries. Empirical evaluation across 15 enterprise scenarios (3,150 trials) demonstrated Component 2 reduces authenticated ASR from 93–100% to 8.0% (95% CI [5.8%, 10.9%]) in a production-scale store (m=20), remaining below the δ=10.4% certificate bound. An end-to-end query-only attack further reduced ASR from 65.3% to 5.3% (n=150). The full defense maintains 85% utility.
Key takeaway
For AI Security Engineers deploying RAG agents with persistent memory, implementing SMSR is crucial to mitigate runtime memory poisoning. This defense provides certified robustness, reducing attack success rates significantly. You should integrate HMAC provenance for write-time protection and configure randomized ablation with verdict-based aggregation at query time. Carefully size your retrieval pool (m) and number of runs (n_runs) based on your assumed adversary budget (t) to achieve desired security bounds.
Key insights
SMSR provides the first certified defense against runtime memory poisoning in persistent LLM agent systems.
Principles
- Write-time provenance is essential for certified defense.
- Randomised over-fetch ablation resists adaptive adversaries.
- Verdict-based aggregation counters the Consistent Minority Effect.
Method
SMSR signs legitimate memory writes with HMAC-SHA256. At query time, it retrieves top-m verified candidates, samples k entries randomly n_runs times, and aggregates LLM responses via majority verdict.
In practice
- Store HMAC keys in HSMs or secrets managers.
- Restrict memory store writes to trusted paths.
- Size retrieval pool m based on adversary budget t.
Topics
- RAG Systems
- LLM Agent Security
- Memory Poisoning Attacks
- Certified Robustness
- HMAC Provenance
- Randomised Ablation
Code references
Best for: AI Architect, Research Scientist, CTO, AI Scientist, AI Security Engineer, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.