Persona Attack: Incremental Memory Injection Jailbreak Attack against Large Language Models
Summary
Persona Attack is a novel memory injection jailbreak method designed to exploit Large Language Models' conversational memory. Unlike traditional single-prompt injections, this technique manipulates the model's context window incrementally, step-by-step. Experiments on several widely used LLMs demonstrate that as these injections accumulate, models increasingly prioritize the injected instructions over their internal safety alignment mechanisms. The attack's success rate, which can reach 95% under specific instruction configurations, varies significantly based on the model's memory implementation and the combination of instructions used.
Key takeaway
For AI Security Engineers evaluating LLM robustness, this research indicates that traditional safety training is insufficient against memory-based jailbreak attacks like Persona Attack. You should prioritize developing and implementing defenses that specifically address incremental context window manipulation and the accumulation of malicious instructions over conversational turns, rather than solely focusing on single-turn prompt injections.
Key insights
Persona Attack exploits LLM conversational memory to bypass safety alignment via incremental instruction injection.
Principles
- LLMs prioritize accumulated injected instructions over safety.
- Jailbreak success varies by memory implementation and instruction sets.
Method
Persona Attack incrementally injects instructions into an LLM's context window, causing the model to prioritize these over its internal safety alignment mechanisms.
In practice
- Exploit LLM conversational memory for jailbreaking.
- Test instruction combinations for higher attack rates.
Topics
- LLM Jailbreak
- Persona Attack
- Memory Injection
- Context Window
- Safety Alignment
- Prompt Engineering
Best for: Research Scientist, CTO, VP of Engineering/Data, AI Scientist, AI Security Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.