AnchorKV: Safety-Aware KV Cache Compression via Soft Penalty with a Refusal Anchor
Summary
AnchorKV is a novel drop-in modification designed to enhance the safety alignment of Large Language Models (LLMs) during KV cache compression. LLMs excel in generative inference and long-context tasks, but their large size creates significant memory and energy challenges. The key-value (KV) cache is a primary bottleneck. Existing compression methods, such as FastKV, SnapKV, and DynamicKV, reduce costs and preserve accuracy. However, these methods often fail against jailbreak attacks or compromise safety under aggressive token eviction. AnchorKV addresses this by biasing token retention scores away from key space directions associated with harmful prompts. It constructs an offline safety anchor, adapting a difference-of-means representation engineering approach to the layer-specific key projection space. This mechanism enables a soft penalty token selection rule, trading minor utility for substantially improved safety alignment. It also reverts to the original compressor when no penalty is applied.
Key takeaway
For Machine Learning Engineers deploying Large Language Models with KV cache compression, you should consider integrating AnchorKV to mitigate safety risks. Your current compression methods might leave models vulnerable to jailbreak attacks or degrade alignment. AnchorKV offers a way to substantially improve safety alignment by biasing token retention. This trades a small amount of utility for enhanced robustness, ensuring your compressed LLMs maintain safety without significant performance compromise.
Key insights
AnchorKV improves LLM safety during KV cache compression by biasing token retention away from harmful prompt directions.
Principles
- KV cache compression can degrade safety alignment.
- Biasing token retention scores enhances safety.
- Offline safety anchors guide token selection.
Method
Construct an offline safety anchor using difference-of-means representation engineering in the layer-specific key projection space. Apply a soft penalty token selection rule based on this anchor.
In practice
- Integrate AnchorKV as a drop-in modification.
- Improve LLM robustness against jailbreak attacks.
- Balance utility and safety in compressed LLMs.
Topics
- AnchorKV
- KV Cache Compression
- LLM Safety
- Jailbreak Attacks
- Representation Engineering
Best for: AI Engineer, NLP Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, AI Security Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.