Output Vector Editing for Memorization Mitigation in Large Language Models

· Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cybersecurity & Data Privacy, Software Development & Engineering · Depth: Expert, extended

Summary

A new method, output vector editing, addresses large language model memorization risks by minimally modifying MLP neuron output vectors instead of zeroing activations. This technique, evaluated on four models from 360M to 7B parameters (SmolLM-360M, OLMo-1B, OLMo-7B, Llama2-7B), achieved up to 87.9% suppression on 6,831 memorized sequences from OLMo-7B. This represents a 2.7x improvement over zero ablation on the same located neurons. Four distinct edit modes offer a spectrum from aggressive suppression to minimal redirection; the "Next-best" mode achieved 81.5% suppression with no catastrophic locality failures. Approximately 14% of memorized sequences resisted MLP-only editing, indicating attention-layer intervention as a complementary fallback.

Key takeaway

For AI Security Engineers concerned with LLM privacy and copyright risks, output vector editing offers a targeted mitigation strategy. You should prioritize the "Next-best" edit mode (k=5) for its 81.5% suppression rate on OLMo-7B with zero catastrophic locality failures. For comprehensive coverage, consider an ensemble of edit modes. Be aware that approximately 14% of memorized sequences may require complementary attention-layer interventions, especially for copy-style continuations.

Key insights

Output vector editing surgically mitigates LLM memorization by redirecting MLP neuron contributions, preserving other encoded features.

Principles

Method

Locates MLP neurons responsible for memorized continuations and applies a rank-one weight update to their output vectors, introducing a distractor token without gradient computation.

In practice

Topics

Code references

Best for: Research Scientist, CTO, Director of AI/ML, AI Scientist, Machine Learning Engineer, AI Security Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.