Forgetting That Sticks: Quantization-Permanent Unlearning via Circuit Attribution

· Source: Machine Learning · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cybersecurity & Data Privacy, Emerging Technologies & Innovation · Depth: Expert, quick

Summary

A new study reveals that standard machine unlearning methods, evaluated in full precision, systematically fail when language models undergo 4-bit post-training quantization (PTQ). This failure is attributed to per-parameter updates being 47-828x smaller than the NF4 quantization bin width, preventing updates from clearing quantization boundaries and leading to a sparsity-permanence tradeoff. The research introduces MANSU (Mechanistic-Aligned Null-Space Unlearning), a novel method that addresses this by combining causal circuit attribution to identify minimal forget-set subgraphs, circuit-restricted null-space projection with a diagonal-Fisher retain bound, and a per-parameter magnitude floor to ensure quantization survival. Additionally, Circuit Attribution Divergence (CAD) is presented as a new metric to verify structural erasure, distinguishing it from mere behavioral suppression. MANSU is the first method demonstrated to achieve meaningful forgetting, retain preservation, a non-positive PTQ gap, and structural erasure across various model families and hazard benchmarks, outperforming gradient-based baselines which can recover up to +0.05 accuracy post-compression.

Key takeaway

For research scientists developing machine unlearning techniques for deployed large language models, you must account for the impact of post-training quantization. Your current gradient-based unlearning methods are likely to be reversed by 4-bit compression. Prioritize methods like MANSU that guarantee quantization survival and structural erasure, using metrics like Circuit Attribution Divergence to validate true forgetting, ensuring your unlearning efforts are permanent and effective in real-world applications.

Key insights

Quantization can reverse machine unlearning, necessitating methods that ensure forgetting persists post-compression.

Principles

Method

MANSU uses causal circuit attribution, circuit-restricted null-space projection with a diagonal-Fisher retain bound, and a per-parameter magnitude floor to ensure quantization-permanent unlearning.

In practice

Topics

Best for: Research Scientist, CTO, VP of Engineering/Data, AI Scientist, Machine Learning Engineer, AI Security Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.