Learning What to Forget: Improving LLM Unlearning via Learned Token-Level Importance
Summary
The Alternating Token-Weighted Unlearning (ATWU) framework significantly improves large language model (LLM) unlearning by identifying token-level forget-specificity. Traditional machine unlearning aims to remove specific knowledge while preserving general capabilities, but existing methods often overlook that not all tokens in a forget sample are equally relevant. ATWU addresses this by characterizing token relevance through its interaction with the retain objective, formalizing it as a joint optimization problem. This lightweight framework jointly learns token forget-specificity and model parameters using a simple linear scorer over hidden states, requiring no external token-level supervision. Evaluated on TOFU and RWKU benchmarks, ATWU achieves leading forget-retain trade-offs, surpassing sample-level, probability-based, and auxiliary-model-based approaches. Its learned scores also align substantially better with ground truth forget-specific spans, demonstrating its effectiveness in identifying semantically meaningful forgetting signals with minimal computational overhead.
Key takeaway
For Machine Learning Engineers implementing LLM unlearning, you should consider adopting the ATWU framework. This method offers leading forget-retain trade-offs by intelligently identifying token-level forget-specificity, surpassing current sample-level or heuristic approaches. Integrating ATWU can significantly enhance the precision and efficiency of removing targeted knowledge from your models, ensuring better preservation of general capabilities with minimal computational overhead. Evaluate its performance on your specific unlearning tasks to optimize model maintenance.
Key insights
LLM unlearning improves by learning token-level forget-specificity, defined by retain objective conflict, directly from model representations without external supervision.
Principles
- Token forget-specificity is defined by minimal conflict with retain optimality.
- Retain conflict effectively identifies what LLMs should forget.
Method
ATWU jointly optimizes model parameters and token weights. It learns token forget-specificity using a linear scorer over hidden states, driven by retain objective interaction, without external token-level supervision.
In practice
- Apply ATWU for superior forget-retain trade-offs in LLM unlearning.
- Use ATWU to identify semantically meaningful token-level forgetting signals.
Topics
- Machine Unlearning
- Large Language Models
- Token-Level Importance
- Forget-Retain Trade-offs
- Joint Optimization
- ATWU Framework
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.