Mechanism-Guided Selective Unlearning for RLVR-Induced Reasoning
Summary
A new method called MAST (Mechanism-Aligned Selective Targeting) has been introduced for unlearning reasoning induced by Reinforcement Learning from Verifiable Rewards (RLVR) in large language models. Developed by Chenyu Zhou, Qiliang Jiang, Shuning Wu, and Xu Zhou, MAST significantly reduces collateral damage compared to traditional full-parameter unlearning. The approach identifies and updates only a top-ranked subset of attention-projection tensors, based on off-principal energy, update magnitude, and forget-gradient coupling. Experiments on Qwen2.5-Math-1.5B and Qwen3-1.7B-Base models demonstrated MAST's effectiveness. On the primary model, it achieved statistically significant target forgetting, reducing MATH forget scores from 45/150 to 37/150 (McNemar p=0.0078), while preserving GSM8K (+0.8 pp) and MATH retain (-0.5 pp). This advantage was consistent across different seeds, objectives, and models, notably preserving GSM8K on Qwen3 where full-parameter unlearning failed.
Key takeaway
For Machine Learning Engineers tasked with unlearning specific reasoning capabilities from RLVR-trained models, you should consider implementing mechanism-guided selective unlearning methods like MAST. This approach allows you to precisely forget targeted knowledge, such as specific mathematical reasoning, while critically preserving the model's broader general reasoning abilities on benchmarks like GSM8K. Adopting selective unlearning prevents the significant utility collapse seen with full-parameter updates, ensuring your models remain robust and performant post-unlearning.
Key insights
MAST selectively unlearns RLVR-induced reasoning by targeting specific attention tensors, minimizing collateral damage to other capabilities.
Principles
- Full-parameter unlearning damages general reasoning.
- Selective tensor updates reduce collateral damage.
- RLVR-induced reasoning has distinct token-level delta-log-probability.
Method
MAST ranks attention-projection tensors by off-principal energy, update magnitude, and forget-gradient coupling. It then updates only the top-ranked subset of these tensors.
In practice
- Apply MAST for targeted unlearning of RLVR-induced reasoning.
- Use MAST to preserve general reasoning skills.
- Evaluate unlearning impact on specific benchmarks.
Topics
- Machine Unlearning
- RLVR
- Selective Forgetting
- Attention Mechanisms
- Qwen Models
- Model Utility Preservation
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.