Mechanism-Guided Selective Unlearning for RLVR-Induced Reasoning

· Source: Takara TLDR - Daily AI Papers · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, medium

Summary

A new method called MAST (Mechanism-Aligned Selective Targeting) has been introduced for unlearning reasoning induced by Reinforcement Learning from Verifiable Rewards (RLVR) in large language models. Developed by Chenyu Zhou, Qiliang Jiang, Shuning Wu, and Xu Zhou, MAST significantly reduces collateral damage compared to traditional full-parameter unlearning. The approach identifies and updates only a top-ranked subset of attention-projection tensors, based on off-principal energy, update magnitude, and forget-gradient coupling. Experiments on Qwen2.5-Math-1.5B and Qwen3-1.7B-Base models demonstrated MAST's effectiveness. On the primary model, it achieved statistically significant target forgetting, reducing MATH forget scores from 45/150 to 37/150 (McNemar p=0.0078), while preserving GSM8K (+0.8 pp) and MATH retain (-0.5 pp). This advantage was consistent across different seeds, objectives, and models, notably preserving GSM8K on Qwen3 where full-parameter unlearning failed.

Key takeaway

For Machine Learning Engineers tasked with unlearning specific reasoning capabilities from RLVR-trained models, you should consider implementing mechanism-guided selective unlearning methods like MAST. This approach allows you to precisely forget targeted knowledge, such as specific mathematical reasoning, while critically preserving the model's broader general reasoning abilities on benchmarks like GSM8K. Adopting selective unlearning prevents the significant utility collapse seen with full-parameter updates, ensuring your models remain robust and performant post-unlearning.

Key insights

MAST selectively unlearns RLVR-induced reasoning by targeting specific attention tensors, minimizing collateral damage to other capabilities.

Principles

Method

MAST ranks attention-projection tensors by off-principal energy, update magnitude, and forget-gradient coupling. It then updates only the top-ranked subset of these tensors.

In practice

Topics

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.