Mechanism-Guided Selective Unlearning for RLVR-Induced Reasoning

2026-06-17 · Source: Takara TLDR - Daily AI Papers · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, medium

Summary

A new method called MAST (Mechanism-Aligned Selective Targeting) has been introduced for unlearning reasoning induced by Reinforcement Learning from Verifiable Rewards (RLVR) in large language models. Developed by Chenyu Zhou, Qiliang Jiang, Shuning Wu, and Xu Zhou, MAST significantly reduces collateral damage compared to traditional full-parameter unlearning. The approach identifies and updates only a top-ranked subset of attention-projection tensors, based on off-principal energy, update magnitude, and forget-gradient coupling. Experiments on Qwen2.5-Math-1.5B and Qwen3-1.7B-Base models demonstrated MAST's effectiveness. On the primary model, it achieved statistically significant target forgetting, reducing MATH forget scores from 45/150 to 37/150 (McNemar p=0.0078), while preserving GSM8K (+0.8 pp) and MATH retain (-0.5 pp). This advantage was consistent across different seeds, objectives, and models, notably preserving GSM8K on Qwen3 where full-parameter unlearning failed.

Key takeaway

For Machine Learning Engineers tasked with unlearning specific reasoning capabilities from RLVR-trained models, you should consider implementing mechanism-guided selective unlearning methods like MAST. This approach allows you to precisely forget targeted knowledge, such as specific mathematical reasoning, while critically preserving the model's broader general reasoning abilities on benchmarks like GSM8K. Adopting selective unlearning prevents the significant utility collapse seen with full-parameter updates, ensuring your models remain robust and performant post-unlearning.

Key insights

MAST selectively unlearns RLVR-induced reasoning by targeting specific attention tensors, minimizing collateral damage to other capabilities.

Principles

Full-parameter unlearning damages general reasoning.
Selective tensor updates reduce collateral damage.
RLVR-induced reasoning has distinct token-level delta-log-probability.

Method

MAST ranks attention-projection tensors by off-principal energy, update magnitude, and forget-gradient coupling. It then updates only the top-ranked subset of these tensors.

In practice

Apply MAST for targeted unlearning of RLVR-induced reasoning.
Use MAST to preserve general reasoning skills.
Evaluate unlearning impact on specific benchmarks.

Topics

Machine Unlearning
RLVR
Selective Forgetting
Attention Mechanisms
Qwen Models
Model Utility Preservation

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.