Replay What Matters: Off-Policy Replay for Efficient LLM Reinforcement Unlearning

· Source: Computation and Language · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

ReRULE introduces an off-policy replay enhancement for LLM reinforcement unlearning, addressing inefficiencies in existing RL-based methods like RULE. Current approaches repeatedly sample from easy cases and discard low-reward hard-case rollouts, wasting computational resources. ReRULE tackles this by storing low-reward hard-case rollout groups in a replay buffer during early GRPO training and reusing them in later stages through importance-sampled off-policy updates, thereby focusing computation on challenging boundary cases. Theoretically, ReRULE offers a tighter hard-case convergence bound. Empirically, it improves MUSE-Books Retain Quality from 46.3 to 56.2, adding only 5-11% to training time across benchmarks, with benefits more pronounced in complex scenarios like MUSE-Books compared to the simpler TOFU setting.

Key takeaway

For Machine Learning Engineers developing LLM unlearning solutions, if you are struggling with computational inefficiency or suboptimal retain quality, consider implementing off-policy replay. ReRULE demonstrates that storing and reusing hard-case rollouts significantly improves retain quality (e.g., MUSE-Books from 46.3 to 56.2) while adding only 5-11% to training time. This approach allows you to redirect computation towards critical boundary cases, making your unlearning process more effective and resource-efficient, especially for complex datasets.

Key insights

Off-policy replay efficiently targets hard cases in LLM reinforcement unlearning, improving performance with minimal training overhead.

Principles

Method

ReRULE stores low-reward hard-case rollout groups in a replay buffer during early GRPO training, then reuses them in later stages via importance-sampled off-policy updates.

In practice

Topics

Best for: Research Scientist, AI Engineer, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.