RePAIR: Interactive Machine Unlearning through Prompt-Aware Model Repair

· Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Natural Language Processing, Cybersecurity & Data Privacy · Depth: Expert, quick

Summary

RePAIR introduces Interactive Machine Unlearning (IMU), a novel paradigm enabling end users to instruct large language models (LLMs) to forget specific knowledge using natural language prompts during inference. This framework addresses the limitation of existing provider-centric unlearning methods by allowing user-driven control over data removal. RePAIR consists of a watchdog model for intent detection, a surgeon model for generating repair procedures, and a patient model whose parameters are updated. Its core innovation is Steering Through Activation Manipulation with PseudoInverse (STAMP), a training-free, single-sample unlearning method that redirects MLP activations using closed-form pseudoinverse updates. A low-rank variant of STAMP reduces computational complexity from O(d^3) to O(r^3 + r^2 * d), achieving up to ~3x speedup over training-based baselines. Experiments show RePAIR achieves near-zero forget scores (Acc_f = 0.00, F-RL = 0.00) while maintaining utility (Acc_r up to 84.47, R-RL up to 0.88) across tasks like harmful knowledge suppression, misinformation correction, and personal data erasure, outperforming six state-of-the-art baselines.

Key takeaway

For research scientists and CTOs evaluating LLM deployment strategies, RePAIR offers a critical advancement in user-centric model governance. Your teams can now consider implementing on-device, interactive unlearning capabilities, significantly enhancing data privacy and content moderation without requiring extensive retraining pipelines. This shifts control to the end-user, potentially reducing compliance burdens and improving trust in LLM applications.

Key insights

RePAIR enables user-driven, interactive machine unlearning in LLMs via prompt-aware model repair and efficient activation manipulation.

Principles

Method

RePAIR uses a watchdog for intent, a surgeon for repair procedures, and STAMP to redirect MLP activations via closed-form pseudoinverse updates for efficient unlearning.

In practice

Topics

Best for: Research Scientist, CTO, VP of Engineering/Data, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.