RePAIR: Interactive Machine Unlearning through Prompt-Aware Model Repair

2026-04-14 · Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Natural Language Processing, Cybersecurity & Data Privacy · Depth: Expert, quick

Summary

RePAIR introduces Interactive Machine Unlearning (IMU), a novel paradigm enabling end users to instruct large language models (LLMs) to forget specific knowledge using natural language prompts during inference. This framework addresses the limitation of existing provider-centric unlearning methods by allowing user-driven control over data removal. RePAIR consists of a watchdog model for intent detection, a surgeon model for generating repair procedures, and a patient model whose parameters are updated. Its core innovation is Steering Through Activation Manipulation with PseudoInverse (STAMP), a training-free, single-sample unlearning method that redirects MLP activations using closed-form pseudoinverse updates. A low-rank variant of STAMP reduces computational complexity from O(d^3) to O(r^3 + r^2 * d), achieving up to ~3x speedup over training-based baselines. Experiments show RePAIR achieves near-zero forget scores (Acc_f = 0.00, F-RL = 0.00) while maintaining utility (Acc_r up to 84.47, R-RL up to 0.88) across tasks like harmful knowledge suppression, misinformation correction, and personal data erasure, outperforming six state-of-the-art baselines.

Key takeaway

For research scientists and CTOs evaluating LLM deployment strategies, RePAIR offers a critical advancement in user-centric model governance. Your teams can now consider implementing on-device, interactive unlearning capabilities, significantly enhancing data privacy and content moderation without requiring extensive retraining pipelines. This shifts control to the end-user, potentially reducing compliance burdens and improving trust in LLM applications.

Key insights

RePAIR enables user-driven, interactive machine unlearning in LLMs via prompt-aware model repair and efficient activation manipulation.

Principles

User control over LLM knowledge is achievable at inference time.
Unlearning can be training-free and single-sample efficient.

Method

RePAIR uses a watchdog for intent, a surgeon for repair procedures, and STAMP to redirect MLP activations via closed-form pseudoinverse updates for efficient unlearning.

In practice

Suppress harmful knowledge in LLMs.
Correct misinformation in model outputs.
Erase personal data from LLM memory.

Topics

Machine Unlearning
Large Language Models
RePAIR Framework
STAMP Method
Interactive Machine Unlearning

Best for: Research Scientist, CTO, VP of Engineering/Data, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.